How to Train PointGoal Navigation Agents
on a (Sample and Compute) Budget

Erik Wijmans

{}^{\texttt{1},\texttt{2}}

, Irfan Essa

{}^{\texttt{1},\texttt{3}}

, and Dhruv Batra

{}^{\texttt{2},\texttt{1}}

{{}^{\texttt{1}}}

Georgia Institute of Technology,

{{}^{\texttt{2}}}

Facebook AI Research,

{{}^{\texttt{3}}}

Google Research Atlanta
{etw, irfan, dbatra}@gatech.edu

Abstract

PointGoal Navigation has seen significant recent interest and progress, spurred on by the Habitat platform and associated challenge (habitat19iccv). In this paper, we study PointGoal Navigation under both a sample budget (75 million frames) and a compute budget (1 GPU for 1 day). We conduct an extensive set of experiments, cumulatively totaling over 50,000 GPU-hours, that let us identify and discuss a number of ostensibly minor but significant design choices – the advantage estimation procedure (a key component in training), visual encoder architecture, and a seemingly minor hyper-parameter change. Overall, these design choices to lead considerable and consistent improvements over the baselines present in Savva et al. habitat19iccv. Under a sample budget, performance for RGB-D agents improves 8 SPL on Gibson (14% relative improvement) and 20 SPL on Matterport3D (38% relative improvement). Under a compute budget, performance for RGB-D agents improves by 19 SPL on Gibson (32% relative improvement) and 35 SPL on Matterport3D (220% relative improvement). We hope our findings and recommendations will make serve to make the community’s experiments more efficient.

1 Introduction

Galvanized by fast simulation platforms

How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget

Abstract

1 Introduction

How to Train PointGoal Navigation Agents
on a (Sample and Compute) Budget