Robotic Detection of a Human-Comprehensible Gestural Language for Underwater Multi-Human-Robot Collaboration

Sadman Sakib Enan

{}^{1}

, Michael Fulton

{}^{2}

and Junaed Sattar

{}^{3}

Abstract.

In this paper, we present a motion-based robotic communication framework that enables non-verbal communication among autonomous underwater vehicles (AUVs) and human divers. We design a gestural language for AUV-to-AUV communication which can be easily understood by divers observing the conversation unlike typical radio frequency, light, or audio based AUV communication. To allow AUVs to visually understand a gesture from another AUV, we propose a deep network (RRCommNet) which exploits a self-attention mechanism to learn to recognize each message by extracting maximally discriminative spatio-temporal features. We train this network on diverse simulated and real-world data. Our experimental evaluations, both in simulation and in closed-water robot trials, demonstrate that the proposed RRCommNet architecture is able to decipher gesture-based messages with an average accuracy of 88-94% on simulated data, 73-83% on real data (depending on the version of the model used). Further, by performing a message transcription study with human participants, we also show that the proposed language can be understood by humans, with an overall transcription accuracy of 88%. Finally, we discuss the inference runtime of RRCommNet on embedded GPU hardware, for real-time use on board AUVs in the field.

The paper is accepted for presentation at 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). This work was supported by the US National Science Foundation award #00074041, a UMII-MnDRIVE Fellowship, and the MnRI Seed Grant. The authors are with the Department of Computer Science & Engineering and the Minnesota Robotics Institute, University of Minnesota, MN, USA. {

{}^{1}

enan0001,

{}^{2}

fulto081,

{}^{3}

junaed}@umn.edu

^†^†copyright: none

1. Introduction

Over the last several decades, applications of autonomous underwater vehicles (AUVs) (sattar2008enabling; edge2020design) have multiplied and diversified (e.g., environmental monitoring and mapping (fulton2019robotic; weidner2017underwater), submarine cables and wreckage inspection (bingham2010robotic), search and navigation (koreitem2020one-shot; xanthidis2020navigation)), driven by ever-increasing on-board computational power, increased affordability, and ease of use. The majority of these applications involve multiple AUVs and/or their human diver companions, often interacting with one another to work effectively as a team (islam2018understanding; hong2020visual). Thus, robust underwater human-to-robot and robot-to-human interaction capabilities are of utmost value. A common language comprehensible to both humans and other AUVs would greatly enhance such underwater multi-human-robot (m/HRI) missions (see Fig. LABEL:fig:intro).

When designing such a communication protocol, challenges unique to the underwater domain need to be considered. Traditional sensory mediums, such as radio and other electromagnetic (EM) modalities, suffer from signal attenuation and degradation (qureshi2016rf) underwater which limit their use to mostly surface operations. Although acoustic signals work quite well in underwater settings (farr2010an), these types of inter-AUV communication signals are typically incomprehensible to humans. Our recent work utilizing robot motion for AUV-to-diver communication has demonstrated that motion can be used to communicate with divers (fulton2019robot; fulton_rcvm-thri_2022). Similarly, research on the use of motion for inter-AUV communication has shown the same for AUV-to-AUV communication