This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Decentralized and Communication-Free Multi-Robot
Navigation through Distributed Games

Brian Reily1, Terran Mott2, and Hao Zhang2 1: Brian Reily is affiliated with the DEVCOM Army Research Laboratory through ORAU. email: [email protected]2:Terran Mott and Hao Zhang are with the Human-Centered Robotics Lab in the Department of Computer Science at the Colorado School of Mines, Golden, CO, 80401. email: {terranmott, hzhang}@mines.edu.
Abstract

Effective multi-robot teams require the ability to move to goals in complex environments in order to address real-world applications such as search and rescue. Multi-robot teams should be able to operate in a completely decentralized manner, with individual robot team members being capable of acting without explicit communication between neighbors. In this paper, we propose a novel game theoretic model that enables decentralized and communication-free navigation to a goal position. Robots each play their own distributed game by estimating the behavior of their local teammates in order to identify behaviors that move them in the direction of the goal, while also avoiding obstacles and maintaining team cohesion without collisions. We prove theoretically that generated actions approach a Nash equilibrium, which also corresponds to an optimal strategy identified for each robot. We show through extensive simulations that our approach enables decentralized and communication-free navigation by a multi-robot system to a goal position, and is able to avoid obstacles and collisions, maintain connectivity, and respond robustly to sensor noise.

I Introduction

Multi-robot teams are a viable solution for a variety of applications that involve distributed or simultaneous tasks, such as disaster response [1], area exploration [2], and search and rescue [3]. To effectively address such applications, robots must be able to move through complex environments and arrive at waypoints and goal positions [4]. While planning its own navigation actions, each robot must avoid obstacles as well as collisions with its teammates who are making their own movements. Accordingly, each robot needs the capability to anticipate both the actions of their local teammates and how their own actions will affect their neighbors.

In order to ensure the resilience of multi-robot teams, robots must be able to work in a decentralized manner in the real world [5, 6]. Complex environments can be dangerous, and relying on a single robot to provide centralized control can cause an entire multi-robot system to fail due to the failure of only that single robot [7]. Decentralized systems are resilient to this, continuing to operate despite failure of individual robots [8]. In addition, in a resilient multi-robot system robots should be able to operate without requiring the capability to communicate with their teammates. Multi-robot teams are often composed of low-cost robots that may lack expensive communication capabilities, or multi-robot teams may lose members due to obstacles and adversaries that interfere with communication. Together, enabling decentralized behaviors and allowing robots to operate without communication to their neighbors is vital for the success of the multi-robot team.

Refer to caption
Figure 1: Our approach enables decentralized and communication-free navigation. The robot highlighted in blue chooses its strategy after considering the expected strategies of its neighbors and the effect of its strategy on them, without actually communicating. In this way, the blue robot can avoid collisions, maintain team cohesion, avoid obstacles, and make progress towards a goal position without direct communication.

Due to its importance, multi-robot navigation has been explored by a wide variety of methods. Many methods have been designed based on centralized planning and control, which require a master robot for centralized computing and communication among all teammates [9, 10, 11]. Multiple other methods have been developed to work in a decentralized manner, but still rely on communication among robots [12, 13]. The problem of navigating multiple robots towards a goal in a completely decentralized and communication-free manner has not been well addressed, mainly because without communication it is challenging for a robot to anticipate actions of neighboring robots that dynamically make independent decisions [14, 15]. Recently, game theory has attracted increasing attention in coordinating robots without communication [16]. Game theoretic methods have been developed to identify actions in the context of their effect on teammates and adversaries for two-player applications such as driving [17, 18], or racing [19], as well as larger games in known environments for surveillance [20]. However, they have not been previously utilized to address the problem of navigating a variable size multi-robot system through a complex environment to a goal position.

In this paper, we propose a novel game theoretic and communication-free approach to multi-robot navigation. Our approach assumes that a robot is able to locally observe locations of nearby obstacles and robots in the multi-robot system, but does not require communication with teammates or knowledge of the entire system. Then, our approach models each robot and its neighboring teammates with a strategic game, formulating multi-robot navigation as a problem of solving a game for each robot at each time step. These games are solved in a distributed fashion, with each robot playing its own individual game. In each game, illustrated in Figure 1, a robot estimates the behaviors of its neighbors in order to identify movements that avoid collisions, while also progressing towards stationary or dynamic goal positions and avoiding obstacles. Collision-free actions are determined by the robot to minimize a cost function, which is theoretically proved to approach a Nash equilibrium, i.e., a state in the game where no player can unilaterally improve their outcome.

The contribution of this paper is twofold.

  1. 1.

    We propose a novel formulation that models multi-robot navigation as multiple distributed normal-form games, allowing individual robots to form precise expectations about the actions of their neighbors and the effects of their own actions, which then enables communication-free and decentralized multi-robot navigation.

  2. 2.

    We theoretically prove that these games approach stable Nash equilibria, which also enables each robot in the team to identify its optimal strategy without the requirement of communication with other teammates.

II Related Work

II-A Multi-Robot Maneuver

Many multi-robot navigation methods require explicit communication among robots, either to avoid collisions with obstacles by generating a consensus behavior among robots [21] or to calculate a global gradient that controls the overall shape of the multi-robot formation [9]. Several methods have used leader and follower models to navigate a multi-robot system. By controlling the movements of a number of robots, the overall behavior of the system was influenced [22]. A single leader was utilized in [23], basing its movements on influencing nearby naive agents to follow it to goal positions. A single leader was also utilized in [24] and multiple leaders were used in [10], but both required guaranteed connectivity with the leader. A consistent drawback of these models is a required communication, either for centralized control or for knowledge of follower positions in the environment.

Recently, several decentralized methods have been implemented. Graph neural networks have enabled decentralized behavior policies [13, 25], but relied on local communication that allowed information to propagate globally. Graph temporal logic was applied to swarm guidance in order to create policies that scale to large numbers of agents [26], but relies on knowledge of the overall agent distribution. Deep reinforcement learning was used to learn navigation policies around complex obstacles, allowing decentralized policies but requiring full knowledge of the state of the multi-robot system for training [27, 12].

II-B Game Theory for Robotics

Game theory seeks to explain and analyze the interaction of rational agents accommodating their own preferences and expectations. Modelling systems as games enables agents to analyze actions based on how they impact other agents. Many methods have been designed to find solutions to games, including optimization and reinforcement learning [28, 29, 30]. Game theory has been applied in several multi-robot system applications [31], due to its ability to consider the interaction of multiple agents and their competing motivations.

In autonomous driving, game theory has been applied to the problem of lane changing and merging [18]. Deep reinforcement learning was used to solve two player merging games in [17], which relied on defining drivers as proactive or passive. Roles for two players were also used in [32], which modelled lane changing as a Stackleberg game that has implicit leaders and followers. Multiple drivers were modelled in [33], which solved games through optimization by converting constraints such as collision avoidance to Lagrangian multiplier terms. In addition to lane changing, game theory was also used to generate racing trajectories which would avoid collisions while still competing with other agents [19, 34].

Despite the exploration of game theory in autonomous driving, previous research has mostly been limited to only two player systems (i.e., two cars). Recently, game theory has been applied to several multi-robot scenarios. Many approaches have considered task allocation, both with the requirement of communication [35] and enabling this without communication being necessary [36, 37, 38]. Collision avoidance has been considered, but has required the agents to be capable of communicating with each other [mylvaganam2017differential]. Multi-robot sensing and patrolling has been the other main scenario explored with game theoretic models, but approaches have required prior mappings of the environment [39, 20] or still require communication between agents [40].

Game theoretic models have been used due to their useful ability to model other agents in a system, but have not been theoretically well studied in generating navigational behaviors for a large multi-robot system without the ability to communicate between robots in complex environments.

III Our Proposed Approach

III-A Multi-Robot Navigation as a Decentralized Game

We consider a multi-robot system consisting of NN robots. Each robot has a position 𝐩d\mathbf{p}\in\mathbb{R}^{d}, where 𝐩i\mathbf{p}_{i} is the position of the ii-th robot. At each time step tt, each robot makes a decision to take an action 𝐚it\mathbf{a}_{i}^{t} and update its position. We define a goal 𝐠d\mathbf{g}\in\mathbb{R}^{d} that is the position that the multi-robot system is attempting to navigate to, which is known to each robot. This goal can be stationary (e.g., a fixed location in the environment) or dynamic (e.g., a moving human whom the multi-robot system must follow). We define the multi-robot system to have reached the goal state when 𝐩+𝐠2<T\|\mathbf{p}_{+}-\mathbf{g}\|_{2}<T, where 𝐩+\mathbf{p}_{+} is the geometric center of the robot system and TT is a distance threshold.

We define an ideal interaction distance RR between robots, where if 𝐩i𝐩j2R\|\mathbf{p}_{i}-\mathbf{p}_{j}\|_{2}\ll R then the ii-th and jj-th robots are at risk of collision (with the collision risk rising as 𝐩i\mathbf{p}_{i} approaches 𝐩j\mathbf{p}_{j}) and if 𝐩i𝐩j2R\|\mathbf{p}_{i}-\mathbf{p}_{j}\|_{2}\gg R then there is a risk of losing cohesion between robots ii and jj (with this risk rising as 𝐩i\mathbf{p}_{i} moves further away from 𝐩j\mathbf{p}_{j}). Successful cohesion in the multi-robot system results from maintaining 𝐩i𝐩j2=Ri,j{1,,N},ij\|\mathbf{p}_{i}-\mathbf{p}_{j}\|_{2}=R\;\forall\>i,j\in\{1,\dots,N\},i\neq j. Finally, we consider a set of OO obstacles in the environment. Robots should take actions such that 𝐩i𝐨j2>0\|\mathbf{p}_{i}-\mathbf{o}_{j}\|_{2}>0, so as to avoid collisions with obstacles. We assume each robot is able to locally perceive neighboring robots and obstacles.

Then, we formulate the problem of multi-robot navigation with multiple distributed and differentiable simultaneous strategic games 𝒢={𝒩,𝒳,𝒥}\mathcal{G}=\{\mathcal{N},\mathcal{X},\mathcal{J}\}, where each robot is the main player of a game based on its local environment. Each game is simultaneous as the robot plays its game at each time step without knowledge of how other robots will move. Each game is also strategic, where players are unable to coordinate through explicit communication. 𝒩\mathcal{N} represents the set of players, where the cardinality of 𝒩\mathcal{N} is denoted as NN^{\prime}. This set of players is defined as the main player robot and the neighboring robots within sensing distance. 𝒳={X1,,XN}\mathcal{X}=\{X_{1},\dots,X_{N^{\prime}}\} represents the strategy set for game 𝒢\mathcal{G}, where XiX_{i} represents the set of available strategies of the ii-th player. We define a strategy 𝐱iXi\mathbf{x}_{i}\in X_{i} to be a movement update available to the ii-th robot given 𝐩i\mathbf{p}_{i} and actions 𝐚i\mathbf{a}_{i}

𝐱i=𝐩i+α𝐚i\mathbf{x}_{i}=\mathbf{p}_{i}+\alpha\mathbf{a}_{i}\vspace{-5pt} (1)

where 𝐚i\mathbf{a}_{i} satisfies 𝐚i2=1\|\mathbf{a}_{i}\|_{2}=1 and α\alpha is a hyperparameter controlling the importance of the action. Values of α\alpha are discussed later in the paper when consider the theoretical aspects of the games.

We define 𝒥={J1(𝐱1,𝐱1),,JN(𝐱N,𝐱N)}\mathcal{J}=\{J_{1}(\mathbf{x}_{1},\mathbf{x}_{-1}),\dots,J_{N^{\prime}}(\mathbf{x}_{N^{\prime}},\mathbf{x}_{-N^{\prime}})\} as the set of differentiable cost functions that govern the action of each player, where 𝐱i\mathbf{x}_{i} is the strategy of the ii-th player and 𝐱i\mathbf{x}_{-i} is the strategy set of all other players, i.e. 𝐱i=(𝐱1,,𝐱i1,𝐱i+1,,𝐱N)\mathbf{x}_{-i}=(\mathbf{x}_{1},\dots,\mathbf{x}_{i-1},\mathbf{x}_{i+1},\dots,\mathbf{x}_{N^{\prime}}). Each player is assumed to be rational, i.e., each player will select a strategy 𝐱i\mathbf{x}_{i} that minimizes its cost function Ji(𝐱i,𝐱i)J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i}), while expecting that every other player is selecting strategies in 𝐱i\mathbf{x}_{-i} that minimize their own cost functions. We denote the optimal strategy for player ii as 𝐱i\mathbf{x}_{i}^{*} and the optimal strategies for the other players as 𝐱i\mathbf{x}_{-i}^{*}. The formulated cost function common to each player is defined as:

Ji(𝐱i,𝐱i)=β1𝐱i𝐠22+β2Ci(𝐱i,𝐱i)β3o=0O𝐱i𝐨o22\small{J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i})=\beta_{1}\|\mathbf{x}_{i}-\mathbf{g}\|_{2}^{2}+\beta_{2}C_{i}(\mathbf{x}_{i},\mathbf{x}_{-i})-\beta_{3}\sum_{o=0}^{O}\|\mathbf{x}_{i}-\mathbf{o}_{o}\|_{2}^{2}} (2)

where Ci(𝐱i,𝐱i)C_{i}(\mathbf{x}_{i},\mathbf{x}_{-i}) represents a cost related to cohesion with other players within sensing distance, which can be defined as Ci(𝐱i,𝐱i)=n=1,niN(𝐱i𝐳n)22C_{i}(\mathbf{x}_{i},\mathbf{x}_{-i})=\sum_{n=1,n\neq i}^{N^{\prime}}\|(\mathbf{x}_{i}-\mathbf{z}_{n})\|_{2}^{2}, where 𝐳n\mathbf{z}_{n} defines a point that is on the axis between a sensed robot nn and robot ii and is distance RR from robot nn (e.g., if the distance between robots is less than RR, this cost will encourage robot ii to move away from robot nn). This is calculated from the position of robot ii using the estimated strategy 𝐱n\mathbf{x}_{n} of the neighboring robot nn, such that 𝐳n=R(𝐩i𝐱n)/(𝐩i𝐱n2)\mathbf{z}_{n}=R(\mathbf{p}_{i}-\mathbf{x}_{n})/(\|\mathbf{p}_{i}-\mathbf{x}_{n}\|_{2}).

Hyperparameters β1,β2,β3\beta_{1},\beta_{2},\beta_{3} balance the different influences and are valued such that β1Oβ3Nβ2\beta_{1}\geq O\beta_{3}-N^{\prime}\beta_{2}. β1\beta_{1} controls the weighting of progress towards the goal. This part of the cost function decreases as the ii-th robot makes progress towards the goal. β2\beta_{2} controls the weighting of cohesion with neighboring robots. This cohesion cost is minimized when the ii-th robot is at a distance RR from every neighbor. β3\beta_{3} controls the weighting of navigating to avoid obstacles. As robots move away from obstacles, this term minimizes the overall cost function.

III-B Solution and Analysis of the Game

In order to identify optimal behaviors for each robot, our approach constructs the described game 𝒢i\mathcal{G}_{i} for each robot i{1,,N}i\in\{1,\dots,N\} at each time step. Algorithm 1 solves the game required to enable decentralized and communication-free multi-robot navigation. The algorithm identifies the players involved in each game 𝒢i\mathcal{G}_{i}, consisting of only the local neighbors of robot ii, and initializes a null strategy for the main player. Then, it minimizes each neighboring player’s cost function to identify optimal strategies 𝐱i\mathbf{x}_{-i}^{*}. From this, it updates the main player’s cost function and identifies the optimal strategy 𝐱i\mathbf{x}_{i}^{*} that minimizes it. When Algorithm 1 converges after iteration, 𝐱i\mathbf{x}_{i}^{*} is adopted as the strategy for robot ii. In empirical evaluation, Algorithm 1 converges quickly, typically in less than NN^{\prime} iterations.

For each game, we have two objectives for the main player (denoted as robot ii). First, our approach should minimize the cost function Ji(𝐱i,𝐱i)J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i}) described in Eq. (2), in order to identify the best strategy for this main player. Second, our approach should generate strategies that move in the direction of a Nash equilibrium for the game, or a state where no player can improve their strategy without other players changing their strategies. If every player acts rationally, as we are assuming, then a Nash equilibrium means that robot ii has identified a strategy 𝐱i\mathbf{x}_{i} that cannot be improved upon, assuming its expectations of its neighbors’ optimal strategies 𝐱i\mathbf{x}_{-i}^{*} are accurate.

III-B1 Minimizing the Cost Function

First, we consider the objective of minimizing the cost function Ji(𝐱i,𝐱i)J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i}) for the main player of the game 𝒢i\mathcal{G}_{i}. We have defined our game 𝒢\mathcal{G} and our cost functions 𝒥\mathcal{J} as differentiable, and so we are able to utilize derivative-based methods in order to identify minima which correspond to equilibria in the game. To take advantage of this fact, we define an action 𝐚i\mathbf{a}_{i} that is part of a strategy 𝐱i\mathbf{x}_{i} with the gradient of the cost function:

𝐚i\displaystyle\mathbf{a}_{i} =iJi(𝐱i,𝐱i)\displaystyle=-\nabla_{i}J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i})\vspace{-7pt} (3)

with the exact gradient of Eq. (2) omitted for space. Just as gradient descent moves in the direction that minimizes a function, by defining our actions this way robots will take physical moves in the direction of minimizing their cost functions. Then, 𝐚i\mathbf{a}_{i} is normalized to be a unit vector by 𝐚i/𝐚i2\mathbf{a}_{i}/\|\mathbf{a}_{i}\|_{2} in order to represent real-world restrictions on robot movements at a single time step.

Input: A multi-robot system with NN robots.
Output: Optimal strategies 𝐱i\mathbf{x}_{i}^{*} for each robot.
1
2for i1i\leftarrow 1 to NN do
3       Identify positions of neighboring robots and obstacles within sensing distance of robot ii.
4      Construct game 𝒢i={𝒩,𝒳,𝒥}\mathcal{G}_{i}=\{\mathcal{N},\mathcal{X},\mathcal{J}\}.
5      Initialize strategy 𝐱i\mathbf{x}_{i} with a zero action 𝐚i\mathbf{a}_{i}.
6      repeat
7             Estimate strategies 𝐱i\mathbf{x}_{-i}^{*} for neighbors.
8            Update strategy 𝐱i\mathbf{x}_{i} to minimize Ji(𝐱i,𝐱i)J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i}^{*}).
9      until  convergence ;
10 end for
Algorithm 1 Solving the Decentralized Game.

We now introduce constraints on 𝒳\mathcal{X} to include explicit collision avoidance, both with obstacles and neighboring robots. These constraints ensure that the valid set of strategies 𝒳i\mathcal{X}_{i} for robot ii does not allow actions that would move it within the estimated position of a neighbor or obstacle:

Xi=\displaystyle X_{i}= {𝐱i:𝐱i=𝐩i+α𝐚i,𝐚i2=1,𝐱i𝐩j2τ,\displaystyle\Big{\{}\mathbf{x}_{i}:\mathbf{x}_{i}=\mathbf{p}_{i}+\alpha\mathbf{a}_{i},\|\mathbf{a}_{i}\|_{2}=1,\|\mathbf{x}_{i}-\mathbf{p}_{j}\|_{2}\geq\tau, (4)
𝐱i𝐨k2τj𝒩,ij,k{1,,O}}\displaystyle\;\;\|\mathbf{x}_{i}-\mathbf{o}_{k}\|_{2}\geq\tau\;\forall j\in\mathcal{N},i\neq j,k\in\{1,\dots,O\}\Big{\}}\vspace{-7pt}

where τ\tau is a distance defined as that where collision occurs and is dependent on the physical dimensions of the robots. Though we propose a general solution where valid movements can occur in any direction (as with a holonomic robot), additional constraints can be applied here in correspondence with a robot’s physical and kinematic limitations.

Given Eqs. (3) and (4), we prove in Theorem 1 that our approach minimizes the cost function of each robot.

Theorem 1.

The inner loop of Algorithm 1 minimizes the value of Ji(𝐱i,𝐱i)J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i}) at each iteration for the main player ii.

Proof.

Eq. (1) computes a strategy 𝐱i\mathbf{x}_{i} for the ii-th player by 𝐱i=𝐩i+α𝐚i\mathbf{x}_{i}=\mathbf{p}_{i}+\alpha\mathbf{a}_{i}. From Eq. (1), we can also define the null strategy 𝐱i0\mathbf{x}_{i}^{0} as that 𝐩i\mathbf{p}_{i} with an action 𝐚i=𝟎\mathbf{a}_{i}=\mathbf{0} is equal to 𝐱i\mathbf{x}_{i}, meaning that Ji(𝐱i0,𝐱i)J_{i}(\mathbf{x}_{i}^{0},\mathbf{x}_{-i}) returns the cost at the current position of robot ii. Finally, Eq. (3) computes 𝐚i=iJi(𝐱i,𝐱i)\mathbf{a}_{i}=-\nabla_{i}J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i}). Combining these terms together results in:

Ji(𝐱i,𝐱i)=Ji(𝐱i0,𝐱i)αiJi(𝐱i,𝐱i)\displaystyle J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i})=J_{i}(\mathbf{x}_{i}^{0},\mathbf{x}_{-i})-\alpha\nabla_{i}J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i})\vspace{-7pt} (5)

Thus, for sufficiently small step sizes of α\alpha, Eq. (5) is equivalent to gradient descent, and thus decreases the cost function:

Ji(𝐱it+1,𝐱it+1)Ji(𝐱it,𝐱it)\displaystyle J_{i}(\mathbf{x}_{i}^{t+1},\mathbf{x}_{-i}^{t+1})\leq J_{i}(\mathbf{x}_{i}^{t},\mathbf{x}_{-i}^{t})\vspace{-7pt} (6)

Further proof that gradient descent decreases function values is omitted for brevity. ∎

III-B2 Approaching a Nash Equilibrium

We now prove the objective of generating strategies for all players i𝒩i\in\mathcal{N} such that each game moves in the direction of a Nash equilibrium. A Nash equilibrium results when no single agent can improve its position without other agents changing their strategies. In our case, when all agents are rational in being motivated to minimize their defined cost function, the Nash equilibrium is also the optimal solution to a game. Formally, a Nash equilibrium is achieved when an optimal strategy 𝐱i\mathbf{x}_{i}^{*} is identified such that:

Ji(𝐱i,𝐱i)Ji(𝐱i,𝐱i)𝐱i𝒳i,i𝒩\displaystyle J_{i}(\mathbf{x}_{i}^{*},\mathbf{x}_{-i}^{*})\leq J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i}^{*})\quad\forall\mathbf{x}_{i}\in\mathcal{X}_{i},\;i\in\mathcal{N}\vspace{-7pt} (7)

Mathematically, a Nash equilibrium occurs when the first derivative of Ji(𝐱i,𝐱i)𝒥J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i})\in\mathcal{J} equals 𝟎\mathbf{0}:

iJi(𝐱i,𝐱i)=𝟎i𝒩\displaystyle\nabla_{i}J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i})=\mathbf{0}\quad\forall i\in\mathcal{N}\vspace{-7pt} (8)

The set of first derivatives of all cost functions is defined as the game gradient:

𝒥=[(1J1(𝐱i,𝐱i)),,(NJN(𝐱i,𝐱i))]\displaystyle\nabla\mathcal{J}=[(\nabla_{1}J_{1}(\mathbf{x}_{i},\mathbf{x}_{-i}))^{\top},\dots,(\nabla_{N^{\prime}}J_{N^{\prime}}(\mathbf{x}_{i},\mathbf{x}_{-i}))^{\top}]^{\top}\vspace{-7pt}

It is necessary for a Nash equilibrium that the game gradient 𝒥=𝟎\nabla\mathcal{J}=\mathbf{0}. Additionally, a stable Nash equilibrium occurs when the second derivative of each cost function is positive; that is, when the Jacobian of the cost functions is positive, or:

i2Ji(𝐱i,𝐱i)𝟎i𝒩\displaystyle\nabla_{i}^{2}J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i})\geq\mathbf{0}\quad\forall i\in\mathcal{N}\vspace{-7pt} (9)

In light of the above rigorous conditions, and given that all Ji(𝐱i,𝐱i)J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i}) defined by Eq. (2) are parameterized by the cascading strategies of neighboring robots (i.e., when a neighboring robot moves, it effects the cost functions of all nearby robots), a Nash equilibrium may not be theoretically possible to achieve in an individual game (i.e., since a game is played at each time step and robots can only move so far physically at a time step). Instead, we prove that strategies identified by Algorithm 1 approach a stable Nash equilibrium.

Theorem 2.

Algorithm 1 decides strategies for each player i𝒩i\in\mathcal{N} such that 𝒢n\mathcal{G}_{n} approaches a stable Nash equilibrium.

Proof.

In Eq. (3), we define the action 𝐚i\mathbf{a}_{i} taken using the gradient of the cost function. From Theorem 1, we have proven that defining our action in such a way minimizes the cost function and thus approaches iJi(𝐱i,𝐱i)=𝟎\nabla_{i}J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i})=\mathbf{0}. By taking this action (or an action in the direction of it, after normalizing to unit length), the ii-th robot moves in the direction of minimizing this cost function. As Theorem 1 holds for all i𝒩i\in\mathcal{N}, the result of applying Algorithm 1 for each robot is that:

iJi(𝐱it,𝐱it)𝟎asti𝒩\displaystyle\nabla_{i}J_{i}(\mathbf{x}_{i}^{t},\mathbf{x}_{-i}^{t})\to\mathbf{0}\;\text{as}\;t\to\infty\quad\forall i\in\mathcal{N}\vspace{-7pt} (10)

Thus, our approach results in each game approaching a Nash equilibrium.

Refer to caption
(a) Obstacle Field
Refer to caption
(b) Area Obstacles
Refer to caption
(c) Dynamic Goal
Figure 2: Qualitative results showing the navigation of a multi-robot system through an instance of each scenario. Robots are represented by black triangles, with their paths shown in light grey. The goal positions are marked by red ‘x’ marks, with obstacles marked in blue.

Additionally, we earlier define that the hyperparameters of the cost function must be chosen such that β1Oβ3Nβ2\beta_{1}\geq O\beta_{3}-N^{\prime}\beta_{2}. In order for a Nash equilibrium to be stable, the Jacobian of the cost functions is positive, requiring i2Ji(𝐱i,𝐱i)𝟎i𝒩\nabla_{i}^{2}J_{i}(\mathbf{x}_{i},\mathbf{x}_{-i})\geq\mathbf{0}\quad\forall i\in\mathcal{N}. The second derivative of our cost function is:

i2Ji=2β1+2Nβ22Oβ3\displaystyle\nabla_{i}^{2}J_{i}=2\beta_{1}+2N^{\prime}\beta_{2}-2O\beta_{3}\vspace{-7pt} (11)

Therefore, when the above condition for setting the hyperparameter values is met, the Jacobian of the cost function is positive. Thus, each robot ii in the game 𝒢n\mathcal{G}_{n} moves in the direction of a stable Nash equilibrium, which corresponds to an optimal movement for each robot. ∎

Theorems 1 and 2 prove that our approach minimizes the cost function for each robot in each game and that every game moves robots in the direction of a stable Nash equilibrium, or an optimal strategy for each robot. Thus, our approach enables optimal multi-robot navigation with our defined cost function which moves towards the goal, maintains ideal system cohesion, and avoids collisions with obstacles.

IV Experimental Results

We perform extensive evaluation of our approach, comparing it to two baseline versions (setting either β2\beta_{2} or β3\beta_{3} equal to 0), a greedy approach that moves towards the goal without consideration of obstacles or collisions, and a random approach to provide a worst case comparison (as widely used in other works [23, 10]). Both the greedy and random approaches are restricted to be collision-free, similar to the strategy constraints in Eq. (4). We also evaluate with added sensor noise to make robots less certain of the locations of their neighbors and obstacles. Each scenario is simulated 100 times to gain an understanding of performance with a variety of starting formations, obstacle arrangements, and goal locations.

We evaluate and compare our approach using four metrics. First, Cohesion quantifies the average distance between all robots in the system. Ideally, the cohesion distance of a system is equal to RR, the defined interaction distance that avoids collisions and preserves cohesion. Second, Connectivity quantifies the connectivity of the system. We define a binary adjacency matrix where the ijij-th element equals 11 if the ii-th and jj-th robot are within sensing distance of each other, and 0 otherwise. We then calculate the 2nd smallest eigenvalue of the Laplacian of this, known as the algebraic connectivity of the graph [41]. We present this metric as a percentage, where 0 indicates the system is not connected and 100100 indicates the system is fully connected. Though our approach does not require communication, connectivity has other benefits, such as relaying information about malfunctions or events. Third, the Nearest Obstacle reports the average nearest distance to an obstacle at each iteration. Fourth, the Distance to Goal shows the progress of each compared approach towards the goal position, showing 𝐩+𝐠2\|\mathbf{p}_{+}-\mathbf{g}\|_{2} at each time step.

IV-A Evaluation in Obstacle Field

We first evaluated the Obstacle Field scenario seen in Figure 2(a), with quantitative results seen in Figure 3. We see that our full approach completes simulations with the closest to the ideal interaction distance in Figure 3(a), while maintaining it well as time progresses. The β3=0\beta_{3}=0 approach performs similarly well, as this still considers system cohesion. The greedy approach and β2=0\beta_{2}=0 approach, which do not value cohesion, result in robots grouped much too close together. Figure 3(b) displays the connectivity status achieved by different methods over the course of the simulation. Both our full approach and our β3=0\beta_{3}=0 approach maintain connectivity well, as does the greedy approach, which indirect causes robots to move towards one another at the goal. Our β2=0\beta_{2}=0 approach is less able to maintain a connected system as it moves through obstacles due to it not valuing spacing between robots, but increases connectivity as the system reaches the goal position.

Refer to caption
(a) Cohesion
Refer to caption
(b) Connectivity
Refer to caption
(c) Nearest Obstacle
Refer to caption
(d) Distance to Goal
Figure 3: Results for the Obstacle Field scenario. Results for our full approach are marked with a solid black line. Comparison approaches are marked with dashed lines, with β2=0\beta_{2}=0 in green, β3=0\beta_{3}=0 in red, 10%10\% noise in blue, random in black, and greedy in purple. These colors are also used later. [Best viewed in color.]
Refer to caption
(a) Cohesion
Refer to caption
(b) Connectivity
Refer to caption
(c) Nearest Obstacle
Refer to caption
(d) Distance to Goal
Figure 4: Results for the Area Obstacles scenario. [Best viewed in color.]
Refer to caption
(a) Cohesion
Refer to caption
(b) Connectivity
Refer to caption
(c) Nearest Obstacle
Refer to caption
(d) Distance to Goal
Figure 5: Results for the Dynamic Goal scenario. [Best viewed in color.]

Figure 3(c) shows the average nearest obstacle to any robot. Here, we see that our β3=0\beta_{3}=0 approach performs poorly for the first time, with it and the greedy approach moving very near to obstacles. Our full approach and our β2=0\beta_{2}=0 approach perform well, as they value distance from obstacles. Finally, Figure 3(d) shows the distance to the goal over time for each approach. Only the random approach fails to make progress towards the goal. Overall, we see that our full approach performs the best, making progress towards the goal while staying close to the ideal system cohesion, maintaining connectivity, and avoiding obstacles. Each baseline version performs poorly somewhere, with β2=0\beta_{2}=0 maintaining a poor interaction distance and connectivity and β3=0\beta_{3}=0 navigating much closer to obstacles.

IV-B Evaluation in Area Obstacles

We next evaluated the Area Obstacles scenario seen in Figure 2(b), in which large square obstacles lie between the multi-robot system and the goal position. These larger obstacles interfere with robot movement much more than point obstacles, and so cause larger variation in evaluation. First, in Figure 4(a) we see that our approach is the only one capable of ending a simulation with robots distributed at the ideal cohesion distance. The alternate versions of our approach result in robots grouped much too closely together, while the greedy approach ends with robots spaced too far apart, as robots become stuck navigating through the large obstacles. Figure 4(b) shows a similar result when considering the connectivity status. All approaches struggle to maintain a fully connected system as the robots move through the large obstacles. Our approach and its variations perform the best, ensuring the multi-robot system stays together instead of separating around obstacles and resulting in the highest connectivity scores. The greedy approach, which attains high connectivity through simple point obstacles, has much more trouble maintaining connectivity as robots become stuck without an algorithm to plan navigational behaviors around the large obstacles.

In Figure 4(c) we see the distance to the nearest obstacle as robots progress through the environment. As with the Obstacle Field scenario, our full approach and the β2=0\beta_{2}=0 approach are able to maintain significant distance from the large obstacles, showing the importance of the obstacle avoidance term in the cost function. Again we see the β3=0\beta_{3}=0 and greedy approaches result in the lowest average distance to an obstacle, indicating a high risk of being stuck behind an obstacle or nearing a collision. Finally, in Figure 4(d) we see that our approach and its variations are the only approaches able to consistently reach a goal position through the Area Obstacles. These approaches, because of their ability to plan actions in the context of their environments and their neighbors, allow the multi-robot system to effectively navigate the large obstacles. The greedy approach, as can be expected by its poor maintenance of system cohesion and its nearness to obstacles, regularly fails to navigate the entire system to the goal position.

IV-C Evaluation with Dynamic Goal

Finally, we evaluate the scenario seen in Figure 2(c) in which there is a Dynamic Goal, as if the multi-robot system was tasked with following waypoints or a human through an environment. We see similar results to earlier scenarios. The naive greedy approach causes robots to bunch together, maintaining cohesion but making the system at risk for collisions. Our baseline approaches again each struggle in the cost areas that their hyperparameters omit. When β2=0\beta_{2}=0, we see that the system maintains connectivity at the expense of allowing robots to become dangerously close together. When β3=0\beta_{3}=0, we see that the system maintains a safe distance between robots but risks collisions with obstacles.

However, our full approach again balances the evaluated metrics. It is able to maintain an ideal interaction distance among robots, keeping them at a safe distance to avoid collisions but still achieving full connectivity. This full approach is also able to maintain a safe distance from obstacles in the environment, while still making progress towards the waypoints.

IV-D Robustness to Noise

In order to understand how our approach would perform with the noise inherent in physical robot operations (i.e., uncertainty about the positions of detected neighbors and obstacles), we conduct evaluations with locations in the environment modified with 10%10\% sensor noise. Results with this added noise can be seen in blue in Figures 3-5. We can see that our approach does lose performance when noise is added, but is still able to outperform both of our baseline methods and the greedy method. Due to robots being able to play their games before each action, errors in observation do not build up or persist and cause long term issues. Additionally, as our approach is decentralized, errors from the added noise do not propagate through the system, and are restricted to individual robots. Accordingly, our approach is very robust to the sensor and movement noise encountered in physical multi-robot systems.

V Conclusion

In this paper, we propose a novel game theoretic method for decentralized and communication-free multi-robot navigation. Our approach is proven to identify strategies that approach a stable Nash equilibrium, resulting in optimal behaviors for each robot. We show that our approach enables a multi-robot system to navigate to a goal position, with changes to hyperparameter values showing the importance of terms that encourage both avoiding obstacles and collisions with neighboring robots and maintaining connectivity, while remaining robust to errors from sensor noise.

References

  • [1] M. Erdelj, M. Król, and E. Natalizio, “Wireless sensor networks and multi-UAV systems for natural disaster management,” Computer Networks, vol. 124, pp. 72–86, 2017.
  • [2] B. Bayat, N. Crasta, A. Crespi, A. M. Pascoal, and A. Ijspeert, “Environmental monitoring using autonomous vehicles: a survey of recent searching techniques,” Current Opinion in Biotechnology, vol. 45, pp. 76–84, 2017.
  • [3] A. Goldhoorn, A. Garrell, R. Alquézar, and A. Sanfeliu, “Searching and tracking people with cooperative mobile robots,” Autonomous Robots, vol. 42, no. 4, pp. 739–759, 2018.
  • [4] T. Manderson, J. C. Gamboa, S. Wapnick, J.-F. Tremblay, F. Shkurti, D. Meger, and G. Dudek, “Vision-based goal-conditioned policies for underwater navigation in the presence of obstacles,” in Robotics: Science and Systems (RSS), 2020.
  • [5] M. Smyrnakis and S. M. Veres, “Coordination of control in robot teams using game-theoretic learning,” IFAC Proceedings Volumes, vol. 47, no. 3, pp. 1194–1202, 2014.
  • [6] J. Banfi, N. Basilico, and S. Carpin, “Optimal redeployment of multirobot teams for communication maintenance,” in International Conference on Intelligent Robots and Systems (IROS), 2018.
  • [7] W. Luo and K. Sycara, “Minimum k-connectivity maintenance for robust multi-robot systems,” in International Conference on Intelligent Robots and Systems (IROS), 2019.
  • [8] J. Li, W. Abbas, M. Shabbir, and X. Koutsoukos, “Resilient distributed diffusion for multi-robot systems using centerpoint,” in Robotics: Science and Systems (RSS), 2020.
  • [9] M. Aranda, J. A. Corrales, and Y. Mezouar, “Deformation-based shape control with a multirobot system,” in International Conference on Robotics and Automation (ICRA), 2019.
  • [10] B. Reily, C. Reardon, and H. Zhang, “Leading multi-agent teams to multiple goals while maintaining communication,” in Robotics: Science and Systems (RSS), 2020.
  • [11] A. Varava, K. Hang, D. Kragic, and F. T. Pokorny, “Herding by caging: a topological approach towards guiding moving agents via mobile robots.,” in Robotics: Science and Systems (RSS), 2017.
  • [12] J. Lin, X. Yang, P. Zheng, and H. Cheng, “Connectivity guaranteed multi-robot navigation via deep reinforcement learning,” in Conference on Robot Learning (CoRL), 2019.
  • [13] E. Tolstaya, F. Gama, J. Paulos, G. Pappas, V. Kumar, and A. Ribeiro, “Learning decentralized controllers for robot swarms with graph neural networks,” in Conference on Robot Learning (CoRL), 2019.
  • [14] R. Emery-Montemerlo, Game-Theoretic Control for Robot Teams. PhD thesis, Rutgers, The State University of New Jersey, 2005.
  • [15] R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun, “Game theoretic control for robot teams,” in International Conference on Robotics and Automation (ICRA), 2005.
  • [16] S. Bansal, J. Xu, A. Howard, and C. Isbell, “A bayesian framework for nash equilibrium inference in human-robot parallel play,” in Robotics: Science and Systems (RSS), 2020.
  • [17] G. Ding, S. Aghli, C. Heckman, and L. Chen, “Game-theoretic cooperative lane changing using data-driven models,” in International Conference on Intelligent Robots and Systems (IROS), 2018.
  • [18] A. Liniger and L. Van Gool, “Safe motion planning for autonomous driving using an adversarial road model,” in Robotics: Science and Systems (RSS), 2020.
  • [19] M. Wang, Z. Wang, J. Talbot, J. C. Gerdes, and M. Schwager, “Game theoretic planning for self-driving cars in competitive scenarios,” in Robotics: Science and Systems (RSS), 2019.
  • [20] E. Hernández, A. Barrientos, and J. Del Cerro, “Selective smooth fictitious play: An approach based on game theory for patrolling infrastructures with a multi-robot system,” Expert Systems with Applications, vol. 41, no. 6, pp. 2897–2913, 2014.
  • [21] J. Alonso-Mora, E. Montijano, M. Schwager, and D. Rus, “Distributed multi-robot formation control among obstacles: A geometric and optimization approach with consensus,” in International Conference on Robotics and Automation (ICRA), 2016.
  • [22] L. Sabattini, C. Secchi, M. Lotti, and C. Fantuzzi, “Coordinated motion for multi-robot systems under time varying communication topologies,” in International Conference on Robotics and Automation (ICRA), 2016.
  • [23] M. Kwon, M. Li, A. Bucquet, and D. Sadigh, “Influencing leading and following in human-robot teams,” in Robotics: Science and Systems (RSS), 2019.
  • [24] C. Vrohidis, C. P. Bechlioulis, and K. J. Kyriakopoulos, “Safe decentralized and reconfigurable multi-agent control with guaranteed convergence,” in International Conference on Robotics and Automation (ICRA), 2017.
  • [25] A. Khan, E. Tolstaya, A. Ribeiro, and V. Kumar, “Graph policy gradients for large scale robot control,” in Conference on Robot Learning (CoRL), 2019.
  • [26] F. Djeumou, Z. Xu, and U. Topcu, “Probabilistic swarm guidance subject to graph temporal logic specifications,” in Robotics: Science and Systems (RSS), 2020.
  • [27] J. Lin, X. Yang, P. Zheng, and H. Cheng, “End-to-end decentralized multi-robot navigation in unknown complex environments via deep reinforcement learning,” in International Conference on Mechatronics and Automation (ICMA), 2019.
  • [28] P. Mertikopoulos and W. H. Sandholm, “Learning in games via reinforcement and regularization,” Mathematics of Operations Research, vol. 41, no. 4, pp. 1297–1324, 2016.
  • [29] M. T. Basar, A. Haurie, and G. Zaccour, “Nonzero-sum differential games,” in Handbook of Dynamic Game Theory, pp. 61–110, 2018.
  • [30] J. Y. Halpern and Y. Moses, “Characterizing solution concepts in terms of common knowledge of rationality,” International Journal of Game Theory, vol. 46, no. 2, pp. 457–473, 2017.
  • [31] C. Robin and S. Lacroix, “Multi-robot target detection and tracking: taxonomy and survey,” Autonomous Robots, vol. 40, no. 4, pp. 729–760, 2016.
  • [32] J. F. Fisac, E. Bronstein, E. Stefansson, D. Sadigh, S. S. Sastry, and A. D. Dragan, “Hierarchical game-theoretic planning for autonomous vehicles,” in International Conference on Robotics and Automation (ICRA), 2019.
  • [33] S. L. Cleac’h, M. Schwager, and Z. Manchester, “Algames: A fast solver for constrained dynamic games,” in Robotics: Science and Systems (RSS), 2020.
  • [34] Z. Wang, R. Spica, and M. Schwager, “Game theoretic motion planning for multi-robot racing,” in Distributed Autonomous Robotic Systems, pp. 225–238, 2019.
  • [35] J. J. Roldán, J. Del Cerro, and A. Barrientos, “Should we compete or should we cooperate? applying game theory to task allocation in drone swarms,” in International Conference on Intelligent Robots and Systems (IROS), 2018.
  • [36] A. Kanakia, B. Touri, and N. Correll, “Modeling multi-robot task allocation with limited information as global game,” Swarm Intelligence, vol. 10, no. 2, pp. 147–160, 2016.
  • [37] T. Gao and S. Bhattacharya, “Multirobot charging strategies: A game-theoretic approach,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2823–2830, 2019.
  • [38] W. Dai, H. Lu, J. Xiao, and Z. Zheng, “Task allocation without communication based on incomplete information game theory for multi-robot systems,” Journal of Intelligent & Robotic Systems, vol. 94, no. 3-4, pp. 841–856, 2019.
  • [39] Y. Meng, “Multi-robot searching using game-theory based approach,” International Journal of Advanced Robotic Systems, vol. 5, no. 4, p. 44, 2008.
  • [40] V. Ramaswamy, S. Moon, E. W. Frew, and N. Ahmed, “Mutual information based communication aware path planning: A game theoretic perspective,” in International Conference on Intelligent Robots and Systems (IROS), 2016.
  • [41] M. Fiedler, “Algebraic connectivity of graphs,” Czechoslovak Mathematical Journal, vol. 23, no. 2, pp. 298–305, 1973.