vario\fancyrefseclabelprefixSection #1 \frefformatvariothmTheorem #1 \frefformatvariolemLemma #1 \frefformatvariocorCorollary #1 \frefformatvariodefDefinition #1 \frefformatvarioobsObservation #1 \frefformatvarioasmAssumption #1 \frefformatvarioasmsAssumptions #1 \frefformatvario\fancyreffiglabelprefixFigure #1 \frefformatvarioappAppendix #1 \frefformatvariopropProposition #1 \frefformatvarioalgAlgorithm #1 \frefformatvario\fancyrefeqlabelprefix(#1) \frefformatvariotblTable #1 \frefformatvarioestiEstimator #1
Decentralized Multi-Agent Active Search and Tracking
when Targets Outnumber Agents
Abstract
Multi-agent multi-target tracking has a wide range of applications, including wildlife patrolling, security surveillance or environment monitoring. Such algorithms often make restrictive assumptions: the number of targets and/or their initial locations may be assumed known, or agents may be pre-assigned to monitor disjoint partitions of the environment, reducing the burden of exploration. This also limits applicability when there are fewer agents than targets, since agents are unable to continuously follow the targets in their fields of view. Multi-agent tracking algorithms additionally assume inter-agent synchronization of observations, or the presence of a central controller to coordinate joint actions. Instead, we focus on the setting of decentralized multi-agent, multi-target, simultaneous active search-and-tracking with asynchronous inter-agent communication. Our proposed algorithm uses a sequential monte carlo implementation of the probability hypothesis density filter for posterior inference combined with Thompson sampling for decentralized multi-agent decision making. We compare different action selection policies, focusing on scenarios where targets outnumber agents. In simulation, we demonstrate that is robust to unreliable inter-agent communication and outperforms information-greedy baselines in terms of the Optimal Sub-Pattern Assignment (OSPA) metric for different numbers of targets and varying teamsizes.
I Introduction
Searching for targets, detecting objects of interest (OOIs), localizing and following them are tasks integral to several robotics applications. Hence one or more of these sub-problems have been widely studied in a number of settings. For example, in informative path planning [1] or simultaneous localization and mapping (SLAM) [2], when the OOIs are fixed, an agent (or robot) adaptively selects actions to detect and localize the targets. Here the environment (or, the target distribution) being stationary, an agent tends to explore unseen parts of its surroundings more than exploit already observed viewpoints. In contrast, when targets are dynamic, the environment is non-stationary. Therefore, agents tracking an unknown number of moving targets should trade-off between exploring the possibly unobserved parts of the environment and exploiting its own posterior estimates to localize the previously detected targets at the current timestep. Unfortunately, prior work in multi-target tracking (MTT) has often assumed that the environment is known and exploration is not of primary concern [3]. Moreover, with multiple agents, existing MTT methods simplify the explore-exploit dilemma by separating search and tracking into sequential tasks where each agent is assigned to track a target as soon as it is found [4]. Another approach is to assign sub-teams for executing these tasks separately [5]. Further, the majority of these multi-agent multi-target tracking (MAMTT) algorithms require either a central controller to coordinate joint tracking actions, facilitate target hand-offs among agents or they depend on synchronized inter-agent communication for distributed inference and decision making. Unfortunately, such conditions may not be feasible in practice: the environment may be unstructured and unknown, OOIs may need to be simultaneously detected and localized (i.e. without a separation between search and tracking phases), there may not be enough agents to continuously monitor all the targets and unreliable communication channels may hinder inter-agent synchronization at each timestep.
In this work, we aim to develop a more practical approach to the MAMTT setup. In particular, we focus on the setting where agents are outnumbered by targets, so that the multi-agent team is unable to continuously cover all targets in their fields of view. The number of targets and their initial locations are unknown. Therefore, agents need to interact with the environment to collect observations by adaptively making explore-exploit decisions. It is not feasible to continuously track all targets, but our goal is to produce an estimate of the number and positions of all targets with time. We assume that there is no central controller, and agents share their observations asynchronously with teammates, whenever possible. In other words, agents do not wait to receive observations, action selection policy or environment belief information from their teammates and can continue their online decision making when communication is unreliable or even unavailable.
Contributions. We propose a decentralized and asynchronous multi-agent algorithm, called (Decentralized Multi-Agent Active Search-and-Tracking without continuous coverage) for simultaneous multi-target active search and tracking without continuously following the targets. In simulation, we compare a number of common decision making objectives from the tracking literature after adapting them to our simultaneous search-and-tracking setting. Our results show that outperforms competitive baselines with different teamsizes and different target distributions in terms of the OSPA tracking performance.
II Related Works
Target detection and tracking are both widely studied problems, typically considered as distinct tasks in various applications like search and rescue [6], security surveillance [7], computer games [8], etc. [3] is a detailed survey of the many different approaches and taxonomy used in robotics and related fields for such scenarios. Here, we discuss some of the most relevant prior work in our context.
The single target state is commonly modeled assuming linear dynamics with additive Gaussian noise, using the Kalman filter [9] or using non-parametric particle filters [10]. In multi-target settings, alternative approaches like Multiple Hypothesis Tracker (MHT) [11], Joint Probabilistic Data Association (JPDA) [12] and Probability Hypothesis Density (PHD) filter [13] have been proposed, all of which differ in how they perform data association [14]. The PHD filter is particularly suited when unique identities for each target are not required, for example, in search and rescue tasks, where agents should detect and localize all survivors. In this work, we build on the Sequential Monte Carlo (SMC) formulation of the PHD filter presented in [15].
Prior work in MAMTT algorithms primarily considers centralized or distributed settings, the latter still necessitating synchronized communication among subgroups of agents at each time step. Coupled with a PHD filter, some of the common action selection methods previously proposed for tracking include mutual information and expected count based objectives [16], Renyi divergence maximization [4] and Lloyd’s algorithm for Voronoi-cell based control [17, 18]. [4] discusses the benefits of a simultaneous search-and-tracking algorithm but their proposed method requires that agents transition from searching to tracking upon target detection, foregoing further exploration. In similar spirit, [19] proposes solving an information gain based multi-objective optimization problem over a finite planning horizon to decide the best (greedy) joint action for a centralized search-and-tracking task. In contrast with these deterministic objectives, [20] demonstrates the superior performance of stochastic optimization methods like Particle Swarm Optimization (PSO) and Simulated Annealing (SA) for better coverage and localization in such settings. Unfortunately, none of these prior approaches are applied in the decentralized and asynchronous multi-agent setup when agents are unable to support continuous target coverage.
Recently, some learning based approaches have been proposed for tracking. [21] considers the single-agent setting, assuming a known number of (one or two) targets which only start moving after first observed by the agent. [22, 23] both propose GNN-based algorithms, trained by imitation from a centralized expert, and deployed in the distributed inference and decision making with synchronized communication setup. While [22] does not consider dynamic targets, [23] simplifies the problem to deterministic optimal control over a fixed horizon, differing from our more complex setting.
III Problem Setup
Consider a team of UAVs tasked with search and tracking of an unknown number of moving targets in a 2-dimensional (2D) region of length and width (Fig. 1(a)). The agents can self-localize. They are equipped with noisy sensors that provide location 2D coordinates of possible targets in their current field of view (FOV). The targets can move in any direction in the search space at different speeds. We assume that agents typically move faster than targets. Each agent’s FOV includes a contiguous rectangular region of the search space, and agents may choose to observe a wider (smaller) area at a greater (lower) vertical height but with more (less) observation noise. We therefore consider a hierarchical region sensing action space for each agent. Agents can communicate asynchronously with their teammates. Over time , agents observe different parts of the search space to detect and track all targets in the environment.


III-A Target and Measurement Representations
The state of each target is denoted by , where 2D coordinates and velocities . Since both the number of targets and their true locations are unknown, we follow the Random Finite Set (RFS) representation for the multi-target state space , where follows a Poisson distribution and for a given cardinality, the set elements are sampled i.i.d from a uniform distribution [24]. Following prior work, we use the Particle Hypothesis Density (PHD) filter [13] to maintain a belief over the RFS .
The PHD is the first statistical moment of a distribution over RFSs. In this case, it is a density over the state space of targets so that for any region , the expected cardinality of the target RFS in that region is . The PHD filter tracks the evolving target density over the search space using models of target motion and measurements gathered by the agents. The measurements are also modeled as a (Poisson) RFS, as are clutter (false positives) and target births .
III-B Sensing model
An agent with pose executes a sensing action , receiving a measurement set . Any target x within the agent’s FOV may generate a measurement , with a probability of detection . In this work, we assume a constant when the target x is within the FOV at , and otherwise. The agent follows a linear sensing model with additive i.i.d white noise: , where and . Additionally, includes i.i.d false positives with clutter rate .
The (noisy) target dynamics from state to x is captured by the target motion model . The survival probability denotes the target’s chances of persisting over successive time steps. The PHD filter formulates the following steps to propagate the posterior density over target states.111For a detailed understanding of the PHD filter, please refer to [24].
Prediction: | (1) | |||
Update: | (2) | |||
(3) | ||||
(4) |
Here, is the probability that the agent at receives the measurement from a target x and is the measurement likelihood model. The PHD filter can handle appearing and disappearing targets by defining an appropriate birth density over the search space, but in our experiments the number of ground truth targets is fixed.
Remark 1. As [13] explains, using the first order moment to approximate the multi-target belief and deriving recursive PHD update equations to approximate the evolving posterior is justifiable when both sensor covariances and false alarm densities are small, so that (the distribution of) observations from true targets are centered around target states with negligible spread and there is lower noise due to false alarms. Besides, in our SMC-PHD implementation following [15], we also ensure we avoid particle impoverishment during update and propagation of the posterior density estimate.
Decentralized Asynchronous Multi-Agent Setup. In our setup, each agent maintains its PHD estimate and decides its next sensing action in a decentralized manner. There is no central controller and inter-agent communication is asynchronous (Fig. 1(b)). Note that agents are not independent, since they share their own observations and incorporate the measurements received from their teammates in subsequent PHD filter update steps. Our setting is also different from the distributed computation setup in prior work [17] where each agent completes a part of the centralized update step and relies on inter-agent synchronized communication to maintain a global PHD estimate across all agents.
Since targets are in continuous motion, our agents must be able to deal with the uncertainty arising from observation noise as well as due to asynchronous communication of time-dependant observations in their posterior PHD updates. In order to enable time-ordered assimilation of received observations by all agents, we assume that any agent communicates the tuple () where and are respectively the agent’s sensing action and measurement set at time .
IV Our approach
We now describe our algorithm for multi-agent active search and tracking without continuous coverage.
Notation. Agent at time has a history of available actions and observations . Using , it computes the PHD (Eqs. 1 and 2) over the target RFS. In our SMC-PHD implementation, where are the particles with weights .
The SMC-PHD filter propagation steps follow from [15].222We will make our code publicly available after publication. In our decentralized setup, each agent maintains its own posterior PHD . Next, we will describe the decision making step executed by agent at time .
Thompson sampling for decision making. Prior work in multi-agent active search with static targets has demonstrated the effectiveness of Thompson sampling (TS) as a decentralized decision making algorithm, both in theory [25] and in practice [26]. TS ensures stochasticity in decision making by sampling different plausible realizations of the ground truth from the posterior belief and selecting the best action to maximize the desired reward for a particular sample. The uncertainty in the agent’s belief over the state space is reflected in the posterior samples, which makes TS suitable for driving exploration and exploitation. [5] couples a TS-based active search strategy with the deterministic Lloyd’s algorithm for tracking, but in their setup, agents are pre-assigned to only one of search or tracking tasks. Instead, we propose a TS strategy to enable agents to naturally trade-off exploratory sensing actions that might discover undetected targets, with exploitative sensing actions that help localize and track the previously detected dynamic targets in our simultaneous search-and-tracking setting.
To the best of our knowledge, prior work has not studied the problem of TS in a continuous (not discretized) search space with a PHD posterior. This is challenging because the PHD is not a distribution and does not include second order uncertainty information, whereas TS is typically applied in the Bayesian setting with the samples drawn from a posterior distribution for which both first and second order moment estimates are available [27]. Prior work in [28] has proposed particle Thompson sampling (PTS) and regenerative PTS (RPTS) algorithms for particle filters where particles are sampled proportional to their weights. Therefore, we adopt a similar principle in our first proposed TS strategy for the SMC-PHD posterior, called TS-PHD-I (Algorithm 1).
We note that this method has drawbacks. It tends to sample more particles from the regions in the PHD where the agent already estimates targets might be present. The samples drawn are thus more likely to be biased against regions of the target state space where the agent might be less certain about its observations owing to false positives or missed detections. Furthermore, this method does a poor job of modeling the uncertainty about the number of true targets.
To address the drawbacks of Algorithm 1, we now describe a second proposed approach to Thompson sampling from our SMC-PHD posterior (Algorithm 2). Recall that the expected cardinality of the target RFS over a region is given by . In case of the SMC-PHD representation, [15] i.e. the sum of particle weights of the SMC-PHD in the region is the expected cardinality of in that region. Further, [13] shows that the PHD is the best Poisson approximation of the multitarget posterior in terms of KL divergence. We therefore draw a sample of the cardinality of the target RFS from a Poisson distribution with mean (i.e. ). Then we sample locations of the possible targets by drawing from a mixture of already estimated target locations in the PHD and some locations drawn uniformly at random over the search space. These particles form our TS.
Objective. The Optimal Sub-Pattern Assignment (OSPA) metric is typically used in the MTT literature for evaluating the tracking performance of an algorithm and is defined as the error between two sets. Given sets and , where without loss of generality,
where is the cut-off distance, and is the set of all permutations of the set . The distance error component of the OSPA computes the minimum cost assignment between and , such that is matched to only when they are within a distance of each other.
Given a true target set and an estimated set of possible target locations, our goal is to minimize . Since the ground truth is unknown, each agent instead draws a TS from the predicted PHD . Assuming observations are generated by for any action and is the estimated target set following the PHD filter update, agent then selects:
(5) |
Algorithm 3 outlines our proposed algorithm, called . In our decentralized and asynchronous multi-agent setting, each agent individually runs with its own sampled . Hence the stochasticity in the sampling procedure enables agents to make decentralized explore-exploit decisions for simultaneous search-and-tracking in their action space.
Remark 2. Prior work in search-and-tracking [4, 5] tends to separate the search and tracking phases of the task, and maintains either a visit count or dynamic occupancy grid to compute the action selection objective during the exploration phase. Such methods scale poorly with the size of the environment since agents need to maintain a discretization over the search space [19]. In contrast, our SMC inference for multi-target belief is parallelizable over particles in the posterior PHD, while our TS-based decision making is scalable with increasing teamsize .
V Results
We now describe our experimental setup. Consider a 2D search space with dimensions . There are targets moving in this region, whose starting locations and velocities are chosen uniformly at random, such that . A team of agents are tasked with search-and-tracking of all the targets over steps. The agents’ action space consists of hierarchical region sensing actions of width , , and , . Since actions with larger FOV receive noisier observations, we vary the false positive (clutter) rate as for action widths 1, 2, 4 and 8 respectively. The survival probability in the PHD filter is set at and the detection probability for targets in the agent’s FOV. In our SMC-PHD implementation following [15], we initialize 100 new (birth) particles per observation and re-sample 1000 particles per estimated target, following the low variance sampling method in [29]. We choose (Algorithm 1).
The agents assume the target motion model , where , and , . The sensing model is , where and , . is the identity matrix. For the OSPA metric, we set and .
Remark 3. Our experimental setup is intended to illustrate the abilities of for decentralized and asynchronous multi-agent multi-target search and tracking. The action space is chosen so that there is a non-trivial explore-exploit decision to be made by the agents. The maximum target speed is such that targets may cover a considerable distance in the search space over steps.
In the following experiments, we measure performance in terms of the average OSPA for the entire team of agents. The plots show mean across 10 random trials with the shaded regions indicating standard error. Each trial differs in the initialization of the target locations and their velocities.
V-A Comparing TS-PHD-I with TS-PHD-II
First, we compare the performance of using the two proposed approaches for Thompson sampling from a PHD posterior. Fig. 2 compares OSPA with number of measurements per agent, for -I and -II using TS-PHD-I and TS-PHD-II respectively in 3 of Algorithm 3. We observe that decision making with TS-PHD-II consistently outperforms that with TS-PHD-I. Since TS-PHD-II samples both the cardinality and locations of the target RFS from the PHD, the samples for different agents are sufficiently diverse to capture the uncertainty regarding the true multi-target ground truth. In contrast, the samples from TS-PHD-I are generally clustered around the agent’s current estimate of target locations. We also empirically observed an improvement in the OSPA performance with TS-PHD-II when, for a particular sampled cardinality , an agent considers multiple samples of target locations and averages the reward in Eq. 5 over them. Our results using -II consider 10 such samples per action selection step for any agent . We also allow -I to similarly consider averaging over multiple samples, but this does not improve the sample diversity and does not lead to performance improvement.
Fig. 2 further demonstrates the scalability of TS in the decentralized multi-agent active search-and-tracking setting. When teamsize increases times, agents achieve similar OSPA with times fewer measurements per agent [25].


V-B Baseline comparisons
We compare -I and -II with the following baselines. Note that all of them use the same PHD filter inference method, but differ in the action selection policy.
1) RANDOM. Each agent selects its next sensing action uniformly at random. 2) RENYI. At , agent computes the predicted PHD and generates a (pseudo) measurement set for any action assuming the estimated target set from as ground truth. It then selects the action that maximizes the Renyi divergence (with ) between and its expected updated PHD (Eq. 2). With the SMC-PHD formulation, the Renyi divergence is [30]:
(6) |
where and are the weights of the particle in and respectively. 3) TS-RENYI. We modify RENYI to use (with TS-PHD-II) for computing the (pseudo) measurement set and the updated weights .
Fig. 3 shows that our proposed method outperforms all the baselines for different number of targets and team sizes . We observe that RENYI agents are information-greedy, therefore the lack of stochasticity in their decision making objective leads different agents to select the same action in the decentralized asynchronous multi-agent setting. Moreover, the computation in Eq. 6 depends only on the particles in and does not account for previously undetected targets. This highlights the drawback of using Renyi divergence as an optimization objective for explore-exploit decisions in the search-and-tracking setting, in contrast with its success in the tracking setting where exploration is not a concern [4]. This motivated us to propose the TS-RENYI baseline in order to encourage exploration with samples drawn from TS-PHD-II. We observe that TS-RENYI still does not perform noticeably better than RENYI. This is because the weights of the particles in the SMC-PHD filter relate to the expected cardinality of the target set, therefore Eq. 6 does not account for any measure of the distance error between (or ) and the estimate from . In contrast, the OSPA objective accounts for both localization error as well as cardinality error in the estimated target set. Thus we observe that our algorithm -I is competitive with or outperforms random sensing and information-greedy baselines, and -II consistently achieves the lowest OSPA among all with the same number of measurements per agent. Based on these results, we consider -II as our best approach in this setting, labeled in the following experiments.




DecSTER vs. DecSTER-C. Motivated by prior work [16] that showed the effectiveness of maximizing the expected number of target detections for action selection in multi-agent tracking, we introduce the -C baseline where agents select actions minimizing only the cardinality error term of the OSPA. Unlike , each agent with -C draws multiple samples of cardinality to consider the uncertainty about the real number of targets in its objective. Fig. 4 shows that still outperforms -C, indicating that jointly considering detection and localization error in the decision making objective is more advantageous for TS-guided action selection in our search-and-tracking setting. This further supports our earlier observations regarding the drawbacks of using the particle weight based Renyi divergence optimization objective in this setting.


V-C Robustness to communication delays
Multi-agent systems benefit from leveraging observations shared by their teammates. Agents in our decentralized and asynchronous multi-agent setting benefit from any information shared by their teammates but they can continue searching for and tracking targets without waiting for such communication. Therefore, we now analyze the robustness of under unreliable inter-agent communication. In simulation, we consider each agent chooses to communicate its own observation at time , along with any prior observations it had not shared with its teammates, with a probability . The setting corresponds to our description and analysis of in Fig. 3. We observe a graceful decay in the OSPA performance with decreasing rates of inter-agent communication in Fig. 5, both when targets outnumber agents and vice versa. Compared to prior work in the centralized or distributed multi-agent tracking setting [3], does not depend on synchronized communication within the team, thus agents can adapt and continue their search-and-tracking tasks even when communication is unreliable or unavailable.


VI Conclusion
We introduce , a novel decentralized and asynchronous algorithm for multi-agent multi-target active search-and-tracking that relaxes the restrictive assumption of requiring continuous target coverage. In simulation, outperforms competitive baselines that greedily optimize for information gain or expected target detections. A key contribution is adapting TS to effectively drive exploration and exploitation using the SMC-PHD filter. Future work includes theoretical analysis of the proposed TS methods and learning improved models of environment uncertainty for non-stationary multi-target tracking. Building on recent success of TS-based multi-agent active search [26], we also aim to validate ’s performance when deployed on teams of physical robots in the real world.
References
- [1] M. Popović, T. Vidal-Calleja, G. Hitz, J. J. Chung, I. Sa, R. Siegwart, and J. Nieto, “An informative path planning framework for uav-based terrain monitoring,” Autonomous Robots, vol. 44, pp. 889–911, 2020.
- [2] J. A. Placed, J. Strader, H. Carrillo, N. Atanasov, V. Indelman, L. Carlone, and J. A. Castellanos, “A survey on active simultaneous localization and mapping: State of the art and new frontiers,” IEEE Transactions on Robotics, 2023.
- [3] C. Robin and S. Lacroix, “Multi-robot target detection and tracking: taxonomy and survey,” Autonomous Robots, vol. 40, pp. 729–760, 2016.
- [4] S. Papaioannou, P. Kolios, T. Theocharides, C. G. Panayiotou, and M. M. Polycarpou, “A cooperative multiagent probabilistic framework for search and track missions,” IEEE Transactions on Control of Network Systems, vol. 8, no. 2, pp. 847–858, 2020.
- [5] J. Chen and P. Dames, “Active multi-target search using distributed thompson sampling,” 2022.
- [6] R. R. Murphy, “Human-robot interaction in rescue robotics,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 34, no. 2, pp. 138–153, 2004.
- [7] L. Doitsidis, S. Weiss, A. Renzaglia, M. W. Achtelik, E. Kosmatopoulos, R. Siegwart, and D. Scaramuzza, “Optimal surveillance coverage for teams of micro aerial vehicles in gps-denied environments using onboard vision,” Autonomous Robots, vol. 33, pp. 173–188, 2012.
- [8] T. Oskam, R. W. Sumner, N. Thuerey, and M. Gross, “Visibility transition planning for dynamic camera control,” in Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2009, pp. 55–65.
- [9] R. E. Kalman, “A new approach to linear filtering and prediction problems,” 1960.
- [10] A. Doucet, N. De Freitas, N. J. Gordon, et al., Sequential Monte Carlo methods in practice. Springer, 2001, vol. 1, no. 2.
- [11] S. S. Blackman, “Multiple hypothesis tracking for multiple target tracking,” IEEE Aerospace and Electronic Systems Magazine, vol. 19, no. 1, pp. 5–18, 2004.
- [12] T. Fortmann, Y. Bar-Shalom, and M. Scheffe, “Sonar tracking of multiple targets using joint probabilistic data association,” IEEE journal of Oceanic Engineering, vol. 8, no. 3, pp. 173–184, 1983.
- [13] R. P. Mahler, “Multitarget bayes filtering via first-order multitarget moments,” IEEE Transactions on Aerospace and Electronic systems, vol. 39, no. 4, pp. 1152–1178, 2003.
- [14] L. D. Stone, R. L. Streit, T. L. Corwin, and K. L. Bell, Bayesian multiple target tracking. Artech House, 2013.
- [15] B. Ristic, D. Clark, and B.-N. Vo, “Improved smc implementation of the phd filter,” in 2010 13th International Conference on Information Fusion. IEEE, 2010, pp. 1–8.
- [16] P. Dames, P. Tokekar, and V. Kumar, “Detecting, localizing, and tracking an unknown number of moving targets using a team of mobile robots,” The International Journal of Robotics Research, vol. 36, no. 13-14, pp. 1540–1553, 2017.
- [17] P. M. Dames, “Distributed multi-target search and tracking using the phd filter,” Autonomous robots, vol. 44, no. 3-4, pp. 673–689, 2020.
- [18] J. Chen and P. Dames, “Distributed multi-target search and tracking using a coordinated team of ground and aerial robots,” in Robotics science and systems, 2020.
- [19] H. Van Nguyen, B.-N. Vo, B.-T. Vo, H. Rezatofighi, and D. C. Ranasinghe, “Multi-objective multi-agent planning for discovering and tracking multiple mobile objects,” arXiv preprint arXiv:2203.04551, 2022.
- [20] P. Xin and P. Dames, “Comparing stochastic optimization methods for multi-robot, multi-target tracking,” in International Symposium on Distributed and Autonomous Systems, 2022.
- [21] H. Jeong, H. Hassani, M. Morari, D. D. Lee, and G. J. Pappas, “Deep reinforcement learning for active target tracking,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 1825–1831.
- [22] L. Zhou, V. D. Sharma, Q. Li, A. Prorok, A. Ribeiro, P. Tokekar, and V. Kumar, “Graph neural networks for decentralized multi-robot target tracking,” in 2022 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR). IEEE, 2022, pp. 195–202.
- [23] M. Tzes, N. Bousias, E. Chatzipantazis, and G. J. Pappas, “Graph neural networks for multi-robot active information acquisition,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 3497–3503.
- [24] R. P. Mahler, Advances in statistical multisource-multitarget information fusion. Artech House, 2014.
- [25] R. Ghods, A. Banerjee, and J. Schneider, “Decentralized multi-agent active search for sparse signals,” in Uncertainty in Artificial Intelligence. PMLR, 2021, pp. 696–706.
- [26] N. A. Bakshi, T. Gupta, R. Ghods, and J. Schneider, “Guts: Generalized uncertainty-aware thompson sampling for multi-agent active search,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 7735–7741.
- [27] D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, Z. Wen, et al., “A tutorial on thompson sampling,” Foundations and Trends® in Machine Learning, vol. 11, no. 1, pp. 1–96, 2018.
- [28] Z. Zhou, B. Hajek, N. Choi, and A. Walid, “Regenerative particle thompson sampling,” arXiv preprint arXiv:2203.08082, 2022.
- [29] S. Thrun, “Probabilistic robotics,” Communications of the ACM, vol. 45, no. 3, pp. 52–57, 2002.
- [30] B. Ristic, B.-N. Vo, and D. Clark, “A note on the reward function for phd filters with sensor control,” IEEE Transactions on Aerospace and Electronic Systems, vol. 47, no. 2, pp. 1521–1529, 2011.