This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\corrauth

Indrajeet Yadav, University of Delaware, 126 Spencer Lab, Newark, DE-19716, USA.

Receding Horizon Navigation and Target Tracking for Aerial Detection of Transient Radioactivity

Indrajeet Yadav11affiliationmark:    Michael Sebok11affiliationmark: and Herbert G Tanner11affiliationmark: 11affiliationmark: University of Delaware, USA [email protected]
Abstract

The paper presents a receding horizon planning and control strategy for quadrotor-type micro aerial vehicle (mav)s to navigate reactively and intercept a moving target in a cluttered unknown and dynamic environment. Leveraging a lightweight short-range sensor that generates a point-cloud within a relatively narrow and short field of view (fov), and an ssd-MobileNet based Deep neural network running on board the mav, the proposed motion planning and control strategy produces safe and dynamically feasible mav trajectories within the sensor fov, which the vehicle uses to autonomously navigate, pursue, and intercept its moving target. This task is completed without reliance on a global planner or prior information about the environment or the moving target. The effectiveness of the reported planner is demonstrated numerically and experimentally in cluttered indoor and outdoor environments featuring maximum speeds of up to 4.5–5 m/s.

keywords:
Receding Horizon Motion Planning, Target Tracking, Reactive Obstacle Avoidance, Aerial Radiation Detection

1 Introduction

Overview

The work reported in this paper is motivated in part by application problems in the field of nuclear nonproliferation. In this context, there can be instances where one needs to quickly detect weak radiation sources that could be in transit through generally unknown, and sometimes gps-denied environments. With mavs being now a widely available and accessible technology, and with the possibility of mounting lightweight (albeit still of low efficiency) commercial off the shelf (cots) radiation detectors on them, such a detection task is now possible using aerial means.

The approach to such problems advocated here utilizes an algorithmic pipeline that combines reactive receding horizon navigation with target tracking for mavs, a visual-inertial (vi) state estimation algorithm, a ssd-MobileNetV2 based visual target identification and tracking, and a fast likelihood ratio-based binary decision making algorithm. Reactive planning is achieved through a new model predictive control (mpc)-type motion planner that fully incorporates the nonlinear vehicle dynamics of the mav (cf. Tanner and Piovesan (2010)) and utilizes real-time sensor data in the form of a point-cloud generated by an onboard rgb-d camera, to select a (probabilistically optimal) safe path toward the target within the field of view. The reactive planner solves a multi-objective optimal control problem in real time, and thus balances detection accuracy and decision-making speed against platform flight agility limitations. The product of this optimization is a dynamically compatible minimum snap trajectory that fits along the kinematic reference path. Then a nonlinear geometric controller on the mav tracks this trajectory in a receding horizon fashion.

Refer to caption
Figure 1: Custom-built quadrotor capable of on-board planning, control, state estimation, ssd-mobilenet based target tracking motion planning and radiation measurement tracking the Jackal (ground-vehicle).

The planning and control architecture is implemented using low-cost cots hardware, limited-range sensors, and computation devices with average capabilities. With these resources, the system has demonstrated speeds up to 4.5 m/s in cluttered indoor as well as outdoor environments. In the experimental results reported in this paper, the planner enables the mav to intercept to static or dynamic ground targets. Such interception maneuvers have been shown to increase the likelihood of correct identification of the presence of a radioactive source on the target (Sun and Tanner, 2015). In cases where the local information is insufficient to construct a feasible trajectory to the goal, the planner is designed to safely stop the vehicle. The features and capabilities of the reported reactive planning and control architecture is demonstrated in case studies of mobile radiation detection, in which mavs autonomously determine whether radiation sources are present on a ground vehicle moving in cluttered environments (Fig. 1).

Background and Related Work

A quadrotor-type mav is an inexpensive, lightweight, and agile sensor platform, suitable for many applications in areas including surveillance, aerial photography and mapping, precision agriculture, construction and defense. Although mavs with various degrees of autonomy have been deployed in these application domains, the prevailing assumption is that the environment is at least partially known, so that a motion plan can be generated a priori and then used for steering the vehicle to a desired goal. To plan its motion in unknown environments, however, a robotic vehicle needs to construct and update a (local) environment map online; recent mav literature addresses this problem using an onboard perception stack (Oleynikova et al., 2017; Hornung et al., 2013).

Perception-Based Reactive Navigation

As long as vehicle dynamic constraints are respected, safe mav navigation and aggressive maneuvering is possible by combining a reference trajectory generation process that splices together time polynomials between waypoint poses (keyframes) (Mellinger and Kumar, 2011; Richter et al., 2016); a differential-geometric quadrotor point stabilization or tracking controller (Lee et al., 2010) then is utilized to transition between these keyframes. While platform-specific constraints on vehicle dynamics can be identified by experimental testing, ensuring safety during operation requires either complete prior knowledge of the environment, or some way of acquiring the missing information in real time through onboard sensing; both options involve several nontrivial and open research questions (Cadena et al., 2016).

Early work on online obstacle avoidance focused on building a new, or updating a prior, map of the environment. In this context, a continuous-time trajectory optimization using octomap (Hornung et al., 2013) is employed (Oleynikova et al., 2016), utilizing a local planner to re-generate a safe trajectory that assigns collision costs based on a (computationally expensive) Euclidean signed distance field (esdf) map. In unknown environments, Gao and Shen (2016) construct a navigation approach utilizing online planning to produce a point-cloud map of the environment using a Velodyne 3d lidar, and use the map to find safe corridors through which their mav plans it motion. To navigate to destinations outside the sensor range, a sequence of predefined waypoints is needed. Landry et al. (2016) assume knowledge of obstacle location and geometry, and locally decompose the available free space into convex polyhedra to generate safe paths for the vehicle. Variants of such approaches (Liu et al., 2017) consider the workspace represented as a 3d grid map with uniform voxels, which are used to create a convex safe flight corridor (sfc).

More recent reactive motion planning algorithms specifically for aerial vehicles include lightweight egospace-based algorithms extended to a quadrotor’s configuration dynamics (Fragoso et al., 2018), or reactively sample safe trajectories in the field of view of the quadrotor, and decouple local obstacle avoidance from global guidance using a global planner (Ryll et al., 2019). Sikang Liu et al. (2016) report on an impressive receding horizon-based approach to (local) planning that involves only limited onboard sensing, and which utilizes a local uniform resolution volumetric occupancy grid map and a cost map to find and navigate to safe frontier points (local goals that are closest to the global goal).

All aforementioned approaches either require some type of prior information and a global planner that generates a sequence of waypoints, or rely on (mav payload-taxing) high-range sensors. They are particularly effective for navigating to static configurations; if, however, the mav destination changes over time, or if the generation of a prior map is impossible, they cannot guarantee a priori the existence of a sfc, or ensure that intermediate waypoints will be within sensor range.

Yet even within a reactive navigation framework, some convergence guarantees can still be provided (Yadav and Tanner, 2021a) based on appropriate extensions of the navigation function approach (Rimon and Koditschek, 1992). Recent advances in such techniques have independently demonstrated success in online calculation of convergent vector fields using limited range onboard sensors (Vasilopoulos et al., 2020; Arslan and Koditschek, 2019), where unknown obstacles are detected online using a deep neural network (dnn) trained on the predefined class of object geometries. Such approaches have been successfully tested on fully actuated or differential drive robots at relatively low speeds; their applicability to fast moving mavs with underactuated higher order dynamics remains to be demonstrated.

Target Tracking

While the area of simultaneous localization, mapping, and moving object tracking (slammot) has been the focus of recent research efforts (Wang et al., 2007; Chojnacki and Indelman, 2018), there are few implementations involving active visual-inertial target tracking approach (Chen et al., 2016) that have demonstrated impressive performance using a quadrotor. Conceptually closer to the one reported in this paper, is the aerial target tracking work of Thomas et al. (2017) who demonstrate tracking of a spherical rolling target. That approach employed a geometric technique similar to visual servoing for tracking the target combined with a receding horizon strategy that penalizes velocity and position errors. Alternatively, if the environment is known, a reactive motion planning scheme can utilize multi-objective optimization for obstacle avoidance and target tracking (Penin et al., 2018).

These approaches assume that the target is either known, detected via some type of a tag, or it can be localized using some form of visual servoing (which depends on the shape of the target). Alternative means of target detection include approaches based on convolution neural networks, which have gained popularity in active quadrotor motion planning and target tracking with the advent of computationally efficient networks such as single shot detector (ssd(Liu et al., 2016), you only look once (yolo(Redmon et al., 2016), and their variants, and seem particularly effective at high sensor speeds. For instance, DroNet architectures (Loquercio et al., 2018) are utilized in Drone Racing involving dynamically changing environments (Loquercio et al., 2020). In another related application of forest trail navigation, a multi-column dnn is used to predict the turns in the trail and guide the quadrotor (Giusti et al., 2016), although that particular perception and control stack runs remotely on a laptop. Smolyanskiy et al. (2017) utilize a yolo network but runs the planning and perception stack onboard the quadrotor on an nvidia tx1. A human gaze-driven quadrotor navigation utilizing a ssd network running on an nvidia tx2 features eye-tracking glasses along with a camera and an inertial measurement unit (imu), which are combined to estimate the position of the human with respect to the quadrotor and steer the mav (Yuan et al., 2019).

As far as the state of the art in experimental implementations is concerned, recent literature reports quadrotor navigation results at impressively high speeds (Liu et al., 2017; Mohta et al, 2018; Ryll et al., 2019); yet most of the systems either involve a global planner (Ryll et al., 2019), or employ high-end and expensive sensors with extended range—e.g., Velodyne vlp-16 or Hokuyo ust-20lx liidar mounted on a gimbal to provide 270 fov Liu et al. (2017); Mohta et al (2018), which can offer significantly more advanced perception capabilities compared to a 69.4×{}^{\circ}\times\;42.5 fov sensor utilized in this work. In addition, the top speeds reported by Ryll et al. (2019) were achieved in open areas and with average flight speeds of 2.4 m/s. In the absence of a global planner, however, a planner with myopic vision cannot arbitrarily increase the speed in anticipation of an unseen obstacle. Moreover, the majority of experimental results reported relate to the case of converging to static final configurations.

Thus existing robotics literature on mavs covers either safe and agile point-to-point navigation in unknown environments, or autonomously tracking moving target, but there is scarce, if any, work on the combined problem. In addition, the signal and data processing component, that is, what will be done with the measurement data once they have been collected, is typically an afterthought; not much effort is directed to designing the measurement process so that it serves the needs of decision-making based on these data.

Paper Organization and Contributions

At a high level, the approach of this paper is different: motion planning, safe navigation, target tracking and decision-making components are integrated and co-designed to operate in concert with each other. In this regard, this paper makes technical contributions to real-time, sensor-based reactive navigation and target tracking in the form of a motion planning and control methodology applicable to problem instances involving either static or moving navigation goals. The planning and control architecture is unique because:

  1. 1.

    it is applicable to cases with both static and moving goal configurations;

  2. 2.

    it generates trajectories in a new way by efficiently solving a multi-objective optimal control problem;

  3. 3.

    it is experimentally tested indoors and outdoors with mav speeds of up to 4.5–5 m/s; and

  4. 4.

    it is demonstrated to be capable of solving a challenging radiation detection problem.

2 Overview of the Architecture

The block diagram of Fig. 2 shows the flow of information within the motion planning, state estimation, and platform control architecture. Starting with the rgb-d camera, a point-cloud is produced and then used to frame the obstacle-free portion of the workspace. On board state estimation is performed through a visual-inertial msckf navigation stack (Open-vins (Geneva et al., 2020)). Given the free workspace and the state estimates, the receding horizon planner (i) generates a candidate minimum snap trajectory from the current position of the robot to each of the free points in the field of view based on Pontryagin’s minimum principle; (ii) selects the point in the fov closest to the target (henceforth referred to as the intermediate point); (iii) assigns a cost to each trajectory as a weighted sum of the trajectory’s proximity to obstacles and the proximity of the trajectory’s end point to the intermediate point; and finally, (iv) selects the trajectory with the least such cost as the reference trajectory for the mav to follow. The end point of this selected minimum-snap reference trajectory is referred to as the local goal. An initial segment of the reference trajectory that ends at the local goal is presented to a differential-geometric tracking controller, which initiates steering of the mav to the designated local goal point within the fov. In a typical receding horizon fashion, before the end of that initial segment is reached, the mav uses an updated set of point-cloud data to generate a new reference trajectory, starting from a point on the current trajectory associated with the end of some designated control time horizon. Once in the neighborhood of this point, the mav transitions smoothly to tracking of the new reference trajectory. This constitutes a planning cycle of duration δt\delta t. The planning, and concurrent trajectory tracking, cycle repeats until the final destination is reached —when the latter is static.

If the destination is a moving target, an ssd MobileNet-v2 based target tracker calculates the relative positions between the target and the mav, which the planner utilizes to generate intercepting trajectories.

Refer to caption
Figure 2: Block diagram of the motion planning and control architecture.The arrows indicate the direction of information flow. The dashed lines indicate that the input to the planner is either a user specified static goal or relative position of the target obtained from ssd-MobileNet.

3 Radiation Detection Preliminaries

Application Case Study Background

The robotics application that motivated this work was detecting weak mobile sources of radiation using aerial mobile sensors. One possibility for realizing an inexpensive fully autonomous aerial radiation detector is to utilize a quadrotor mav, mount a cots radiation counter, and allow it to maneuver itself in ways that allow it to determine as quickly as possible whether or not its given moving target is radioactive or not. While a network of static detectors can also be deployed to the detection, due to the strong (inverse square law) dependence of the process intensity on the distance between sensor and source (Nemzek et al., 2004), the static network will experience a dramatic decrease of its signal-to-noise-ratio (snr) that will significantly reduce its detection accuracy and speed.

Detecting a mobile radioactive source based on a series of discrete events associated with the arrival of particles or gamma rays (indistinguishable from each other), can be formulated as an inhomogeneous likelihood ratio test (lrt) within the Neyman-Pearson framework (Pahlajani et al., 2014). In this framework, analytical (Chernoff) bounds on error probabilities can be derived in terms of relative sensor-source distance and used as proxies for the otherwise intractable true detection error probabilities (Pahlajani et al., 2014), to facilitate the design of optimal control laws that maximize detection performance in terms of speed and accuracy (Sun and Tanner, 2015; Yadav and Tanner, 2019).

Neyman-Pearson based Radiation Detection

A sensor measurement event—for the case of a Gieger-Muller (gm) detector, for instance—is associated with the discrete random process where a gamma-ray (from the source or background) hits the sensitive area of the detector and records a “count.” This is mathematically modeled in a probabilistic setup (Pahlajani et al., 2014; Sun and Tanner, 2015) as follows. Assume that there is a counting process NtN_{t} for t[0,T]t\in[0,T], on a measurable space (Ω,)(\Omega,\mathscr{F}). In this context, NtN_{t} essentially represents the number of times gamma-rays have hit the radiation counter located at position x3x\in\mathbb{R}^{3}, up to (and including) time t[0,T]t\in[0,T]. This counting process is assumed to be a Poisson process (Snyder, 1975). The sensor position xx is known over the whole interval [0,T][0,T].

The detector can make two competing hypotheses, H0H_{0} and H1H_{1}, each expressing an opinion as to whether its cumulative count rate can be attributed to the presence of a radiation source of intensity aa (in counts per unit time, i.e., counts per second (cps)), located at a possibly time-varying position y3y\in\mathbb{R}^{3} which will be referred to as the target. Hypothesis H0H_{0} asserts that the target is benign, while hypothesis H1H_{1} claims that the target carries a source of intensity. The two hypotheses H0H_{0} and H1H_{1} correspond, respectively, to two distinct probability measures 0\mathbb{P}_{0} and 1\mathbb{P}_{1} on (Ω,)(\Omega,\mathscr{F}). With respect to measure 0\mathbb{P}_{0}, the process NtN_{t} is a Poisson process with intensity β(t)\beta(t), i.e., the intensity of naturally occurring background radiation; with respect to 1\mathbb{P}_{1}, however, the same process is Poisson with intensity β(t)+ν(t)\beta(t)+\nu(t), where ν(t)\nu(t) represents the intensity of the source (whenever present) as perceived by the sensor at time tt. The functions β(t)\beta(t) and ν(t)\nu(t) defined on [0,T][0,T] are assumed to be bounded, continuous and strictly positive (Pahlajani et al., 2014).

Function ν(t)\nu(t) should implicitly reflect the inverse square law dependence of the source intensity perceived by the sensor on the distance between the sensor and the source (Nemzek et al., 2004). If χ\chi denotes the sensor’s cross-section coefficient, one possibility for expressing the detector’s perceived intensity is in the form:

ν(t)=χa2χ+y(t)x(t)2.\nu(t)=\frac{\chi a}{2\chi+\|y(t)-x(t)\|^{2}}\enspace. (1)

A test for choosing between H0H_{0} and H1H_{1} faces the risk of two types of errors. One of them is a false alarm, which occurs when the outcome of the test is in favor of H1H_{1} while H0H_{0} is instead the correct hypothesis; the other is a missed detection in which one sides with H0H_{0} when in fact H1H_{1} is true. In this setting, the optimal test for deciding between H0H_{0} and H1H_{1} is an lrt obtained as follows (Pahlajani et al., 2014). Let τn\tau_{n} for n1n\geq 1 denote the nnth jump time of NtN_{t} (jumps occur when the sensor registers a count), and with the convention that n=10()=1\prod_{n=1}^{0}(\cdot)=1, let

LT=exp(0Tν(s)ds)n=1Nt1+ν(τn)β(τn).\displaystyle L_{T}=\exp\left(-\int_{0}^{T}\nu(s)\operatorname{d\!}s\right)\prod_{n=1}^{N_{t}}1+\frac{\nu(\tau_{n})}{\beta(\tau_{n})}\enspace. (2)

be the likelihood ratio. Then for a specific fixed threshold γ>0\gamma>0, the test

LTH0H1γ\displaystyle L_{T}\mathop{\gtrless}_{H_{0}}^{H_{1}}\gamma (3)

is optimal in the (Neyman-Pearson) sense. This means that (3) minimizes the probability of a missed detection under a given upper bound constraint on the probability of false alarm. With μ(t)1+ν(t)β(t)\mu(t)\triangleq 1+\tfrac{\nu(t)}{\beta(t)}, constants p(0,1)p\in(0,1) and ηlogγ\eta\triangleq\log\gamma, and the quantity

Λ(p)0T[μ(s)ppμ(s)+p1]β(s)ds,\Lambda(p)\triangleq\int_{0}^{T}\left[\,\mu(s)^{p}-p\,\mu(s)+p-1\right]\beta(s)\operatorname{d\!}s\enspace, (4)

one can now extract analytical expressions for Chernoff bounds on the probabilities of false alarm PFP_{F} and missed detection PMP_{M} (Pahlajani et al., 2014).

If an upper limit α>0\alpha>0 is set on the bound on probability of false alarm, then there exists a unique solution p[0,1]p^{\ast}\in[0,1] for which the tightest bound on the probability of missed detection can be obtained. The exponent in the bound on the probability of false alarm and missed detection, respectively, is (Pahlajani et al., 2014)

=0T[pμplogμμp+1]βds=logα\displaystyle\mathcal{E_{F}}=\int_{0}^{T}[p^{\ast}\mu^{p^{\ast}}\log\mu-\mu^{p^{\ast}}+1]\beta\operatorname{d\!}s=-\log\alpha (5)
=logα+Λ(p),\displaystyle\mathcal{E_{M}}=\log\alpha+\Lambda^{{}^{\prime}}(p^{\ast})\enspace, (6)

where the dependency of μ\mu and β\beta on time is suppressed for clarity, and derivative Λ(p)=Λp\Lambda^{\prime}(p)=\dfrac{\partial{}\Lambda}{\partial{p}} is expressed as

Λ(p)=0T[μplogμμ+1]βds.\Lambda^{{}^{\prime}}(p)=\int_{0}^{T}[\mu^{p}\log\mu-\mu+1]\beta\operatorname{d\!}s\enspace. (7)

For the optimal pp^{*}, the Λ(p)\Lambda(p^{*}) and detection threshold γ\gamma are related in the form γ=exp(Λ(p))\gamma=\exp\big{(}\Lambda(p^{*})\big{)}. Suppose now that the distance between target and sensor, yx\|y-x\|, is regulated by a control input u{u}; then ν\nu, and consequently μ\mu, depend implicitly on u{u} and an optimal control problem can be formulated as follows:

Problem 1.

Find a control input u{u} that optimizes Λ(p)\Lambda^{{}^{\prime}}(p^{\ast}) for a given upper limit α\alpha on the bound on the probability of false alarm.

Irrespective of whether yx\|y-x\| is deterministic or stochastic, it can be shown that the optimal sensor management strategy u{u} for sensors is to close the distance between source and sensor as quickly as possible (Sun and Tanner, 2015; Yadav and Tanner, 2017).

4 Quadrotor Dynamics and Control

The mav is modeled as a rigid body moving in 𝖲𝖤(3)\mathsf{SE}(3). Let mm and 𝐉\mathbf{J} denote its mass and moment of inertia, respectively, and denote 𝗑=(x,y,z)\mathsf{x}=(x,y,z)^{\intercal} and 𝗏=𝗑˙\mathsf{v}=\dot{\mathsf{x}} its 3d position and velocity in the inertial frame. Let 𝐑\mathbf{R} be the rotation matrix from the body-fixed frame to the inertial frame, and Ω\Omega be the mav’s angular velocity in the body-fixed frame. We denote ^\hat{\cdot} the skew symmetry operation, and write the gravitational acceleration vector and the unit vector along the inertial vertical as 𝗀=(0,0,g)\mathsf{g}=(0,0,g)^{\intercal}, and 𝖾3=(0,0,1)\mathsf{e}_{3}=(0,0,1)^{\intercal}, respectively. The control inputs to the mav are the (scalar) thrust ff and the (vector) moment 𝖬\mathsf{M}, both of which are considered bounded. The former is bounded in the interval (fmin,fmax)(f_{\mathrm{min}},f_{\mathrm{max}}); the constraint on the latter, is assumed to translate to a bound on angular velocity ΩΩmax\|\Omega\|\leq\Omega_{\mathrm{max}}. With these definitions and assumptions in place, the mav dynamics is described as

𝗑˙=𝗏,m𝗏˙=f𝐑𝖾3m𝗀\displaystyle\dot{\mathsf{x}}=\mathsf{v},\qquad m\;\dot{\mathsf{v}}=f\;\mathbf{R}\;\mathsf{e}_{3}-m\;\mathsf{g} (8a)
𝐑˙=𝐑Ω^,𝐉Ω˙+Ω×𝐉Ω=𝖬.\displaystyle\dot{\mathbf{R}}=\mathbf{R}\;\hat{\Omega},\qquad\mathbf{J}\;\dot{\Omega}+\Omega\times\mathbf{J}\;\Omega=\mathsf{M}\enspace. (8b)

The control law for this system is designed based on a (differential) geometric method (Lee et al., 2010). To examine the particular control design employed here in more detail, consider a smooth position reference trajectory 𝗑d(t)3\mathsf{x}_{d}(t)\in\mathbb{R}^{3}. Based on this one can construct (Lee et al., 2010) a desired rotation matrix 𝐑𝐝\mathbf{R_{d}} and angular velocity Ωd\Omega_{d} that are consistent with the reference trajectory 𝗑d(t)\mathsf{x}_{d}(t). The desired yaw angle ψ\psi of the mav is used to construct the desired mav direction vector in the form 𝖻1d=(cosψ,sinψ,0)\mathsf{b}_{1d}=(\cos{\psi},\sin{\psi},0)^{\intercal}.

The tracking errors in position 𝖾𝗑\mathsf{e_{x}}, velocity 𝖾𝗏\mathsf{e_{v}}, orientation 𝖾𝖱\mathsf{e_{R}}, and angular rate 𝖾Ω\mathsf{e_{\Omega}}, are expressed as

𝖾𝗑\displaystyle\mathsf{e_{x}} =𝗑𝗑d\displaystyle=\mathsf{x}-\mathsf{x}_{d} 𝖾𝗏\displaystyle\mathsf{e_{v}} =𝗑˙𝗑˙d\displaystyle=\mathsf{\dot{x}}-\mathsf{\dot{x}}_{d} (9a)
𝖾R\displaystyle\mathsf{e}_{R} =12(𝐑d𝐑𝐑𝐑d)\displaystyle=\frac{1}{2}(\mathbf{R}^{\intercal}_{d}\mathbf{R}-\mathbf{R^{\intercal}}\mathbf{R}_{d}) 𝖾Ω\displaystyle\mathsf{e_{\Omega}} =Ω𝐑𝐑dΩd.\displaystyle=\Omega-\mathbf{R^{\intercal}}\mathbf{R}_{d}\Omega_{d}\enspace. (9b)

Picking positive control gains kxk_{x}, kvk_{v}, kRk_{R} and kΩk_{\Omega}, the control inputs can now be constructed as

f\displaystyle f =kx𝖾𝗑kv𝖾𝗏+m𝗀+m𝗑¨𝖽,\displaystyle=-k_{x}\mathsf{e_{x}}-k_{v}\mathsf{e_{v}}+m\mathsf{g}+m\mathsf{\ddot{x}_{d}}, (10a)
M\displaystyle M =kR𝖾𝖱+kΩ𝖾Ω+Ω×𝐉Ω,\displaystyle=k_{R}\mathsf{e_{R}}+k_{\Omega}\mathsf{e_{\Omega}}+\Omega\times\mathbf{J}\Omega\enspace, (10b)

allowing one to achieve exponentially stable tracking behavior for initial attitude error less than π/2\nicefrac{{\pi}}{{2}} (cf. Lee et al. (2010)).

5 Reactive Receding Horizon Planning

Representing the Free Space

Let 𝒱3\mathcal{V}\in\mathbb{R}^{3} denote the visible space within the fov of the rgb-d sensor. This fov area contained in 𝒱\mathcal{V} is assumed to take the shape of a solid pyramid sector with its apex attached to the base frame of the sensor, with the depth direction of the sensor being aligned with the local xx (heading) frame axis of the mav. The base of the fov pyramid is outlined by the maximum range RmaxR_{\mathrm{max}} of the sensor, while the side boundaries (polar and azimuth angles of the sector) are determined by the maximal viewing angles in the fov of the sensor along the local yy and zz directions. Denote ϕy\phi_{y} and ϕz\phi_{z} those maximum viewing angles of the sensor at its apex along the local yy and zz directions, respectively. The motion planning process also takse as input a user-specified minimum range for perception, RminR_{\mathrm{min}}. Assuming now that the apex of the fov pyramid is at local frame coordinates (0,0,0)(0,0,0)^{\intercal}, any point within the fov of the sensor can be expressed in the polar coordinates as (r,θ,φ)(r,\theta,\varphi)^{\intercal} where RminrRmaxR_{\mathrm{min}}\leq r\leq R_{\mathrm{max}}, ϕyθϕy-\phi_{y}\leq\theta\leq\phi_{y} and ϕzφϕz-\phi_{z}\leq\varphi\leq\phi_{z}. By selecting a resolution δr\delta r on the range and δθ\delta\theta on the viewing angles in both direction, the field of view of the sensor can be discretized and represented as an ensemble of points. Each point in this ensemble represents a potential desired next location for the mav, should a feasible path to this location exist.

Reference Trajectory Generation

The center of gravity (cog) of the mav is designated as the initial point of any candidate reference trajectory. Using the (constant) transformation between cog and sensor frame (at the fov apex), the coordinates of all the points and rays can be expressed relative to body-fixed cog frame of the vehicle.

Given the ensemble of points within the field of view, a minimum snap trajectory to each of these point is constructed. Note that the dynamics of a quadrotor mav enjoys differential flatness properties (Mellinger and Kumar, 2011), which ensure that all inputs and states can be written as functions of four (flat) outputs and their derivatives. The flat outputs are the Cartesian mav position and its yaw angle, which can be brought together in a vector (𝗑,ψ)(\mathsf{x},\psi). Note that yaw ψ\psi is decoupled from 𝗑\mathsf{x} and can be steered independently.

The derivatives of 𝗑\mathsf{x} that are of interest are velocity 𝗏=𝗑˙\mathsf{v}=\dot{\mathsf{x}}, acceleration 𝖺=𝗑¨\mathsf{a}=\ddot{\mathsf{x}}, and jerk 𝗃=𝗑˙˙˙\mathsf{j}=\dddot{\mathsf{x}}. In the flat output space, the mav position dynamics can therefore be assumed to match those of a quadruple integrator with state vector 𝗉(t)=(𝗑(t),𝗏(t),𝖺(t),𝗃(t))\mathsf{p}(t)=\left(\mathsf{x}^{\intercal}(t),\mathsf{v}^{\intercal}(t),\mathsf{a}^{\intercal}(t),\mathsf{j}^{\intercal}(t)\right)^{\intercal}. The objective of the receding horizon planner now is to generate smooth trajectories 𝗉[N]\mathsf{p}_{[N]} and 𝗉[N+1]\mathsf{p}_{[N+1]}, defined over the course of two consecutive planning cycles each of duration δt\delta t and indexed NN and N+1N+1,

𝗉[N](t,t+δt)\displaystyle\mathsf{p}_{[N]}(t,t\!+\!\delta t) =(𝗑d[N],𝗏d[N],𝖺d[N],𝗃d[N])\displaystyle=\left(\mathsf{x}_{d[N]}^{\intercal},\,\mathsf{v}_{d[N]}^{\intercal},\,\mathsf{a}_{d[N]}^{\intercal},\,\mathsf{j}_{d[N]}^{\intercal}\right)^{\intercal}
𝗉[N+1](t+δt,t+2δt)\displaystyle\mathsf{p}_{[N\!+\!1]}(t\!+\!\delta t,t\!+\!2\delta t) =(𝗑d[N+1],𝗏d[N+1],𝖺d[N+1],𝗃d[N+1]),\displaystyle\!=\!\left(\mathsf{x}_{d[N\!+\!1]}^{\intercal},\mathsf{v}_{d[N\!+\!1]}^{\intercal},\mathsf{a}_{d[N\!+\!1]}^{\intercal},\mathsf{j}_{d[N\!+\!1]}^{\intercal}\right)^{\intercal}\;,

which always remain within \mathcal{F} and satisfy the boundary condition 𝗉[N](t+δt)=𝗉[N+1](t+δt)\mathsf{p}_{[N]}(t+\delta t)=\mathsf{p}_{[N+1]}(t+\delta t), while being dynamically feasible, i.e., fminffmaxf_{\mathrm{min}}\leq f\leq f_{\mathrm{max}} and ΩΩmax\|\Omega\|\leq\Omega_{\mathrm{max}}.

Let 𝖳\mathsf{T} be the free trajectory terminal time (which will henseforth be referred to as the planning horizon) and denote 𝗉𝟢=(𝗑0,𝗏𝟢,𝖺𝟢,𝗃𝟢)\mathsf{p_{0}}=\left(\mathsf{x}_{0}^{\intercal},\mathsf{v_{0}}^{\intercal},\mathsf{a_{0}}^{\intercal},\mathsf{j_{0}}^{\intercal}\right)^{\intercal}, 𝗉𝖳=(𝗑𝖳,𝗏𝖳,𝖺𝖳,𝗃𝖳)\mathsf{p_{\mathsf{T}}}=\left(\mathsf{x_{T}}^{\intercal},\mathsf{v_{T}}^{\intercal},\mathsf{a_{T}}^{\intercal},\mathsf{j_{T}}^{\intercal}\right)^{\intercal} the trajectory boundary conditions at t=0t=0 and t=𝖳t=\mathsf{T}, respectively. Then let 𝗎(t)=d𝗑4(t)dt4\mathsf{u}(t)=\dfrac{\operatorname{d\!}{{}^{4}}\mathsf{x}(t)}{\operatorname{d\!}{t^{4}}} denote snap, and treat it as the control input for the flat space dynamics. For a relative weight parameter k+k\in\mathbb{R}_{+}, the free terminal time optimal control problem the solution of which are the desired reference trajectories, is formulated as:

{min0𝖳(k+12𝗎(t)2)dtsubject to𝗑˙(t)=𝗏(t),𝗏˙(t)=𝖺(t),𝖺˙(t)=𝗃(t),𝗃˙(t)=𝗎(t)𝗉(0)=𝗉𝟢,𝗉(𝖳)=𝗉𝖳.\begin{cases}\min\int_{0}^{\mathsf{T}}\big{(}k+\frac{1}{2}\mathinner{\!\left\lVert\mathsf{u}(t)\right\rVert}^{2}\big{)}\operatorname{d\!}t\qquad\text{subject to}\\ \dot{\mathsf{x}}(t)=\mathsf{v}(t),\enspace\dot{\mathsf{v}}(t)=\mathsf{a}(t),\enspace\dot{\mathsf{a}}(t)=\mathsf{j}(t),\enspace\dot{\mathsf{j}}(t)=\mathsf{u}(t)\\ \mathsf{p}(0)=\mathsf{p_{0}},\enspace\mathsf{p}(\mathsf{T})=\mathsf{p_{T}}\enspace.\end{cases} (11)

The cost function of the optimal control problem (11) is a linear combination of two performance factors: (i) the incremental cost associated with the terminal time (time optimality), captured by the constant integrand term; and (ii) the incremental cost that penalizes the cummulative magnitude of snap 𝗎\mathsf{u} along the trajectory. By tuning kk, one can adjust how aggressive the reference trajectory obtained is. Smaller values for kk place a smaller penalty on tracking time and therefore result to slower motion. gazebo simulation data at various kk and speeds had been collected to fit a relation between the maximum speed along a candidate trajectory and parameter kk. This relation comes out to be an interpolated curve of the form 𝗏candidate=αk1/β\mathsf{v}_{\mathrm{candidate}}=\alpha\,k^{1/\beta} and has been found to work well in practical scenarios to guide the selection of the cost weight kk based on the maximum robot speed afforded for the mission at hand. The cost weight kk is particular to the candidate trajectory and varies for the different candidate trajectories in the ensemble, since the maximum speed afforded along a candidate trajectory 𝗏candidate\mathsf{v}_{\mathrm{candidate}} itself varies in proportion to the ratio of the candidate trajectory’s length to the length of the longest candidate trajectory. This leads to smaller trajectories having lower top speeds, making the mav automatically slow down in the vicinity of dense obstacles.

Denoting tt the time elapsed since the start of the whole planned maneuver, dd the vehicle’s remaining distance to its goal, rr being the distance of the point in the ensemble from the camera focal point, vmaxv_{\mathrm{max}} the desired maximum mav speed, and ktk_{t} and kdk_{d} being positive tuning parameters, the velocity used to calculate the weighing factor kk in (11) to generate a particular candidate trajectory is given by\useshortskip

𝗏𝖼𝖺𝗇𝖽𝗂𝖽𝖺𝗍𝖾=erf(ktt)erf(kdd)(rRmax)vmax.\mathsf{v_{candidate}}=\operatorname{erf}{(k_{t}\,t)}\operatorname{erf}{(k_{d}\,d)}\left(\tfrac{r}{R_{max}}\right)\,v_{\mathrm{max}}\enspace. (12)

Compared to alternative trapezoid velocity profiles (Mellinger and Kumar, 2011), (12) produces a velocity profile for the entire remaining quadrotor trajectory that is also effective in scenarios involving moving target interception, in which the vehicle needs to adjust its speed to match that of its target while at the vicinity of the latter. A more detailed discussion of the effect of this velocity profile on tracking dynamic targets follows in the section on target tracking.

To solve (11) one can utilize Pontryagin’s Minimum principle (Athans and Falb, 1966, Chapter 6) as follows. Let λ𝗑\mathsf{\lambda_{x}}, λ𝗏\mathsf{\lambda_{v}}, λ𝖺\mathsf{\lambda_{a}} and λ𝗃\mathsf{\lambda_{j}} be the costate vectors. Each such vector has three components, one for each spatial direction xx, yy, and zz. Let ,\langle\cdot,\cdot\rangle denote the standard inner product between of two vectors, and express the Hamiltonian 𝖧\mathsf{H} for this problem as

𝖧=k+12𝗎2+λ𝗑,𝗏+λ𝗏,𝖺+λ𝖺,𝗃+λ𝗃,𝗎.\mathsf{H}=k+\frac{1}{2}\mathinner{\!\left\lVert\mathsf{u}\right\rVert}^{2}+\langle\mathsf{\lambda_{x}},\mathsf{v}\rangle+\langle\mathsf{\lambda_{v}},\mathsf{a}\rangle+\langle\mathsf{\lambda_{a}},\mathsf{j}\rangle+\langle\mathsf{\lambda_{j}},\mathsf{u}\rangle\enspace. (13)

The optimal solution is derived from the condition 𝖧(𝗑,𝗎,λ,t)𝖧(𝗑,𝗎,λ,t)\mathsf{H}(\mathsf{x^{*}},\mathsf{u^{*}},\mathsf{\lambda^{*}},t)\leq\mathsf{H}(\mathsf{x^{*}},\mathsf{u},\mathsf{\lambda^{*}},t), which, since the Hamitonian is quadratic in the control inputs, leads to

𝖧𝗎=0{𝗎x=λ𝗃x𝗎y=λ𝗃y𝗎z=λ𝗃z.\dfrac{\partial{}\mathsf{H}}{\partial{\mathsf{u}}}=0\implies\begin{cases}\mathsf{u}_{x}=-\lambda_{\mathsf{j}x}&\\ \mathsf{u}_{y}=-\lambda_{\mathsf{j}y}&\\ \mathsf{u}_{z}=-\lambda_{\mathsf{j}z}\enspace.\end{cases} (14)

The costate dynamics now is

λ˙𝗑\displaystyle\dot{\lambda}_{\mathsf{x}} =𝖧𝗑=𝟢\displaystyle=-\dfrac{\partial{}\mathsf{H}}{\partial{\mathsf{x}}}=\mathsf{0} λ˙𝗏\displaystyle\dot{\lambda}_{\mathsf{v}} =𝖧𝗏=λ𝗉\displaystyle=-\dfrac{\partial{}\mathsf{H}}{\partial{\mathsf{v}}}=-\lambda_{\mathsf{p}}
λ˙𝖺\displaystyle\dot{\lambda}_{\mathsf{a}} =𝖧𝖺=λ𝗏\displaystyle=-\dfrac{\partial{}\mathsf{H}}{\partial{\mathsf{a}}}=-\lambda_{\mathsf{v}} λ˙𝗃\displaystyle\dot{\lambda}_{\mathsf{j}} =𝖧𝗃=λ𝖺.\displaystyle=-\dfrac{\partial{}\mathsf{H}}{\partial{\mathsf{j}}}=-\lambda_{\mathsf{a}}\enspace.

Fortunately, this problem affords independent solution along each direction. The following equation set, where c=(cx0,,cx7)c=(c_{x0},\cdots,c_{x7}) denotes an 8×18\times 1 vector of constant coefficients, illustrates the solution along in xx direction; the other two directions feature identical polynomials:

λ𝗉x=cx7λ𝗏x=cx7t+cx6λ𝖺x=cx7t22cx6t+cx5𝗎x=cx7t36cx6t22+cx5tcx4𝗃x=cx7t424cx6t36+cx5t22cx4t+cx3𝖺x=cx7t5120cx6t424+cx5t36cx4t22+cx3t+cx2𝗏x=cx7t6720cx6t5120+cx5t424cx4t36+cx3t22+cx2t+cx1𝗑x=cx7t75040cx6t6720+cx5t5120cx4t424+cx3t36+cx2t22+cx1t+cx0.}\left.\begin{array}[]{l}\lambda_{\mathsf{p}x}=c_{x7}\\ \lambda_{\mathsf{v}x}=-c_{x7}t+c_{x6}\\ \lambda_{\mathsf{a}x}=c_{x7}\frac{t^{2}}{2}-c_{x6}t+c_{x5}\\ \mathsf{u}_{x}=c_{x7}\frac{t^{3}}{6}-c_{x6}\frac{t^{2}}{2}+c_{x5}t-c_{x4}\\ \mathsf{j}_{x}=c_{x7}\frac{t^{4}}{24}-c_{x6}\frac{t^{3}}{6}+c_{x5}\frac{t^{2}}{2}-c_{x4}t+c_{x3}\\ \mathsf{a}_{x}=c_{x7}\frac{t^{5}}{120}-c_{x6}\frac{t^{4}}{24}+c_{x5}\frac{t^{3}}{6}-c_{x4}\frac{t^{2}}{2}+c_{x3}t+c_{x2}\\ \mathsf{v}_{x}=c_{x7}\frac{t^{6}}{720}-c_{x6}\frac{t^{5}}{120}+c_{x5}\frac{t^{4}}{24}-c_{x4}\frac{t^{3}}{6}+c_{x3}\frac{t^{2}}{2}\\ \qquad+c_{x2}t+c_{x1}\\ \mathsf{x}_{x}=c_{x7}\frac{t^{7}}{5040}-c_{x6}\frac{t^{6}}{720}+c_{x5}\frac{t^{5}}{120}-c_{x4}\frac{t^{4}}{24}+c_{x3}\frac{t^{3}}{6}\\ \qquad+c_{x2}\frac{t^{2}}{2}+c_{x1}t+c_{x0}\enspace.\end{array}\right\} (15)

The optimal trajectory, therefore, is a 7th order polynomial in time. Enforcing the boundary conditions at t=0t=0 gives cx0=𝗉x0c_{x0}=\mathsf{p}_{x0}, cx1=𝗏x0c_{x1}=\mathsf{v}_{x0}, cx2=𝖺x0c_{x2}=\mathsf{a}_{x0} and cx3=𝗃x0c_{x3}=\mathsf{j}_{x0}, while the remaining coefficients, cx4c_{x4} through cx7c_{x7}, are obtained from the boundary conditions at t=𝖳t=\mathsf{T}:

[cx7cx6cx5cx4]=\medmath[𝖳𝟩5040𝖳𝟨720𝖳𝟧120𝖳𝟦24𝖳𝟨720𝖳𝟧120𝖳𝟦24𝖳𝟥6𝖳𝟧120𝖳𝟦24𝖳𝟥6𝖳𝟤2𝖳𝟦24𝖳𝟥6𝖳𝟤2𝖳]1[𝗉x𝖳(𝗉x0+𝗏x0𝖳+12𝖺x0𝖳2)+16𝗃x0𝖳3𝗏x𝖳(𝗏x0+𝖺x0𝖳+12𝗃x0𝖳2)𝖺x𝖳(𝖺x0+𝗃x0𝖳)𝗃x𝖳𝗃x0]\left[\begin{smallmatrix}c_{x7}\\ c_{x6}\\ c_{x5}\\ c_{x4}\end{smallmatrix}\right]=\medmath{\left[\begin{smallmatrix}\frac{\sf T^{7}}{5040}\enspace\frac{\sf-T^{6}}{720}\enspace\frac{\sf T^{5}}{120}\enspace\frac{\sf-T^{4}}{24}\\ \frac{\sf T^{6}}{720}\enspace\frac{\sf-T^{5}}{120}\enspace\frac{\sf T^{4}}{24}\enspace\frac{\sf-T^{3}}{6}\\ \frac{\sf T^{5}}{120}\enspace\frac{\sf-T^{4}}{24}\enspace\frac{\sf T^{3}}{6}\enspace\frac{\sf-T^{2}}{2}\\ \frac{\sf T^{4}}{24}\enspace\frac{\sf-T^{3}}{6}\enspace\frac{\sf T^{2}}{2}\enspace\sf-T\\ \end{smallmatrix}\right]^{-1}\left[\begin{smallmatrix}\mathsf{p}_{x\mathsf{T}}-(\mathsf{p}_{x0}+\mathsf{v}_{x0}\mathsf{T}+\frac{1}{2}\mathsf{a}_{x0}\mathsf{T}^{2})+\frac{1}{6}\mathsf{j}_{x0}\mathsf{T}^{3}\\ \mathsf{v}_{x\mathsf{T}}-(\mathsf{v}_{x0}+\mathsf{a}_{x0}\mathsf{T}+\frac{1}{2}\mathsf{j}_{x0}\mathsf{T}^{2})\\ \mathsf{a}_{x\mathsf{T}}-(\mathsf{a}_{x0}+\mathsf{j}_{x0}\mathsf{T})\\ \mathsf{j}_{x\mathsf{T}}-\mathsf{j}_{x0}\end{smallmatrix}\right]}

expressing the optimal trajectory coefficients as a function of the (yet unknown) free terminal time 𝖳\mathsf{T}.

The free terminal time 𝖳\mathsf{T} can be determined as follows. With the cxc_{x} trajectory coefficients are explicitly expressed in terms of 𝖳\mathsf{T}, one substitutes and writes the control input at the terminal time as

𝗎xT=840(𝗉x0𝗉x𝖳)𝖳4+360𝗏x0𝖳3+60𝖺x0𝖳2+4𝗃x0𝖳\mathsf{u}_{xT}=\frac{840(\mathsf{p}_{x0}-\mathsf{p}_{x\mathsf{T}})}{\mathsf{T}^{4}}+\frac{360\mathsf{v}_{x0}}{\mathsf{T}^{3}}+\frac{60\mathsf{a}_{x0}}{\mathsf{T}^{2}}+\frac{4\mathsf{j}_{x0}}{\mathsf{T}}

(control inputs at 𝖳\mathsf{T} in yy and zz are similarly obtained). Velocity, acceleration and jerk at time 𝖳\mathsf{T} are all set to zero, while the transversality condition (Athans and Falb, 1966) at 𝖳\mathsf{T} requires 𝖧(𝖳)=0\mathsf{H(T)}=0. Combining these conditions, with (14) results to

k+12𝗎𝖳2+λ𝖳,𝗎𝖳=0𝗎𝖳2=2k.k+\frac{1}{2}\mathinner{\!\left\lVert\mathsf{u_{T}}\right\rVert}^{2}+\langle\mathsf{\lambda_{T}},\mathsf{u_{T}}\rangle=0\Longrightarrow\mathinner{\!\left\lVert\mathsf{u_{T}}\right\rVert}^{2}=2k\enspace.

This is essentially an 8th degree polynomial equation which can now produce 𝖳\mathsf{T}. Indeed, if for the sake of brevity we set

l1840(px0px𝖳)\displaystyle l_{1}\!\triangleq 840(p_{x0}\!-\!p_{x\mathsf{T}}) m1360𝗏x0\displaystyle m_{1}\!\triangleq 360\,\mathsf{v}_{x0} n160𝖺x0\displaystyle n_{1}\!\triangleq\!60\,\mathsf{a}_{x0} o14𝗃x0\displaystyle o_{1}\!\triangleq\!4\,\mathsf{j}_{x0}

(work similarly for yy and zz using indices 22 and 33, respectively), and then substitute back we obtain the polynomial equation

2k𝖳8+i=03oi2𝖳6+2i=03nioi𝖳5+i=03(ni2+2mioi)𝖳4+2i=03(lioi+mini)𝖳3+i=03(mi2+2lini)𝖳2+2i=03limi𝖳+i=03li2=0.-2k\mathsf{T}^{8}+\sum_{i=0}^{3}o^{2}_{i}\,\mathsf{T}^{6}+2\sum_{i=0}^{3}n_{i}o_{i}\,\mathsf{T}^{5}\\ +\sum_{i=0}^{3}(n^{2}_{i}+2m_{i}o_{i})\,\mathsf{T}^{4}+2\sum_{i=0}^{3}(l_{i}o_{i}+m_{i}n_{i})\,\mathsf{T}^{3}\\ +\sum_{i=0}^{3}(m^{2}_{i}+2l_{i}n_{i})\,\mathsf{T}^{2}+2\sum_{i=0}^{3}l_{i}m_{i}\,\mathsf{T}+\sum_{i=0}^{3}l^{2}_{i}=0\enspace. (16)

This equation can be efficiently solved numerically, returning three pairs of complex conjugate roots and a single pair of real roots, of which one is negative and the other is positive—the only acceptable root.

The vehicle’s yaw angle ψ\psi is dynamically decoupled from its position. A reference trajectory for yaw can be constructed in a similar way, given that the relative degree of flat output ψ\psi is two. In the flat output space, therefore, the yaw dynamics can be expressed as as a double integrator. Let us define the yaw state vector Ψ(ψ,ψ˙)\Psi\triangleq(\psi,\dot{\psi})^{\intercal}, treating wψ¨w\triangleq\ddot{\psi} as the virtual control input for this dynamical subsystem. For the yaw trajectory we have boundary conditions on both sides, denoted Ψ(0)=Ψ0(ψ0,ψ˙0)\Psi(0)=\Psi_{0}\triangleq(\psi_{0},\dot{\psi}_{0}) and Ψ(𝖳)=Ψ𝖳(ψ𝖳,ψ˙𝖳)\Psi(\mathsf{T})=\Psi_{\mathsf{T}}\triangleq(\psi_{\mathsf{T}},\dot{\psi}_{\mathsf{T}}). The reference yaw trajectory is obtained as a solution of

{min120𝖳w(s)2dssubject toΨ=(ψ,ψ˙),ψ¨(t)=w(t)Ψ(0)=Ψ0,Ψ(𝖳)=Ψ𝖳,\begin{cases}\min\frac{1}{2}\int_{0}^{\mathsf{T}}w(s)^{2}\operatorname{d\!}s\\ \text{subject to}\\ \Psi=(\psi,\dot{\psi})\,,\enspace\ddot{\psi}(t)=w(t)\\ \Psi(0)=\Psi_{0},\enspace\Psi(\mathsf{T})=\Psi_{\mathsf{T}}\enspace,\end{cases} (17)

which is a 3rd order polynomial of the form

ψ(t)=γ0+γ1t+γ2t2+γ3t3,\psi(t)=\gamma_{0}+\gamma_{1}\,t+\gamma_{2}\,t^{2}+\gamma_{3}\,t^{3}\enspace,

with coefficients given as

γ0\displaystyle\gamma_{0} =ψ0\displaystyle=\psi_{0} γ2\displaystyle\gamma_{2} =12(6(ψ𝖳ψ0)𝖳22(ψ˙𝖳+2ψ˙0)𝖳)\displaystyle=\textstyle\frac{1}{2}\left(\frac{6(\psi_{\mathsf{T}}-\psi_{0})}{\mathsf{T}^{2}}-\frac{2(\dot{\psi}_{\mathsf{T}}+2\dot{\psi}_{0})}{\mathsf{T}}\right)
γ1\displaystyle\gamma_{1} =ψ˙0\displaystyle=\dot{\psi}_{0} γ3\displaystyle\gamma_{3} =16(6(ψ˙𝖳+ψ˙0)𝖳212(ψ𝖳ψ0)𝖳3).\displaystyle=\textstyle\frac{1}{6}\left(\frac{6(\dot{\psi}_{\mathsf{T}}+\dot{\psi}_{0})}{\mathsf{T}^{2}}-\frac{12(\psi_{\mathsf{T}}-\psi_{0})}{\mathsf{T}^{3}}\right)\enspace.

Figure 3 shows a sample evolution of the cost functional in (11) for some particular set of boundary conditions. The cost associated with the terminal time (dotted curve) increases while the integral of the snap (dashed curve) reduces with time. The vertical line represents the positive real root 𝖳\mathsf{T} of (16) which minimizes the total cost (solid curve).

Refer to caption
Figure 3: Temporal evolution of the optimization cost functional over time, showing how the variation of its two affine cost components contribute to its final value and yield a convex curve with respect to time.

In the context of the particular case study considered here, the best radiation detection performance has been shown to be achieved when the robot closes the distance between its sensor and the source as fast as possible (Sun and Tanner, 2015; Yadav and Tanner, 2019). This implies that aggressive reference trajectories are preferable, motivating us to select kk based on the maximum speed limitations of the mav. With this in mind, the methodology outlined above can thus provide motion plans that would be (a) feasible given the dynamic constraints of the vehicle, (b) as aggressive as possible, and (c) high-performance in terms of radiation detection accuracy.

While the reference trajectory is dynamically feasible by design, its conformity to actuation constraints is verified after its generation (Mueller et al., 2015) (see Fig. 4). Here, we have actuation (upper) bounds on the magnitude of the input thrust f=m𝖺𝗀f=m\mathinner{\!\left\lVert\mathsf{a}-\mathsf{g}\right\rVert} and on the square of the sum of roll and pitch angular velocities in terms of jerk and thrust, 𝗃2/f2\nicefrac{{\mathinner{\!\left\lVert\mathsf{j}\right\rVert}^{2}}}{{f^{2}}}. These actuation constraints stem from (a) equipment limitations, and (b) the maximum deceleration that the vehicle can undergo during an emergency stopping maneuver. In the reactive navigation architecture described here, emergency stopping maneuvers are engaged when the mav cannot find a safe path in its free workspace. Finally, a linear velocity constraint is imposed in order to reduce motion blur which would otherwise affect the robot’s visual inertial odometry (vio).

Refer to caption
Figure 4: The decision block diagram for the reference trajectory generation process. If the actuation limit check fails, the terminal time 𝖳\mathsf{T} is increased slightly by some δ𝖳\delta\mathsf{T} and the trajectory is regenerated.

Local Goal and Collision Costs

Once the ensemble of candidate reference trajectories is generated, the candidate trajectories that intersect with any obstacle is excluded from the ensemble. The methodology is motivated from earlier work of Yadav and Tanner (2020), and has been modified to suit the new trajectory generation methodology.

The obstacle point cloud 𝒫\mathcal{P} is first converted into a kd-Tree and each trajectory is discretized and represented in the form of a finite (size nn) sequence of points. Thereafter a query is run to find points of the kd-tree that lies within a ball of radius rr from each of these points on the candidate trajectory. Parameter rr is chosen so that it can fully enclose the quadrotor, with a suitable—based on how conservative with respect to the practical risk collision due to uncertainty or disturbances—safety margin. A candidate trajectory is said to be intersecting with an obstacle if any point (among the n points in which it is discretized) on it has any point of kd-tree within this ball of radius rr; such a trajectory is excluded from the ensemble.

Among the collision-free candidate trajectories, the optimal one should strike an acceptable balance between safety (against collision), and speed of convergence to the goal point. An illustration of the subsequent process of selecting this optimal reference trajectory is given in Fig. 5 (for details on the associated computational requirements, see Section 7). In lieu of a global planner which would otherwise sketch a complete path from initial to final configurations, the reactive local planner defines an intermediate point as the point in the (collision free) ensemble closest to the final goal (denoted IP in Fig. 5(a). It then assigns a cost to each trajectory, in the form of a linear weighted sum of two cost components: the first component is based on the distance of each trajectory end-point to the intermediate point, normalized over the maximum distance; the second component is a normalized collision cost that captures how close the trajectory comes to 𝒫\mathcal{P}.

Denote the total number of collision-free trajectories and the Euclidean distance between the end point of the iith trajectory and the intermediate point, pp and did_{i}, respectively. Set dmaxmaxidid_{\mathrm{max}}\triangleq\max_{i}d_{i}, and let r^r\hat{r}\geq r be an additional safety margin (over the radius around detected obstacles). The minimum distance ρi\rho_{i} of trajectory ii to obstacles is found by quering a kd-Tree over 𝒫\mathcal{P} and minimizing over the query response. With these data at hand, the collision cost for trajectory ii is found as\useshortskip

ccolli={1+r^4r^4[(ρir)2r^2]21+[(ρir)2r^2]2ifρirr^0otherwise .c_{\mathrm{coll}_{i}}=\begin{cases}\frac{1+\hat{r}^{4}}{\hat{r}^{4}}\cdot\frac{[(\rho_{i}-r)^{2}-\hat{r}^{2}]^{2}}{1+[(\rho_{i}-r)^{2}-\hat{r}^{2}]^{2}}&\text{if}\enspace\rho_{i}-r\leq\hat{r}\\ 0&\text{otherwise\;.}\end{cases}

The collision cost function normalizes the cost of each trajectory into the [0,1][0,1] interval. Thus any trajectory that touches a ball around its nearest obstacle is assigned a collision cost of 1, while any trajectory that lies at least r^\hat{r}-away from every obstacle incurs zero cost. All other trajectories are assigned costs within the (0,1)(0,1) interval. The end point of the trajectory with the lowest total cost becomes the local goal (i.e., within the fov) for the planner (marked with a green dot in Fig. 5(c)).

Selecting positive weights k1,k2(0,1)k_{1},\,k_{2}\in(0,1), the cost contributions of trajectory i{0,,p}i\in\{0,\ldots,p\} due to its proximity to the intermediate point and obstacles are combined into an aggregate cost expression

ci=k1didmax+k2ccolli.c_{i}=k_{1}\tfrac{d_{i}}{d_{\mathrm{max}}}+k_{2}\,c_{\mathrm{coll}_{i}}\enspace.

The trajectory with the least such cost is selected as the reference trajectory between the robot’s current position and the local goal.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 5: Assigning costs to collision-free trajectories. The trajectories with high costs are plotted as darker green lines. The trajectory with minimum total cost is selected as the final trajectory through which the robot traverses. The corresponding end point is the local goal.

The mav tracks a small portion of the reference trajectory, hereafter referred to as the control horizon. The length of the control horizon is dependent on the vehicle’s speed, its computational capabilities, and its sensor update rate. Each time the mav receives a sensor update, it generates a new reference trajectory (segment) and appends it to the end of the segment it is currently tracking. By design, the transition between subsequent reference trajectory segments is smooth. The process in between sensor updates constitutes a replanning cycle. This process is illustrated in Fig. 7 which shows different trajectories generated from the starting point until the end point. The complete implementation for planning, control and state estimation is open-source.111 https://github.com/indsy123/quad_navigation_and_target_tracking

6 Target Tracking

Object Detection based on ssd-MobileNetV2

The motivating application for this work is the detection of weak radiological material on moving ground vehicles, and for this to be achieved via aerial means of sensing and measurement in a timely manner, the mavs need to get close to their target as quickly as possible. To this end, the methodology of Section 5 is extended to dynamic target interception and tracking scenarios. Necessary extensions include the capability of the mav to autonomously detect its target within its fov, and estimate its target’s relative position and speed based on visual means. For this task an open source implementation of MobileNetV2 (Sandler et al., 2018) deep neural network in TensorFlow (Abadi et al., 2016) is utilized and combined with ssd (Liu et al., 2016) for object detection.

Refer to caption
(a)
Refer to caption
(b)
Figure 6: 6(a)) The mobile target in the experimental study. The detection box is shown in green while the image matching between two successive images is shown in the inset. 6(b)) Workflow of the target tracker operating at 15 Hz.

This implementation of ssd-MobileNetV2 has been trained on the standard coco dataset. The desired neural network is then trained via transfer learning on a dataset of approximately 500 images of the Jackal mobile robot from Clearpath Robotics (Fig. 6(a)). These images were collected from an Intel RealSense D435 camera onboard the mav in different indoor and outdoor environments, under different lightening conditions and varying background. The network utilizes a 300×300300\times 300 fixed image resizer to increase the inference speed. Adam optimizer was used with initial learning rate of 2×1042\times 10^{-4} and subsequently 10410^{-4}, 8×1058\times 10^{-5} and 4×1054\times 10^{-5} after 4500, 7000 and 10000 steps, respectively, for a total 2×1042\times 10^{4} steps. All other parameters are kept to their default values. The network is trained on nvidia gtx 1060 gpu and the trained model is subsequently converted into a TensorRT model to run fast inference on nvidia Jetson-Nano gpu onboard the mav.

Target 3D Position Estimation

Given a bounding box obtained from the dnn, the onboard rgb-d sensor data allow direct measurement of relative position of the target with respect to the quadrotor. First, all sift features are extracted for two consecutive detection boxes and matched using a flann-based matcher utilizing a Lowe’s ratio test (Lowe, 2004), and a ransac (ransac) based outlier rejection was implemented. Utilizing the disparity image from the rgb-d sensor, the (u,v)(u,v) position of all the inliers (i.e. the features on the target) in the focal plane can be converted to 3d position of the feature with respect to the camera by utilizing the disparity map. The average of these 3d positions provides an aggregate measurement of relative position of the target with respect to the mav. This aggregate relative position measurement is used by a Kalman filter, which based on a constant-acceleration motion model for the target, returns 3d position estimates of the target with respect to the mav’s cog and feeds it to the motion planning algorithm. The entire target position estimation workflow is showcased in Fig. 6(b).

The effectiveness of the velocity profile prescribed in (12) is pronounced in the case of intercepting a moving target. Note that in case of a static navigation end goal, the first factor (the time-dependent erf) increases slowly from 0 and converges to 11, while the second term (distance-dependent erf) starts from 11 and converges to 0; this steers the mav so that it starts at lower speed, progresses to near maximum allowed speeds during most of the navigation maneuver, and then smoothly slow down near its goal, thus mimicking a trapezoidal velocity profile. In contrast, in the case of moving target interception the distance-dependent erf in (12) converges instead to a fixed strictly positive value, allowing the mav align its velocity with that of the target and maintain a fixed distance to it.

7 Implementation Results

Refer to caption
Figure 7: Receding horizon planning in a cylindrical obstacle Poisson forest.

Numerical Testing

The reported planning and control framework was first tested in simulations based on the RotorS package (Furrer et al., 2016), within a Poisson forest-like environment with obstacle densities of 18 and 36 obstacles in a 100 m2 area. A typical rviz visualization of the resulting algorithm execution is shown in Fig. 7. As the mav flies from left to right, the figure marks in green color the different trajectories generated within each planning cycle, while the fov grid point ensemble is shown in white toward the end of the vehicle’s path. Figure 8 shows the probability of mission success as a function of mav speed at two different obstacle densities (cf. Karaman and Frazzoli (2012)).

Refer to caption
Figure 8: Mission (target interception with collision avoidance) success probability in Poisson forests as a function of obstacle density and mav speed. Solid line corresponds to left axis (0.18 obstacle/m2) while dashed line corresponds to right.

Experimental Testing

Refer to caption
Figure 9: A pair of quadrotors fitted with cots gm counters. The one on the left carries a gm-10 counter (Blackcat Systems) , while the one on the right features a gm-90 (more sensitive) gm counter from the same product line.

The custom-build mavs used for experimental testing shown in Fig. 9 are based on a dji Flamewheel F450 frame. Computationally, one of them features an on board Intel nuc Core i7-8650U quad core cpu@1.9 GHz×\times4 while the other has an Intel nuc Core i5-7300U dual core cpu@2.6 GHz×\times2. Both uses 16GB ram, 128GB ssd and a Pixhawk flight controller that is given the desired thrust magnitude and rate commands, which are then tracked using an onboard body rate controller. A point cloud is generated by an Intel RealSense-D435 depth rgb-d camera (640×\times480 pixel, 30 Hz) while the RealSense-T265 vi-sensor (2 848×\times800 pixel 30 Hz cameras with a 200 Hz imu) is used for inertial odometry. This lightweight sensor package provides reliable depth information for up to 5 m. A voxel filter reduces the density of the generated point cloud to a uniform size of 0.125 m, which is considered adequate for typical obstacle avoidance purposes. Open-vins provides state estimation and real time odometry at 30 Hz. Ultimately, the pipeline’s computational bottleneck is the inference script that can only be run at 15 Hz on the cpu of the intel nuc, and therefore the entire receding horizon planning and target tracking software is constrained at 15 Hz.

Over five different runs, each of overall trajectory length of 25 m, in both indoor and outdoor environments the 95% quartile of execution time is well below 0.02 seconds (Fig. 10). These execution times are almost half of those obtained on those systems in earlier studies (Yadav and Tanner, 2020) that uses an optimization based trajectory generation methodology. In this configuration, the mav flies safely among moderately dense outdoor obstacle environments at speeds of 4.5–5 m/s. These speeds surpass or are comparable to those reported in recent literature (Gao and Shen, 2016; Sikang Liu et al., 2016; Fragoso et al., 2018) without using any high-end and expensive sensors and their achievement is attributed to the ability to replan faster.

Refer to caption
Figure 10: Replanning execution time for the quadrotor with nuc Core i7-8650U cpu@1.9 GHz×\times4 processor, averaged over five different experimental runs. The number of trajectories generated in each planning cycle varied between 50 to 300 while the input pointcloud size was 800-1400. Box corresponds to 5-95 quartile while the median is marked red.

The video attachment that accompanies this paper features a number of experiments, the first two of which show the mav to navigate at speeds of 2 m/s and 4.5 m/s in an outdoor environment for a total of 40 m and 60 m respectively. The video showcases first and third person views of the cluttered scene traversed by the mav, alongside an rviz representation of generated trajectories amongst the sensed obstacles. The octomap featured in the video is provided for illustration purposes only to mark obstacle locations along the path of the vehicle, and is not used for planning. The third experiment included in the video attachment demonstrates the mav’s capability to avoid moving obstacles in its field of view, while the fourth showcases receding horizon navigation with combined obstacle and target tracking abilities. The latter utilizes a neural network with feature matching between two subsequent images.

Ultimately, the mav’s speed will be limited primarily by the computational capabilities and the need to implement a safe emergency stop in the absence of an effective motion plan—a possibility which cannot be eliminated in purely reactive and local planning methods. Since purely reactive approaches are supposed to rely exclusively on local information, convergence to the navigation goal cannot be guaranteed for all possible scenarios. (The reported reactive planning architecture can however be integrated with an exact global planner, and overcome these limitations at the price of using global information (Yadav and Tanner, 2021b).) This investigation naturally exposes the limits of purely reactive motion planning approaches. It is expected that knowledge of those limits, can guide the development of hybrid (local & global (Yadav and Tanner, 2019)) mav motion planning methodologies destined for deployment in environments where uncertainty is reasonably well characterized, in order to complement each other and operate robustly in real-world scenarios.

While on one hand the of perception from trajectory generation can induce latencies, their co-design using powerfull on board computation and the incorporation of pre-trained neural networks for trajectory generation can further boost vehicle speeds to impressive levels (Loquercio et al., 2021). To reach speeds up to 10 m/s, however, one would also require a very favorable power-to-weight ratio (e.g. \sim4.0 (Loquercio et al., 2021)), which may be challenging to maintain depending on the mission-mandated sensor payload. In this paper, the mavs featured a power-to-weight ratio in the 1.55–1.75 range. Besides an effort to achieve a more favorable power-to-weight ratio, we postulate that additional speed improvements can be achieved with the use of event cameras (Falanga et al., 2020).

Radiation Detection

The radiation detection algorithm is based on a Neyman-Pearson based, fixed time interval binary hypothesis test (refer to Pahlajani et al. (2014) for a more detailed exposition). At the heart of the test is a likelihood ratio of a statistic LTL_{T} (2) calculated based on the history of relative distance between the airborne radiation sensor and the hypothesized source in addition to the aggregated counts over the sensor’s (predetermined) integration time interval TT. This likelihood ratio is compared against a fixed threshold value γ\gamma that also depends on the relative distance and the acceptable bound on the probability of false alarm P𝖥𝖠P_{\mathsf{FA}}. The optimal value of pp^{*} is obtained by solving (5) and then the threshold is calculated by evaluating (7) at pp^{*}. The remaining parameters for this test, including background naturally occurring (background) radiation rate and sensor characteristics, are determined by computational and experimental calibration processes.

Our detection tests involve a sequence of controlled experiments in which the efficiency of the radiation sensors and the variation of its expected count rate as a function of its distance to the source is estimated. The mavs featured in Fig. 9 were deployed in both indoor and outdoor experiments, where their task was to locate and intercept a ground target (the remotely controlled ClearPath Robotics Jackal in Fig. 1) moving along an unspecified path with unknown but bounded speed. The ground robot carried an approximately 8 μ\muCi radioactivity source which the mavs had to detect.

Each experiment involved the mav tracking the ground vehicle for certain time TT. To minimize radiation exposure risk, Monte Carlo simulations using gazebo have been performed to compliment the experimental validation. The counts were generated using thinning algorithm in the simulations (Pasupathy, 2009). These simulations were used for the estimation of the minimum length TT of the time interval for radiation measurement collection that is more likely to yield a confident and accurate classification of the target relative to its radioactivity. This overall process suggested a sensor integration window set at T=100T=100 seconds for the gm-10 counter (at the median of the distribution with 5% and 95% percentiles estimated at 71 and 136 seconds, respectively), for a radiation source of activity around 8.2μ8.2\,\muCi. During that fixed time interval, the distance between sensor and source should be at most within 3–2.5 m, otherwise the source emissions completely blend into the background, rendering detection extremely unlikely given the mav’s operating time constraints imposed by on-board power. For that same source and range to sensor, the gm-90 counter appeared to need approximately T=70T=70 seconds, with a 5% and 95% percentiles at 65 and 96 seconds, respectively.

The receding horizon planning and control strategy of Section 5 ensures that the mav closes this distance as fast as possible, thus enabling the onboard Geiger counters to collect informative measurements. The mav has to maintain a certain minimum distance from the target to keep it in its camera’s limited fov; as a result, the relative distance should not be arbitrarily reduced. Variations in the relative distance between sensor and source can be attributed to motion perturbations, as the (remotely operated) target performs avoidance maneuvers. Although generally robust, the neural network does not furnish guarantees against false target identification, and this can contribute to relative distance estimate outliers (see e.g.  Fig. 11, around the 85th second).

Refer to caption
Figure 11: Detection parameters for GM-10 sensor as a function of decision time TT. Bound on probability of false alarm: dashed blue; bound on the probability of missed detection: dashed green; ratio logLT/γ\log\nicefrac{{L_{T}}}{{\gamma}}: solid red; sensor-source distance: solid magenta.

Figure 11 presents the results of one radiation detection experiment conducted in an indoor environment (an abandoned warehouse) using the mav that has GM-10 counter. It shows the evolution of the estimate of the relative distance dd between the mav and the ground robot as the latter moves with unknown and time-varying speed. The relative distance is estimated in real time via the target tracking pipeline described in the section on target tracking. The dashed curves in Fig. 11 indicate the evolution of Chernoff bounds on the probability of false alarm, P𝖥𝖠P_{\mathsf{FA}}, and probability of missed detection P𝖬P_{\mathsf{M}} (Pahlajani et al., 2014). The bound on the probability of false alarm appears to drop below the acceptable upper limit after approximately 50 seconds from the start of the experimental run, after which the bound on the probability of missed detection P𝖬P_{\mathsf{M}} also starts to slowly decrease monotonically—the latter is a decreasing function of the sensor integration time and distance between sensor and source (Pahlajani et al., 2014). The graph of the logarithm of the likelihood ratio LTL_{T} over the detection threshold γ\gamma over time is marked in red; this process is stochastic because it depends directly on the arrival time of gamma rays on the sensor. The initial segment of the red curve corresponds to the initial time period during which the constraint on PFAP_{FA} has not been satisfied and logLT/γ\log\nicefrac{{L_{T}}}{{\gamma}} has been kept at 0. The experiment is concluded at 95.5495.54 seconds and the likelihood ratio LTL_{T} exceeds its threshold value at 89.889.8 seconds indicating the presence of the radiation source on the ground target (marked with a black circle in the plot). The likelihood ratio had actually crossed the threshold before that time, but the experiment was continued because that event was observed significantly earlier than the recommended sensor integration window.

Figures 12(a) and 12(b) showcase two different runs where the mav featuring the gm-90 counter was utilized. The experimental run of Fig. 12(a) shows an instance where the mav did not have enough time to detect the source. This experiment was performed in the same indoor facility as that used for the run of Fig. 11. Here, the radiation sensor integration window is 56 seconds. The bound on the probability of missed detection is then still around 0.6, comparable to the conditions under which the detection of Fig. 11 was achieved, but this TT is below the 5% percentile for the recommended exposure time.

Refer to caption
(a)
Refer to caption
(b)
Figure 12: Detection parameters for GM-90 sensor as a function of decision time TT. Bound on probability of false alarm: dashed blue; bound on the probability of missed detection: dashed green; ratio logLT/γ\log\nicefrac{{L_{T}}}{{\gamma}}: solid red; sensor-source distance: solid magenta. 12(a)) Detection time interval 5656 seconds. 12(b)) Detection time interval 206206 seconds.

Figure 12(b) depicts the results of a longer chase by the mav carrying the gm-90 counter conducted outdoors. This time, the integration window was extended to more than 200 seconds. In addition to the effect of sensor integration window length on detection accuracy, Fig. 12(b) shows more clearly the evolution of the bounds on the decision test’s error probabilities P𝖥𝖠P_{\mathsf{FA}} and P𝖬P_{\mathsf{M}}. At the time of decision, the bound on the probability of miss, P𝖬P_{\mathsf{M}} is almost zero, indicating very high probability for accurate decision-making. Although the statistic logLT/γ\log\nicefrac{{L_{T}}}{{\gamma}} crosses becomes positive for the first time shortly after 70 seconds, at that time the bound P𝖬P_{\mathsf{M}} is around 0.3. It is of interest that towards the end of the integration window, the statistic logLT/γ\log\nicefrac{{L_{T}}}{{\gamma}} decreases, most likely due to the target being able to open up its distance with respect to the pursuing mav—which by that time was experiencing a drop in its power reserves; same trend can be noticed in Fig. 12(a).

The accompanying video attachment includes two experiments of target tracking with radiation detection (experiments #5 and #6). These are cases where the Jackal robot is steered manually while carrying an 8.2μ8.2\,\muCi source. The plot at the bottom left of the video represents graphically the evolution of the detection parameters as depicted in Figs. 11 and 12(b). To reduce the size of the video file, only selected portions of these experimental runs are included and the video is accelerated at four times the normal speed.

8 Conclusions

The challenges that a completely autonomous mav equipped with short-ranged sensors faces when tasked to navigate in a completely unknown and cluttered environment are more pronounced when the complete data and signal processing (such as radiation detection) pipeline needs to be run onboard, and the motion of the vehicle can have adverse effect on the quality and quantity of the data. Under these conditions, a motion planner that aims at operating the robot within a reasonable safety envelope has to strike a balance between safety, platform limitations, and mission-informed active sensing. In this context, an adaptive, purely reactive receding horizon motion planning and control strategy has been developed that co-designs the planning, safe navigation, target tracking and decision-making components. Not only can such a navigation strategy be remarkably effective even in the absence of global environment and platform information precludes formal completeness guarantees, but it can also can be integrated with an exact global planner when prior knowledge permits, to furnish formal performance guarantees. The work thus pushes the envelope on what is achievable with purely reactive but deliberate mav navigation in cluttered environments, particularly at the low-end of the technology and sensor sophistication spectrum.

{acks}

Special thanks to Paul Huang and his RPNG group, specifically to Kevin Eckenhoff and Patrick Geneva for his contributions on perception. {funding} This work was made possible in part by DTRA grant #HDTRA1-16-1-0039 and ARL grant #W911NF-20-2-0098.

References

  • Abadi et al. (2016) Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y and Zheng X (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems.
  • Arslan and Koditschek (2019) Arslan O and Koditschek DE (2019) Sensor-based reactive navigation in unknown convex sphere worlds. The International Journal of Robotics Research 38(2-3): 196–223.
  • Athans and Falb (1966) Athans M and Falb PL (1966) Optimal control : an introduction to the theory and its applications. McGraw-Hill New York ; Sydney.
  • Cadena et al. (2016) Cadena C, Carlone L, Carrillo H, Latif Y, Scaramuzza D, Neira J, Reid I and Leonard JJ (2016) Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on Robotics 32(6): 1309–1332.
  • Chen et al. (2016) Chen J, Liu T and Shen S (2016) Tracking a moving target in cluttered environments using a quadrotor. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 446–453. 10.1109/IROS.2016.7759092.
  • Chojnacki and Indelman (2018) Chojnacki M and Indelman V (2018) Vision-based dynamic target trajectory and ego-motion estimation using incremental light bundle adjustment. International Journal of Micro Air Vehicles 10(2): 157–170.
  • Falanga et al. (2020) Falanga D, Kleber K and Scaramuzza D (2020) Dynamic obstacle avoidance for quadrotors with event cameras. Science Robotics 5(40): eaaz9712.
  • Fragoso et al. (2018) Fragoso AT, Cigla C, Brockers R and Matthies LH (2018) Dynamically feasible motion planning for micro air vehicles using an egocylinder. In: Hutter M and Siegwart R (eds.) Field and Service Robotics. Cham: Springer International Publishing. ISBN 978-3-319-67361-5, pp. 433–447.
  • Furrer et al. (2016) Furrer F, Burri M, Achtelik M and Siegwart R (2016) Robot Operating System (ROS): The Complete Reference (Volume 1), chapter RotorS—A Modular Gazebo MAV Simulator Framework. Cham: Springer International Publishing. ISBN 978-3-319-26054-9, pp. 595–625.
  • Gao and Shen (2016) Gao F and Shen S (2016) Online quadrotor trajectory generation and autonomous navigation on point clouds. In: proceedings of IEEE International Symposium on Safety, Security, and Rescue Robotics. pp. 139–146.
  • Geneva et al. (2020) Geneva P, Eckenhoff K, Lee W, Yang Y and Huang G (2020) Openvins: A research platform for visual-inertial estimation. In: Proceedings of the IEEE International Conference on Robotics and Automation. Paris, France.
  • Giusti et al. (2016) Giusti A, Guzzi J, Cireşan DC, He F, Rodríguez JP, Fontana F, Faessler M, Forster C, Schmidhuber J, Caro GD, Scaramuzza D and Gambardella LM (2016) A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters 1(2): 661–667.
  • Hornung et al. (2013) Hornung A, Wurm KM, Bennewitz M, Stachniss C and Burgard W (2013) Octomap: an efficient probabilistic 3d mapping framework based on octrees. Autonomous Robots 34(3): 189–206.
  • Karaman and Frazzoli (2012) Karaman S and Frazzoli E (2012) High-speed flight in an ergodic forest. CoRR abs/1202.0253.
  • Landry et al. (2016) Landry B, Deits R, Florence PR and Tedrake R (2016) Aggressive quadrotor flight through cluttered environments using mixed integer programming. In: proceedings of IEEE International Conference on Robotics and Automation. pp. 1469–1475.
  • Lee et al. (2010) Lee T, Leoky M and McClamroch NH (2010) Geometric tracking control of a quadrotor uav on se(3). In: Proceedings of 49th IEEE Conference on Decision and Control. pp. 5420–5425.
  • Liu et al. (2017) Liu S, Watterson M, Mohta K, Sun K, Bhattacharya S, Taylor CJ and Kumar V (2017) Planning dynamically feasible trajectories for quadrotors using safe flight corridors in 3-d complex environments. IEEE Robotics and Automation Letters 2(3): 1688–1695.
  • Liu et al. (2016) Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY and Berg AC (2016) Ssd: Single shot multibox detector. Lecture Notes in Computer Science : 21–37.
  • Loquercio et al. (2020) Loquercio A, Kaufmann E, Ranftl R, Dosovitskiy A, Koltun V and Scaramuzza D (2020) Deep drone racing: From simulation to reality with domain randomization. IEEE Transactions on Robotics 36(1): 1–14.
  • Loquercio et al. (2021) Loquercio A, Kaufmann E, Ranftl R, Müller M, Koltun V and Scaramuzza D (2021) Learning high-speed flight in the wild. Science Robotics 6(59): eabg5810.
  • Loquercio et al. (2018) Loquercio A, Maqueda AI, del-Blanco CR and Scaramuzza D (2018) Dronet: Learning to fly by driving. IEEE Robotics and Automation Letters 3(2): 1088–1095.
  • Lowe (2004) Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2): 91–110.
  • Mellinger and Kumar (2011) Mellinger D and Kumar V (2011) Minimum snap trajectory generation and control for quadrotors. In: Proceedings of IEEE International Conference on Robotics and Automation. pp. 2520–2525.
  • Mohta et al (2018) Mohta et al K (2018) Fast, autonomous flight in gps-denied and cluttered environments. Journal of Field Robotics 35(1): 101–120.
  • Mueller et al. (2015) Mueller MW, Hehn M and D’Andrea R (2015) A computationally efficient motion primitive for quadrocopter trajectory generation. IEEE Transactions on Robotics 31(6): 1294–1310.
  • Nemzek et al. (2004) Nemzek RJ, Dreicer JS, Torney DC and Warnock TT (2004) Distributed sensor networks for detection of mobile radioactive sources. IEEE Transactions on Nuclear Science 51(4): 1693–1700.
  • Oleynikova et al. (2016) Oleynikova H, Burri M, Taylor Z, Nieto J, Siegwart R and Galceran E (2016) Continuous-time trajectory optimization for online uav replanning. In: proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 5332–5339.
  • Oleynikova et al. (2017) Oleynikova H, Taylor Z, Fehr M, Siegwart R and Nieto J (2017) Voxblox: Incremental 3d euclidean signed distance fields for on-board mav planning. In: proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 1366–1373.
  • Pahlajani et al. (2014) Pahlajani CD, Sun J, Poulakakis I and Tanner HG (2014) Error probability bounds for nuclear detection: Improving accuracy through controlled mobility. Automatica 50(10): 2470–2481.
  • Pasupathy (2009) Pasupathy R (2009) Generating nonhomogeneous Poisson processes. In: Wiley Encyclopedia of Operations Research and Management Science. Wiley.
  • Penin et al. (2018) Penin B, Giordano PR and Chaumette F (2018) Vision-based reactive planning for aggressive target tracking while avoiding collisions and occlusions. IEEE Robotics and Automation Letters 3(4): 3725–3732.
  • Redmon et al. (2016) Redmon J, Divvala S, Girshick R and Farhadi A (2016) You only look once: Unified, real-time object detection. In: proceedings of IEEE Conference on Computer Vision and Pattern Recognition. pp. 779–788.
  • Richter et al. (2016) Richter C, Bry A and Roy N (2016) Polynomial Trajectory Planning for Aggressive Quadrotor Flight in Dense Indoor Environments. Cham: Robotics Research: The 16th International Symposium ISRR. Springer International Publishing. ISBN 978-3-319-28872-7, pp. 649–666.
  • Rimon and Koditschek (1992) Rimon E and Koditschek DE (1992) Exact robot navigation using artificial potential functions. IEEE Transactions on Robotics and Automation 8(5): 501–518. 10.1109/70.163777.
  • Ryll et al. (2019) Ryll M, Ware J, Carter J and Roy N (2019) Efficient trajectory planning for high speed flight in unknown environments. In: proceedings of International Conference on Robotics and Automation. pp. 732–738.
  • Sandler et al. (2018) Sandler M, Howard A, Zhu M, Zhmoginov A and Chen L (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4510–4520.
  • Sikang Liu et al. (2016) Sikang Liu, Watterson M, Tang S and Kumar V (2016) High speed navigation for quadrotors with limited onboard sensing. In: proceedings of IEEE International Conference on Robotics and Automation. pp. 1484–1491.
  • Smolyanskiy et al. (2017) Smolyanskiy N, Kamenev A, Smith J and Birchfield S (2017) Toward low-flying autonomous mav trail navigation using deep neural networks for environmental awareness.
  • Snyder (1975) Snyder DL (1975) Random point processes. Wiley New York. ISBN 0471810215.
  • Sun and Tanner (2015) Sun J and Tanner HG (2015) Constrained decision making for low-count radiation detection by mobile sensors. Autonomous Robots 39(4): 519–536.
  • Tanner and Piovesan (2010) Tanner HG and Piovesan JL (2010) Randomized receding horizon navigation. IEEE Transactions on Automatic Control 55(11): 2640–2644.
  • Thomas et al. (2017) Thomas J, Welde J, Loianno G, Daniilidis K and Kumar V (2017) Autonomous flight for detection, localization, and tracking of moving targets with a small quadrotor. IEEE Robotics and Automation Letters 2(3): 1762–1769. 10.1109/LRA.2017.2702198.
  • Vasilopoulos et al. (2020) Vasilopoulos V, Pavlakos G, Bowman SL, Caporale JD, Daniilidis K, Pappas GJ and Koditschek DE (2020) Reactive semantic planning in unexplored semantic environments using deep perceptual feedback. IEEE Robotics and Automation Letters 5(3): 4455–4462.
  • Wang et al. (2007) Wang CC, Thorpe C, Thrun S, Hebert M and Durrant-Whyte H (2007) Simultaneous localization, mapping and moving object tracking. The International Journal of Robotics Research 26(9): 889–916.
  • Yadav and Tanner (2017) Yadav I and Tanner HG (2017) Controlled mobile radiation detection under stochastic uncertainty. IEEE Control Systems Letters 1(1): 194–199.
  • Yadav and Tanner (2019) Yadav I and Tanner HG (2019) Mobile radiation source interception by aerial robot swarms. In: 2019 International Symposium on Multi-Robot and Multi-Agent Systems. pp. 63–69.
  • Yadav and Tanner (2020) Yadav I and Tanner HG (2020) Reactive receding horizon planning and control for quadrotors with limited on-board sensing. In: proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 7058–7063.
  • Yadav and Tanner (2021a) Yadav I and Tanner HG (2021a) Exact decentralized receding horizon planning for multiple aerial vehicles. In: Proceedings of the IEEE Conference on Decision and Control. (to appear).
  • Yadav and Tanner (2021b) Yadav I and Tanner HG (2021b) Exact decentralized receding horizon planning for multiple aerial vehicles. In: Conference on Decision and Control. (Accepted).
  • Yuan et al. (2019) Yuan L, Reardon C, Warnell G and Loianno G (2019) Human gaze-driven spatial tasking of an autonomous mav. IEEE Robotics and Automation Letters 4(2): 1343–1350.