This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\sidecaptionvpos

figuret

11institutetext: Department of Mechanical Engineering
Department of Computer Science
Yale University, New Haven CT 06511, USA
11email: {henry.berger,ian.abraham}@yale.edu

RAnGE: Reachability Analysis
for Guaranteed Ergodicity

Henry Berger    Ian Abraham
Abstract

This paper investigates performance guarantees on coverage-based ergodic exploration methods in environments containing disturbances. Ergodic exploration methods generate trajectories for autonomous robots such that time spent in each area of the exploration space is proportional to the utility of exploring in the area. We find that it is possible to use techniques from reachability analysis to solve for optimal controllers that guarantee ergodic coverage and are robust against disturbances. We formulate ergodic search as a differential game between the controller optimizing for ergodicity and an external disturbance, and we derive the reachability equations for ergodic search using an extended-state Bolza-form transform of the ergodic problem. Contributions include the computation of a continuous value function for the ergodic exploration problem and the derivation of a controller that provides guarantees for coverage under disturbances. Our approach leverages neural-network-based methods to solve the reachability equations; we also construct a robust model-predictive controller for comparison. Simulated and experimental results demonstrate the efficacy of our approach for generating robust ergodic trajectories for search and exploration on a 1D system with an external disturbance force.

1 Introduction

Autonomous mobile robots have been used for a number of search and exploration applications, including search and rescue [1, 2], mapping [3, 4], and extraterrestrial exploration [5]. One challenge is that many real-world applications of autonomous search occur in areas with hazards that impede search efforts. Hazards for autonomous mobile robots can be broadly categorized into two classes: static obstacles, such as walls or objects, which do not vary with time; and dynamic disturbances, such as wind and moving obstacles, which can vary over time. It is advantageous to find control solutions that provide guarantees on effective coverage, even in the presence of hazards. For our code, see https://github.com/ialab-yale/RAnGE.

Recent advances in ergodic coverage-based exploration algorithms have demonstrated their efficacy for generating trajectories that provide coverage of a search area [6]. Ergodic exploration methods optimize trajectories such that over time, a robot’s trajectory spends time in each area of a search space proportional to the information to be gained in the area [7], providing sparser coverage in low-information areas and more meticulous coverage in high-information areas. However, current ergodic controllers have no formal performance guarantees in the presence of disturbances. Therefore, this paper investigates whether it is possible to provide performance guarantees on coverage-based ergodic control methods under dynamic disturbances.

Refer to caption
Figure 1: Overview of RAnGE. An agent explores an area with distributed information, subject to a disturbance (a), using a precomputed value function (b) to achieve comprehensive ergodic coverage (c). Note that in (b), higher values denote less desirable states.

Methods in reachability analysis offer a path toward providing such performance guarantees on ergodic control problems. Reachability analysis is a method for computing control policies with performance guarantees in worst-case disturbance scenarios [8, 9, 10]. Unfortunately, the prerequisite problem structure for using reachability analysis is not present in the canonical ergodic control problem formulation. In this paper, we investigate suitable formulations to construct reachability-based controllers for ergodic search that guarantee ergodic coverage of an area in the presence of dynamic external disturbances.

Our approach structures the ergodic control problem as a differential game between control and external disturbance force, and we employ a Bolza-form augmented-state transform of the ergodic metric that has also been used in other ergodicity-related work [7, 11, 12]. We show that the problem transform is equivalent to the original ergodic control problem and demonstrate compatibility with existing Hamilton-Jacobi-Isaacs (HJI) reachability methods [9]. We take advantage of computational advancements in HJI methods [13] to solve controllers capable of generating robust ergodic trajectories, although the limitations of these computational methods restrict the complexity of problems for which our controller can currently be computed. In summary, our main contributions are as follows:

  • Derivation of Hamilton-Jacobi-Isaacs reachability conditions for the problem of ergodic exploration;

  • Computation of a robust ergodic controller under dynamic disturbances;

  • Demonstration of the effectiveness of robust reachability-based ergodic controllers for 1D coverage on a double-integrator system (illustrated in Figure 1); and

  • Derivation of a robust ergodic model-predictive controller (reMPC) for ergodic exploration in the presence of disturbance.

This paper is structured as follows: Section 2 outlines the related work, Section 3 provides an overview of preliminary background information on ergodic exploration and reachability analysis, Section 4 derives the main contribution of this work, Section 5 demonstrates performance results of the proposed method, and Section 6 provides a discussion and conclusion. The appendices provide implementation details, proofs, and an additional derivation.

2 Related Work

2.1 Reachability Analysis

Robots operating in the real world are often subject to disturbances, so it is desirable to have guarantees of performance for robotic systems in worst-case conditions. Reachability analysis is a verification technique for robotic systems under such performance scenarios. For a bounded, time-varying disturbance, these analysis methods can determine the worst-case disturbance and provide a lower bound on performance in all possible conditions. Reachability has already been used for a variety of autonomous robotics applications, including control and obstacle avoidance for drones [10, 14], collision avoidance for airplanes [13, 15], and safe navigation for autonomous vehicles [13, 14].

Reachability analysis requires finding the viscosity solution to the Hamilton-Jacobi-Isaacs (HJI) partial differential equation [8, 16]; for most applications, this cannot be done analytically and must be done numerically. For any complex problem where an analytical solution is not feasible, the numerical solution is only an approximation of the value function. In some cases, it is possible to guarantee that all inaccuracies are conservative, which maintains the performance guarantees [10, 17]. In other cases, the reachability-based controllers cannot quite provide the formal guarantees promised by reachability analysis, but as the accuracy of the numerical solution improves, a violation of the guarantees becomes less likely.

Much research on reachability analysis has focused on low/dimensional systems. Additionally, due to the difficulty of learning the value function, reachability is often used modularly to provide safety guarantees (e.g., collision avoidance), without providing performance guarantees relating to the robot’s main objective [18, 19]. In this work, we generate approximate solutions to the HJI problem for the case of ergodic exploration, a high-dimensional problem where reachability is used to guarantee performance, not merely safety.

The Curse of Dimensionality.

The HJI equation is often solved through discretizing the state space on a grid. However, the computational cost of this approach scales exponentially with the problem state dimensionality, and as a result, grid-based solutions to the HJI equation are limited to low/dimensional problems. State decomposition methods [10, 20] can mitigate this scaling, but many high-dimensional problems are still difficult to solve on a grid. Furthermore, decomposition methods compute individual level sets of the value function, not a complete, differentiable value function.

Recent advances using neural networks for solving differential equations [21] have provided new methods for solving the reachability equations [17, 22, 13]. For a neural network, the computational time required to find a satisfactory solution depends on the complexity of the solution, rather than strictly on the dimensionality of the state space. Higher-dimensional problems will often have more complex value functions and therefore take longer to learn, but the scaling is better than exponential [13]. Neural-network-based solvers have demonstrated solutions of the reachability equations for 10D problems with disturbance [13] and 30D problems without disturbance [22].

2.2 Ergodic Exploration

In this paper, we apply reachability to a quintessentially trajectory-based optimization problem: coverage-based exploration. In general, the goal of exploration is to maximize coverage of a search area, and for problems involving a single mobile robot, coverage is inherently a property of an entire trajectory, not of an instantaneous state.

For discretized exploration problems, a variety of methods already exist to provide formal guarantees on performance. In discrete time and space, exploration can be represented as a traveling salesperson problem, enabling the application of fast, locally optimal algorithms [23] or slower, formally guaranteed globally optimal algorithms [24]. Voronoi methods can be used to guarantee maximal coverage of a continuous exploration area by a set of agent locations [25, 26]; these locations can be treated as the static locations of a swarm of agents or as the discretized trajectory of a single agent, but Voronoi methods cannot generate continuous trajectories. The motions of most robotic systems are continuous in time and space, so discrete-time and discrete-space methods necessarily introduce simplifications, which can lead to non-optimal results.

Ergodic exploration is a commonly-used technique for continuous-time, continuous-space exploration. Ergodic controllers seek to minimize the difference between the (expected) spatial distribution of information and the time-averaged spatial distribution of an agent’s trajectory [7]. Dressel and Kochenderfer [6] proved that ergodic exploration is an optimal strategy for gathering information if, as is often the case in real-world applications, the amount of information in a location is depleted as the agent spends time in that location.

Ergodic exploration has been combined with formally guaranteed static obstacle avoidance, using methods such as vector fields [27] and control barrier functions [28]. In this work, we introduce formal performance guarantees with time-varying disturbances into ergodic exploration, broadening the set of situations in which ergodic search can safely and effectively be used.

3 Problem Setup and Definitions

We consider a general system with state x𝒳nx\in\mathcal{X}\subseteq\mathbb{R}^{n}, control input u𝒰nuu\in\mathcal{U}\subseteq\mathbb{R}^{n_{u}}, disturbance input d𝒟ndd\in\mathcal{D}\subseteq\mathbb{R}^{n_{d}}, and (possibly nonlinear) dynamics

x˙(t)=f(x(t),u(t),d(t)).\dot{x}(t)=f(x(t),u(t),d(t)). (1)

In this work, we consider the system over a finite time interval t[t0,tf]t\in[{t_{0}},{t_{f}}]. The control input is generated by a control policy u():𝒳×[t0,tf]𝒰u(\cdot):\mathcal{X}\times[{t_{0}},{t_{f}}]\to\mathcal{U}, also called the controller; the disturbance input is generated by a disturbance policy d():𝒳×[t0,tf]×𝒰𝒟d(\cdot):\mathcal{X}\times[{t_{0}},{t_{f}}]\times\mathcal{U}\to\mathcal{D}, also called the disturbance.

We assume that the disturbance has an instantaneous information advantage, meaning that the disturbance can respond instantaneously to the controller, although it cannot anticipate the controller’s future actions [9]. More concretely, at any time tt, the control is a function of xx and tt only, whereas the disturbance can be a function of xx, tt, and uu.

3.1 Ergodic Exploration

Ergodicity is a measure of the effectiveness of a trajectory in covering distributed information in a known bounded space. We consider exploration of a vv-dimensional space =[0,L0]×[0,L1]×[0,Lv1]v\mathcal{M}=[0,L_{0}]\times[0,L_{1}]\times\cdots[0,L_{v-1}]\subset\mathbb{R}^{v}, where LiL_{i} are bounds on the exploration space. We define a function M:𝒳M:\mathcal{X}\to\mathcal{M} that maps state x𝒳x\in\mathcal{X} to a point in the exploration space mm\in\mathcal{M}.111Often, the exploration task only involves a subset of the agent’s state (e.g., pose), though the state includes additional variables (e.g., velocity). The mapping MM between 𝒳\mathcal{X} and \mathcal{M} accounts for this distinction; for example, MM could be a selection matrix that selects only the pose components of the state.

Definition 3.1 (Ergodicity).

Given a function ϕ:>0\phi:\mathcal{M}\to\mathbb{R}_{>0}, an ergodic trajectory x():[t0,tf]𝒳x(\cdot):[{t_{0}},{t_{f}}]\to\mathcal{X} is one that spends time in each region of \mathcal{M} in proportion to the integral of ϕ\phi over that region, such that

limt1tt0t0tF(Mx(τ))𝑑τ=F(m)ϕ(m)𝑑m\lim_{t\to\infty}{\frac{1}{t-{t_{0}}}\int_{{t_{0}}}^{t}{F(M\circ x(\tau))d\tau}}=\int_{\mathcal{M}}{F(m)\phi(m)dm} (2)

for all Lebesgue integrable functions F:F:\mathcal{M}\to\mathbb{R} [7].

The definition of ϕ\phi is problem-specific. For search and exploration applications, ϕ\phi is an information density function that represents the probability of discovering information at a location. Following [7], ergodicity is often measured by a comparison in the Fourier domain between the information density ϕ\phi and the time-averaged trajectory statistics, which are defined as follows.

Definition 3.2 (Time-averaged trajectory statistics).

The time-averaged location statistics of the trajectory are given by

c(x(),t)1tt0t0tδ[M(x(τ))]𝑑τ,c(x(\cdot),t)\coloneqq\frac{1}{t-{t_{0}}}\int_{{t_{0}}}^{t}\delta[M(x(\tau))]d\tau, (3)

where δ\delta is the Dirac delta function.

In order to measure whether a trajectory is ergodic, we define a metric on ergodicity.

Definition 3.3 (Spectral ergodic metric).

Following [7], the ergodic metric is given by

(x(),t)k𝒦Λk(ck(x(),t)ϕk)2,\mathcal{E}(x(\cdot),t)\coloneqq\sum_{k\in\mathcal{K}}{\Lambda_{k}\left(c_{k}(x(\cdot),t)-\phi_{k}\right)^{2}}, (4)

where the time-average trajectory statistics cc and the information density function ϕ\phi are encoded in the coefficients ckc_{k} and ϕk\phi_{k}, respectively, calculated using the Fourier transform

ck(x(),t)1tt0t0tFk(Mx(τ))𝑑τ,ϕkFk(m)ϕ(m)𝑑m,c_{k}(x(\cdot),t)\coloneqq\frac{1}{t-{t_{0}}}\int_{{t_{0}}}^{t}{F_{k}(M\circ x(\tau))d\tau},\ \ \phi_{k}\coloneqq\int_{\mathcal{M}}{F_{k}\left(m\right)\phi(m)dm}, (5)

for modes k𝒦={0,1,2,,N1}vk\in\mathcal{K}=\{0,1,2,...,N-1\}^{v}. Here, the set {Λk}\{\Lambda_{k}\in\mathbb{R}\} is a sequence of weighting coefficients defined for each k𝒦k\in\mathcal{K} as Λk(1+k22)s\Lambda_{k}\coloneqq(1+\lVert k\rVert_{2}^{2})^{-s}, where s(v+1)/2s\coloneqq(v+1)/2.

We use the cosine basis function Fk(m)1hki=0v1cos(kiπmiLi),F_{k}(m)\coloneqq\frac{1}{h_{k}}\prod_{i=0}^{v-1}{\cos{\left(\frac{k_{i}\pi m_{i}}{L_{i}}\right)}}, where 1hk\frac{1}{h_{k}} is a normalizing constant [7], but other orthonormal bases would also suffice. The ergodic metric measures how far a trajectory is from ergodicity. The canonical ergodic trajectory-optimization problem is the minimization of the ergodic metric with respect to a given information metric ϕ\phi [7, 29] (see Problem 2 below).

3.2 Reachability Analysis

Reachability analysis is a technique for solving differential games via the Hamilton-Jacobi-Isaacs (HJI) partial differential equation, which can be used to model safety requirements in control systems. We first consider a problem in which, subject to dynamic constraints, the controller and disturbance seek to minimize and maximize, respectively, a cost function with Bolza form (Problem 1).

Problem 1 (Bolza-form optimal control with disturbance)

Given a terminal cost h:𝒳0h:\mathcal{X}\to\mathbb{R}_{\geq 0} and a running cost g:𝒳×𝒰0g:\mathcal{X}\times\mathcal{U}\to\mathbb{R}_{\geq 0}, for some choice of x0x_{0} and t0{t_{0}}, find

u()argminu()maxd(){J(x0,t0,u(),d())},u^{*}(\cdot)\coloneqq\arg\min_{u(\cdot)}\max_{d(\cdot)}\left\{J(x_{0},{t_{0}},u(\cdot),d(\cdot))\right\}, (6a)
where the cost function JJ is defined as
J(x0,t0,u(),d())h(x(tf))+t0tfg(x(τ),u(τ))𝑑τ,J\left(x_{0},{t_{0}},u(\cdot),d(\cdot)\right)\coloneqq h(x({t_{f}}))+\int_{{t_{0}}}^{{t_{f}}}{g(x(\tau),u(\tau))d\tau}, (6b)
subject to the following conditions (t[t0,tf]\forall t\in[{t_{0}},{t_{f}}], where applicable):
u(t)\displaystyle u(t) =u(x(t),t)\displaystyle=u(x(t),t) (6c)
d(t)\displaystyle d(t) =d(x(t),t,u(t))\displaystyle=d(x(t),t,u(t)) (6d)
x˙(t)\displaystyle\dot{x}(t) =f(x(t),u(t),d(t))\displaystyle=f(x(t),u(t),d(t)) (6e)
x(t0)\displaystyle x({t_{0}}) =x0.\displaystyle=x_{0}. (6f)
Definition 3.4 (Value function).

The value function V:𝒳×[t0,tf]V:\mathcal{X}\times[{t_{0}},{t_{f}}]\to\mathbb{R}, also called the optimal cost-to-go function, is the cost of the remainder of the trajectory, starting at the given time and state, ending at time tf{t_{f}}, with optimal control and disturbance.

V(x,t)=minu()maxd(){J(x,t,u(),d())}.V(x,t)=\min_{u(\cdot)}\max_{d(\cdot)}\left\{J(x,t,u(\cdot),d(\cdot)\right)\}. (7)

As proved in Theorem 4.1 of [16], the value function for a Bolza-form optimization problem (Problem 1) is a viscosity solution to the Hamilton-Jacobi-Isaacs optimality conditions [16].

Definition 3.5 (Hamilton-Jacobi-Isaacs optimality conditions).
For the Bolza-form optimization problem (Problem 1), the value function V:𝒳×[t0,tf]V:\mathcal{X}\times[{t_{0}},{t_{f}}]\to\mathbb{R} satisfies the conditions
Vt=H(x,t),t[t0,tf],x𝒳-\frac{\partial V}{\partial t}=H(x,t),\ \forall t\in[{t_{0}},{t_{f}}],\forall x\in\mathcal{X} (8a)
V(x,tf)=h(x),x𝒳,V(x,{t_{f}})=h(x),\ \forall x\in\mathcal{X}, (8b)
where the Hamiltonian HH is defined as
H(x,t)minu𝒰maxd𝒟{g(x,u)+(V(x,t)x)f(x,u,d)}.H(x,t)\coloneqq\min_{u\in\mathcal{U}}\max_{d\in\mathcal{D}}{\Bigr{\{}g(x,u)+\left(\frac{\partial V(x,t)}{\partial x}\right)^{\intercal}f(x,u,d)\Bigl{\}}}. (8c)

Once value function VV is known, the HJI equation (8a) and the definition of the Hamiltonian (8c) entail that the optimal controller u()u^{*}(\cdot) and worst-case disturbance d()d^{*}(\cdot) can be calculated directly as

u(x,t)=argminu𝒰{\displaystyle u^{*}(x,t)=\arg\min_{u\in\mathcal{U}}\bigg{\{} maxd𝒟{H^(x,t,u,d)}}\displaystyle\max_{d\in\mathcal{D}}{\Bigr{\{}\hat{H}(x,t,u,d)\Bigl{\}}}\bigg{\}} (9a)
d(x,t,u)=arg\displaystyle d^{*}(x,t,u)=\arg maxd𝒟{H^(x,t,u,d)},\displaystyle\max_{d\in\mathcal{D}}{\Bigr{\{}\hat{H}(x,t,u,d)\Bigl{\}}}, (9b)

where H^\hat{H} represents the target of the minimax condition in the Hamiltonian HH [30]. This optimal controller is one of the main results of reachability analysis, since the controller can be used as a robust policy against worst-case disturbance. We seek to find such an optimal controller for exploration problems, in order to provide sufficient coverage guarantees for exploration with disturbance.

4 Reachability Analysis for Guaranteed Ergodic Exploration

We now consider the problem of ergodic control subject to disturbance, as formulated in Problem 2.

Problem 2 (Ergodic control with disturbance)

Solve an optimization problem of the same form as the Bolza optimization problem with disturbance (Problem 1), but with the modified cost function

J(x0,t0,u(),d())(x(),tf)+h(x(tf))+t0tfg(x(τ),u(τ))𝑑τ.J\left(x_{0},{t_{0}},u(\cdot),d(\cdot)\right)\coloneqq\mathcal{E}(x(\cdot),{t_{f}})+h(x({t_{f}}))+\int_{{t_{0}}}^{{t_{f}}}{g(x(\tau),u(\tau))d\tau}. (10)

This problem is not strictly compatible with Bolza form, because the ergodic cost depends on the entire trajectory x()x(\cdot), not just on the terminal state x(tf)x({t_{f}}). Consequently, we reformulate the ergodic control problem in order to apply reachability analysis. To do so, we first derive an equivalent form of the ergodic metric via an augmented-state ergodic variable.

Lemma 4.1 (Augmented-state ergodic metric formulation [7, 11]).

The ergodic metric defined in (4) can be equivalently defined as

(x(),tf)\displaystyle\mathcal{E}(x(\cdot),{t_{f}}) =k𝒦(Λk(ck(x(),t)ϕk)2)\displaystyle=\sum_{k\in\mathcal{K}}{\left(\Lambda_{k}\left(c_{k}(x(\cdot),t)-\phi_{k}\right)^{2}\right)}
=1tft0k𝒦(Λkzk(tf)2)\displaystyle=\frac{1}{t_{f}-t_{0}}\sum_{k\in\mathcal{K}}{\left(\Lambda_{k}z_{k}({t_{f}})^{2}\right)} (11)

where zk()z_{k}(\cdot) is the solution to the differential equation

zk˙(t)\displaystyle\dot{z_{k}}(t) =Fk(Mx(t))ϕk\displaystyle=F_{k}(M\circ x(t))-\phi_{k} (12a)
zk(t0)\displaystyle z_{k}({t_{0}}) =0k𝒦.\displaystyle=0\ \forall k\in\mathcal{K}. (12b)
Proof 4.2.

See Appendix B.1. ∎

This formulation of the ergodic metric (11) can be used to rewrite the ergodic control problem with disturbance (Problem 2) in Bolza form.

Problem 3 (Bolza-form ergodic control with disturbance)

Given a terminal cost h:𝒳0h:\mathcal{X}\to\mathbb{R}_{\geq 0} and a running cost g:𝒳×𝒰0g:\mathcal{X}\times\mathcal{U}\to\mathbb{R}_{\geq 0}, for some choice of x0x_{0} and t0{t_{0}}, find

u()argminu()maxd(){J(x0,t0,u(),d())},u^{*}(\cdot)\coloneqq\arg\min_{u(\cdot)}\max_{d(\cdot)}\left\{J(x_{0},{t_{0}},u(\cdot),d(\cdot))\right\}, (13a)
where the cost function JJ is defined as
J(t0,x0,u(),d())=\displaystyle J\left({t_{0}},x_{0},u(\cdot),d(\cdot)\right)= 1tft0k𝒦(Λkzk(tf)2)+h(x(tf))\displaystyle\frac{1}{t_{f}-t_{0}}\sum_{k\in\mathcal{K}}{\left(\Lambda_{k}z_{k}({t_{f}})^{2}\right)}+h(x({t_{f}}))
+t0tfg(x(τ),u(τ))𝑑τ,\displaystyle+\int_{{t_{0}}}^{{t_{f}}}{g(x(\tau),u(\tau))d\tau}, (13b)
subject to the following conditions (t[t0,tf]\forall t\in[{t_{0}},{t_{f}}] and k𝒦\forall k\in\mathcal{K}, where applicable):
u(t)\displaystyle u(t) =u(x(t),z(t),t),\displaystyle=u(x(t),z(t),t), d(t)=d(x(t),z(t),t,u(t)),\displaystyle d(t)=d(x(t),z(t),t,u(t)), (13c)
x˙(t)\displaystyle\dot{x}(t) =f(x(t),u(t),d(t)),\displaystyle=f(x(t),u(t),d(t)), x(t0)=x0,\displaystyle x({t_{0}})=x_{0}, (13d)
z˙k(t)\displaystyle\dot{z}_{k}(t) =Fk(Mx(t))ϕk,\displaystyle=F_{k}(M\circ x(t))-\phi_{k},\hskip 18.06749pt zk(t0)=0.\displaystyle z_{k}({t_{0}})=0. (13e)

Note that z𝒵=Nvz\in\mathcal{Z}=\mathbb{R}^{N^{v}} is a vector of zk,k𝒦z_{k},\forall k\in\mathcal{K}.

Problem 3 is a canonical Bolza-form problem (Problem 1), which enables the application of reachability analysis.

Theorem 4.3 (Ergodic Hamilton-Jacobi-Isaac PDE).

The solution to the ergodic control problem with disturbance (Problem 3) is given by the value function V:𝒳×𝒵×[t0,tf]V:\mathcal{X}\times\mathcal{Z}\times[{t_{0}},{t_{f}}]\to\mathbb{R}, optimal controller u():𝒳×𝒵×[t0,tf]𝒰u^{*}(\cdot):\mathcal{X}\times\mathcal{Z}\times[{t_{0}},{t_{f}}]\to\mathcal{U}, and optimal disturbance d():𝒳×𝒵×[t0,tf]×𝒰𝒟d^{*}(\cdot):\mathcal{X}\times\mathcal{Z}\times[{t_{0}},{t_{f}}]\times\mathcal{U}\to\mathcal{D} such that VV satisfies

V(x,z,t)t\displaystyle-\frac{\partial V(x,z,t)}{\partial t} =minu𝒰maxd𝒟{H^(x,z,t,u,d)},where\displaystyle=\min_{u\in\mathcal{U}}\max_{d\in\mathcal{D}}\left\{\hat{H}(x,z,t,u,d)\right\},\text{where} (14a)
H^(x,z,t,u,d)\displaystyle\hat{H}(x,z,t,u,d) ={g(x,u)+(V(x,z,t)x)f(x,u,d)\displaystyle=\bigg{\{}g(x,u)+\left(\frac{\partial V(x,z,t)}{\partial x}\right)^{\intercal}f(x,u,d)
+k𝒦((V(x,z,t)zk)(Fk(M(x))ϕk))},\displaystyle\phantom{=\bigg{\{}}+\sum_{k\in\mathcal{K}}{\left(\left(\frac{\partial V(x,z,t)}{\partial z_{k}}\right)^{\intercal}\left(F_{k}(M(x))-\phi_{k}\right)\right)}\bigg{\}}, (14b)
with terminal condition
V(x,z,tf)=1tft0k𝒦(Λkzk(tf)2)+h(x),V(x,z,{t_{f}})=\frac{1}{{t_{f}}-{t_{0}}}\sum_{k\in\mathcal{K}}{\left(\Lambda_{k}z_{k}({t_{f}})^{2}\right)}+h(x), (14c)

and such that u()u^{*}(\cdot) and d()d^{*}(\cdot) satisfy

u(x,z,t)\displaystyle u^{*}(x,z,t) =argminu𝒰maxd𝒟{H^(x,z,t,u,d)},\displaystyle=\arg\min_{u\in\mathcal{U}}\max_{d\in\mathcal{D}}\left\{\hat{H}(x,z,t,u,d)\right\}, (15a)
d(x,z,t,u)\displaystyle d^{*}(x,z,t,u) =argmaxd𝒟{H^(x,z,t,u,d)}.\displaystyle=\arg\max_{d\in\mathcal{D}}\left\{\hat{H}(x,z,t,u,d)\right\}. (15b)
Proof 4.4.

(Informal) The proof follows [31] and Section 4.2 of [30]. Using the definition of the value function (Def. 3.4) with the ergodic cost function (13b), the value function at the terminal time can be shown to satisfy (14c). We then break the value function into Isaacs’ canonical principle of optimality (Eq. 2.7 of [31]), which is a minimax variant of the Bellman principle of optimality. From there, following the standard reachability derivation [30, 31] yields the remaining conditions for V,u,dV,u^{*},d^{*}. A more detailed proof is provided in Appendix B.2. ∎

We additionally prove in Appendix C that the canonical and extended-state formulations of the ergodic control problem (Problems 2 and 3, respectively) yield the same HJI equation (14). This demonstrates that the presence of extended states in the HJI equation (14) is a feature of the problem structure, not an artifact of the choice of problem formulation.

The optimal controller uu^{*} from Theorem 4.3 is robust in the sense that it satisfies the following lemmas, which are direct consequences of (13a).

Remark 4.5 (Cost guarantees).

Applying u()u^{*}(\cdot) starting at (x,z,t0)(x,z,{t_{0}}) guarantees that despite any possible disturbance d𝒟d\in\mathcal{D}, the trajectory cost will be upper-bounded by V(x,z,t0)V(x,z,{t_{0}}); furthermore, no controller can guarantee an upper bound lower than V(x,z,t0)V(x,z,{t_{0}}).

Remark 4.6 (Reachable coverage set).

If g(x,t)=0g(x,t)=0 and h(x)=0h(x)=0, then the set {(x,z)𝒳×𝒵|V(x,z,t0)ϵ}\{(x,z)\in\mathcal{X}\times\mathcal{Z}\ \rvert\ V(x,z,{t_{0}})\leq\epsilon\} is precisely the set of starting states from which the optimal controller can guarantee achieving an ergodic metric of at most ϵ\epsilon.

We solve the HJI equation using a deep neural network (DNN), following [13], and we call the resulting controller RAnGE. Implementation details are provided in Appendix A.

4.1 Practical Considerations Regarding Scalability and Robustness

The HJI PDE (14) can only be solved numerically, which raises two concerns. Firstly, the solved value function is only an approximation, and errors in the approximation could invalidate the guarantees of Theorem 4.3. This is a common issue for reachability methods; one common solution is to make the value function overly conservative, which is possible even when using neural networks [17], but this compromises the accuracy of the value function. We do not provide such conservative guarantees here, but as numerical HJI solvers improve, any approximation errors and their consequences should diminish.

Secondly, scalability is a major limitation for HJI solvers, and the extra dimensionality from the augmented state zz limits the scope of ergodic problems that can be solved. The number of added dimensions is NvN^{v}, which scales polynomially with respect to the resolution NN of the basis functions and exponentially with respect to the dimensionality vv of the exploration space \mathcal{M}. The curse of dimensionality is common to all reachability applications, and developing new solvers with better scalability is an active area of research.

Alternative formulations of reachability for ergodic search may be possible, but we do not anticipate that any such alternative formulation could avoid the issue of increased dimensionality. The ergodic metric depends on the history of the trajectory, so an optimal controller must have a way to respond to information about the history of the trajectory up to the current time. Whether this information is stored in the augmented state variable zz (as in this paper, as well as [7, 11, 12]) or some other way, it will necessarily constitute an additional input to the value function, thus increasing the dimensionality of the problem. Indeed, we show in Appendix C that extended state variables emerge from the application of reachachability analysis to ergodic exploration, even when using an ergodic formulation without extended states (Problem 2).

4.2 An Alternative Robust Ergodic Model-Predictive Controller

We additionally develop a robust ergodic model-predictive controller, which we call reMPC. This controller uses first-order gradient descent to solve a discretized version of the minimax condition of the Bolza-form ergodic optimization problem (13):222 Here, the dynamics are discretized using the explicit Euler method, but any numerical integration method could be used.

Problem 4 (Discretized Bolza-form ergodic control with disturbance)

Given a start state (x𝒳,z𝒵)(x\in\mathcal{X},z\in\mathcal{Z}), for some timestep Δt>0\Delta t\in\mathbb{R}_{>0} and time horizon TT\in\mathbb{N}, solve for the optimal control u𝒰Tu^{*}\in\mathcal{U}^{T} such that

𝐮=argmin𝐮𝒰Tmax𝐝𝒟T{τ=0Tg(𝐱τ,𝐮τ)Δt+1TΔtk𝒦(Λk(𝐳T)k2)+h(𝐱T)},\displaystyle\mathbf{u}^{*}=\arg\min_{\mathbf{u}\in\mathcal{U}^{T}}\max_{\mathbf{d}\in\mathcal{D}^{T}}\bigg{\{}\sum_{\tau=0}^{T}{g(\mathbf{x}_{\tau},\mathbf{u}_{\tau})\Delta t}+\frac{1}{T\Delta t}\sum_{k\in\mathcal{K}}{\left(\Lambda_{k}(\mathbf{z}_{T})_{k}^{2}\right)}+h(\mathbf{x}_{T})\bigg{\}}, (16)

subject to

𝐱τ\displaystyle\mathbf{x}_{\tau} =𝐱τ1+f(𝐱τ1,𝐮τ1,𝐝τ1)Δt,τ[1,2,,T]\displaystyle=\mathbf{x}_{\tau-1}+f(\mathbf{x}_{\tau-1},\mathbf{u}_{\tau-1},\mathbf{d}_{\tau-1})\Delta t,\ \forall\tau\in[1,2,\dots,T] (17a)
(𝐳τ)k\displaystyle(\mathbf{z}_{\tau})_{k} =(𝐳τ1)k+(Fk(M(𝐱τ1)ϕk)Δt,τ[1,2,,T],k𝒦\displaystyle=(\mathbf{z}_{\tau-1})_{k}+(F_{k}(M(\mathbf{x}_{\tau-1})-\phi_{k})\Delta t,\ \forall\tau\in[1,2,\dots,T],\forall k\in\mathcal{K} (17b)
(𝐱0,𝐳0)\displaystyle(\mathbf{x}_{0},\mathbf{z}_{0}) =(x,z).\displaystyle=(x,z). (17c)

The complete reMPC algorithm is given as Alg. 1. The first-order gradient descent method (lines 1-1) produces only local optima, so reMPC has no performance guarantees. However, given a long enough time horizon, if reMPC finds solutions close to the global optimum, then its performance should be similar to that of RAnGE.

Input: Starting state (x0𝒳,z0𝒵)(x_{0}\in\mathcal{X},z_{0}\in\mathcal{Z})
Input: Dynamics f:𝒳×𝒰×𝒟𝒯𝒳f:\mathcal{X}\times\mathcal{U}\times\mathcal{D}\to\mathcal{T}_{\mathcal{X}}
Input: Basis functions {Fk:,k𝒦}\{F_{k}:\mathcal{M}\to\mathbb{R},\forall k\in\mathcal{K}\}, information {ϕk,k𝒦}\{\phi_{k}\in\mathbb{R},\forall k\in\mathcal{K}\}
Input: Running cost g:𝒳×𝒰0g:\mathcal{X}\times\mathcal{U}\to\mathbb{R}_{\geq 0}, terminal cost h:𝒳0h:\mathcal{X}\to\mathbb{R}_{\geq 0}
1 function reMPC(x0𝒳,z0𝒵x_{0}\in\mathcal{X},z_{0}\in\mathcal{Z})
2       𝐮,𝐳𝟎\mathbf{u},\mathbf{z}\leftarrow\mathbf{0}
3       while controllerisrunning\mathrm{controller\ is\ running} do
4             𝐮0,,T2𝐮1,,T1\mathbf{u}_{0,\dots,T-2}\leftarrow\mathbf{u}_{1,\dots,T-1}𝐮T1𝟎\mathbf{u}_{T-1}\leftarrow\mathbf{0}
5             𝐝0,,T2𝐝1,,T1\mathbf{d}_{0,\dots,T-2}\leftarrow\mathbf{d}_{1,\dots,T-1}𝐝T1𝟎\mathbf{d}_{T-1}\leftarrow\mathbf{0}
6             for Niterations\ \mathrm{iterations} do
7                   𝐮𝐮λu(PredRollout/𝐮)\mathbf{u}\leftarrow\mathbf{u}-\lambda_{u}\cdot\left(\partial\text{\sc{PredRollout}}/\partial\mathbf{u}\right)
8                   𝐝clip(𝐝+λd(PredRollout/𝐝),𝒟)\mathbf{d}\leftarrow\text{clip}\left(\mathbf{d}+\lambda_{d}\cdot\left(\partial\text{\sc{PredRollout}}/\partial\mathbf{d}\right),\mathcal{D}\right) \triangleright Constrain d𝒟d\in\mathcal{D}
9                  
10            Apply 𝐮0\mathbf{u}_{0}
11      
12
13function PredRollout(x0𝒳,z0𝒵,𝐮𝒰T,𝐝𝒟T)\textsc{PredRollout}{(}x_{0}\in\mathcal{X},z_{0}\in\mathcal{Z},\mathbf{u}\in\mathcal{U}^{T},\mathbf{d}\in\mathcal{D}^{T}{)}
14       𝐱0x0,𝐳0z0\mathbf{x}_{0}\leftarrow x_{0},\ \mathbf{z}_{0}\leftarrow z_{0}
15       for τin[1,2,,T]\tau\mathrm{\ in\ }[1,2,\dots,T]  do
16             𝐱τ𝐱τ1+f(𝐱τ1,𝐮τ1,𝐝τ1)Δt\mathbf{x}_{\tau}\leftarrow\mathbf{x}_{\tau-1}+f(\mathbf{x}_{\tau-1},\mathbf{u}_{\tau-1},\mathbf{d}_{\tau-1})\Delta t
17             (𝐳τ)k(𝐳τ1)k+(Fk(M(𝐱τ1)ϕk)Δt,k𝒦(\mathbf{z}_{\tau})_{k}\leftarrow(\mathbf{z}_{\tau-1})_{k}+(F_{k}(M(\mathbf{x}_{\tau-1})-\phi_{k})\Delta t,\ \forall k\in\mathcal{K}
18            
19      return τ=0Tg(𝐱τ,𝐮τ)Δt+1TΔtk𝒦(Λk(𝐳T)k2)+h(𝐱T)\sum_{\tau=0}^{T}{g(\mathbf{x}_{\tau},\mathbf{u}_{\tau})\Delta t}+\frac{1}{T\Delta t}\sum_{k\in\mathcal{K}}{\left(\Lambda_{k}(\mathbf{z}_{T})_{k}^{2}\right)}+h(\mathbf{x}_{T})
20      
21
Algorithm 1 Robust Ergodic Model-Predictive Control (reMPC)

5 Results

In this section, we demonstrate the RAnGE controller for the double-integrator system moving along one axis.

Due to the computational complexities of the value function, we found that training the value function was only possible for a one-second time horizon. The results in this section were achieved by running the controller with a receding one-second time horizon, always inputting tt0t\approx{t_{0}} into the controller.333Because t=t0t={t_{0}} was on the boundary of the time interval over which the DNN was trained, a slightly larger time (t=0.02st=0.02s) in the interior of the time interval was found to provide better behavior. This modification violates the preconditions for the guarantees of Theorem 4.3 to apply, but if the HJI equation could be solved over a longer time horizon, the guarantees would apply over that time horizon.

We generate trajectories from RAnGE and compare against reMPC (Section 4.2). We also compare against a standard model-predictive controller (MPC), a type of controller that does not explicitly account for disturbance but has often been shown to perform well under external disturbance [27, 32]. We implement a conventional MPC that minimizes the Bolza-form ergodic cost function (13b) without considering disturbance.

The goal for this section is to answer the following questions:

  1. Q1:

    Can RAnGE generate robust ergodic trajectories in the presence of a disturbance?

  2. Q2:

    How does the value function evolve over the course of a trajectory?

  3. Q3:

    How does the performance of RAnGE compare to the performance of MPC and reMPC?

  4. Q4:

    Regarding its effects on controller performances, how does the worst-case disturbance generated by RAnGE compare to other types of disturbance?

The following subsections provide an overview of the problem setup and answers to the questions stated above; each subsection Ann in 5.2 below answers the corresponding question Qnn above.

5.1 Problem Setup: One-Axis Double Integrator

We use a single-axis double-integrator dynamical system to study the proposed method. The system has state x=[x1x2]x=[x_{1}\ x_{2}]^{\intercal}, control u=[u]u=\left[u\right], disturbance d=[d]d=\left[d\right], and dynamics x˙=[x2,u+d]\dot{x}=[x_{2},u+d]^{\intercal}. Here, we subject the system to the control and disturbance constraints u[umax,umax],d[dmax,dmax]u\in\left[-u_{\text{max}},u_{\text{max}}\right]\subset\mathbb{R},\ d\in\left[-d_{\text{max}},d_{\text{max}}\right]\subset\mathbb{R}. For all results in this paper, we use dmax=0.4umaxd_{\text{max}}=0.4u_{\text{max}}; performance was found to decline significantly for larger disturbance magnitudes.

We set N=6N=6, leading to an 8-dimensional system: one dimension for time, two dimensions for the double integrator state, and five dimensions for the augmented state.444N=6N=6 only necessitates five augmented state variables, because zk=0(t)=0z_{k=\textbf{0}}(t)=0 for all tt, so there is no need to compute and track this term. Note that while the plant dynamics are linear, the dynamics of the augmented state variables (12) are nonlinear, so the augmented system as a whole is nonlinear.

The ergodic problem setup was that of (14), with running cost

g(x,z,u)qk𝒦(Λkzk2)+uRu+barr(x),g(x,z,u)\coloneqq q\sum_{k\in\mathcal{K}}{\left(\Lambda_{k}z_{k}^{2}\right)}+u^{\intercal}Ru+\text{barr}(x), (18)

where qq is a nonnegative scalar weighting factor, R1×1R\in\mathbb{R}^{1\times 1} is a positive semi-definite matrix representing a weight on the control input, and barr(x)\text{barr}(x) is a barrier function that penalizes the trajectory for leaving the search area bounds on position. We introduce the augmented state zz into the running cost with the running ergodic cost term qk𝒦(Λkzk2)q\sum_{k\in\mathcal{K}}{\left(\Lambda_{k}z_{k}^{2}\right)}, which was found to improve convergence by helping the solver avoid a local minimum where V=0V=0 everywhere except close to t=tft={t_{f}}. Other than the ergodic metric, there is no additional terminal cost h(x)h(x).

Refer to caption
Figure 2: Trajectories generated by RAnGE in simulation, with worst-case disturbance, for bimodal and uniform information distributions. For each pair of plots, the left plot shows the trajectory and the direction of the disturbance, and the right plot shows the coverage achieved by the trajectory.

5.2 Simulated Results

5.2.1 A1: Robust ergodic trajectories.

A robust controller must produce ergodic trajectories even with worst-case disturbance, and with the controller from RAnGE, it is possible to calculate this worst-case disturbance (15b). In this particular problem setup, the control and disturbance are both affine, and they enter the dynamics in exactly the same way; it follows that the worst-case disturbance is always in the opposite direction of the optimal control. Thus, the worst-case disturbance can be calculated as

d=(sign(u))dmax.d^{*}=(-\mathrm{sign}(u^{*}))d_{\mathrm{max}}. (19)
Refer to caption
Figure 3: Time-Evolution of a Trajectory and the Value Function. Top row: the time-averaged spatial distribution and the information distribution. As time increases, the time-averaged trajectory statistics approach the information distribution, which is the goal of ergodic exploration. Bottom: evolution of the trajectory through phase space, superimposed on a cross-section of the value function with respect to position and velocity, where zkz_{k} are fixed to their value at that instant in the trajectory.

We demonstrate that the controller can generate robust ergodic trajectories with worst-case disturbance, as shown in Figure 2. More trajectory results are summarized in Figure 4, discussed below.

5.2.2 A2: Value function evolution.

Figure 3 shows the evolution of a trajectory, as well as a cross-section of the value function at times during the trajectory. Qualitatively, properties of the value function can be related to the problem setup. Recall that the controller seeks states where the value function is lowest. At the top right and lower left corners, the value function is high, indicating that the controller should avoid these states; this is logical, because the agent is about to leave the exploration bounds and incur a boundary cost. Elsewhere, the value function is lowest in states where the agent is at or moving toward positions it has not adequately explored, and highest in states where the agent is at or moving toward positions it has already over-explored.

5.2.3 A3: Comparison of RAnGE, MPC and reMPC.

The performance of the three controllers for two information distributions and four disturbance types is summarized in Figure 4. There is considerable variation in the performances within each trial, due in part to the limitations of the controllers’ short time horizons, but some trends emerge.

Refer to caption
Figure 4: Comparison of MPC, reMPC, and RAnGE. The performance measures are the ergodic metric (11) and cost function (13b). Note that the ergodic metric is not normalized; see Figure 3 for an intuition of the scale of \mathcal{E}. The MPC and reMPC results aggregate 64 runs with varying initial conditions; the RAnGE results aggregate from 64 runs each with two controllers trained with different random seeds. The worst-case disturbance came from RAnGE according to (15b), the uniform disturbance was sampled from [dmax,dmax][-d_{\mathrm{max}},d_{\mathrm{max}}], and the normal disturbance was sampled from a Gaussian with μ=0,σ=dmax/2\mu=0,\sigma=d_{\mathrm{max}}/2. All trials lasted for 20s, and all controllers had a 1s time horizon, so the running cost was divided by 20 when computing the trajectory cost JJ, to preserve the relative weighting of the running and terminal costs.

As expected, the worst performances by RAnGE are better than the worst performances by either of the other two methods. This is the primary theoretical advantage of RAnGE: it should have the highest floor for performance of any controller. By contrast, the best performance by the other controllers rival or exceed the performance of RAnGE, particularly with non-worst-case noise. This is consistent with the formulation of RAnGE, which is designed to optimize its worst-case performance, not its best-case performance. In situations where average performance with non-worst-case disturbance is most important, the MPC could be a better choice.

Interestingly, the reMPC controller performs its best—both absolutely and relative to the other methods—with worst-case disturbance. This is because the reMPC controller optimizes its controls under the assumption of a near-worst-case disturbance, so it is not optimized for non-worst-case disturbances. Consequently, reMPC outperforms MPC with worst-case disturbance but underperforms it otherwise.

5.2.4 A4: Comparison of Disturbances

Just as the RAnGE controller optimizes its worst-case performance, the RAnGE-generated disturbance is optimally designed to impede best-case performance by any controller. Across both information distributions, the best performance by any controller with worst-case disturbance is worse than the best performance by any controller with each of the other types of noise.

The effects of the two types of random disturbance on controller performance are very similar to the effect of no disturbance at all. This is unsurprising, because the random disturbances average to zero and thus tend to cancel out.

Refer to caption
Figure 5: Experimental setup and results. (a) We used a Bitcraze Crazyflie 2.1 quadrotor drone, and two Lighthouse cameras (not pictured) to assist with position and velocity measurement [33]. The information distribution was a hard-coded bimodal distribution, and the blocks show the modes and bounds for visualization purposes only. Bottom row: trajectories recorded from the drone, with the fans (b) turned on at constant low speed, (c) turned off, and (d) turned on at varying speeds from low to high over the course of the trajectory.

5.3 Experimental Results

We tested the controller on a Bitcraze Crazyflie 2.1 quadrotor drone [33] with fans for disturbance; the experimental setup is shown in Figure 5a. The drone was tested with no disturbance; with the fans at constant low speed; and with the fans varying between low, medium, and high speeds. Trajectories from these tests are shown in subfigures b-d of Figure 5. We were unable to implement a worst-case disturbance in the experiment because the worst-case disturbance must react instantaneously to the trajectory of the drone.

In all cases, the drone executed ergodic trajectories, as shown with the reconstructed time-averaged statistics in Figure 5. This demonstrates that the RAnGE controller is robust to disturbance and can be implemented on physical systems. The trajectory was most ergodic in the case of no disturbance, which is consistent with the results of Figure 4.

6 Discussion

In conclusion, we showed that it is possible to guarantee coverage using reachability analysis on ergodic search problems despite dynamic disturbances. We demonstrated a Bolza-form formulation for ergodic exploration using an augmented state that allows us to derive the ergodic HJI conditions of optimality. To compute solutions to the ergodic HJI conditions, we leveraged recent deep learning methods to approximate the value function. The controller was demonstrated to provide robust ergodic trajectories in the presence of dynamic disturbances, both in simulation and in experiments.

One limitation of this work is that due to the value function complexity, the HJI equation could not be solved for more complex problems (e.g., time horizons much greater than one second and exploration spaces with multiple dimensions). This limitation is a result of numerical implementation and not the fundamental theory. With advances in HJI solving techniques, future work could extend these results to longer time horizons, enabling longer guarantees on ergodic coverage trajectories. Furthermore, we seek to extend this work to higher, multidimensional search spaces and introduce dynamic obstacles in the formulation. This paper provides the theoretical foundation for performing reachability analysis on ergodic exploration, and improved solving techniques will broaden its applicability.

Appendix A Implementation for Solving the HJI Equation

Solving the Hamilton-Jacobi-Isaacs problem (14) for the value function VV is not feasible analytically, but it can be done numerically. Following [13], we employ a deep neural network (DNN) to compute a numerical solution for VV, with the sine function as the activation function. The sine activation function provides smooth gradients of the output of the neural network [21], which is necessary for computing the controller and disturbance (15). The input to the DNN is the full state (x,z,t)(x,z,t), and the output is trained to be the scalar value function VV.

The training loss \ell is defined as (θ)=B(θ)+λD(θ)\ell(\theta)=\ell_{B}(\theta)+\lambda\ell_{D}(\theta), where

B(θ)\displaystyle\ell_{B}(\theta) 1|Stf|(x,z,tf)StfVθ(x,z,tf)1tft0k𝒦(Λkzk2)h(x),\displaystyle\coloneqq\frac{1}{|S_{{t_{f}}}|}\sum_{(x,z,{t_{f}})\in S_{{t_{f}}}}{\lVert V_{\theta}(x,z,{t_{f}})-\frac{1}{{t_{f}}-{t_{0}}}\sum_{k\in\mathcal{K}}{\left(\Lambda_{k}z_{k}^{2}\right)}-h(x)\rVert}, (20a)
D(θ)\displaystyle\ell_{D}(\theta) 1|S|(x,z,t)SVθt+minu𝒰maxd𝒟{H^(x,z,t,u,d)},\displaystyle\coloneqq\frac{1}{|S|}\sum_{(x,z,t)\in S}{\Biggr{\lVert}\frac{\partial V_{\theta}}{\partial t}+\min_{u\in\mathcal{U}}\max_{d\in\mathcal{D}}\biggr{\{}\hat{H}(x,z,t,u,d)\biggl{\}}\Biggl{\rVert}}, (20b)

where VθV_{\theta} is the output of the DNN with parameters θ\theta, SS is a sampled set of state data points of the form (x,z,t)(x,z,t), and StfSS_{t_{f}}\subseteq S is the subset of SS of all state data points with t=tft={t_{f}}, and H^\hat{H} is defined in (14b).

In the loss function, B\ell_{B} is the loss associated with the terminal boundary condition (14c), so it is only considered for state data points where t=tft={t_{f}}. By contrast, D\ell_{D} is the loss associated with the HJI differential equation (14a), so it is considered for all state points sampled. The tuning parameter λ\lambda is used to adjust the weighting of the terminal and differential loss components.

The training of the DNN was performed as in [13], such that the terminal condition (14c) was learned first, and then the solution was learned expanding backwards from t=tft={t_{f}} to t=t0t={t_{0}}, following the differential equation (14a). The DNN was trained for a total of 1.5×1061.5\times 10^{6} iterations, which took an average of 247 minutes on an NVIDIA GeForce RTX 3080 GPU.

Appendix B Proofs

B.1 Proof of Lemma 4.1

Proof B.1.

Integrating the differential equation for zkz_{k} (12), recognizing that ϕk\phi_{k} is constant, produces

zk(x(),tf)=t0tfFk(Mx(t))𝑑t(tft0)ϕk.z_{k}(x(\cdot),{t_{f}})=\int_{{t_{0}}}^{{t_{f}}}{F_{k}(M\circ x(t))dt}-({t_{f}}-{t_{0}})\phi_{k}. (21)

The definition of ckc_{k} (5) can be used to simplify to

zk(x,tf)=(tft0)(ck(x(),tf)ϕk).z_{k}(x,{t_{f}})=({t_{f}}-{t_{0}})\left(c_{k}(x(\cdot),{t_{f}})-\phi_{k}\right). (22)

Finally, substituting this into (11) yields

1tft0k𝒦(Λkzk2)=k𝒦Λk(ck(x(),tf)ϕk)2.\frac{1}{{t_{f}}-{t_{0}}}\sum_{k\in\mathcal{K}}{\left(\Lambda_{k}z_{k}^{2}\right)}=\sum_{k\in\mathcal{K}}{\Lambda_{k}\left(c_{k}(x(\cdot),{t_{f}})-\phi_{k}\right)^{2}}. (23)

This completes the proof. ∎

B.2 Proof of Theorem 4.3

Proof B.2.

For an alternative proof, consult [16]. In (7), repeated here, we define the value function as the optimal cost of the remainder of the trajectory, given by

V(x,z,t1)\displaystyle V(x,z,{t_{1}}) =minu()maxd(){J(x,t1,u(),d())}.\displaystyle=\min_{u(\cdot)}\max_{d(\cdot)}\left\{J(x,{t_{1}},u(\cdot),d(\cdot)\right)\}. (24)

Using the ergodic cost function (13b), this becomes

V(x,z,t1)=\displaystyle V(x,z,{t_{1}})= minu()maxd(){t1tfg(x(τ),u(τ))dτ\displaystyle\min_{u(\cdot)}\max_{d(\cdot)}\bigg{\{}\int_{{t_{1}}}^{{t_{f}}}{g(x(\tau),u(\tau))d\tau}
+1tft0k𝒦(Λkzk(tf)2)+h(x(tf))}.\displaystyle\phantom{\min_{u(\cdot)}\max_{d(\cdot)}\bigg{\{}}+\frac{1}{{t_{f}}-{t_{0}}}\sum_{k\in\mathcal{K}}{\left(\Lambda_{k}z_{k}({t_{f}})^{2}\right)}+h(x({t_{f}}))\bigg{\}}. (25)

Note that in this and following equations, x(t),z(t),u(t)x(t),z(t),u(t) are governed by the constraints (13c)-(13e). Substituting t1=tf{t_{1}}={t_{f}} into the definition of VV (25) directly yields the HJI boundary condition

V(x,z,tf)=1tft0k𝒦(Λkzk(tf)2)+h(x(tf)).V(x,z,{t_{f}})=\frac{1}{{t_{f}}-{t_{0}}}\sum_{k\in\mathcal{K}}{\left(\Lambda_{k}z_{k}({t_{f}})^{2}\right)}+h(x({t_{f}})). (26)

Next, we define t(t1,tf)t\in({t_{1}},{t_{f}}), then split the integral in (25), yielding

V(x,z,t1)=\displaystyle V(x,z,{t_{1}})= minu()maxd(){t1tg(x(τ),u(τ))dτ+ttfg(x(τ),u(τ))dτ\displaystyle\min_{u(\cdot)}\max_{d(\cdot)}\bigg{\{}\int_{{t_{1}}}^{t}{g(x(\tau),u(\tau))d\tau}+\int_{t}^{{t_{f}}}{g(x(\tau),u(\tau))d\tau}
+1tft0k𝒦(Λkzk(tf)2)+h(x(tf))}\displaystyle\phantom{\min_{u(\cdot)}\max_{d(\cdot)}\bigg{\{}}+\frac{1}{{t_{f}}-{t_{0}}}\sum_{k\in\mathcal{K}}{\left(\Lambda_{k}z_{k}({t_{f}})^{2}\right)}+h(x({t_{f}}))\bigg{\}} (27)
=\displaystyle= minu()maxd(){t1tg(x(τ),u(τ))𝑑τ+V(x(t),z(t),t)}\displaystyle\min_{u(\cdot)}\max_{d(\cdot)}\bigg{\{}\int_{{t_{1}}}^{t}{g(x(\tau),u(\tau))d\tau}+V(x(t),z(t),t)\bigg{\}} (28)

Equation (28) is the Isaacs recursive principle of optimality (c.f. Eq. 2.7 of [31]). Taking the derivative with respect to tt yields

0=\displaystyle 0= minu(t)maxd(t){g(x(t),u(t))+Vxdx(t)dt+Vzdz(t)dt+Vt}\displaystyle\min_{u(t)}\max_{d(t)}\left\{g(x(t),u(t))+\frac{\partial V}{\partial x}\frac{dx(t)}{dt}+\frac{\partial V}{\partial z}\frac{dz(t)}{dt}+\frac{\partial V}{\partial t}\right\} (29)
Vt=\displaystyle-\frac{\partial V}{\partial t}= minu(t)maxd(t){g(x(t),u(t))+Vxdx(t)dt+Vzdz(t)dt}\displaystyle\min_{u(t)}\max_{d(t)}\left\{g(x(t),u(t))+\frac{\partial V}{\partial x}\frac{dx(t)}{dt}+\frac{\partial V}{\partial z}\frac{dz(t)}{dt}\right\} (30)

Since all terms are evaluated at time tt, we can replace x(t)x(t) with xx, and likewise for z,u,dz,u,d. We can now substitute the equations for dxdt\frac{dx}{dt} (1) and dzdt\frac{dz}{dt} (12), yielding the HJI differential condition

V(x,z,t)t\displaystyle-\frac{\partial V(x,z,t)}{\partial t} =minu𝒰maxd𝒟{g(x,u)+V(x,z,t)xf(x,u,d)\displaystyle=\min_{u\in\mathcal{U}}\max_{d\in\mathcal{D}}\bigg{\{}g(x,u)+\frac{\partial V(x,z,t)}{\partial x}f(x,u,d)
+k𝒦(V(x,z,t)zk(Fk(Mx)ϕk))}.\displaystyle\phantom{\min_{u\in\mathcal{U}}\max_{d\in\mathcal{D}}\bigg{\{}abc}+\sum_{k\in\mathcal{K}}{\left(\frac{\partial V(x,z,t)}{\partial z_{k}}\left(F_{k}(M\circ x)-\phi_{k}\right)\right)}\bigg{\}}. (31)

By inspection, it is clear that for any (x,z,t)(x,z,t), the optimal control uu and disturbance dd are precisely those that satisfy the minimax condition in (31). ∎

Appendix C Derivation of the Ergodic HJI Equation

This section presents an alternative derivation of the HJI equation for ergodic exploration, presented above as (14). Notably, although we begin from the canonical formulation of the ergodic metric, which does not include extended state variables, the resulting HJI equation will nevertheless contain extended state variables.

C.1 Problem Setup

We start with the simple cost function

J(x(),u())\displaystyle J(x(\cdot),u(\cdot))\coloneqq t0tfg(x(t),u(t))𝑑t+h(x(tf))+(x(),tf)\displaystyle\smallint_{{t_{0}}}^{{t_{f}}}{g(x(t),u(t))dt}+h(x({t_{f}}))+\mathcal{E}(x(\cdot),{t_{f}}) (32)
=\displaystyle= t0tfg(x(t),u(t))𝑑t+h(x(tf))\displaystyle\smallint_{{t_{0}}}^{{t_{f}}}{g(x(t),u(t))dt}+h(x({t_{f}}))
+kΛk(1tft0t0tfFk(Mx(t))𝑑tϕk)2\displaystyle+\sum_{k}{\Lambda_{k}\left(\frac{1}{{t_{f}}-{t_{0}}}\int_{t_{0}}^{t_{f}}F_{k}(M\circ x(t))dt-\phi_{k}\right)^{2}} (33)

First, we bring the ϕk\phi_{k} into the integral of the ergodic metric, yielding

J(x(),u())=\displaystyle J(x(\cdot),u(\cdot))= t0tfg(x(t),u(t))𝑑t+h(x(tf))\displaystyle\smallint_{{t_{0}}}^{{t_{f}}}{g(x(t),u(t))dt}+h(x({t_{f}}))
+kΛk(1tft0t0tf(Fk(Mx(t))ϕk)𝑑t)2\displaystyle+\sum_{k}{\Lambda_{k}\left(\frac{1}{{t_{f}}-{t_{0}}}\int_{t_{0}}^{t_{f}}\left(F_{k}(M\circ x(t))-\phi_{k}\right)dt\right)^{2}} (34)

For notational ease, we define ϵk(t)Fk(Mx(t))ϕk\epsilon_{k}(t)\coloneqq F_{k}(M\circ x(t))-\phi_{k}. The ergodic metric is the sum of a single squared term for each value of kk, so it can be rewritten as

(x,tf)=k(Λk(tft0)2(t0tfϵk(t)𝑑t)2),\mathcal{E}(x,{t_{f}})=\sum_{k}\left(\frac{\Lambda_{k}}{({t_{f}}-{t_{0}})^{2}}\left(\smallint_{{t_{0}}}^{{t_{f}}}{\epsilon_{k}(t)dt}\right)^{2}\right), (35)

C.2 Solving for the HJI Equation

Define times t1,t2{t_{1}},{t_{2}} such that t0<t1<t2<tf{t_{0}}<{t_{1}}<{t_{2}}<{t_{f}}. We expand the expression for the cost function (34) by substituting in the definition of the ergodic metric (35) and recognizing that t0tfϵk(t)𝑑t=t0t1ϵk(t)𝑑t+t1t2ϵk(t)𝑑t+t2tfϵk(t)𝑑t\smallint_{{t_{0}}}^{{t_{f}}}{\epsilon_{k}(t)dt}=\smallint_{{t_{0}}}^{{t_{1}}}{\epsilon_{k}(t)dt}+\smallint_{{t_{1}}}^{{t_{2}}}{\epsilon_{k}(t)dt}+\smallint_{{t_{2}}}^{{t_{f}}}{\epsilon_{k}(t)dt}, yielding

J(x(),u())=\displaystyle J(x(\cdot),u(\cdot))= t0tfg(x(t),u(t))𝑑t+h(x(tf))\displaystyle\smallint_{{t_{0}}}^{{t_{f}}}{g(x(t),u(t))dt}+h(x({t_{f}}))
+k𝒦(Λk(tft0)2(t0t1ϵk(t)𝑑t+t1t2ϵk(t)𝑑t+t2tfϵk(t)𝑑t)2).\displaystyle+\sum_{k\in\mathcal{K}}\Bigg{(}\frac{\Lambda_{k}}{({t_{f}}-{t_{0}})^{2}}\bigg{(}\smallint_{{t_{0}}}^{{t_{1}}}{\epsilon_{k}(t)dt}+\smallint_{{t_{1}}}^{{t_{2}}}{\epsilon_{k}(t)dt}+\smallint_{{t_{2}}}^{{t_{f}}}{\epsilon_{k}(t)dt}\bigg{)}^{2}\Bigg{)}. (36)

Expanding yields

J(x(),u())=\displaystyle J(x(\cdot),u(\cdot))= t0t1g(x(t),u(t))𝑑t+t1t2g(x(t),u(t))𝑑t+t2tfg(x(t),u(t))𝑑t+h(x(tf))\displaystyle\smallint_{{t_{0}}}^{{t_{1}}}{g(x(t),u(t))dt}+\smallint_{{t_{1}}}^{{t_{2}}}{g(x(t),u(t))dt}+\smallint_{{t_{2}}}^{{t_{f}}}{g(x(t),u(t))dt}+h(x({t_{f}}))
+k𝒦(Λk(tft0)2[(t0t1ϵk(t)dt)2+(t1t2ϵk(t)dt)2\displaystyle+\sum_{k\in\mathcal{K}}\Bigg{(}\frac{\Lambda_{k}}{({t_{f}}-{t_{0}})^{2}}\Bigg{[}\left(\smallint_{{t_{0}}}^{{t_{1}}}{\epsilon_{k}(t)dt}\right)^{2}+\left(\smallint_{{t_{1}}}^{{t_{2}}}{\epsilon_{k}(t)dt}\right)^{2}
+2(t0t1ϵk(t)𝑑t)(t1t2ϵk(t)𝑑t)\displaystyle\hskip 72.26999pt+2\left(\smallint_{{t_{0}}}^{{t_{1}}}{\epsilon_{k}(t)dt}\right)\left(\smallint_{{t_{1}}}^{{t_{2}}}{\epsilon_{k}(t)dt}\right)
+2[t0t1ϵk(t)𝑑t+t1t2ϵk(t)𝑑t](t2tfϵk(t)𝑑t)\displaystyle\hskip 72.26999pt+2\left[\smallint_{{t_{0}}}^{{t_{1}}}{\epsilon_{k}(t)dt}+\smallint_{{t_{1}}}^{{t_{2}}}{\epsilon_{k}(t)dt}\right]\left(\smallint_{{t_{2}}}^{{t_{f}}}{\epsilon_{k}(t)dt}\right)
+(t2tfϵk(t)dt)2])}.\displaystyle\hskip 72.26999pt+\left(\smallint_{{t_{2}}}^{{t_{f}}}{\epsilon_{k}(t)dt}\right)^{2}\Bigg{]}\Bigg{)}\Bigg{\}}. (37)

We define the cost-to-go V(x[t0,τ],τ)V(x_{[{t_{0}},\tau]},\tau) as the optimal value of the remaining running cost and the ergodic metric of entire trajectory, where the portion of the trajectory prior to t=τt=\tau is held fixed. We use the subscript [ta,tb]{\boxed{\cdot}}_{[t_{a},t_{b}]} to denote the portion of the trajectory, control, or disturbance function limited to the time range t[ta,tb]t\in[t_{a},t_{b}]. Applying this definition of VV to (37) produces

V(x[t0,t2],t2)=\displaystyle V(x_{[{t_{0}},{t_{2}}]},{t_{2}})= (t0t2ϵk(t)𝑑t)2\displaystyle\left(\smallint_{{t_{0}}}^{{t_{2}}}{\epsilon_{k}(t)dt}\right)^{2}
+minu[t2,tf]maxd[t2,tf]{t2tfg(x(t),u(t))dt+h(x(tf))\displaystyle+\min_{u_{[{t_{2}},{t_{f}}]}}{{\max_{d_{[{t_{2}},{t_{f}}]}}}}\Bigg{\{}\smallint_{{t_{2}}}^{{t_{f}}}{g(x(t),u(t))dt}+h(x({t_{f}}))
+k𝒦(Λk(tft0)2[2(t0t2ϵk(t)dt)(t2tfϵk(t)dt)\displaystyle\hskip 36.135pt+\sum_{k\in\mathcal{K}}\Bigg{(}\frac{\Lambda_{k}}{({t_{f}}-{t_{0}})^{2}}\Bigg{[}2\left(\smallint_{{t_{0}}}^{{t_{2}}}{\epsilon_{k}(t)dt}\right)\left(\smallint_{{t_{2}}}^{{t_{f}}}{\epsilon_{k}(t)dt}\right)
+(t2tfϵk(t)dt)2])}\displaystyle\hskip 108.405pt+\left(\smallint_{{t_{2}}}^{{t_{f}}}{\epsilon_{k}(t)dt}\right)^{2}\Bigg{]}\Bigg{)}\Bigg{\}} (38)

and

V(x[t0,t1],t1)=\displaystyle V(x_{[{t_{0}},{t_{1}}]},{t_{1}})= (t0t1ϵk(t)𝑑t)2\displaystyle\left(\smallint_{{t_{0}}}^{{t_{1}}}{\epsilon_{k}(t)dt}\right)^{2}
+minu[t1,tf]maxd[t1,tf]{t1t2g(x(t),u(t))dt+t2tfg(x(t),u(t))dt+h(x(tf))\displaystyle+\min_{u_{[{t_{1}},{t_{f}}]}}{{\max_{d_{[{t_{1}},{t_{f}}]}}}}\Bigg{\{}\smallint_{{t_{1}}}^{{t_{2}}}{g(x(t),u(t))dt}+\smallint_{{t_{2}}}^{{t_{f}}}{g(x(t),u(t))dt}+h(x({t_{f}}))
+k𝒦(Λk(tft0)2[(t1t2ϵk(t)dt)2\displaystyle\hskip 36.135pt+\sum_{k\in\mathcal{K}}\Bigg{(}\frac{\Lambda_{k}}{({t_{f}}-{t_{0}})^{2}}\Bigg{[}\left(\smallint_{{t_{1}}}^{{t_{2}}}{\epsilon_{k}(t)dt}\right)^{2}
+2(t0t1ϵk(t)𝑑t)(t1t2ϵk(t)𝑑t)\displaystyle\hskip 90.3375pt+2\left(\smallint_{{t_{0}}}^{{t_{1}}}{\epsilon_{k}(t)dt}\right)\left(\smallint_{{t_{1}}}^{{t_{2}}}{\epsilon_{k}(t)dt}\right)
+2[t0t1ϵk(t)𝑑t+t1t2ϵk(t)𝑑t](t2tfϵk(t)𝑑t)\displaystyle\hskip 90.3375pt+2\left[\smallint_{{t_{0}}}^{{t_{1}}}{\epsilon_{k}(t)dt}+\smallint_{{t_{1}}}^{{t_{2}}}{\epsilon_{k}(t)dt}\right]\left(\smallint_{{t_{2}}}^{{t_{f}}}{\epsilon_{k}(t)dt}\right)
+(t2tfϵk(t)dt)2])}\displaystyle\hskip 90.3375pt+\left(\smallint_{{t_{2}}}^{{t_{f}}}{\epsilon_{k}(t)dt}\right)^{2}\Bigg{]}\Bigg{)}\Bigg{\}} (39)
=\displaystyle= minu[t1,t2]maxd[t1,t2]{t1t2g(x(t),u(t))dt+(t0t2ϵk(t)dt)2\displaystyle\min_{u_{[{t_{1}},{t_{2}}]}}{{\max_{d_{[{t_{1}},{t_{2}}]}}}}\Bigg{\{}\smallint_{{t_{1}}}^{{t_{2}}}{g(x(t),u(t))dt}+\left(\smallint_{{t_{0}}}^{{t_{2}}}{\epsilon_{k}(t)dt}\right)^{2}
+minu[t2,tf]maxd[t2,tf]{t2tfg(x(t),u(t))dt+h(x(tf))\displaystyle\hskip 18.06749pt+\min_{u_{[{t_{2}},{t_{f}}]}}{{\max_{d_{[{t_{2}},{t_{f}}]}}}}\Bigg{\{}\smallint_{{t_{2}}}^{{t_{f}}}{g(x(t),u(t))dt}+h(x({t_{f}}))
+k𝒦(Λk(tft0)2[2(t0t2ϵk(t)dt)(t2tfϵk(t)dt)\displaystyle\hskip 36.135pt+\sum_{k\in\mathcal{K}}\Bigg{(}\frac{\Lambda_{k}}{({t_{f}}-{t_{0}})^{2}}\Bigg{[}2\left(\smallint_{{t_{0}}}^{{t_{2}}}{\epsilon_{k}(t)dt}\right)\left(\smallint_{{t_{2}}}^{{t_{f}}}{\epsilon_{k}(t)dt}\right)
+(t2tfϵk(t)dt)2])}}\displaystyle\hskip 108.405pt+\left(\smallint_{{t_{2}}}^{{t_{f}}}{\epsilon_{k}(t)dt}\right)^{2}\Bigg{]}\Bigg{)}\Bigg{\}}\Bigg{\}} (40)
=\displaystyle= minu[t1,t2]maxd[t1,t2]{t1t2g(x(t),u(t))𝑑t+V(x[t0,t2],t2)}.\displaystyle\min_{u_{[{t_{1}},{t_{2}}]}}{{\max_{d_{[{t_{1}},{t_{2}}]}}}}\Big{\{}\smallint_{{t_{1}}}^{{t_{2}}}{g(x(t),u(t))dt}+V(x_{[{t_{0}},{t_{2}}]},{t_{2}})\Big{\}}. (41)

The final result (41) is the recursive Isaacs principle of optimality for a problem with running cost. Note that aside from the different notations for the arguments of VV, this is equivalent to (28), from the proof of Theorem 4.3 above.

As written in (C.2), the cost-to-go VV at time τ\tau depends on the entire trajectory up to that time, x[t0,τ]x_{[{t_{0}},\tau]}. However, inspection shows that x[t0,τ]x_{[{t_{0}},\tau]} is only used in two ways:

  • The entire history x[t0,τ]x_{[{t_{0}},\tau]} is used in the calculation of t0τϵk(t)𝑑t,k𝒦\smallint_{{t_{0}}}^{\tau}{\epsilon_{k}(t)dt},\forall k\in\mathcal{K}.

  • The state x(τ)x(\tau) is necessary for rolling out the dynamics at times greater than τ\tau.

Thus, VV at time τ\tau can be written as V(x(τ),{t0τϵk(t)𝑑t,k𝒦},τ)V(x(\tau),\left\{\smallint_{{t_{0}}}^{\tau}{\epsilon_{k}(t)dt},\forall k\in\mathcal{K}\right\},\tau). Taking the limit of (41) as Δtt2t10\Delta t\coloneqq{t_{2}}-{t_{1}}\to 0 results in

V\displaystyle V (x(t1),{t0t1ϵk(t)𝑑t,k𝒦},t1)\displaystyle\left(x({t_{1}}),\left\{\smallint_{{t_{0}}}^{{t_{1}}}{\epsilon_{k}(t)dt},\forall k\in\mathcal{K}\right\},{t_{1}}\right) (42)
=minu𝒰maxd𝒟{g(x(t1),u(t1))Δt\displaystyle=\min_{u\in\mathcal{U}}\max_{d\in\mathcal{D}}\Bigg{\{}g(x({t_{1}}),u({t_{1}}))\Delta t
+V(x(t1)+f(x,u,d)Δt,{t0t1ϵk(t)dt+ϵk(t1)Δt,k𝒦},t1+Δt)}.\displaystyle\hskip 18.06749pt+V\left(x({t_{1}})+f(x,u,d)\Delta t,\left\{\smallint_{{t_{0}}}^{{t_{1}}}{\epsilon_{k}(t)dt}+\epsilon_{k}({t_{1}})\Delta t,\forall k\in\mathcal{K}\right\},{t_{1}}+\Delta t\right)\Bigg{\}}.

We use the Taylor expansion of the value function, resulting in

V\displaystyle V (x(t1),{t0t1ϵk(t)𝑑t,k𝒦},t1)\displaystyle\left(x({t_{1}}),\left\{\smallint_{{t_{0}}}^{{t_{1}}}{\epsilon_{k}(t)dt},\forall k\in\mathcal{K}\right\},{t_{1}}\right) (43)
=minu𝒰maxd𝒟{g(x(t1),u(t1))Δt+V(x(t1),{t0t1ϵk(t)dt,k𝒦},t1)\displaystyle=\min_{u\in\mathcal{U}}\max_{d\in\mathcal{D}}\Bigg{\{}g(x({t_{1}}),u({t_{1}}))\Delta t+V\left(x({t_{1}}),\left\{\smallint_{{t_{0}}}^{{t_{1}}}{\epsilon_{k}(t)dt},\forall k\in\mathcal{K}\right\},{t_{1}}\right)
+Vxf(x,u,d)Δt+k𝒦(V(t0tϵk(t)𝑑t)ϵk)Δt+VtΔt+o(Δt2)},\displaystyle\hskip 18.06749pt+\frac{\partial V}{\partial x}f(x,u,d)\Delta t+\sum_{k\in\mathcal{K}}{\left(\frac{\partial V}{\partial\left(\smallint_{{t_{0}}}^{t}{\epsilon_{k}(t)dt}\right)}\epsilon_{k}\right)}\Delta t+\frac{\partial V}{\partial t}\Delta t+o(\Delta t^{2})\Bigg{\}},

where o(Δt2)o(\Delta t^{2}) is a term that approaches 0 much faster than Δt\Delta t as Δt0\Delta t\to 0. Dividing by Δt\Delta t and then taking the limit as Δt0\Delta t\to 0 yields

0=minu𝒰maxd𝒟{g(x,u)+Vxf(x,u,d)+k𝒦(V(t0tϵk(t)𝑑t)ϵk)+Vt}.0=\min_{u\in\mathcal{U}}\max_{d\in\mathcal{D}}\Bigg{\{}g(x,u)+\frac{\partial V}{\partial x}f(x,u,d)+\sum_{k\in\mathcal{K}}{\left(\frac{\partial V}{\partial\left(\smallint_{{t_{0}}}^{t}{\epsilon_{k}(t)dt}\right)}\epsilon_{k}\right)}+\frac{\partial V}{\partial t}\Bigg{\}}. (44)

We recognize now that t0τϵk(t)𝑑t\smallint_{{t_{0}}}^{\tau}{\epsilon_{k}(t)dt} is precisely equivalent to zk(τ)z_{k}(\tau). We refrained from making this substitution earlier in order to emphasize that this derivation does not depend on any properties of zkz_{k} proved elsewhere in the paper. Making this substitution, expanding ϵk\epsilon_{k}, and rearranging produces the differential condition

Vt=minu𝒰maxd𝒟{\displaystyle-\frac{\partial V}{\partial t}=\min_{u\in\mathcal{U}}\max_{d\in\mathcal{D}}\Bigg{\{} g(x,u)+Vxf(x,u,d)\displaystyle g(x,u)+\frac{\partial V}{\partial x}f(x,u,d)
+k𝒦(Vzk(Fk(M(x))ϕk))}.\displaystyle+\sum_{k\in\mathcal{K}}{\left(\frac{\partial V}{\partial z_{k}}\left(F_{k}(M(x))-\phi_{k}\right)\right)}\Bigg{\}}. (45)

Finally, evaluating the non-recursive formulation of the value function (C.2) at t=tft={t_{f}} yields the terminal condition

V(x,z,tf)=k𝒦(Λk(tft0)2zk2)+h(x),x𝒳,z𝒵V(x,z,{t_{f}})=\sum_{k\in\mathcal{K}}{\left(\frac{\Lambda_{k}}{({t_{f}}-{t_{0}})^{2}}z_{k}^{2}\right)}+h(x),\ \forall x\in\mathcal{X},\forall z\in\mathcal{Z} (46)

Taken together, (45) and (46) constitute a PDE that defines the value function, and they are identical to the HJI PDE (14) derived from the extended-state formulation of the ergodic control problem (Problem 3).

References

  • [1] T. Tomic, K. Schmid, P. Lutz, A. Domel, M. Kassecker, E. Mair, I. L. Grixa, F. Ruess, M. Suppa, and D. Burschka, “Toward a fully autonomous UAV: Research platform for indoor and outdoor urban search and rescue,” IEEE Robotics & Automation Magazine, vol. 19, no. 3, pp. 46–56, 2012.
  • [2] D. C. Schedl, I. Kurmi, and O. Bimber, “An autonomous drone for search and rescue in forests using airborne optical sectioning,” Science Robotics, vol. 6, no. 55, p. eabg1188, 2021.
  • [3] S. Nuske, S. Choudhury, S. Jain, A. Chambers, L. Yoder, S. Scherer, L. Chamberlain, H. Cover, and S. Singh, “Autonomous exploration and motion planning for an unmanned aerial vehicle navigating rivers,” Journal of Field Robotics, vol. 32, no. 8, pp. 1141–1162, 2015.
  • [4] H. Qin, Z. Meng, W. Meng, X. Chen, H. Sun, F. Lin, and M. H. Ang, “Autonomous exploration and mapping system using heterogeneous UAVs and UGVs in GPS-denied environments,” IEEE Transactions on Vehicular Technology, vol. 68, no. 2, pp. 1339–1350, 2019.
  • [5] J. G. Serna, F. Vanegas, F. Gonzalez, and D. Flannery, “A review of current approaches for UAV autonomous mission planning for Mars biosignatures detection,” in 2020 IEEE Aerospace Conference, pp. 1–15, 2020.
  • [6] L. Dressel and M. J. Kochenderfer, “On the optimality of ergodic trajectories for information gathering tasks,” in 2018 Annual American Control Conference (ACC), pp. 1855–1861, IEEE, 2018.
  • [7] G. Mathew and I. Mezić, “Metrics for ergodicity and design of ergodic dynamics for multi-agent systems,” Physica D: Nonlinear Phenomena, vol. 240, no. 4, pp. 432–442, 2011.
  • [8] J. Lygeros, “On reachability and minimum cost optimal control,” Automatica, vol. 40, no. 6, pp. 917–927, 2004.
  • [9] S. Bansal, M. Chen, S. Herbert, and C. J. Tomlin, “Hamilton-Jacobi reachability: A brief overview and recent advances,” in 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2242–2253, IEEE, 2017.
  • [10] M. Chen, S. L. Herbert, M. S. Vashishtha, S. Bansal, and C. J. Tomlin, “Decomposition of reachable sets and tubes for a class of nonlinear systems,” IEEE Transactions on Automatic Control, vol. 63, no. 11, pp. 3675–3688, 2018.
  • [11] G. De La Torre, K. Flasskamp, A. Prabhakar, and T. D. Murphey, “Ergodic exploration with stochastic sensor dynamics,” in 2016 American Control Conference (ACC), pp. 2971–2976, IEEE, 2016.
  • [12] D. Dong, H. Berger, and I. Abraham, “Time optimal ergodic search,” in 2023 Robotics: Science and Systems (RSS), 2023.
  • [13] S. Bansal and C. J. Tomlin, “DeepReach: A deep learning approach to high-dimensional reachability,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824, IEEE, 2021.
  • [14] Y. S. Shao, C. Chen, S. Kousik, and R. Vasudevan, “Reachability-based trajectory safeguard (RTS): A safe and fast reinforcement learning safety layer for continuous control,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3663–3670, 2021.
  • [15] K. D. Julian and M. J. Kochenderfer, “Guaranteeing safety for neural network-based aircraft collision avoidance systems,” in 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC), pp. 1–10, IEEE, 2019.
  • [16] L. C. Evans and P. E. Souganidis, “Differential games and representation formulas for solutions of Hamilton-Jacobi-Isaacs equations,” Indiana University Mathematics Journal, vol. 33, no. 5, pp. 773–797, 1984.
  • [17] F. Jiang, G. Chou, M. Chen, and C. J. Tomlin, “Using neural networks to compute approximate and guaranteed feasible Hamilton-Jacobi-Bellman PDE solutions,” 2017.
  • [18] S. L. Herbert, M. Chen, S. Han, S. Bansal, J. F. Fisac, and C. J. Tomlin, “FaSTrack: A modular framework for fast and guaranteed safe motion planning,” in 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 1517–1522, IEEE, 2017.
  • [19] A. Majumdar and R. Tedrake, “Funnel libraries for real-time robust feedback motion planning,” The International Journal of Robotics Research, vol. 36, no. 8, pp. 947–982, 2017.
  • [20] C. He, Z. Gong, M. Chen, and S. Herbert, “Efficient and guaranteed Hamilton–Jacobi reachability via self-contained subsystem decomposition and admissible control sets,” IEEE Control Systems Letters, vol. 7, pp. 3824–3829, 2023.
  • [21] V. Sitzmann, J. N. P. Martel, A. W. Bergman, D. B. Lindell, and G. Wetzstein, “Implicit neural representations with periodic activation functions,” 2020.
  • [22] T. Nakamura-Zimmerer, Q. Gong, and W. Kang, “Adaptive deep learning for high-dimensional Hamilton-Jacobi-Bellman equations,” SIAM Journal on Scientific Computing, vol. 43, no. 2, pp. A1221–A1247, 2021.
  • [23] S. Lin, “Computer solutions of the traveling salesman problem,” The Bell System Technical Journal, vol. 44, no. 10, pp. 2245–2269, 1965.
  • [24] D. L. Miller and J. F. Pekny, “Exact solution of large asymmetric traveling salesman problems,” Science, vol. 251, no. 4995, pp. 754–761, 1991.
  • [25] J. Cortes, S. Martinez, T. Karatas, and F. Bullo, “Coverage control for mobile sensing networks,” IEEE Transactions on Robotics and Automation, vol. 20, no. 2, pp. 243–255, 2004.
  • [26] J. Chen and P. Dames, “Distributed and collision-free coverage control of a team of mobile sensors using the convex uncertain Voronoi diagram,” in 2020 American Control Conference (ACC), pp. 5307–5313, 2020.
  • [27] H. Salman, E. Ayvali, and H. Choset, “Multi-agent ergodic coverage with obstacle avoidance,” Proceedings of the International Conference on Automated Planning and Scheduling, vol. 27, pp. 242–249, 2017.
  • [28] C. Lerch, D. Dong, and I. Abraham, “Safety-critical ergodic exploration in cluttered environments via control barrier functions,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 10205–10211, 2023.
  • [29] S. E. Scott, T. C. Redd, L. Kuznetsov, I. Mezić, and C. K. Jones, “Capturing deviation from ergodicity at different scales,” Physica D: Nonlinear Phenomena, vol. 238, no. 16, pp. 1668–1679, 2009.
  • [30] R. Isaacs, Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization. Wiley, 1965.
  • [31] E. Galperin, “The Isaacs equation for differential games, totally optimal fields of trajectories and related problems,” Computers & Mathematics with Applications, vol. 55, no. 6, pp. 1333–1362, 2008.
  • [32] L. M. Miller, Y. Silverman, M. A. MacIver, and T. D. Murphey, “Ergodic exploration of distributed information,” IEEE Transactions on Robotics, vol. 32, no. 1, pp. 36–52, 2016.
  • [33] J. A. Preiss*, W. Hönig*, G. S. Sukhatme, and N. Ayanian, “Crazyswarm: A large nano-quadcopter swarm,” in IEEE International Conference on Robotics and Automation (ICRA), pp. 3299–3304, IEEE, 2017. Software available at https://github.com/USC-ACTLab/crazyswarm.