Fabrics: A Foundationally Stable Medium for Encoding Prior Experience
Abstract
Most physical systems have dynamics functions that are just a nuisance to policies. Torque policies, for instance, usually have to effectively invert the natural classical mechanical dynamics to get their job done. Because of this, we often use controllers to make things easier on policies. For instance, inverse dynamics controllers wipe out the physical dynamics so the policy starts from a clean slate. That makes learning easier, but still the policy needs to learn everything about the problem, including aspects of a solution which are common to many other problems, such as how to make the end-effector move in a straight line, how to avoid joints and self collisions, how to avoid obstacles, etc. Over the past few years it’s become standard to formulate learning not in C-space, but in end-effector space and use controllers such as Operational Space Control (OSC) to capture some of these commonalities. These controllers, whether inverse dynamics or OSC, reshape the natural dynamics of the system into a different second-order dynamical system whose behavior is more useful. And the trend is, the more useful behavior we can pack into these reshaped systems, the easier it is to learn policies.
However, OSC is from the 80’s, and captures only straight line end-effector motion. There’s a lot more behavior we could and should be packing into these systems. Earlier work [15, 16, 19] developed a theory that generalized these ideas and constructed a broad and flexible class of second-order dynamical systems which was simultaneously expressive enough to capture substantial behavior (such as that listed above), and maintained the types of stability properties that make OSC and controllers like it a good foundation for policy design and learning. This paper, motivated by the empirical success of the types of fabrics used in [20], reformulates the theory of fabrics into a form that’s more general and easier to apply to policy learning problems. We focus on the stability properties that make fabrics a good foundation for policy synthesis. Fabrics create a fundamentally stable medium within which a policy can operate; they influence the system’s behavior without preventing it from achieving tasks within its constraints. When a fabrics is geometric (path consistent) we can interpret the fabric as forming a road network of paths that the system wants to follow at constant speed absent a forcing policy, giving geometric intuition to its role as a prior. The policy operating over the geometric fabric acts to modulate speed and steers the system from one road to the next as it accomplishes its task.
We reformulate the theory of fabrics here rigorously and develop theoretical results characterizing system behavior and illuminating how to design these systems, while also emphasizing intuition throughout.
I Introduction
Policies all operate on underlying system dynamics: what the robot wants to do absent external control. These dynamics can be as straightforward as the underlying classical mechanical dynamics of the robot, where the system’s inertia defines its Riemannian geometry, the network of paths the system would travel along absent gravity and frictions (see [16] for a discussion of the connection between classical mechanics and Riemannian geometry). That mechanical system of paths, though, is often irrelevant to tasks and a hindrance to achieving a desired behavior.
Therefore, control systems often work to reshape that geometry into something more relevant, or at least less disruptive. For instance, inverse dynamics control [17] removes the geometry entirely, replacing it with a Euclidean geometry in C-space (a blank slate), such that additional controllers can generate a desired behavior without competing with the native system geometry.
Operational space control [9] builds upon this idea and not only clears the geometry, but also reshapes it into something more relevant to the task. Specifically, it replaces the physical geometry with a different Riemannian geometry where geodesics move the end-effector in straight lines. Policies then build off that more useful geometry to define useful task behavior. Since tasks are often more easily described in the end-effector space, this starting geometry is highly relevant to many problems—it encodes useful prior information.
However, operational space control captures only a small fraction of the commonalities among tasks. Most tasks, for instance, require some form of obstacle awareness, such as avoidance or attraction toward a surface (e.g. grasping). Moreover, robots generally avoid joint limits and self-collisions, and approach targets from a particular direction (e.g. approach a table orthogonal to a surface when touching it). Many of these behavioral elements can and should be factored out and encoded into the reshaping controller itself, and ideally we should extract these common behavioral components from data.
In this work, we formalize this concept of an underlying behavior shaping controller into what we call fabrics. We define fabrics rigorously as conservative autonomous second-order differential equations, show how to construct them by energizing a generating system, and thoroughly characterize their intrinsic stability properties. This theory supports the type of fabric design used in recent fabric learning work such as [20] and gives the theoretical foundations for encoding prior information into an underlying fabric and training policies over the top of it.
When the fabric has a particular geometric path consistency property, we call it a geometric fabric. This path consistency gives the fabric a speed invariant road network of paths that guide the system around obstacles and other constraints and broadly encode important prior information. Policies navigating these fabrics generally follow the network of paths and need only choose when and how strongly to push the system from one path to another and how to regulate energy along the way. We provide a number of theoretical statements characterizing energy regulation and system convergence, including convergence to desired goals (the zero set of a forcing term). We tailor the theory to providing insight into the design and training of fabrics and the policies residing on them.
Importantly, these shared fabrics, especially those learned from data [20], constitute well-informed priors on behavior. We discuss this perspective throughout this work. Generally, when we us the term prior, we mean it in the broader sense than purely Bayesian probabilistic. We simply mean it acts as a way to inject prior experience into the system to improve the sample efficiency of training policies (including the manual design of policies which is, itself, an information theoretic learning process—an iterative process resulting in a policy expected to generalize to novel situations.) Note that fabrics can be used to develop probabilistic priors by running stochastic policies over them to generate distributions of trajectories. But the underlying fabric itself, which captures the essence of the encoded information in its geometry, is not probabilistic.
I-A Related work
Since robots are physical systems, their dynamics are well-understood and governed by the Euler-Lagrange equation, as characterized in any number of introductory robotics text books [7, 17]. The mathematics of these systems is sophisticated, with Lagrangian symmetries giving rise to conserved quantities such as energy conservation [18], and control theorists have exploited those mathematical properties thoroughly for the stable design of complex nonlinear controllers by reshaping the systems into different, more favorable, dynamical systems [1, 4]. These fundamental equations have also been used in the creation of modern robot simulators like [11, 12], which are critical tools for designing or learning robot control policies.
In classical systems, the Euler-Lagrange equations decompose into two parts. One part is the closed system’s inertial equations. This component can be shown to be geometric in nature in the sense that it produces speed-invariant paths through space [4, 16]. The system, under the influence of only its inertia, follows the same path regardless of speed (or more generally accelerations along the direction of motion). These geometries are Riemannian, and the system’s mass matrix is the Riemannian metric (see [16] for a derivation). The second part includes additional forcing and damping terms. Both of these components are required to accurately model the complex, nonlinear physical phenomena in the real world. Reshaping these models into useful behavioral systems within the same class of classical mechanical systems (Riemannian geometries) is common [2, 3, 4, 9], and interestingly, Riemannian geodesics have also been shown to model large segments of human motion in [10, 13], indicating that energy efficient human motion can be largely captured by following geometric paths. However, these Riemannian systems are fundamentally limited in their expressivity for two reasons: their metrics can only be a function of position (no velocity), and the metric plays a double role of both defining the geometry of paths itself and specifying how one sub-system weights together with another. Broader classes of second-order differential equations, such as Riemannian Motion Policies (RMPs) for motion generation in [14], aren’t limited in these ways and have been shown empirically to have high-capacity for representing intricate behaviors, but are less well understood.
Recently, [19] generalized classical mechanical models to what are called geometric fabrics, building off a type of system termed a bent Finsler systems, to expand the modeling capacity of these types of systems specifically to remove those above limitations. Geometric fabrics capture the flexibility of RMPs while being provably stable and maintaining a form of (non-Riemannian) geometric path consistency, building off the mathematics of Finsler and Spray geometry [16]. With bent Finsler systems, behavioral designers were able to engineer policies that can outperform both the classical mechanical systems of geometric control and RMPs. These systems were also shown to outperform linear dynamical systems (such as Dynamic Movement Primitives, DMPs), RMPs, and a variety of baseline neural architectures like Long Short-Term Memory (LSTM) networks in learning contexts [20]. These systems superficially resemble artificial potential fields [8], but are built to enable the design of the geometry rather than the potential function, which improves their regularity and dramatically boosts performance in practice. Behavior can be written directly into the underlying road network of paths, reshaping the system geometry, rather than relying on the potential function to push the system (fight) against a less relevant natural geometry.
Independently, Bylard et al [5] developed what are called Pullback Bundle Dynamical Systems (PBDS) as a rigorously covariant version of the Geometric Dynamical Systems (GDS) developed earlier in [6]. They approached the problem as developing Riemannian metrics on the tangent bundle (space of positions and velocities) of a given manifold. While rigorous, those systems are analogous in their representational capacity to the Lagrangian fabrics outlined in [15] and from the analysis in [19], it’s now know they lack the flexibility to independently represent both the geometry of paths and the metric independent of one another. For that reason, PBDS rely heavily on non-geometric potential functions similar to the standard potential shaping techniques of geometric control [4]. The perspective we develop here builds from the results on nonlinear (spray) geometries of paths detailed in [16], and requires less technical machinery than developing metrics directly on the tangent bundle. Finsler fabrics and more broadly Lagrangian fabrics are analogous to PBDS, but geometric fabrics are a fundamentally more expressive class of systems.
All of these earlier works, while elegant in their generalization of classical mechanical or Riemannian systems, often require complex tensor calculations, such as evaluating the Euler-Lagrange equation, to fully follow the theory. This complexity introduces challenges, especially when learning is involved. Here, we reformulate these systems in a way that’s more intuitive, easier to handle both conceptually and implementationally, and emphasizes the role they play as fundamentally stable mediums for guiding policies.
I-B A note on generalized notation
Often a classical mechanical, (bent) Finsler, RMP system, etc. takes the form , and when forced by some potential function , and damped by a dissipating term , we get
(1) |
The resulting acceleration is
(2) | ||||
(3) |
where and . That first term is conservative and often geometric (and/or unbiased in the sense that it won’t push the system away from rest) and the second term both forces away from and regulates the injection and dissipation of energy.
In this work, we address general decomposed systems of the form given in Equation 3. We use to denote a fabric (conservative term) and the to denote a forcing term that pushes against the fabric and regulates energy. The tilde denotes that the term is conservative (a fabric), to distinguish it from a generator that creates a fabric through energization (see Definition II.9 and Lemma II.10).
This decomposition is very general and covers many systems, including the ones above. If has an associated system metric , it’s often useful to think of forcing policies as force functions such as which are transformed by the system metric into the forcing term such as in Equation 2. Note that strong Eigen-directions of trim away components of the force. In that sense, defines the system priorities, intuitively defining which directions in space important to and which directions that aren’t.
I-C Overview
We begin in Section II with a series of results characterizing the fundamental stability of fabrics as a medium for policies to operate on. Throughout, we define and develop the theory of fabrics generally but also detail the important role path consistency plays in the more specific case of geometric fabrics which form a concrete road network of paths for policies to operate across.
We start by defining fabrics to be conservative second-order autonomous differential equations in Definition II.1 and show that energy conservation, by itself, gives the fabric important stability properties. Terminologically, in Definition II.4, we decompose the full system into , where is the fabric and is the policy, and together they form a forced system. This terminology derives from the force form where is a symmetric positive definite system metric and we have the relations and . Here is called a force policy. Intuitively, a system traveling along a fabric will always maintain constant energy if the policy does nothing, and the policy can always simply dissipate energy to come to a stop. Over a bounded period of time, a bounded policy can only inject a finite amount of energy into the system, so it always has the means to easily bring the system back to rest. Fabrics, in that sense, innately form a fundamentally stable medium across which the policy operates.
We show in Lemma II.10 that we can always transform any given second-order autonomous system into a fabric simply by speeding up or slowing down along the direction of motion, and Definition II.9 gives a specific energization transform that does that. Importantly, Proposition II.13 then shows that if the underlying generating system is geometric (path consistent), energization stabilizes it without changing the collection of paths (since it operates entirely by accelerating along the direction of motion which is known to leave paths unchanged in geometric systems).
Then Proposition II.16 shows that any policy operating across a geometric fabric can be decomposed into a zero-work energy-preserving term which bends (or steers) the paths without changing the energy, and an energy regulation term which modulates speed along the direction of motion without changing the path. All policies thereby act to simply modulate the underlying fabric’s energy while steering the system. When training policies, one can potentially exploit this observation to define data efficient policy parameterizations.
Section II finishes with a discussion of convergence to the zero set of the forcing policy. Broadly, there are many cases where goals can be characterized by zero sets of some vector field. For instance, the local minima of a potential are the zero sets of its gradient. A forcing policy is a vector field that vanishes when it no longer wants to move the system, so the zero set of the forcing policy is a good characterization of the policy’s goals. Proposition II.17 presents some general conditions under which the forced system converges to the zero set. One of those conditions is the practical statement that if the system (with bounded accelerations) converges, it must converge to the zero set. That comes from the simple observation that the fabric is conservative and therefore wouldn’t itself push the system from rest (zero energy). So if it comes to reset at the zero set, neither the fabric nor the policy wants it to move from there. It’s often straightforward to design convergent systems that dissipate energy properly to bring the system to rest at a zero set, so even if we can’t otherwise prove global convergence of the system, we can design practically convergent systems which are guaranteed to be at the policy’s zero set when they converge. Moreover, these observations suggest that given a goal, we can parameterize the policy to ensure the policy is zero if and only if it’s at the goal. Then a training system needs only learn how to modulate energy effectively to converge nicely to that zero set.
Section III moves into a more complete discussion of theoretical conditions on energy regulation. Propositions III.1 and III.3 give some policy parameterizations for which we can guarantee bounded energy and a natural form of energy regulation. The main result of this section is Theorem III.5, which gives a specific energy regulation formula under which any forced fabric can be guaranteed to converge to the zero set of the policy provided there exists what we call a compatible potential which we use to guide the energy regulation.
Section IV gives a final stability analysis for a common case where the fabric has a corresponding system metric and is being forced by a damped potential function. This setting is similar to the geometric fabric setting of [19], but more general. Importantly, we allow the underlying geometric fabric to be arbitrary and paired with any system metric. It’s typically much easier to design and implement such systems than the bent Finsler geometries described in [19].
Finally, we summarize the takeaways in Section VI.
II The Fundamental Stability of Fabrics
Fabrics are stable autonomous second-order differential equations that can form well-informed priors on policies by encoding behavioral information common across many tasks. Individual control policies use fabrics by navigating across them. In this section, we define fabrics and characterize their utility and fundamental stability.
Throughout this work we use the multivariate calculus notational conventions outlined in [15]. Note that in earlier work we built in specific conditions to handle boundary conformance for manifolds with a boundary. Those boundary conditions often require systems to be unbounded (e.g. accelerations or metrics approach infinity), which is impractical for real-world implementation and numerical integration. More recently, [19] described how to integrate explicit hard constraints into the definition of systems like the ones we consider here; constraint forces effectively fold into the forcing policy making them conceptually simpler. In many cases practical implementations use terms that are softer and better conditioned to smoothly avoid constraints (e.g. added potential function in the framework of [19]). We, therefore, cover only the unconstrained setting here, and refer the reader to [19] for details on how to incorporate hard constraints.
Definition II.1 (Fabrics).
An autonomous differential equation is a fabric if it conserves a Finsler energy .
This definition states that a fabrics is simply a conservative second-order autonomous differential equation. That conservation property is what makes the fabric a nice stable medium for policy design. The following Lemma shows that the fabric itself doesn’t attempt to push a system from rest. This property will enable policies to reliably navigate the fabric and converge to any given desired goal.
Lemma II.2.
If is a fabric, then .
Proof.
Let be the fabric’s conserved Finsler energy. Finsler energies can be written where (see [16]), so if and only if . If at time , by continuity, there exists an such that at time . But that would mean the energy changes which contradicts the fabrics conservation property. Therefore, . ∎
Remark II.3.
Lemma II.2 shows that fabrics as defined in Definition II.1 are unbiased in the sense that they can influence the system’s behavior while in motion, but vanish when the system stops. In other words, a system at rest remains at rest, allowing convergence regions to be entirely governed by the zero sets of other forcing terms (see Definition III.6 in [15] for a precise description.)
Definition II.4 (Navigating across fabrics).
Let be a finite second-order differential equation. is called a navigation policy when added to a fabric to form the system
(4) |
We often say navigates across . When the context is clear, we often refer to it simply as the policy.
We often describe the system in Equation 4 as a forced system because of it’s relation to forcing policies as defined next.
Definition II.5 (Forcing policies).
In many cases, there is a relevant positive-definite system metric that can be used to shape navigating term (see Equation 2 for the intuition). In that case, we usually write the system in its force form
(5) |
where and is an external force. The navigation term is then constructed using a forcing policy denoted , matching standard policy notation. Since the metric is invertible, there is a one-to-one correspondence between forcing policy and navigation term with . Again, when the context is clear, we often refer to it simply as the policy.
The following lemma collects together some previously proven results that characterize the energy conservation properties of fabrics.
Lemma II.6 (Properties of fabric energies).
Let be a fabric with Finsler energy . The Hamiltonian has the property and its time derivative takes the form where are the Euler-Lagrange equations of with and . The fabric conserves so it has the property .
Proof.
These results are proven in [19], with the final fabric property following from conservation of energy. ∎
We use these properties to prove the following theorem which shows that fabrics are fundamentally stable in the sense that the energy of a forced system is bounded at any given time and can always be dissipated to bring the system to rest.
Theorem II.7 (Fundamental stability of fabrics).
Let be a fabric with Finsler energy . If is a finite navigation policy, the corresponding forced system has finite energy after a finite time and will come to rest if the navigating term is set to , where is any positive-definite damping matrix.
Proof.
By Lemma II.6 , so for our system we have
(6) | ||||
(7) | ||||
(8) | ||||
(9) |
since . This is the work done by on the system. The total work gives the energy after seconds as
(10) |
which is finite.
Choosing after seconds gives energy change
(11) | ||||
(12) |
Since is lower bounded, which means both and as . ∎
Remark II.8.
Given any (finite) autonomous second-order differential equation , we can always accelerate along the direction of motion strategically to ensure any given Finsler energy is conserved. The following definition characterizes how to do that.
Definition II.9 (Energization).
Let be a finite autonomous second-order differential equation, and let be an energy. The energized system is the transformed system defined as
(13) | |||
(14) |
This system transformation, which we call energization, turns any into a fabric by making it conservative. Note that the energy can be any Lagrangian, although it’s common for that energy to be more specifically a Finsler energy.
Lemma II.10.
Let be a finite autonomous second-order differential equation, and let be a Finsler energy. The energized system conserves and is therefore a fabric. We call the generator of a fabric constructed in this way.
Proof.
We show that the energized system conserves . By Lemma II.6 , so after energization the time rate of change of is
∎
In general, energization may change the behavior of a system since the path traced by a system often changes when the system speeds up or slows down. (E.g. an orbiting satellite will fall to earth if it slows and shoot out to space if it speeds up.) The following proposition characterizes the class of systems whose behavior is unaffected by energization.
Definition II.11.
A system is Homogeneous of Degree 2 (HD2) if for .
Remark II.12.
An HD2 system modulates its accelerations in just the right way to maintain its path, independent of speed. If the system were constrained to follow a given path, speeding up by a factor of would induce accelerations times higher to maintain the path. An HD2 system has this scaling property built in to make its integral curves trace speed invariant paths. This speed invariance is a defining property of geometries [16].
The next proposition characterizes the class of path consistent fabrics constructed by HD2 generators.
Proposition II.13 (Geometric Fabrics).
Let be an HD2 generator, and let be a Finsler energy. The paths traced by the fabric match those of . Moreover, the energized system is also HD2 so trace the same paths as its HD2 generator for any time varying . Fabrics constructed this way are called geometric fabrics and the class of geometric fabrics is the unique class of path consistent fabrics.
Proof.
A property of HD2 systems is that they can accelerate along the direction of motion arbitrarily without changing the system’s path [16]. The energization transformation is defined as for a particular choice of . Therefore, the energized system is path consistent. Similarly, adding another term is also an acceleration along the direction of motion, so the paths remain consistent.
Examining the system under the specific energization coefficient, we see
(15) | ||||
(16) |
where . is HD0 (independent of velocity norm), and is HD2 (see [16] for a discussion of these properties). is also HD0 since is and there are two factors of in both the numerator and denominator. Therefore, the energized system is HD2 since is HD2.
We prove uniqueness by contradiction. Suppose is a geometric fabrics but is not HD2. (If it is HD2, then it can be constructed as described above.) Then there exists a where for some . That means for that state and that , the integral curve starting at will deviate from the integral curve starting at after some finite time. Therefore, it can’t be geometric which is a contradiction since it’s a geometric fabric. ∎
Remark II.14.
The bent Finsler systems described in [19], which can be characterized as generalizations of classical mechanical systems, are geometric fabrics as defined in Proposition II.13. The definition here, though, is broader and easier to work with in practice than the earlier definition. In bent Finsler systems, metrics must be defined by Finsler energies, requiring the application of Euler-Lagrange equations which can be computationally complex and challenging to implement. Under our definition here, allows metrics to be arbitrary HD0 positive semi-definite matrices dramatically simplifying design. The Finsler energy is still used for energization of the HD2 geometry generator, but it can remain simply since it needs only define the desired measure of speed, not the metrics. This simpler setup was already used in [20] and is especially helpful where automatic differentiation is involved.
Corollary II.15.
A forced geometric fabric of the form with positive real valued asymptotically comes to rest without deviating from the paths of the underlying HD2 system .
Proof.
We can think of geometric fabrics as forming a road network of paths through space. Without a navigation policy the system simply follows the nominal paths of the underlying fabric. The navigation policy then operates over the top of that nominal behavior, pushing the system from its current path to neighboring paths as needed. Similar to long-distant travel, when traveling to a distant goal, if the network of roads is well-designed, the navigation policy needs only set the system onto the right road up front, potentially do some minor switching of roads en route, and then, once close, pull the system off the major road networks to converge locally to the goal. Well-designed geometric fabrics can therefore significantly simplify the navigation policy. In that sense, they constitute a well-informed prior on behavior.
The following Proposition shows that with geometric fabrics we can always view a navigation policy as a combination of zero-work steering (where the path bends but the energy remains constant) and speeding up or slowing down along the direction of motion (more precisely, path invariant energy regulation).
Proposition II.16.
Let be a geometric fabric and let be a navigation policy. Then the forced equation can be written
(17) |
for . The first term is a fabric which we call the steered fabric, and the second term is an energy regulator. We can also write this system as
(18) |
where with and is a projection matrix.
Proof.
Equation 17 can be proven by expansion:
(19) | ||||
The system in Equation 17 shows that we can absorb the navigation term into the fabric thereby exposing a separate energy regulation term . With the system conserves energy while acts to steer the fabric, hence the name. With the system will slow to a stop. The second form given in Equation 18 shows we can also view the energy regulation as applying to the original fabric. Since the fabric is geometric and the energy regulation is an acceleration along the direction of motion, the energy increases or decreases but the path doesn’t change. On top of that, the term steers the system without affecting the energy.
This section show that fabrics form a stable medium for system navigation. Navigation across the fabric can be described either as a navigation policy operating directly on the fabric accelerations or, when there is a system metric, as a force policy pushing against the fabric. Geometric fabrics, in particular, can be viewed as a road network of paths the system can travel along without any effort. We can view navigation across the fabric as a combination of energy regulation (injecting or dissipating energy) and energy invariant steering. In the case of geometric fabrics, the energy regulation doesn’t change the network of paths.
The following proposition gives insight into the behavior of forced fabrics and helps guide design. It states that if we can get the system to converge with sufficient finite damping, it will converge to the zero set (goal) of the navigation policy. Strategically, we can increase the damping to slow the system as needed to give the navigation policy more influence over the behavior. And if we can prove the navigation policy converges on its own (or equivalently when forcing the Euclidean fabric), then we can construct a modified navigation policy that’s guaranteed to converge to the desired goal.
These results can guide policy design, although it’s far from a complete characterization of convergence or stable policies. In many cases, we might learn a policy over a given fabric, for instance using RL. Convergence and stability are more complex in this setting, but fabrics make it easier to safely explore and find performance stable and convergent solutions.
Proposition II.17 (Convergence).
Let be a fabric and let be a bounded navigation policy with zero set . Let denote the forced system. Then if converges, it converges to .
Proof.
By Lemma II.2, for all . Therefore, at convergence, , which implies . ∎
Proposition II.17 shows that training navigation policies can be a powerful design choice. If we enforce through structural choices the desired zero set of the navigation policy and train the policy to successfully converge, then we’re guaranteed that it converges to the correct goal.
When does not necessarily converge on its own (e.g. it may require additional damping), Theorem III.5 gives an explicit class of energy regulators that will guarantee convergence to in the case where there exists a compatible potential.
III Energy Regulation
This next proposition characterizes how to regulate energy within a given range using an energy regularizer while using a navigation policy to both modulate system energy and steer. When driven by , the system increases energy (speeds up) to a maximum energy level then maintains that energy as long as is pushing the system forward. If the system is moving against it removes energy (slows down). Examples of when this second case may occur are (1) the system is moving the wrong way, e.g. away from a goal; (2) the system is approaching a goal and includes sufficient damping to bring it to rest at the goal. In both cases, the energy regularization is removed and acts to slow the system.
Proposition III.1 (Energy Capping 1).
Let be a fabric with Finsler energy and let be a navigation policy. Design a regularized system of the form
(20) |
where is the energy tensor of and is positive definite, and choose
(21) |
where , , and is a desired energy cap. Then the regularized system has the following energy properties:
-
1.
Bounded energy: .
-
2.
Energy increases when moving with : When , we have with equality only when either or .
-
3.
Energy decreases when moving against : When , we have .
-
4.
Energy rates of change are instantaneously the same with and without the fabric, and the regularizing damper only decreases energy: .
Proof.
The energy derivative is
(22) | ||||
since by the conservation property of . Choosing per Equation 21, we have two case. If , then and
(23) |
This case proves property 3.
The second case is, if , then
(24) |
and
(25) | ||||
(26) |
We can make two observations:
-
1.
When , and , so .
-
2.
When , and , so
(27) so .
To prove property 2, we note only when and . And when either factor in Equation 26 is zero, which means either or . The latter condition implies and .
Property 1 follows by noting that at so would be a contradiction.
Finally, property 4 derives from the simple observation that the contribution from to drops out in line 22 because is conservative. And only removes energy with its contribution being since is positive definite.
∎
Remark III.2.
The use of in the denominator of Equation 21 makes it robust at . The specific profile of defines how moves between (to fully cap the energy with when ) and when (equiv. ).
Proposition III.3 (Energy Capping 2).
Let be a fabric with energy and let be a navigation policy. Design a regularized system of the form
(28) |
and choose
(29) |
where , , , , and
(30) |
Such a system will have bounded energy for all time.
Proof.
The energy time derivative is
(31) | ||||
(32) | ||||
(33) |
since by the conservation property of . In general, can perform work on the system, changing its energy levels. However, system energy will ultimately be bounded given that can become equal to 1 arbitrarily, and certainly by design. Whenever , the energy time derivative becomes
(34) | ||||
(35) |
If , then system energy is conserved, and if , then energy is dissipated. In essence, can monitor the system energy and decide how much work can be done by , which results in shifting energy levels that are ultimately bounded by . ∎
Within the preset boundary conditions, can behave arbitrarily, fluctuating the system energy. can therefore be learned from experience, enabling it to modulate system energy advantageously. In parallel, can also be learned, promoting dynamic braking. Note, if and persists, then system energy will decrease resulting in as . Note, this does not imply that as well, but rather, the system can controllably come to rest regardless of . Finally, robustness to numerical issues when leveraging this design for when can be obtained via the strategies in Section V.
To effectively regulate the energy of a navigation fabric to guarantee convergence to the navigation policy’s zero set, we need a measure of progress toward that zero set. That measure of progress can be given by a potential function that’s compatible with the navigation policy in the sense that it’s negative gradient generally points in the same direction as the policy’s vector field and is (locally) minimized at the policy’s zero set.
Definition III.4 (Compatible potential).
Let be a navigation policy. We say a potential function is compatible with if if and only if and wherever (equiv. ).
The next theorem prescribes how to regulate the energy of a navigation fabric given a compatible potential.
Theorem III.5.
Let be a fabric with generator and Finsler energy , and let be a navigation policy with compatible potential . Denote the total energy by . The system with energy regulator
(36) |
converges to the zero set of for .
Proof of Theorem III.5.
The total energy of the energized system is conserved by definition, and we will show that with damping it’s minimized and the system comes to rest. We then show that at convergence the compatibility conditions between potential and perturbation field ensure that at convergence .
The time derivative of the total energy is:
(37) |
where are the equations of motion of defined by the Euler-Lagrange equation (see [19] for a derivation). We assume is bounded in a finite region and strictly positive definite everywhere; in particular, it doesn’t vanish or reduce rank as . To derive energization, we take the system
(38) |
and solve for the which makes (i.e. calculate the acceleration along the direction of motion needed to conserve energy). Plugging Eq. 38 into Eq. 37, setting to zero, and solving for gives:
(39) | ||||
(40) | ||||
(41) |
The of Equation 41 by definition makes the undamped equations in 38 conserve the Hamiltonian , therefore the damped equations
(42) |
for decreases energy at a rate
(43) | ||||
(44) |
Since is strictly positive definite, this final expression is less than for all and 0 for . Since is always decreasing but also lower bounded, we know that its rate of decrease must converge to zero (it stops decreasing at some point). Therefore, which means and hence .
Plugging from Equation 40 into the system in Equation 42 and taking the limit with gives
(45) | ||||
(46) |
Here collects the terms in parentheses from the second line which vanish in the limit with as , and we write because it’s the negative gradient that has positive inner product with per the compatibility conditions. On left-hand-side we have , so it’s the rest of the terms in Equation 46 we need to analyze in the limit as . Note that has two factors of in both the numerator and the denominator. Since is bounded and doesn’t vanish in the limit, it limits to a projection operator
(47) |
where is the limiting direction of motion as the system comes to a stop. This notation allows us write Equation 46 as
(48) | ||||
(49) |
The matrix has nullspace since
(50) | |||
(51) |
Likewise, is rank-1 with column space spanned by , so and must be linearly independent when they’re both nonzero.
We’ll prove by contradiction. and are orthogonal so for Equation 49 to hold, they must both be zero. If , then since we must have . And since , we must have either that or which implies . Both of these contradict the compatibility conditions. Therefore, . ∎
One simple way to leverage Theorem III.5 is to choose a potential whose zero set characterizes the goal and then define so that it’s compatible with by construction. For instance, the following would be compatible:
(52) |
The first term is the soft normalized negative gradient, and the second is a damper.
IV Forcing energized fabrics
Here we analyze forcing an arbitrary fabric term using a forcing term pushing against a system metric of the type described in Section I-B Equation 3. This a case is more specific than the general energy regulation settings discussed in Section III, but it’s an important and common one used, for instance, in [20]. The forcing term in this case takes the form
(53) |
where is an arbitrary positive definite system metric and is an arbitrary positive semi-definite damping matrix.
can be an arbitrary fabric. For instance, we may construct a transform tree, populate its spaces with arbitrary specs, and pull them back into the root. The resulting spec defines a differential equation with acceleration . That can then be used to generate the fabric by energization. The matrix defines the system metric which we use to define the forcing term given in Equation 53. If the individual specs on the transform tree are themselves geometric (the metrics are HD0 and the policies are HD2), the resulting fabric is a geometric fabric. Importantly, the metrics don’t need to be Finsler (deviating from the theory of [19]), just HD0.
The following theorem shows that these systems are stable and convergent to the logical minimum of a potential function with appropriate choice of damping.
Theorem IV.1.
Let be a fabric with positive-definite system metric , and let be a potential function. Then we can always find a finite positive definite damping matrix such that the system
(54) |
converges. And at convergence, by Proposition II.17 is at a local minimum.
Proof.
Suppose our system is
(55) |
with as given by Equation 53, , and where is the energization coefficient with respect to some energy . Our proof follows a standard Lyapunov analysis. We design our Lyapunov function as
(56) |
The time derivative of the Lyapunov function is
(57) |
Plugging in from Equation 55 above yields
(58) |
Rearranging and canceling terms reduces the expression to
(59) |
We now write as the sum of a term designed to remove and and a residual . I.e. with
(60) |
so that
(61) |
We assume that , , and are designed such that the residual is bounded. Substituting into 59 gives
(62) |
Regrouping yields
(63) | ||||
The first group of terms vanish by the design of , so we get
(64) |
We combine the two damping terms to produce
(65) |
This equation can now be upper-bounded via the Rayleigh-Ritz theorem as
(66) |
where is the maximum eigenvalue of and is the minimum eigenvalue of . Via the design of and a sufficiently large , we can enforce that yielding
(67) |
where . We now invoke LaSalle’s invariant set theorem to give as . This implies as , and consequently, as well. This ultimately guarantees that the system will come to rest at a minimum of . ∎
V Numerical Considerations
The mathematical definition of energization given in Definition II.9 has a numerical instability at . The following definition gives two robust variants that can be used for practical implementation. The choice of which to use depends on the properties of the generator being energized as discussed below.
Definition V.1.
Let be an autonomous second-order differential equation, and let be an energy. The vanishing energization transform is defined as
(68) | |||
(69) |
for where . This variant smoothly reduces to zero as avoiding numerical instability and ambiguity at . Another variant which we call the robust energization transform additionally preserves the unbiased property of energization while resolving numerical issues:
(70) |
where is some function that diminishes to zero as with length scale . For instance, is a common choice.
The vanishing energization transform is the same as the standard energization transform aside from the in the denominator. When the generator is unbiased (zero at ), this transformed system is also unbiased. The robust energization transform is useful when energizing a biased generator to create an unbiased system. It explicitly includes the term to ensure the resulting system is zero at (unbiased).
VI Conclusions
This paper reformulates fabrics to focus on their fundamental stability as a medium for policies to operate across. The fabric creates a nominal prior behavior which guides the policy. The policy then steers across the system and regulates energy. When the fabric is geometric, it forms a well-defined road network of paths that the system wants to follow. This reformulation is more intuitive than previous formulations, while subsuming those formulations, making the fabrics both flexible and easier to use in practice, particularly for learning applications.
References
- Behal et al. [2009] Aman Behal, Warren Dixon, Darren M Dawson, and Bin Xian. Lyapunov-based control of robotic systems. CRC Press, 2009.
- Bloch et al. [2000] A.M. Bloch, N.E. Leonard, and J.E. Marsden. Controlled lagrangians and the stabilization of mechanical systems. i. the first matching theorem. IEEE Transactions on Automatic Control, 45(12):2253–2270, 2000. doi: 10.1109/9.895562.
- Bloch et al. [2001] A.M. Bloch, Dong Eui Chang, N.E. Leonard, and J.E. Marsden. Controlled lagrangians and the stabilization of mechanical systems. ii. potential shaping. IEEE Transactions on Automatic Control, 46(10):1556–1571, 2001. doi: 10.1109/9.956051.
- Bullo and Lewis [2019] Francesco Bullo and Andrew D Lewis. Geometric control of mechanical systems: modeling, analysis, and design for simple mechanical control systems, volume 49. Springer, 2019.
- Bylard et al. [2021] Andrew Bylard, Riccardo Bonalli, and Marco Pavone. Composable Geometric Motion Policies using Multi-Task Pullback Bundle Dynamical Systems. In International conference on Robotics and Automation (ICRA 2021), Xi’an, China, May 2021. URL https://hal.science/hal-03467542.
- Cheng et al. [2018] Ching-An Cheng, Mustafa Mukadam, Jan Issac, Stan Birchfield, Dieter Fox, Byron Boots, and Nathan Ratliff. Rmpflow: A computational graph for automatic motion policy generation. In International Workshop on the Algorithmic Foundations of Robotics, pages 441–457. Springer, 2018.
- Craig [2005] John J Craig. Introduction to robotics: mechanics and control. Pearson Educacion, 2005.
- Khatib [1986] Oussama Khatib. Real-time obstacle avoidance for manipulators and mobile robots. The International Journal of Robotics Research, 5(1):90–98, 1986.
- Khatib [1987] Oussama Khatib. A unified approach for motion and force control of robot manipulators: The operational space formulation. IEEE Journal on Robotics and Automation, 3(1):43–53, 1987.
- Klein et al. [2022] Holger Klein, Noémie Jaquier, Andre Meixner, and Tamim Asfour. A riemannian take on human motion analysis and retargeting. arXiv preprint arXiv:2208.01372, 2022.
- Macklin et al. [2019] Miles Macklin, Kenny Erleben, Matthias Müller, Nuttapong Chentanez, Stefan Jeschke, and Viktor Makoviychuk. Non-smooth newton methods for deformable multi-body dynamics. ACM Transactions on Graphics (TOG), 38(5):1–20, 2019.
- Makoviychuk et al. [2021] Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021.
- Neilson et al. [2015] Peter D Neilson, Megan D Neilson, and Robin T Bye. A riemannian geometry theory of human movement: The geodesic synergy hypothesis. Human movement science, 44:42–72, 2015.
- Ratliff et al. [2018] Nathan D Ratliff, Jan Issac, Daniel Kappler, Stan Birchfield, and Dieter Fox. Riemannian motion policies. arXiv preprint arXiv:1801.02854, 2018.
- Ratliff et al. [2020] Nathan D. Ratliff, Karl Van Wyk, Mandy Xie, Anqi Li, and Asif Muhammad Rana. Optimization fabrics for behavioral design. arXiv:2010.15676 [cs.RO], 2020.
- Ratliff et al. [2021] Nathan D. Ratliff, Karl Van Wyk, Mandy Xie, Anqi Li, and Muhammad Asif Rana. Generalized nonlinear and finsler geometry for robotics. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 10206–10212, 2021. doi: 10.1109/ICRA48506.2021.9561543.
- Spong et al. [2006] Mark W Spong, Seth Hutchinson, Mathukumalli Vidyasagar, et al. Robot modeling and control, volume 3. Wiley New York, 2006.
- Taylor [2005] J.R. Taylor. Classical Mechanics. G - Reference,Information and Interdisciplinary Subjects Series. University Science Books, 2005. ISBN 9781891389221. URL https://books.google.com/books?id=P1kCtNr-pJsC.
- Van Wyk et al. [2022] Karl Van Wyk, Man Xie, Anqi Li, Muhammad Asif Rana, Buck Babich, Bryan Peele, Qian Wan, Iretiayo Akinola, Balakumar Sundaralingam, Dieter Fox, et al. Geometric fabrics: Generalizing classical mechanics to capture the physics of behavior. IEEE Robotics and Automation Letters, 2022.
- Xie et al. [2022] Mandy Xie, Karl Van Wyk, Ankur Handa, Stephen Tyree, Dieter Fox, Harish Ravichandar, and Nathan D. Ratliff. Neural geometric fabrics: efficiently learning high-dimensional policies from demonstration. In Conference on robot learning, 2022.