This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Task Generalization with Stability Guarantees via
Elastic Dynamical System Motion Policies

Tianyu Li
University of Pennsylvania
[email protected]
&Nadia Figueroa
University of Pennsylvania
[email protected]
Abstract

Dynamical System (DS) based Learning from Demonstration (LfD) allows learning of reactive motion policies with stability and convergence guarantees from a few trajectories. Yet, current DS learning techniques lack the flexibility to generalize to new task instances as they ignore explicit task parameters that inherently change the underlying trajectories. In this work, we propose Elastic-DS, a novel DS learning, and generalization approach that embeds task parameters into the Gaussian Mixture Model (GMM) based Linear Parameter Varying (LPV) DS formulation. Central to our approach is the Elastic-GMM, a GMM constrained to SE(3) task-relevant frames. Given a new task instance/context, the Elastic-GMM is transformed with Laplacian Editing and used to re-estimate the LPV-DS policy. Elastic-DS is compositional in nature and can be used to construct flexible multi-step tasks. We showcase its strength on a myriad of simulated and real-robot experiments while preserving desirable control-theoretic guarantees. Supplementary videos can be found at https://sites.google.com/view/elastic-ds.

Keywords: Stable Dynamical Systems, Reactive Motion Policies, Learning from Demonstrations, Task Parametrization, Task Generalization

1 Introduction

With advanced development in robotics and autonomous systems in the past decades, the opportunities and demands for more complex physical human-robot interaction (pHRI) in our everyday unconstrained environments are rising; thus, it is critical for robots to be adaptive, compliant, reactive, safe and easy to program [1, 2, 3]. In many cases, robots will need to acquire new skills to satisfy task requirements in an ever-changing environment. It is usually difficult for non-experts to program robots for complex motion tasks and even tedious for experts to reprogram them when task requirements change. A straightforward and intuitive approach for robots to develop new skills is through Learning from Demonstration (LfD) [4, 5, 6, 7, 8]. This paradigm allows robots to acquire skills, typically encoded or defined in literature as action policies, motion policies, or imitation policies, directly from motion examples provided by humans or even other robots, mirroring a teacher-student relationship.

In recent years, significant progress has been made in using LfD to learn complex and diverse motion tasks. However, many focused on learning and executing tasks from static or unchanged scenarios/environments/contexts, which could lead to failures when faced with out-of-distribution cases. From the machine learning perspective, this is the covariate shift issue that exists in many supervised learning related tasks, especially in the behavior cloning (BC) approach [7, 9]. By providing a fixed training dataset beforehand, the LfD algorithm will learn a policy that performs well for the training dataset but could fail to generalize to unseen input during deployment. The learned policy will become invalid due to the change of distribution. Hence, instead of memorizing human demonstrations for one scenario, the robot should be able to adapt and generalize to novel scenarios with satisfactory performance, given the same task objective.

Refer to caption
(a) Executing the learned DS
Refer to caption
(b) Trajectory for the original learned policy
Refer to caption
(c) Generalize the learned policy to new configuration
Refer to caption
(d) Trajectory for the new policy
Figure 1: Task generalization for bookshelf stacking with Elastic-DS. (a-b) Given a single Demonstration, the Elastic-DS can efficiently learn and reproduce and (c-d) efficiently adapt to position and orientation changes in task parameters. Our Elastic-DS approach generalizes seamlessly and efficiently to new task parameter configurations while retaining stability and convergence guarantees.

The Trilemma - Generalization or adaptation abilities are particularly important to enable robots to perform effectively in dynamic environments. There are many attempts at generalization with methods like BC [10], Inverse RL (IRL) [11, 12, 13], Meta-Learning [14], Multi-Task Learning [15], Transfer Learning [16], Multi-Task Reinforcement Learning [17], Lifelong Learning [18, 19, 20], and continual learning [21]. However, the aforementioned approaches have no emphasis on providing control-theoretical guarantees on the learned policies, such as stability, boundedness, and convergence, all of which are critical for safe pHRI. On the other hand, the Dynamical System-based (DS) motion policy approach [3] offers many advantages such as reactivity, motion-level adaptation, and, most importantly, stability guarantees; and can be learned from only handful of demonstrations ensuring minimal human effort [22, 23]. However, due to the closed-form and offline learning nature of DS-based motion policies, they have no flexibility for generalizing to novel environments as they ignore explicit task parameters that inherently change the underlying trajectories that shape the DS vector fields. This limits their generalization capability and adoption as low-level policies. Hence, invoking the no free lunch theorem, we posit that the state-of-the-art currently suffers from the generalization vs. stability vs. effort trilemma.

Goal In this work, we seek to alleviate this trilemma by proposing an LfD approach that has i) stability guarantees, ii) the flexibility to generalize motions across novel scenarios, while iii) requiring minimal human effort during learning and adaptation/generalization, as depicted in Figure 1.

Related Work The number of techniques that exist for LfD/IL is vast [6, 4, 7, 8]. This work follows the BC approach, which learns a policy that maps states (state-action pairs, trajectories, and other contexts are also used) to control inputs [24]. DAgger [25] addressed distribution shift issues with online interactions/corrections. Whereas the generative adversarial learning framework [26] randomly explores for corrections that bring the policy close to the demonstrated distribution. These approaches offer generalization in terms of distribution shift but require either constant human effort or lots of data and computation and hold no control-theoretic guarantees on the learned policy. The recently introduced TaSIL (Taylor Series IL) framework introduces a simple augmentation to the BC loss such that the trained policy is robust to distribution deviations by ensuring incremental input-to-state stability, also benefiting from reduced sample-complexity [27]. Nevertheless, it cannot generalize to novel task instances/environments not seen during training. A Probably Approximately Correct (PAC)-Bayes IL framework was introduced in [10] that computes upper bounds on the expected cost of policies in novel environments. Prior works also focus on explicit skill generalization for novel environments or task instances, such as multitasking learning [28, 29, 30, 31], meta-learning [14, 32]. While capable of generalizing learned tasks to different environments, these works require a considerable amount of offline/online training and require excessively large DNNs, which cannot be used in a reactive manner nor offer any form of control-theoretic guarantees.

A significant body of work tackles the generalization problem by emphasizing task parametrization (TP) as relevant task frames in SE(3)SE(3) assigned to relevant objects in demonstrations, like TP-GMM [33, 34], TP-DMP [35], task invariants [36, 37] and environmental constraints [38]. Other works focus on the motion policy parameter perspective, such as adapting explicit start and goal positions in movement primitives [39], conditioning on probability distributions like the Probabilistic Movement Primitives (ProMPs) for different via-points [40] and geometric descriptor [41]. While such works have demonstrated the ability for task generalization, few provide real-time reactive motion and stability guarantees, and most of them rely on the availability of demonstrations in different contexts or environments to extract the relevant task parameters for generalization.

Approach We propose a zero-shot approach for generalizing motion policies to novel scenarios while guaranteeing control-theoretic properties for safe deployment in pHRI. To achieve stability, we adopt the DS-based LfD paradigm [3] that learns motion policies as time-invariant nonlinear DS with Lyapunov stability guarantees. To achieve generalization, we follow the task-parametrization perspective [33, 34, 35] and propose to embed relevant task frames directly into the DS policy. While several neural network (NN) based formulations for stable DS motion policies exist, such as neurally imprinted vector fields [42], DNN via contrastive learning [43], diffusion models [44], and euclideanizing flows [45]; given their black-box nature it is not straightforward to embed such task parameters into these formulations. Further, NN approaches need multiple demonstrations to encode the stable DS properly. To achieve minimal data, compute and human effort during learning, we adopt the Gaussian Mixture Model (GMM) based Linear Parameter Varying (LPV-DS) formulation [22, 3] which has shown to be computationally efficient and capable of learning stable vector fields of complex motions from single demonstration [23]. Finally, we take inspiration from elastic bands [46] and trajectory editing [47] and propose a novel approach to generalize the GMM-based LPV-DS to novel scenarios without new demonstrations, referred to as Elastic-DS.

Contributions We introduce the Elastic-DS formulation as a solution to the LfD trilemma (See Figure 2). The Elastic-DS is constrained to a set of task parameters described as geometric descriptors representing the invariant features of a task (e.g., object, via-point, or target configurations). It is capable of efficiently generating novel DS policies upon task parameter changes without requiring new data or human input for single, multi-step tasks and the composition of new tasks via DS stitching.

2 Problem Statement

Let 𝒟:={{ξt,n,ξ˙t,n}t=1Tn}n=1N\mathcal{D}:=\{\{\xi_{t,n},\dot{\xi}_{t,n}\}_{t=1}^{T_{n}}\}_{n=1}^{N} be a set of NN demonstration trajectories collected from kinesthetic teaching for a task, where ξtd\xi_{t}\in\mathbb{R}^{d}, ξ˙t,nd\dot{\xi}_{t,n}\in\mathbb{R}^{d} represent the kinematic robot state and velocity vectors at time tt, respectively, for the n-th trajectory with length TnT_{n}. In this work, we consider ξtd\xi_{t}\in\mathbb{R}^{d} to be the end-effector Cartesian position. Let ξ˙=f(ξ)\dot{\xi}=f(\xi) be a first-order DS that describes a motion policy in n\mathbb{R}^{n} state space. Given 𝒟\mathcal{D}, the goal is to infer f(ξ):nnf(\xi):\mathbb{R}^{n}\rightarrow\mathbb{R}^{n} such that any point ξ\xi in the state space leads to a stable attractor ξ\xi^{*}, with f(ξ)f(\xi) described by a set of parameters Θ\Theta and ξ\xi^{*}

ξ˙=f(ξ;Θ,ξ)limtξξ=0.\dot{\xi}=f(\xi;\Theta,\xi^{*})\Rightarrow\lim_{t\to\infty}\|\xi-\xi^{*}\|=0. (1)

Usually, such DS motion policies are learned in an offline manner and fixed for the execution phase [23]. The main contribution of this work is to introduce an update stage that parametrizes the system further, allowing the motion policies to adapt and generalize to new tasks, as shown in Figure 2.

Task Parameters A task is defined as a combination of multiple trajectories where each of them is grounded on a geometric constraint descriptor set Oi={oenter,oexit}SE(3)2O_{i}=\{o_{enter},o_{exit}\}\in SE(3)^{2}, describing the two endpoint poses. Let βid×(k+1)\beta_{i}\in\mathbb{R}^{d\times(k+1)} be generalization parameters in the state space conditioned on the geometric descriptors OiO_{i}. As OiO_{i} changes, βi\beta_{i} will change accordingly to generate a new set of DS to reach OiO_{i} with the correct poses. In this work, the geometric descriptors OiO_{i} is assumed to be known or given during demonstrations. However, it could come from various upstream sources such as human specifications [23] or generative segmentation algorithms [48].

Motion Policy We propose the following motion policy for task-parameterized generalization:

ξ˙=i=1Mδ(ξ,Oi)fi(ξ;Θi,βi,ξi)\dot{\xi}=\sum_{i=1}^{M}\delta(\xi,O_{i})f_{i}(\xi;\Theta_{i},\beta_{i},\xi_{i}^{*}) (2)

where δ(ξ,Oi)\delta(\xi,O_{i}) is an activation function determining the sequence of execution for MM DSs describing a multi-step sequential tasks. Hence, given 𝒟\mathcal{D} with the same behavior and MM geometric descriptor OiO_{i} configurations, our approach finds βi\beta_{i} that will generate new DS motion policies with i) stability guarantees with respect to their corresponding attractors ξi\xi_{i}^{*}, and ii) the flexibility to achieve the same task with new geometric constraint descriptor configurations.

Refer to caption
(a) Original
Refer to caption
(b) Adapt 1
Refer to caption
(c) Adapt 2
Refer to caption
(d) Adapt 3
Refer to caption
(e) Elastic-DS Learning and Control Flowchart
Figure 2: (a) Elastic-DS reproducing a stable LPV-DS vector field on the originally demonstrated data. (b-d) Elastic-DS generating stable LPV-DS motion policies from peg attractor changes (translation and rotation) without new demonstration. (e) Flowchart of Elastic-DS approach.

3 Preliminaries: 𝒫𝒞\mathcal{PC}-GMM and LPV-DS Motion Policy [22]

The GMM-based LPV-DS [22] motion policy has the following formulation,

ξ˙=f(ξ)=k=1Kγk(ξ)(Akξ+bk)s.t.{(Ak)TP+PAk=Qk,Qk=(Qk)T0bk=Akξ\dot{\xi}=f(\xi)=\sum_{k=1}^{K}\gamma_{k}(\xi)\left(A^{k}\xi+b^{k}\right)\quad\text{s.t.}\,\,\left\{\begin{array}[]{l}\left(A^{k}\right)^{T}P+PA^{k}=Q^{k},Q^{k}=\left(Q^{k}\right)^{T}\prec 0\\ b^{k}=-A^{k}\xi^{*}\end{array}\right.

(3)

where γk(ξ)\gamma_{k}(\xi) is the state-dependent mixing function that quantifies the weight of each linear time-invariant (LTI) system (Akξ+bk)(A^{k}\xi+b^{k}). 𝒩(ξ|θk)\mathcal{N}({\xi}|\theta_{k}) denotes the probability of observation ξ{\xi} from the kk-th Gaussian component parametrized by θk={μk,Σk}\theta_{k}=\{\mu_{k},\Sigma_{k}\}, and πk\pi_{k} represents the prior probability of an observation from this particular component and the a posteriori probability is

γk(ξ)=πk𝒩(ξ|θk)j=1πj𝒩(ξ|θj)fromp(ξ|{πk,θk})=k=1Kπk𝒩(ξ|μk,Σk).\gamma_{k}({\xi})=\frac{\pi_{k}\mathcal{N}({\xi}|\theta_{k})}{\sum_{j=1}\pi_{j}\mathcal{N}(\xi|\theta_{j})}\quad\text{from}\quad p\left(\xi|\{\pi_{k},\theta_{k}\}\right)=\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\xi|\mu_{k},\Sigma_{k}). (4)

Intuitively, Eq. 9 fits a mixture of linear DS to a complex non-linear trajectory, with γk(x)\gamma_{k}(x) ensuring the smoothness of the reproduced trajectories. Hence, each Gaussian component must be placed on quasi-linear segments of 𝒟\mathcal{D}. With 𝒫𝒞\mathcal{PC}-GMM, the optimal number of linear DS AkA^{k} for a given 𝒟\mathcal{D} can be automatically inferred.

Stability To guarantee global asymptotic stability of Eq. 9, a Lyapunov function V(x)=(xx)TP(xx)V(x)=(x-x^{*})^{T}P(x-x^{*}) with P=PT0P=P^{T}\succ 0, is used to derive the stability constraints in Eq. 9. Minimizing the fitting error of Eq. 9 with respect to demonstrations subject to constraints in Eq. 9 yields a non-linear DS with a stability guarantee via a Semi-Definite Program (SDP) [22]. Implementation details are provided in Appendix A.

4 Elastic-DS

4.1 Elastic-GMM

To adapt generalization parameters βi\beta_{i} for the corresponding spatial change in the geometric descriptors 𝒪i\mathcal{O}_{i}, we introduce Elastic-GMM as the core component of augmenting LPV-DS (Eq. 9) into Elastic-DS (Eq. 2). Figure 2e shows the pipeline of Elastic-DS. During the training stage, we use 𝒫𝒞\mathcal{PC}-GMM [22] to obtain a set of initial Gaussian parameters {πk,θk0}\{\pi_{k},\theta_{k}^{0}\} as well as the initial βi0\beta_{i}^{0}. The update stage will produce the updated Gaussian parameters {πk,θk}\{\pi_{k},\theta_{k}\} as well as the updated βi\beta_{i}, which are key way-points in the state space. A trajectory will be generated based on the key points to specify the velocity for the DS after the update. With the new Gaussian parameters and the new velocity information, an updated LPV-DS can be learned. If further updates happen in the environment, we can directly update LPV-DS using the transform without re-estimating the GMM, which avoids a time-consuming stage. During the execution, a passive-DS controller [3] will take the newest LPV-DS output velocity ξ˙\dot{\xi} to generate the corresponding joint torque τ\tau for the robots. The upcoming section will discuss each key component in detail. The algorithm is in Appendix G.

Refer to caption
(a) GMM from trajectories
Refer to caption
(b) GMM Chain
Refer to caption
(c) Gaussian joints illustration in 2D
Figure 3: Illustrations of 𝒫𝒞\mathcal{PC}-GMM fitted on trajectory data (a) and the Gaussian chain transform from Section 4.1.1, (c) a closer look at the Gaussian joints introduced in Section 4.1.3.

4.1.1 GMM Chain

Following the LPV-DS pipeline, the demonstration trajectory is encoded into GMM {πk,θk0}\{\pi_{k},\theta_{k}^{0}\} using 𝒫𝒞\mathcal{PC}-GMM [22]. As shown in Figure 3a, a trajectory is extracted and simplified into a chain of Gaussians links. In the update stage (Figure 2e), we transform the spatial relationships among the Gaussians to achieve task adaptation. Imagine an analogy of the Gaussian chain being a robot arm, which could be rotated around each joint to achieve a specific geometric configuration as shown in Figure 3b. Note the robot arm analogy is on the end-effector trajectory instead of the actual robot arm. The generalization parameters βi\beta_{i} are the joints between each pair of the neighboring Gaussians, which describes the spatial relationship between the neighbors. After the 𝒫𝒞\mathcal{PC}-GMM step, we can obtain the initial βi0\beta_{i}^{0} and later update the βi0\beta_{i}^{0} to βi\beta_{i} to achieve transform. The joint of two Gaussian, which is the βi\beta_{i}, is the mean of the product between them as described by the picture on the left in Figure 3c, which could be obtained by,

Σt=(Σ11+Σ21)1βi,12=Σt(Σ11μ1+Σ21μ2)\Sigma_{t}=(\Sigma_{1}^{-1}+\Sigma_{2}^{-1})^{-1}\qquad\,\beta_{i,12}=\Sigma_{t}(\Sigma_{1}^{-1}\mu_{1}+\Sigma_{2}^{-1}\mu_{2}) (5)

where μ\mu and Σ\Sigma are the mean and covariance of the Gaussians {π1,θ10}\{\pi_{1},\theta_{1}^{0}\} and {π2,θ20}\{\pi_{2},\theta_{2}^{0}\}. To complete the robot arm analogy, we also need to determine the links position and orientation TGMMSE(3)KT_{GMM}\in SE(3)^{K} with respect to the joints. Figure 3c depicts the Gaussian mean position μj\mu_{j} and orientation (described by the eigenvectors e^j\hat{e}_{j} of the covariance matrix Σj\Sigma_{j}) with respect to the frame at the last joint with the x-axis pointing towards the next joint (in the direction of the demonstration). All of the above are constructed as the initial condition, in which no update is involved yet. Later when the states of the generalization parameter βi\beta_{i} change, we will recover the same transformation of the mean and covariance with respect to the corresponding βi\beta_{i}. Before introducing the approach to obtain the new joint positions βi\beta_{i} we provide a brief summary of the Laplacian editing approach [49, 47].

4.1.2 Laplacian Editing Primer

Laplacian Editing allows directly modifying an existing trajectory defined by mm waypoints 𝒓d×m\boldsymbol{r}\in\mathbb{R}^{d\times m} while capturing local properties. First, we convert the waypoints 𝒓\boldsymbol{r} in cartesian space into Laplacian coordinates Δ\Delta with the graph Laplacian matrix Lm×mL\in\mathbb{R}^{m\times m} [47],

Lij={1 if i=j,wijj𝐍iwij if j𝐍i,0 otherwise. L_{ij}=\left\{\begin{aligned} 1&\qquad\text{ if }i=j,\\ -\frac{w_{ij}}{\sum_{j\in\mathbf{N}_{i}}w_{ij}}&\qquad\text{ if }j\in\mathbf{N}_{i},\\ 0&\qquad\text{ otherwise. }\end{aligned}\right. (6)

where 𝐍i\mathbf{N}_{i} are a set of neighbor points rjr_{j} for waypoint rir_{i}, and wijw_{ij} is a weight set to 1 for this work. One can obtain Δ=L𝒓\Delta=L\boldsymbol{r}, where Δ\Delta is a concatenation of the Laplacian coordinate for each waypoint δi=j𝐍iwijj𝐍iwij(rirj)\delta_{i}=\sum_{j\in\mathbf{N}_{i}}\frac{w_{ij}}{\sum_{j\in\mathbf{N}_{i}}w_{ij}}\left(r_{i}-r_{j}\right). The matrix LL can be singular, so one can impose constraints on the system L𝒓=ΔL\boldsymbol{r}=\Delta when solving for new waypoints 𝒓\boldsymbol{r} to achieve editing [47].

4.1.3 Transform Gaussians with Constraints

The initial joints βi0\beta_{i}^{0} are converted into the Laplacian coordinate, which constructs a least-square objective as in Section 4.1.2. Then, we align the first link (formed by β0\beta_{0} and β1\beta_{1}) and the last link (formed by βn1\beta_{n-1} and βn\beta_{n}) with the geometric descriptor 𝒪i\mathcal{O}_{i}, which forms the constraints for the least-square formulation. When solving for this optimization, the other βi\beta_{i} will adjust based on the Laplacian objective, softly preserving local position properties,

minβiJ(βi)\displaystyle\min_{\beta_{i}}J\left(\beta_{i}\right) =LβiΔ22subjectto\displaystyle=\|L\beta_{i}-\Delta\|_{2}^{2}\,\,\,\,\,\,\text{subject}\,\,\text{to} {T0,1(βi,0,βi,1)=OstartTn1,n(βi,n1,βi,n)=Oend\displaystyle\,\left\{\begin{array}[]{l}T_{0,1}(\beta_{i,0},\beta_{i,1})=O_{start}\\ T_{n-1,n}(\beta_{i,n-1},\beta_{i,n})=O_{end}\end{array}\right. (7)

where Ln×nL\in\mathbb{R}^{n\times n} and Δ\Delta are the same as in Section 4.1.2. T0,1(β0,β1)T_{0,1}(\beta_{0},\beta_{1}) represents the frame transformation from βi,0\beta_{i,0} to βi,1\beta_{i,1}. The solution of this optimization will produce new joints positions βi\beta_{i}. After that, the link position and orientation (which are the Gaussians’ means and covariances), as well as the scale, are recovered using the TGMMT_{GMM} recorded from the previous section. The orientation of the Gaussian is determined by the eigenvector shown in red and green in Figure 3c. Each orientation will remain fixed with respect to the last joint frame (β12\beta_{12} in Figure 3c). If a rotation happens to the last joint frame, the Gaussian frame associated with the last joint will be moved together in the global frame but remain the same in the last joint frame. The scale is determined by the change from the original distance between each pair of neighboring joints, which scales the eigenvalues of the Gaussians’ covariances. Referring back to the flowchart in Figure 2e, this section outputs the updated joint positions βi\beta_{i} and updated GMM parameters θk\theta_{k} (πk\pi_{k} stay unchanged).

Refer to caption
Figure 4: After obtaining the joints between neighboring Gaussians, a piecewise linear trajectory is formed. The green polygon is the geometric descriptor at the exit of the trajectory. We set a constraint on the last segment to align with the pose of the geometric descriptor. After solving (7), we achieve the final transformation (depicted in the rightmost image). Note: This illustration only shows the constraint at the exit, while in general, the constraints could be at the entry and the exit.

4.2 Create Velocity Profile

Depending on the new task constraints, the velocity requirement could be different from the original demonstration. Therefore, this approach offers the opportunity to modify the velocity by regenerating a trajectory along the Gaussians joints as the waypoints. There are many ways to achieve this with known waypoints, such as splines or minimum jerk trajectory. We provide a simple example of using Laplacian Editing to generate a new trajectory. First, we connect βi,0\beta_{i,0} and βi,n\beta_{i,n} to form a linear trajectory ζd×p\zeta\in\mathbb{R}^{d\times p}. pp is the number of points on this trajectory. We then force this trajectory to pass through β\beta, the Gaussian joints, with Laplacian editing,

minζJ(ζ)\displaystyle\min_{\zeta}J\left(\zeta\right) =LζβΔζ22subjectto\displaystyle=\|L_{\zeta}\beta-\Delta_{\zeta}\|_{2}^{2}\,\,\,\,\,\text{subject}\,\,\text{to} ζj=βi,q\displaystyle\,\,\,\zeta_{j}=\beta_{i,q} (8)

where j{0,,p1}j\in\{0,...,p-1\} is the index in ζ\zeta for matching the corresponding βi\beta_{i}. More details can be found in Appendix B. The velocity will be determined by the finite difference between the edited trajectory neighboring data points divided by the dtdt collected from the demonstrations. After this section, the velocity information and the updated GMM will become the input for learning a new DS motion policy following Section 3, which by nature preserves the stability guarantees.

4.3 Multiple Segments

Multiple Elastic-DS could be stitched together to achieve a via-point trajectory and even long-horizon multi-segment tasks. The index ii in the task parameters βi\beta_{i} represents multiple segments, consequently multiple DSs in (2) when M>1M>1. To allow adding spatial constraints in the middle, one can split the task into multiple segments and process them in a divide-and-conquer manner. There are two possible task-specific cases for stitching the segments: (i) The activation function δ(ξ,Oi)\delta(\xi,O_{i}) is in charge of the switch for multiple DSs. The next DS will be activated by δ\delta when the last DS reaches the attractor. (ii) To create a smooth movement, this case will first connect all the Elastic-GMMs from different segments and then learn a single DS. The δ\delta function is not in use for this case. As mentioned in Section 2, the interesting separation points described by the geometric descriptors are specified by some upstream sources. For more information about the split and stitching process, please refer to Appendix C. The flexibility of composing and regrouping different transformed DS with the new constraint poses allows the possibility for multi-task scenarios.

5 Experimental Results

5.1 2D Experiments

This section shows 2D examples of using Elastic-DS. The learned DS is plotted as steamplots describing a velocity vector field (in blue) with GMM (in orange) geometric descriptors (in green) and rollout trajectory (in black) overlaying on top. The 2D simulation in Figure 5 shows the atomic case of one segment trajectory conditioned on a geometric descriptor with constraints at the endpoints. The two ends of the geometric descriptor (green polygons) are shifted and rotated to show different configurations and the changes in the DS vector field. Figure 6 shows the example of using a via-point to modify the policy in the middle of a single DS corresponding to case (ii) in section 4.3. The DS motion policy is able to adapt to the changes. For more details about stitching the trajectory, please refer to Appendix C. Figure 7 and Table 1 display a comparison to TP-GMM-DS [50, 51], TP-GPR-DS [52], and TP-proMP [52, 40]. Appendix E shows the failure cases of TP-GMM with fewer demonstrations. With a single demonstration, other methods fail to generalize, while Elastic-DS shows a satisfactory performance. For more comparison details, please refer to Appendix F.

Refer to caption
(a) Demonstration
Refer to caption
(b) Reverse
Refer to caption
(c) S Shape
Refer to caption
(d) Checkmark
Figure 5: Elastic-DS motion policy generalizes based on changes of start and goal poses
Refer to caption
(a) Demonstration
Refer to caption
(b) Set via-point
Refer to caption
(c) Shifted
Refer to caption
(d) Rotated
Figure 6: Elastic-DS motion policy generalizes based on changes of the via-point
Refer to caption
(a) Elastic-DS (Ours)
Refer to caption
(b) TP-GPR-DS
Refer to caption
(c) TP-GMM-DS
Refer to caption
(d) TP-proMP
Figure 7: Elastic-DS is able to handle the new situation while the other benchmark methods fail with a single demonstration.
Metric Elastic-DS (Ours) TP-GPR-DS TP-GMM-DS TP-proMP
Start Cosine Similarity 0.9843 -0.3981 0.9405 0.8061
Goal Cosine Similarity 0.9998 0.5453 0.6324 0.9070
Endpoints Distance 0.0008 0.835 0.0764 1.102
Table 1: The metrics include the orientation alignment of the start and goal as well as the position distance with the geometric descriptors in the new instance. The resulting data corresponds to the orange trajectories in Figure 7. With both ends moving, Elastic-DS remains in good performance on all three metrics. More details about the metrics are in Appendix F.

5.2 Robot Experiments

We demonstrate the validation result through four different real robot experiments on the Franka Emika Panda robot: Bookshelf, Pick and Place, Tunnel, and Combination, see Figure 2, 8 and Appendix D. In these experiments, the geometric descriptors are detected from a motion capture system. By attaching motion capture markers on objects, we specify geometric descriptors anchored on the objects of interest. Three experiments will start with a human performing kinesthetic teaching by moving the end-effector. Then, the execution of the original learned DS from the demonstration will be shown. The objects of interest will then be shifted and rotated with different configurations. The last experiment shows the ability to compose new tasks. Without any new demonstration, the robot can still achieve the required tasks. Please refer to the video and Appendix D.

Refer to caption
(a) Executing the learned DS
Refer to caption
(b) Original trajectory
Refer to caption
(c) Generalize to a rotation
Refer to caption
(d) Adapted trajectory
Figure 8: Example of Elastic-DS adaptation in a tunnel passing task

6 Conclusions and Limitations

We propose Elastic-DS in this work, which allows modifying DS with task parameters conditioned on geometric features to achieve task generalization. As the core component, we introduced Elastic-GMM to augment the original LPV-DS to create more flexible task-specific motions with as low as one demonstration. By showing both 2D simulation and different 3D robot experiments, we validate the ability of Elastic-DS to perform task generalization as well as the potential for multi-task and long-horizon motion policies. Following we discussed the limitations of our approach.

Limitations First, this work only considers end-effector motions in the Cartesian position space. The task constraints are only for translational motion as well. To achieve more variety of tasks and extend to more possible poses, the orientation space has to be considered. Further directions could adopt works like  [53] and  [54] to produce DS motion policies and meet task constraints in the full pose space. To go even further, the full pose could also contain the gripper state, which could be achieved with a coupled DS approach such as [55] and [3]. Motion policy in the joint space should also be considered [56]. Second, the geometric descriptors are assumed to be given by human specification in this work. To address this limitation, we could utilize object tracking methods like BundleTrack [57] to identify the geometric interactions between the robot demonstration and objects. Finally, the task adaptation in this work is fast yet not real-time (1s\approx 1s for 2D and 3D data on a typical laptop with Intel i7-12700H and 16GB memory, depending on the complexity of the task). Computation time analysis is provided in Appendix H. With the dynamic nature of physical human-robot interaction, it is important to provide continuous adaptation on the fly to create a seamless experience. Hence, our immediate next step is to accelerate adaptation time to ms scale.

References

  • Lasota et al. [2017] P. A. Lasota, T. Fong, and J. A. Shah. A survey of methods for safe human-robot interaction. Foundations and Trends® in Robotics, 5(4):261–349, 2017. ISSN 1935-8253.
  • Sanneman et al. [2021] L. Sanneman, C. Fourie, and J. A. Shah. The state of industrial robotics: Emerging technologies, challenges, and key research directions. Foundations and Trends® in Robotics, 8(3):225–306, 2021. ISSN 1935-8253.
  • Billard et al. [2022] A. Billard, S. Mirrazavi, and N. Figueroa. Learning for Adaptive and Reactive Robot Control: A Dynamical Systems Approach. MIT Press, 2022.
  • Argall et al. [2009] B. D. Argall, S. Chernova, M. Veloso, and B. Browning. A survey of robot learning from demonstration. Robotics and autonomous systems, 57(5):469–483, 2009.
  • Schaal [1996] S. Schaal. Learning from demonstration. Advances in neural information processing systems, 9, 1996.
  • Billard et al. [2016] A. G. Billard, S. Calinon, and R. Dillmann. Learning from humans. Springer handbook of robotics, pages 1995–2014, 2016.
  • Osa et al. [2018] T. Osa, J. Pajarinen, G. Neumann, J. A. Bagnell, P. Abbeel, J. Peters, et al. An algorithmic perspective on imitation learning. Foundations and Trends® in Robotics, 7(1-2):1–179, 2018.
  • Ravichandar et al. [2020] H. Ravichandar, A. S. Polydoros, S. Chernova, and A. Billard. Recent advances in robot learning from demonstration. Annual review of control, robotics, and autonomous systems, 3:297–330, 2020.
  • Khansari-Zadeh and Billard [2011] S. M. Khansari-Zadeh and A. Billard. Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Transactions on Robotics, 27(5):943–957, 2011.
  • Ren et al. [2021] A. Ren, S. Veer, and A. Majumdar. Generalization guarantees for imitation learning. In Conference on Robot Learning, pages 1426–1442. PMLR, 2021.
  • Ziebart et al. [2008] B. D. Ziebart, A. L. Maas, J. A. Bagnell, A. K. Dey, et al. Maximum entropy inverse reinforcement learning. In Aaai, volume 8, pages 1433–1438. Chicago, IL, USA, 2008.
  • Sadigh et al. [2017] D. Sadigh, A. D. Dragan, S. Sastry, and S. A. Seshia. Active preference-based learning of reward functions. 2017.
  • Zhao et al. [2022] Z. Zhao, Z. Wang, K. Han, R. Gupta, P. Tiwari, G. Wu, and M. J. Barth. Personalized car following for autonomous driving with inverse reinforcement learning. In 2022 International Conference on Robotics and Automation (ICRA), pages 2891–2897, 2022. doi:10.1109/ICRA46639.2022.9812446.
  • Finn et al. [2017] C. Finn, T. Yu, T. Zhang, P. Abbeel, and S. Levine. One-shot visual imitation learning via meta-learning. In Conference on robot learning, pages 357–368. PMLR, 2017.
  • Wang et al. [2021] H. Wang, H. Zhao, and B. Li. Bridging multi-task learning and meta-learning: Towards efficient training and effective adaptation. In International Conference on Machine Learning, pages 10991–11002. PMLR, 2021.
  • Pan and Yang [2010] S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010.
  • Sun et al. [2022] L. Sun, H. Zhang, W. Xu, and M. Tomizuka. Paco: Parameter-compositional multi-task reinforcement learning. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=LYXTPNWJLr.
  • Thrun and Mitchell [1995] S. Thrun and T. M. Mitchell. Lifelong robot learning. Robotics and autonomous systems, 15(1-2):25–46, 1995.
  • Ruvolo and Eaton [2013] P. Ruvolo and E. Eaton. Ella: An efficient lifelong learning algorithm. In International conference on machine learning, pages 507–515. PMLR, 2013.
  • Mendez et al. [2022] J. A. Mendez, H. van Seijen, and E. Eaton. Modular lifelong reinforcement learning via neural composition. arXiv preprint arXiv:2207.00429, 2022.
  • Aljundi et al. [2019] R. Aljundi, K. Kelchtermans, and T. Tuytelaars. Task-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11254–11263, 2019.
  • Figueroa and Billard [2018] N. Figueroa and A. Billard. A physically-consistent bayesian non-parametric mixture model for dynamical system learning. In A. Billard, A. Dragan, J. Peters, and J. Morimoto, editors, Proceedings of The 2nd Conference on Robot Learning, volume 87 of Proceedings of Machine Learning Research, pages 927–946. PMLR, 29–31 Oct 2018. URL https://proceedings.mlr.press/v87/figueroa18a.html.
  • Wang et al. [2022] Y. Wang, N. Figueroa, S. Li, A. Shah, and J. Shah. Temporal logic imitation: Learning plan-satisficing motion policies from demonstrations. In 6th Annual Conference on Robot Learning, 2022. URL https://openreview.net/forum?id=ndYsaoyzCWv.
  • Bain and Sammut [1995] M. Bain and C. Sammut. A framework for behavioural cloning. In Machine Intelligence 15, pages 103–129, 1995.
  • Ross et al. [2011] S. Ross, G. Gordon, and D. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011.
  • Ho and Ermon [2016] J. Ho and S. Ermon. Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
  • Pfrommer et al. [2022] D. Pfrommer, T. T. Zhang, S. Tu, and N. Matni. TaSIL: Taylor series imitation learning. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=jqzoJw7xamd.
  • Zentner et al. [2022] K. Zentner, U. Puri, Y. Zhang, R. Julian, and G. S. Sukhatme. Efficient multi-task learning via iterated single-task transfer. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10141–10146. IEEE, 2022.
  • Mudrakarta et al. [2019] P. K. Mudrakarta, M. Sandler, A. Zhmoginov, and A. Howard. K for the price of 1: Parameter-efficient multi-task and transfer learning, 2019.
  • Rahmatizadeh et al. [2018] R. Rahmatizadeh, P. Abolghasemi, L. Bölöni, and S. Levine. Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. In 2018 IEEE international conference on robotics and automation (ICRA), pages 3758–3765. IEEE, 2018.
  • Jang et al. [2022] E. Jang, A. Irpan, M. Khansari, D. Kappler, F. Ebert, C. Lynch, S. Levine, and C. Finn. Bc-z: Zero-shot task generalization with robotic imitation learning, 2022.
  • Fu et al. [2022] H. Fu, S. Yu, S. Tiwari, G. Konidaris, and M. Littman. Meta-learning transferable parameterized skills. arXiv preprint arXiv:2206.03597, 2022.
  • Calinon [2017] S. Calinon. Robot learning with task-parameterized generative models. In Robotics Research: Volume 2, pages 111–126. Springer, 2017.
  • Zhu et al. [2022] J. Zhu, M. Gienger, and J. Kober. Learning task-parameterized skills from few demonstrations. IEEE Robotics and Automation Letters, 7(2):4063–4070, 2022.
  • Pervez and Lee [2018] A. Pervez and D. Lee. Learning task-parameterized dynamic movement primitives using mixture of gmms. Intelligent Service Robotics, 11(1):61–78, 2018.
  • Ureche et al. [2015] A. L. P. Ureche, K. Umezawa, Y. Nakamura, and A. Billard. Task parameterization using continuous constraints extracted from human demonstrations. IEEE Transactions on Robotics, 31(6):1458–1471, 2015. doi:10.1109/TRO.2015.2495003.
  • Figueroa et al. [2016] N. Figueroa, A. L. P. Ureche, and A. Billard. Learning complex sequential tasks from demonstration: A pizza dough rolling case study. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 611–612, 2016. doi:10.1109/HRI.2016.7451881.
  • Li and Brock [2022] X. Li and O. Brock. Learning from demonstration based on environmental constraints. IEEE Robotics and Automation Letters, 7(4):10938–10945, 2022.
  • Ijspeert et al. [2013] A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal. Dynamical movement primitives: learning attractor models for motor behaviors. Neural computation, 25(2):328–373, 2013.
  • Paraschos et al. [2013] A. Paraschos, C. Daniel, J. R. Peters, and G. Neumann. Probabilistic movement primitives. Advances in neural information processing systems, 26, 2013.
  • Freymuth et al. [2022] N. Freymuth, N. Schreiber, P. Becker, A. Taranovic, and G. Neumann. Inferring versatile behavior from demonstrations by matching geometric descriptors. arXiv preprint arXiv:2210.08121, 2022.
  • Neumann et al. [2013] K. Neumann, A. Lemme, and J. J. Steil. Neural learning of stable dynamical systems based on data-driven lyapunov candidates. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1216–1222. IEEE, 2013.
  • Pérez-Dattari and Kober [2023] R. Pérez-Dattari and J. Kober. Stable motion primitives via imitation and contrastive learning. arXiv preprint arXiv:2302.10017, 2023.
  • Chi et al. [2023] C. Chi, S. Feng, Y. Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
  • Rana et al. [2020] M. A. Rana, A. Li, H. Ravichandar, M. Mukadam, S. Chernova, D. Fox, B. Boots, and N. Ratliff. Learning reactive motion policies in multiple task spaces from human demonstrations. In Conference on Robot Learning, pages 1457–1468. PMLR, 2020.
  • Quinlan and Khatib [1993] S. Quinlan and O. Khatib. Elastic bands: connecting path planning and control. In [1993] Proceedings IEEE International Conference on Robotics and Automation, pages 802–807 vol.2, 1993. doi:10.1109/ROBOT.1993.291936.
  • Nierhoff et al. [2016] T. Nierhoff, S. Hirche, and Y. Nakamura. Spatial adaption of robot trajectories based on laplacian trajectory editing. Autonomous Robots, 40:159–173, 2016.
  • Kirillov et al. [2023] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick. Segment anything. arXiv:2304.02643, 2023.
  • Lipman et al. [2005] Y. Lipman, O. Sorkine, M. Alexa, D. Cohen-Or, D. Levin, C. Rössl, and H.-P. Seidel. Laplacian framework for interactive mesh editing. International Journal of Shape Modeling, 11(01):43–61, 2005.
  • Calinon et al. [2014] S. Calinon, D. Bruno, and D. G. Caldwell. A task-parameterized probabilistic model with minimal intervention control. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 3339–3344. IEEE, 2014.
  • Calinon et al. [2012] S. Calinon, Z. Li, T. Alizadeh, N. G. Tsagarakis, and D. G. Caldwell. Statistical dynamical systems for skills acquisition in humanoids. In 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pages 323–329. IEEE, 2012.
  • Calinon [2016] S. Calinon. A tutorial on task-parameterized movement learning and retrieval. Intelligent service robotics, 9:1–29, 2016.
  • Figueroa et al. [2020] N. Figueroa, S. Faraji, M. Koptev, and A. Billard. A dynamical system approach for adaptive grasping, navigation and co-manipulation with humanoid robots. In 2020 IEEE International conference on robotics and automation (ICRA), pages 7676–7682. IEEE, 2020.
  • Urain et al. [2022] J. Urain, D. Tateo, and J. Peters. Learning stable vector fields on lie groups. IEEE Robotics and Automation Letters, 7(4):12569–12576, 2022.
  • Shukla and Billard [2012] A. Shukla and A. Billard. Coupled dynamical system based arm–hand grasping model for learning fast adaptation strategies. Robotics and Autonomous Systems, 60(3):424–440, 2012.
  • Khansari-Zadeh and Khatib [2017] S. M. Khansari-Zadeh and O. Khatib. Learning potential functions from human demonstrations with encapsulated dynamic and compliant behaviors. Autonomous Robots, 41:45–69, 2017.
  • Wen and Bekris [2021] B. Wen and K. Bekris. Bundletrack: 6d pose tracking for novel objects without instance or category-level 3d models. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8067–8074. IEEE, 2021.
  • Kronander and Billard [2016] K. Kronander and A. Billard. Passive interaction control with dynamical systems. IEEE Robotics and Automation Letters, 1(1):106–113, 2016. doi:10.1109/LRA.2015.2509025.

Appendix

Appendix A LPV-DS Parameter Optimization

GMM-based LPV-DS formulation It is described in Section 3 Preliminaries: 𝒫𝒞\mathcal{PC}-GMM and LPV-DS Motion Policy.

ξ˙=f(ξ)=k=1Kγk(ξ)(Akξ+bk)s.t.{(Ak)TP+PAk=Qk,Qk=(Qk)T0bk=Akξ\dot{\xi}=f(\xi)=\sum_{k=1}^{K}\gamma_{k}(\xi)\left(A^{k}\xi+b^{k}\right)\quad\text{s.t.}\,\,\left\{\begin{array}[]{l}\left(A^{k}\right)^{T}P+PA^{k}=Q^{k},Q^{k}=\left(Q^{k}\right)^{T}\prec 0\\ b^{k}=-A^{k}\xi^{*}\end{array}\right.

(9)

DS Estimation The set of DS parameters θDS={Ak,bk}\theta_{DS}=\{A^{k},b^{k}\} for f(ξ)f(\xi) is estimated with LPV-DS by minimizing the Mean Square Error (MSE) against the demonstrations [22] subject to stability constraints in Equation 9.

minθDSJ(θDS)\displaystyle\min_{\theta_{DS}}J\left(\theta_{DS}\right) =n=1Nreft=1TNξ˙t,nreff(ξt,nref)2\displaystyle=\sum_{n=1}^{N_{\mathrm{ref}}}\sum_{t=1}^{T_{N}}\left\|\dot{{\xi}}_{t,n}^{\mathrm{ref}}-{f}\left({\xi}_{{t},n}^{\mathrm{ref}}\right)\right\|^{2} (10)
s.  t. {(Ak)TP+PAk=Qk,Qk=(Qk)T0bk=Akξk=1,,K\displaystyle\,\left\{\begin{array}[]{l}\left(A^{k}\right)^{T}P+PA^{k}=Q^{k},Q^{k}=\left(Q^{k}\right)^{T}\prec 0\\ b^{k}=-A^{k}\xi^{*}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\forall k=1,\dots,K\end{array}\right.

which is a constrained non-convex semi-definite program (SDP). Further, when PP is known (or estimated beforehand as in [22]) the problem becomes a convex SDP that can be solved highly efficiently with off-the-shelf QP solvers [22, 3].

Appendix B Creating Velocity Profile with Laplacian Editing

Refer to caption
Figure 9: Connect endpoints and set constraints along the trajectory progress at the Gaussian joints

The main goal is to force a linear trajectory ζd×p\zeta\in\mathbb{R}^{d\times p} to pass through β\beta, the Gaussian joints, with Laplacian editing,

minζJ(ζ)\displaystyle\min_{\zeta}J\left(\zeta\right) =LζβΔζ22subjectto\displaystyle=\|L_{\zeta}\beta-\Delta_{\zeta}\|_{2}^{2}\,\,\,\,\,\text{subject}\,\,\text{to} ζj=βi,q\displaystyle\,\,\,\zeta_{j}=\beta_{i,q} (11)

We first calculate the total Euclidean distance along the piecewise connected line between neighboring joints with total distance D=i=0n1βi+1βiD=\sum^{n-1}_{i=0}||\beta_{i+1}-\beta_{i}||. Then we have the percentage λq\lambda_{q} of the progress that the joints positions in βi\beta_{i} make along the total distance DD. By using λq\lambda_{q}, the corresponding βi\beta_{i} is mapped to the index j=λq(p1)j=\lfloor\lambda_{q}(p-1)\rfloor. In practice, by also enforcing constraints between the joints with linearly interpolation will help the trajectory become more aligned with the GMM link and the geometric descriptors. The velocity will be determined by the finite difference between the edited trajectory neighboring data points divided by the dtdt collected from the demonstrations. One can specify the velocity by controlling the spacing between the edited trajectory neighboring data points and the number of points pp.

Appendix C Stitching from Multiple Segments

As mentioned in the main paper, there are two design trade-off approaches for stitching multiple segments. Depending on the task, one can choose either to learn a DS for each Elastic-GMM (Sequential DS) or learn a single DS for the stitched Elastic-GMM (Combined DS). Figure 10a shows the flow describing the former case, and Figure 10b shows the flow for the latter case.

Refer to caption
(a) This flow ensures that the position geometric constraints will be reached for the task with multiple DS
Refer to caption
(b) This flow produces smooth motion with a single DS. One can adapt [23] to achieve task satisfaction.
Figure 10: Different task split flows for design trade-off.

Here is an example of the two cases in which they are used to modify a DS with via-point. By specifying an interesting point in the demonstration (usually by human specification or upstream computer vision method), the trajectory will be split into separate components. Each individual segment will be processed with Elastic-GMM to meet the new interesting via-point geometric constraints as depicted in green polygons. By performing such steps, we can pose constraints not just on the endpoints of the demonstration but also on the intermediate points of the demonstration. This via-point experiment shows that with changes in the via-point constraints, Elastic-DS can adapt to them.

C.1 Sequential Elastic-DS

Refer to caption
Figure 11: An example of the pipeline for sequential DS. In this case, the single demonstration (in red) is separated at the blue spot. Both segments perform Elastic GMM to meet the geometric descriptor constraints and learn DS individually. The end results are two DS which will run sequentially. After the first DS reaches its attractor, the robot will start to follow the second DS. This corresponds to the flowchart in Figure 10a
Refer to caption
(a) t=1 in the first DS
Refer to caption
(b) t=2 in the first DS
Refer to caption
(c) t=1 in the second DS
Refer to caption
(d) t=2 in the second DS
Figure 12: Modified DS based on the via point as multi-segment DS without new demonstration. The plots here show the rollout trajectory of the switch DS. It uses the same GMM as in Figure 14c.

C.2 Combined Elastic-DS

Refer to caption
Figure 13: An example of the pipeline for building a single DS. In this case, the single demonstration (in red) is separated at the blue spot. After obtaining the GMM for each segment, they are stitched together and generate a single DS; such an approach will generate smooth velocity but could result in trajectories missing the geometric constraints with perturbation. One could adapt LTL-DS [23] to alleviate this and achieve task satisfaction. This corresponds to the flowchart in Figure 10b

.

Refer to caption
(a) The original DS
Refer to caption
(b) Original with a via-point constraint
Refer to caption
(c) Shift the via-point
Refer to caption
(d) Shift and rotate the via-point
Figure 14: Modified DS based on the via point as a single DS without new demonstrations

Appendix D Robot Experiments Details

D.1 Software and Hardware Details

For all of these experiments, we used the 7DOF Franka Emika Panda robotic arm controlled via ROS and the libfranka C++ interface. The computer for the experiments ran on Ubuntu 20.04 with Intel i7-11700K 3.6GHz CPU and 32GB memory. To track the geometric descriptors 𝒪b\mathcal{O}_{b} for each task, we use the Optitrack Motion Capture system, which provides us 6DoF frames of the rigid bodies at 100Hz. We first attached a set of motion capture markers to the base of the Franka Panda robot arm, which served as the fixed frame. For each experiment, we attached motion capture markers to the task-relevant objects.

To record demonstrations, we published the robot end-effector position to a ROSbag recording. During the demonstration, the orientation RR and position pp of the task-relevant objects will be recorded with the Motion Capture system. We used the finite difference of the collected position data to calculate the trajectory velocity, which forms the training data 𝒟:={ξt,n,ξ˙t,n}t=1Tn\mathcal{D}:=\{\xi_{t,n},\dot{\xi}_{t,n}\}_{t=1}^{T_{n}}.

In the training phase, we first use Elastic-GMM implemented in both MATLAB and Python to learn a GMM encoding of the training data. Then, during the testing phase, we use Elastic-GMM implemented in Python to modify the encoded data with the geometric descriptors 𝒪i\mathcal{O}_{i}. The Elastic-GMM output will then become the input to the MATLAB code for learning the motion policy. The execution of the Elastic-DS motion policy is implemented in a C++ ROS node, which takes the current state of the end-effector of the robot ξ3\xi\in\mathbb{R}^{3} as the input. The output of this ROS node is the desired end-effector velocity ξ3\xi\in\mathbb{R}^{3}, which is sent to a low-level cartesian velocity impedance controller implemented in C++ and running at 1kHz. To achieve the required velocity at the end-effector, it performs torque control at the joints. The stiffness parameter of the controller was set to 180.0. The orientation of the end-effector is fixed for every task as the Elastic-DS motion policy ξ˙=f(ξ)\dot{\xi}=f(\xi) is only learned in 3D space.

D.2 Bookshelf Experiment

The goal of this experiment is to teach the robot how to insert a book into a desktop bookshelf. With the bookshelf being moved to different locations and orientations on the table, the robot should be able to generalize and reproduce new motion policies for inserting the book into the bookshelf.

Prepare for the experiment:

  1. 1.

    We attached a motion capture marker object to the side of the bookshelf. The goal geometric descriptor 𝒪g\mathcal{O}_{g} was at an offset from the marker object so that it was inside one of the slots in the bookshelf. The orientation RSO(3)R\in SO(3) of the geometric descriptor 𝒪g\mathcal{O}_{g} was the same as the opening of the bookshelf. Both the position p3p\in\mathbb{R}^{3} and the orientation RR were with respect to the fixed frame.

  2. 2.

    Closed the gripper to hold the center of the book vertically.

  3. 3.

    Calibrated the weight of the book so that the robot would not move in gravity compensation.

Collect data:

  1. 1.

    Recorded the robot end-effector position ξt3\xi_{t}\in\mathbb{R}^{3} to a ROSbag.

  2. 2.

    At the same moment, the motion capture system recorded the bookshelf (geometric descriptor) position and orientation 𝒪g\mathcal{O}_{g}.

  3. 3.

    As shown in the figure below, a person used hands directly in touch with the robot to perform a kinesthetic teaching demonstration. The ROSbag recording ended as the end effector position ξ\xi reached the bookshelf slot pp, which is the position in 𝒪g\mathcal{O}_{g}.

  4. 4.

    The single demonstration (end effector position ξ\xi and time-derivative computed numerically with timestamp data ξ˙\dot{\xi}), was then used as the training data 𝒟:={{ξt,n,ξ˙t,n}t=1Tn}\mathcal{D}:=\{\{\xi_{t,n},\dot{\xi}_{t,n}\}_{t=1}^{T_{n}}\}. The training data 𝒟\mathcal{D} was encoded as Elastic-GMM {πk,μk,Σk}k=1K\{\pi_{k},\mu_{k}^{{}^{\prime}},\Sigma_{k}^{{}^{\prime}}\}_{k=1}^{K}, as described in Section 4.1. There was no required tuning parameter.

Execution:

  1. 1.

    We moved the robot end-effector (with the book) and the bookshelf 𝒪g\mathcal{O}_{g} to different configurations, as shown in the different figures below and the video. The end-effector orientation with the book was always aligned with the bookshelf opening so that a translational movement could insert the book into the bookshelf.

  2. 2.

    The motion capture system recorded the new configuration of the bookshelf.

  3. 3.

    For the updated bookshelf configuration 𝒪g,new\mathcal{O}_{g,new}, we updated the Elastic-GMM {πk,μk,Σk}k=1K\{\pi_{k},\mu_{k}^{{}^{\prime}},\Sigma_{k}^{{}^{\prime}}\}_{k=1}^{K} to the new situation and learned the Elastic-DS as described in Section 4.1.

  4. 4.

    The robot then executed the DS motion policy ξ˙=f(ξ)\dot{\xi}=f(\xi) in the task space with a velocity-based impedance controller [58].

  5. 5.

    The gripper released the book once it reached the attractor ξcurr=p\xi_{curr}=p.

  6. 6.

    Repeated with different configurations.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 15: Demonstration for inserting book into a desktop bookshelf
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 16: The execution for the learned DS in the original configuration
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 17: The bookshelf was shifted closer to the robot (The shifting direction is indicated by the red arrow). Without any new demonstration, the learned DS was able to adapt to the new configuration.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 18: The bookshelf was shifted up (The shifting direction is indicated by the red arrow). Without any new demonstration, the Elastic-DS was able to adapt to the new configuration.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 19: The bookshelf was shifted up (The shifting direction is indicated by the red arrow). Without any new demonstration, the Elastic-DS was able to adapt to the new configuration. The robot was still able to reach the new bookshelf position with human disturbances
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 20: The bookshelf is rotated and shifted to the left side of the robot. Without any new demonstration, the Elastic-DS was able to adapt to the new configuration. The robot was still able to reach the new bookshelf configuration. We rotate the end-effector to be parallel with the bookshelf beforehand.

D.3 Pick and Place Experiment

In this task, we will show the robot how to pick and place a cube in a bin. The cube position p3p\in\mathbb{R}^{3} can be changed (labeled by the motion capture marker on the box as the geometric descriptors 𝒪b\mathcal{O}_{b} while the bin position is fixed). Based on the nature of this task, we manually set two motion segments. However, the cutoff location of the two motion segments is determined automatically by the motion capture data. The first segment is the picking motion with a geometric descriptor 𝒪1\mathcal{O}_{1} at the end of the trajectory (at the cube). The second segment is the placing motion with a geometric descriptor 𝒪2\mathcal{O}_{2} at the beginning to ensure the robot with the cube will move upward first to reach enough height to approach the bin from the top. So there are two geometric descriptors 𝒪1\mathcal{O}_{1} and 𝒪2\mathcal{O}_{2} at the cube to serve as a via-point. During the demonstration, the gripper open/close is done through voice commands (with the microphone at the bottom right of the snapshots). The gripper state is memorized and associated with each segment. At the end of each segment (reaching the attractor), the robot will open/close the gripper depending on commands during the demonstration.

Prepare for the experiment:

  1. 1.

    We attached a motion capture marker set to a base box for placing the cube.

Collect data:

  1. 1.

    Record robot end-effector position ξt3\xi_{t}\in\mathbb{R}^{3} to a ROSbag.

  2. 2.

    At the same moment, the motion capture system recorded the box (geometric descriptors ) position p1p_{1} and p2p_{2} from 𝒪1\mathcal{O}_{1} and 𝒪2\mathcal{O}_{2}. This became the cutoff of the two motion segments. As shown in the figure below, a person used hands directly in touch with the robot to perform a kinesthetic teaching pick and place demonstration in a single trajectory.

  3. 3.

    The human used voice command (with the mic at the bottom of the figures) to control the gripper state (open/close). The ROSbag recorded the gripper state.

  4. 4.

    The single demonstration (end-effector position, timestamp data gripper state) was then separated into two parts 𝒟:={{ξt,n,ξ˙t,n}t=1Tn}n=1N=2\mathcal{D}:=\{\{\xi_{t,n},\dot{\xi}_{t,n}\}_{t=1}^{T_{n}}\}_{n=1}^{N=2} and used for training. The two segments of training data were encoded as two Elastic-GMM {{πk,i,μk,i,Σk,i}k=1K}i=1N=2\{\{\pi_{k,i},\mu_{k,i}^{{}^{\prime}},\Sigma_{k,i}^{{}^{\prime}}\}_{k=1}^{K}\}_{i=1}^{N=2} as described in Section 4.1. There was no required tuning parameter.

Execution:

  1. 1.

    We moved the robot end-effector and the box to different positions, as shown in the different figures below. The gripper always pointed downward at all time.

  2. 2.

    The motion capture system recorded the new positions of the box. It served as the new geometric descriptors’ positions p1p_{1} and p2p_{2} from 𝒪1\mathcal{O}_{1} and 𝒪2\mathcal{O}_{2} as well as the switch position of the two segments. The first geometric descriptor orientation R1SO(3)R_{1}\in SO(3) was always pointing down to make the gripper approach the cube from the top. The second geometric descriptor orientation R2SO(3)R_{2}\in SO(3) was always pointing up to allow the gripper to reach enough height before placing the cube in the bin.

  3. 3.

    For the updated box (geometric descriptors) configurations 𝒪1\mathcal{O}_{1} and 𝒪2\mathcal{O}_{2}, we updated the Elastic-GMMs {{πk,i,μk,i,Σk,i}k=1K}i=1N=2\{\{\pi_{k,i},\mu_{k,i}^{{}^{\prime}},\Sigma_{k,i}^{{}^{\prime}}\}_{k=1}^{K}\}_{i=1}^{N=2} to the new situation and learned the Elastic-DSs for the two segments as described in Section 4.1.

  4. 4.

    The multiple DS motion policies ξ˙=δ(fn(ξ))\dot{\xi}=\delta(f_{n}(\xi)) with one-hot activation were executed in order separated by a via-point at the cube, switching of them automatically happens at the via-point as described in Appendix C.1.

  5. 5.

    The gripper released the cube once it reached the attractor in f2(ξ)f_{2}(\xi).

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 21: Demonstration for a pick and place task
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 22: The execution for the learned DS in the original configuration
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 23: The cube was shifted further away from the robot (The shifting direction is indicated by the red arrow).
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 24: The cube was shifted to the right side of the robot (The shifting direction is indicated by the red arrow).
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 25: The cube was shifted slightly to the right side of the robot (The shifting direction is indicated by the red arrow). During the execution, the human held the robot and switched to another cube. After that, the robot finished the task

D.4 Tunnel Experiment

In this experiment, we will show the robot how to pass through a tunnel, mimicking a scanning/inspection task. Two different motion capture marker objects label the entrance and the exit of the tunnel. Separating by the two markers, there are a total of three segments in this task. It is a task with two via points.

Prepare for the experiment:

  1. 1.

    We attached two motion capture objects to two sides of the tunnel (Each object has three markers).

Collect data:

  1. 1.

    Recorded robot end-effector position ξt3\xi_{t}\in\mathbb{R}^{3} to a ROSbag.

  2. 2.

    At the same moment, the motion capture system recorded the entry and exit positions {pb}b=1B=4\{p_{b}\}_{b=1}^{B=4} from {𝒪b}b=1B=4\{\mathcal{O}_{b}\}_{b=1}^{B=4}. They became the two cutoffs of the three motion segments. A person then used hands directly in touch with the robot to perform a kinesthetic teaching tunnel demonstration in a single trajectory.

  3. 3.

    The single demonstration (end-effector position and timestamp data) was separated into three segments 𝒟:={{ξt,n,ξ˙t,n}t=1Tn}n=1N=3\mathcal{D}:=\{\{\xi_{t,n},\dot{\xi}_{t,n}\}_{t=1}^{T_{n}}\}_{n=1}^{N=3} for individual training. The three segments of training data were encoded as three Elastic-GMMs {{πk,i,μk,i,Σk,i}k=1K}i=1N=3\{\{\pi_{k,i},\mu_{k,i}^{{}^{\prime}},\Sigma_{k,i}^{{}^{\prime}}\}_{k=1}^{K}\}_{i=1}^{N=3} as described in Section 4.1. There was no required tuning parameter.

Execution:

  1. 1.

    We moved the robot end-effector and changed the tunnel position and orientation. The gripper always pointed downward.

  2. 2.

    The motion capture system recorded the new positions {pb}b=1B=4\{p_{b}\}_{b=1}^{B=4} of the tunnel entry and exit. They served as the new geometric descriptor position in {𝒪b}b=1B=4\{\mathcal{O}_{b}\}_{b=1}^{B=4} as well as the switch position of the three segments. The first segment had a geometric descriptor 𝒪1\mathcal{O}_{1} at the end (at the tunnel entry). The second segment was within the tunnel, so it had two geometric descriptors 𝒪2\mathcal{O}_{2} and 𝒪3\mathcal{O}_{3} at two ends. The third segment had a geometric descriptor 𝒪4\mathcal{O}_{4} at the beginning (at the tunnel exit). All of the geometric descriptors {𝒪b}b=1B=4\{\mathcal{O}_{b}\}_{b=1}^{B=4} were predefined to point along the tunnel movement direction. They will change based on the relative position of the entry and exit.

  3. 3.

    We updated the three Elastic-GMMs {{πk,i,μk,i,Σk,i}k=1K}i=1N=3\{\{\pi_{k,i},\mu_{k,i}^{{}^{\prime}},\Sigma_{k,i}^{{}^{\prime}}\}_{k=1}^{K}\}_{i=1}^{N=3} to the new tunnel configurations and learned a single Elastic-DS ξ˙=f(ξ)\dot{\xi}=f(\xi) as in Appendix C.2. Except for the flip tunnel case, where we combined {πk,μk,Σk}k=1K}\{\pi_{k},\mu_{k}^{{}^{\prime}},\Sigma_{k}^{{}^{\prime}}\}_{k=1}^{K}\} to learn a sequential Elastic-DS ξ˙=δfn(ξ)\dot{\xi}=\delta f_{n}(\xi) as in Appendix C.1.

  4. 4.

    Execution of the motion policy ξ˙=δfn(ξ)\dot{\xi}=\delta f_{n}(\xi) via the cartesian velocity impedance controller

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 26: The human guides the end-effector for an inspection task, starting from the left side, passing through a tunnel, and stopping on the right side
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 27: The execution for the learned DS in the original configuration from the demonstration
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 28: The tunnel is rotated. We rotate the end-effector to be parallel with the tunnel before the execution.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 29: The tunnel is shifted further away from the robot, indicated by the red arrow.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 30: The tunnel is flipped, indicated by the arrow in the tunnel (as opposed to the direction from the demonstration). The robot needs to move to the right side to enter the tunnel and exit to the left side before reaching the end pose.

D.5 Combined Experiment (Tunnel + Pick and Place)

There is no demonstration or training in this task. We reuse the Elastic-GMMs (each as {{πk,i,μk,i,Σk,i}k=1K}i=1N\{\{\pi_{k,i},\mu_{k,i}^{{}^{\prime}},\Sigma_{k,i}^{{}^{\prime}}\}_{k=1}^{K}\}_{i=1}^{N}) learned from the previous experiments to compose new sequences which perform new tasks. We manually defined the sequence of the task with the one-hot encoding activation δ(ξ,oi)\delta(\xi,o_{i}). However, in the future, we plan to develop high-level planning algorithms to determine the sequence automatically.

Prepare for the experiment:

  1. 1.

    We used all the previous components except for the bookshelf. The motion capture markers were placed in the same way as in the previous experiments. This time, we also added markers to the bin for the cube placing. There were a total of 7 geometric descriptors {𝒪b}b=1B=7\{\mathcal{O}_{b}\}_{b=1}^{B=7}

  2. 2.

    We defined the sequence of the execution (Pick-Scanning-Place) in δ(ξ,oi)\delta(\xi,o_{i}). There were a total of four motion segments in this task, corresponds to {fn(ξ)}n=1N=4\{f_{n}(\xi)\}_{n=1}^{N=4}.

Execution:

  1. 1.

    We moved the robot end-effector and changed the object positions. The gripper always pointed downward.

  2. 2.

    The motion capture system recorded the new positions of the objects. The geometric descriptors {𝒪b}b=1B=7\{\mathcal{O}_{b}\}_{b=1}^{B=7} were updated.

  3. 3.

    We updated the four Elastic-GMMs {{πk,i,μk,i,Σk,i}k=1K}i=1N=4\{\{\pi_{k,i},\mu_{k,i}^{{}^{\prime}},\Sigma_{k,i}^{{}^{\prime}}\}_{k=1}^{K}\}_{i=1}^{N=4} to the new geometric descriptors’ configurations {𝒪b}b=1B=7\{\mathcal{O}_{b}\}_{b=1}^{B=7} based on the motion capture data and learned four Elastic-DSs {fn(ξ)}n=1N=4\{f_{n}(\xi)\}_{n=1}^{N=4}, as described in Section 4.1.

  4. 4.

    Execution of the Sequential Elastic-DS motion policy ξ˙=δfn(ξ)\dot{\xi}=\delta f_{n}(\xi) (Appendix C.1) via the cartesian velocity impedance controller

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 31: Composing the learned DSs with task transfer parameters from the “pick and place” and “tunnel” tasks. The robot is able to pick up a block, pass through the tunnel for scanning, and place the block in the bin. The entire motion does not require extra demonstration. Note the positions of the objects are not the same as in the original demonstrations.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 32: We shift the cube starting platform, the tunnel, and the bin. By reusing and composing the previously learned DS with task transfer, the robot is able to finish the tasks of picking, scanning, and placing without new demonstration in this new environment configuration. Also, there is human disturbance involved during the task execution.

Appendix E Failure Cases for TP-GMM Task-Parameterized Learning

This section shows the performance of task-parametrized policy learning under a different number of demonstrations. Specifically, the method shown here uses Task-Parameterized Gaussian Mixture Model as the encoding strategy and Gaussian mixture regression as the trajectory reproduction [52, 51]. There are two examples in total. Each example will start with four demonstrations (samples) in blue (but faded), moving from the bottom geometric descriptor (frame) to four various other geometric descriptors on top. For the new situation, the two geometric descriptors will be placed at new positions (in deep green). The orange trajectory shows the reproduction in the new situation, with the orange ellipses being the GMM encoding. The number of demonstrations will decrease in each example to show the generalization ability to the new situation with less training data as well as the sensitivity to the coverage of the training data.

E.1 Case Example 1

Refer to caption
(a) 4 Samples
Refer to caption
(b) 3 Samples
Refer to caption
(c) 2 Samples
Refer to caption
(d) 1 Samples
Figure 33: With four demonstrations and the new frames being placed among the samples, the new motion policy performs well. As the number of demonstrations decreases to three and two, the new motion policies still reach the goal with reasonable behaviors though the initial movements are in the opposite direction. It clearly does not perform well with a single demonstration.

E.2 Case Example 2

Refer to caption
(a) 4 Samples
Refer to caption
(b) 3 Samples
Refer to caption
(c) 2 Samples
Refer to caption
(d) 1 Samples
Figure 34: The new frames are placed further away from the demonstration’s coverage area, but TP-GMM can still generate a correct reproduction with four demonstrations. However, as the number of demonstrations decreases, its performance decays and eventually fails with the single demonstration case.

In conclusion, TP-GMM does not generalize well when reducing the number of demonstrations in these two examples. It requires more effort to determine the appropriate placement of the demonstrations to generalize well. With a single demonstration, it tends to overfit that trajectory. The next section will show the performance of Elastic-DS being able to generalize well with a single demonstration compared to other TP approaches.

Appendix F Compare to Existing Methods

To show the advancement of our method, we create both qualitative and quantitative comparisons against various benchmark methods in the task-parametrized (TP) approach [52, 50]. Specifically, the benchmarks include Task-Parameterized Gaussian Process Regression with DS-GMR for motion reproduction (TP-GPR-DS) [52], Task-Parameterized Gaussian Mixture Model with DS-GMR for motion reproduction (TP-GMM-DS) [51, 50, 52], Task-Parameterized Probabilistic Movement Primitives (TP-proMP) [52, 40]. We use different quantitative metrics to show satisfaction in generalizing tasks:

  • Start Cosine Similarity: It describes the starting direction of the trajectory and how it aligns with the entry/starting geometric descriptor. We take the first two data points to create a vector vsv_{s} and compare it against the pointing direction of the entry/starting geometric descriptor vOsv_{Os}. The closer this value is to one, the better.

    cos(θs)=vsvOsvsvOs\cos(\theta_{s})=\frac{v_{s}\cdot v_{Os}}{\|v_{s}\|\|v_{Os}\|} (12)
  • Goal Cosine Similarity: It describes the goal reaching direction of the trajectory and how it aligns with the goal/exit geometric descriptor. We take the last two data points of the trajectory to create a vector vgv_{g} and compare it against the pointing direction of the goal/exit geometric descriptor vOgv_{Og}. The closer this value is to one, the better.

    cos(θg)=vgvOgvgvOg\cos(\theta_{g})=\frac{v_{g}\cdot v_{Og}}{\|v_{g}\|\|v_{Og}\|} (13)
  • Endpoints Distance: Besides the pointing direction, it is important that the trajectory starts from the center of the starting geometric descriptor POsP_{Os} and reach the center of the goal geometric descriptor POgP_{Og}. Let ξ0\xi_{0} be the start of the trajectory and ξT\xi_{T} be the end of the trajectory. The metric will be the sum of the two Euclidean distances. The smaller this value, the better.

    D=d(ξ0,POs)+d(ξT,POg)D=d(\xi_{0},P_{Os})+d(\xi_{T},P_{Og}) (14)

We compare four different trials with the same training data (a single demonstration): Close, Far, Both Ends Shifted, and Both Ends Shifted Far with increasing difficulty levels. Each subsection below describes a trial with a plot showing with red arrows how the geometric descriptors (in green) are being changed. Then it will be followed by four plots showing our method compared to three other methods. Each subsection will include a table showing the quantitative comparison. The single demonstration data is taken from the attached library code in [52]. The parameters for the benchmark methods remain in default as in the code from [52]. There are no required tuning parameters for Elastic-DS.

F.1 Close

Refer to caption
Figure 35: The single demonstration (in blue) goes from the bottom to the top, constrained by the two geometric descriptors. In the new scenario, the goal/exit geometric descriptor is shifted to a closed position, as indicated by the red arrow. The start/entry geometric descriptor remains at the same pose. We will need to generate a new motion policy that adapts to such a change.
Refer to caption
(a) Elastic-DS (Ours)
Refer to caption
(b) TP-GPR-DS
Refer to caption
(c) TP-GMM-DS
Refer to caption
(d) TP-proMP
Figure 36: The demonstration and its probabilistic encoding are indicated as blue. The new motion policy and its probabilistic encoding for the new situation are in orange. While the other benchmark methods fail to meet the constraints given only one demonstration, Elastic-DS can generalize to the geometric descriptors constraints on both ends
Metric Elastic-DS (Ours) TP-GPR-DS TP-GMM-DS TP-proMP
Start Cosine Similarity 0.9857 -0.8222 0.9794 0.9483
Goal Cosine Similarity 0.9999 0.7397 0.5422 0.9923
Endpoints Distance 0.0008 0.4298 0.7564 0.8939
Table 2: Elastic-DS outperforms other methods for meeting the new geometric descriptor constraints. Its cosine similarities are the closest to 1, and the endpoints distance is the closest to 0. Both TP-GMM-DS and TP-proMP have large endpoints distances. TP-GPR-DS starts its movement in the opposite direction.

F.2 Far

Refer to caption
Figure 37: In the new scenario, the goal/exit geometric descriptor is shifted to a further position with rotation, as indicated by the red arrow. The start/entry geometric descriptor remains at the same pose.
Refer to caption
(a) Elastic-DS (Ours)
Refer to caption
(b) TP-GPR-DS
Refer to caption
(c) TP-GMM-DS
Refer to caption
(d) TP-proMP
Figure 38: Only Elastic-DS generates a new motion policy that meets the new geometric descriptor constraints
Metric Elastic-DS (Ours) TP-GPR-DS TP-GMM-DS TP-proMP
Start Cosine Similarity 0.9971 -0.9999 0.7872 0.8987
Goal Cosine Similarity 0.9997 0.5451 0.6724 0.9675
Endpoints Distance 0.0009 0.8459 1.552 1.677
Table 3: As the goal geometric descriptor is moved further away, the performances of the other methods start to decay. The Start Cosine Similarities for TP-GMR-DS and TP-proMP decrease. Elastic-DS remains in good performance on the three metrics.

F.3 Both Ends Shifted

Refer to caption
Figure 39: The new situation includes translation and rotation for both the starting and goal geometric descriptors, as indicated by the red arrows.
Refer to caption
(a) Elastic-DS (Ours)
Refer to caption
(b) TP-GPR-DS
Refer to caption
(c) TP-GMM-DS
Refer to caption
(d) TP-proMP
Figure 40: Elastic-DS is able to handle the new situation while the other benchmark methods fail.
Metric Elastic-DS (Ours) TP-GPR-DS TP-GMM-DS TP-proMP
Start Cosine Similarity 0.9843 -0.3981 0.9405 0.8061
Goal Cosine Similarity 0.9998 0.5453 0.6324 0.9070
Endpoints Distance 0.0008 0.835 0.0764 1.102
Table 4: With both geometric descriptors moving, Elastic-DS remains in good performance on all three metrics.

F.4 Both Ends Shifted Far

Refer to caption
Figure 41: In the new scenario, the goal/exit geometric descriptor is shifted to an even further position with rotation, as indicated by the red arrow
Refer to caption
(a) Elastic-DS (Ours)
Refer to caption
(b) TP-GPR-DS
Refer to caption
(c) TP-GMM-DS
Refer to caption
(d) TP-proMP
Figure 42: Only Elastic-DS generates a new motion policy that meets the new geometric descriptor constraints
Metric Elastic-DS (Ours) TP-GPR-DS TP-GMM-DS TP-proMP
Start Cosine Similarity 0.9869 -0.3827 0.8540 0.9244
Goal Cosine Similarity 0.9997 0.9378 0.7151 0.9815
Endpoints Distance 0.0012 1.265 1.143 1.323
Table 5: With the more challenging new scenario, the performances for the benchmark methods decay even more. The endpoint distances increase for all the TP approaches. Elastic-DS remains in good performance on all three metrics.

Appendix G Transforming Elastic-GMM

Input: {μk,Σk}k=1K\{\mu_{k},\Sigma_{k}\}_{k=1}^{K}, ξt=1,ξt=T\xi_{t=1},\xi_{t=T}, Ostart,OendO_{start},O_{end}
Output: {μk,Σk}k=1K\{\mu_{k}^{{}^{\prime}},\Sigma_{k}^{{}^{\prime}}\}_{k=1}^{K}
n2n\leftarrow 2;
k1k\leftarrow 1;
while kK1k\leq K-1 do
       Σn=(Σ11+Σ21)1\Sigma_{n}=(\Sigma_{1}^{-1}+\Sigma_{2}^{-1})^{-1}; {Eq (5) in the main text}
       βn=Σn(Σ11μ1+Σ21μ2)\beta_{n}=\Sigma_{n}(\Sigma_{1}^{-1}\mu_{1}+\Sigma_{2}^{-1}\mu_{2}); {Eq (5) in the main text}
       λi,ei^=eig(Σk)\lambda_{i},\hat{e_{i}}=eig(\Sigma_{k});
       Mn1=M_{n-1}= create a frame at βn1\beta_{n-1} using βnβn1\beta_{n}-\beta_{n-1} as the x-axis;
       ζ\zeta = create a frame with ei^\hat{e_{i}}, μk\mu_{k};
       Γk,n1\Gamma_{k,n-1} = the transformation from Mn1M_{n-1} to ζ\zeta;
       n=n+1n=n+1;
       k=k+1k=k+1;
      
end while
β1=ξ1,βN=ξT\beta_{1}=\xi_{1},\beta_{N}=\xi_{T};
Construct L via Eq (6) in the main text;
Δ=Lβ\Delta=L\mathbf{\beta};
T0,1T_{0,1} = Create a frame with β0\beta_{0} and β1\beta_{1} at β0\beta_{0};
Tn1,nT_{n-1,n} = Create a frame with βn1\beta_{n-1} and βn\beta_{n} at βn\beta_{n};
β=argminβJ(β)=LβΔ22subjecttoconstraints in Eq (7) in the main text\beta^{{}^{\prime}}=\arg\min_{\beta}J\left(\beta\right)=\|L\beta-\Delta\|_{2}^{2}\,\,\,\,\,\,\text{subject}\,\,\text{to}\,\text{constraints in Eq (7) in the main text};
n2n\leftarrow 2;
k1k\leftarrow 1;
while kKk\leq K do
       Recover μk,eki^\mu_{k}^{{}^{\prime}},\hat{e_{ki}}^{{}^{\prime}} with Γk,n1\Gamma_{k,n-1} w.r.t βn1\beta_{n-1}^{{}^{\prime}};
       Scale λki,μkx\lambda_{ki}^{{}^{\prime}},\mu_{kx}^{{}^{\prime}} according to the new distance between neighboring β\beta;
       Reconstruct Σk\Sigma_{k}^{{}^{\prime}} with λki,eki^\lambda_{ki}^{{}^{\prime}},\hat{e_{ki}}^{{}^{\prime}};
       n=n+1n=n+1;
       k=k+1k=k+1;
end while
Return {μk,Σk}k=1K\{\mu_{k}^{{}^{\prime}},\Sigma_{k}^{{}^{\prime}}\}_{k=1}^{K};
Algorithm 1 Transform Elastic-GMM for Generalization

Appendix H Training and Adaptation Computation Times

Training and adaptation of the Elastic-DS are performed on a laptop with Intel i7-12700H and 16GB memory. Initial training time for a single demonstration with roughly Tn=200T_{n}=200 datapoints is:

  • Original PC-GMM implementation in Matlab [22] takes around 2-4 seconds.

  • An improved parallelized PC-GMM implementation in C++ takes around 100-200ms.

For parameter adaptation, the recorded computation times are below (considering 3-4 Gaussians):

  • Elastic-GMM parameter transfer takes around 30ms-80ms in Python.

  • DS parameter learning (SDP optimization) takes around 800ms\approx 800ms in Matlab.

Hence, for a single demonstrations with Tn=200T_{n}=200 datapoints initial training is <1s<1s whereas generating a new policy from task parameter changes takes around 1-2s. The robot experiments presented in this section contain between Tn=[500,1000]T_{n}=[500,1000] with ξ3\xi\in\mathbb{R}^{3}. For such datasets the average computation time to generate a new policy is 4s\approx 4s. In Fig. 43 we plot the trend of such computation times as function of increasing TnT_{n}.

Refer to caption
Figure 43: This plot shows the trend of Elastic-DS computation time for generating a new policy under different lengths of demonstration TnT_{n} in 2D and 3D. Each data point is collected from an average of 5 runs.