This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

BiRP: Learning Robot Generalized Bimanual Coordination using
Relative Parameterization Method on Human Demonstration

Junjia Liu, Hengyi Sim, Chenzui Li, and Fei Chen This work was supported in part by the Research Grants Council of the Hong Kong SAR under Grant 24209021, 14222722 and C7100-22GF and in part by CUHK Direct Grant for Research under Grant 4055140.Junjia Liu, Hengyi Sim, Chenzui Li, and Fei Chen are with the Department of Mechanical and Automation Engineering, T-Stone Robotics Institute, The Chinese University of Hong Kong, Hong Kong SAR (e-mail: [email protected], [email protected], [email protected], [email protected]).Corresponding authors
Abstract

Human bimanual manipulation can perform more complex tasks than a simple combination of two single arms, which is credited to the spatio-temporal coordination between the arms. However, the description of bimanual coordination is still an open topic in robotics. This makes it difficult to give an explainable coordination paradigm, let alone applied to robotics. In this work, we divide the main bimanual tasks in human daily activities into two types: leader-follower and synergistic coordination. Then we propose a relative parameterization method to learn these types of coordination from human demonstration. It represents coordination as Gaussian mixture models from bimanual demonstration to describe the change in the importance of coordination throughout the motions by probability. The learned coordinated representation can be generalized to new task parameters while ensuring spatio-temporal coordination. We demonstrate the method using synthetic motions and human demonstration data and deploy it to a humanoid robot to perform a generalized bimanual coordination motion. We believe that this easy-to-use bimanual learning from demonstration (LfD) method has the potential to be used as a data augmentation plugin for robot large manipulation model training. The corresponding codes are open-sourced in https://github.com/Skylark0924/Rofunc.

I Introduction

Humanoid robots with high redundancy are expected to perform complex manipulation tasks with human-like behavior. However, ensuring the coordination between multiple degrees of freedom is still an open problem in robotics. This is often the key to the success of most human daily activities, like stir-frying, pouring water, sweeping the floor, and putting away clothes. Thus, it is necessary to provide an explainable paradigm to describe and learn coordination. Learning the manipulation of a humanoid robot by observing human motion and behavior is a straightforward idea [1], but the technology behind it is still challenging. It needs to understand human motion data and design a bridge connecting humans and robots. In this work, we focus on the learning and generalization of bimanual coordination motions from human demonstration.

Refer to caption

Figure 1: The two main bimanual coordination manners in human daily activities are leader-follower and synergistic coordination. Learning these coordination manners from human demonstration requires the ability to extract the implicit coordination information from motion data and deploy it into new situations with different task parameters.

Learning from demonstration (LfD) is a type of machine-learning approach that allows robots to learn tasks or skills from human demonstrations. Instead of programming robot motions with explicit instructions that are defined manually for each task [2][3], LfD enables robots to learn skills by observing human performance [4]. It is implemented by the following processes: recording human demonstration data, learning the representation of multiple demonstrations, transferring the data to the workspace of robots, and finally designing a controller for generating the smooth trajectory and its corresponding control commands. LfD has become an increasingly popular approach for training robots, as it can be faster and more efficient than traditional programming methods. It also allows robots to learn tasks that may be difficult to program explicitly, such as those that involve complex movements or interactions with a dynamic environment. Meanwhile, another important feature of LfD is that it enables robots to adapt to new or changing environments [5], as they can learn from demonstrations in different settings and apply that knowledge to new situations.

Refer to caption

Figure 2: The whole framework is illustrated based on a pouring task, which is a specific leader-follower example. It starts from the collection process, where the coordinated motion of human arms and the displacement of objects were recorded by Optitrack. Then the coordinated motions were represented by TP-GMM with relative parameterization, and the task-specific GMM was reconstructed in a leader-follower manner (given a pre-generated leader motion to construct the corresponding follower motion representation). Follower motion can be generated by LQT (with coordination matrix) for robot execution.

Bimanual robots are much more complex to learn from demonstration than single-armed robots that can be taught by kinesthetic teaching [6]. Some previous works tried to combine the trajectories taught multiple times to realize the kinesthetic teaching of highly redundant robots [7]. However, this also makes the demonstration data less reliable. Recently, several works proposed feasible frameworks for learning directly from human demonstration. Krebs et al. provided a taxonomy of human bimanual manipulation in daily activities by focusing on different types of coordination [8]. Liu et al. regarded the leader-follower coordination as sequence transduction and designed a coordination mechanism based on Transformer model to achieve a human-level stir-fry task [9]. Besides, offline reinforcement learning algorithms have been used to let robots bimanual coordination tasks from offline demonstration dataset [10], allowing the robot to learn the most efficient and effective ways to coordinate its arms for a given task.

In this work, we aim to propose an explainable paradigm for learning generalized coordination from demonstration. The main contributions can be summarized as follows:

  • Coordination parameterization: We propose a relative parameterization method (BiRP) for extracting the coordination relationship from human demonstration and embedding it into the motion generation of each arm.

  • Leader-follower motion generation: We provide conditional coordinated motion generation for bimanual tasks with different roles in arms, allowing us to generate the follower’s motion according to the leader.

  • Synergistic motion generation: For tasks where there is no obvious role difference between arms, we also provide a motion generation method that enables both arms to adapt to new situations synergistically.

II Construct Bimanual Coordination by Relative Parameterization

The definition of relative parameterization is a way to parameterize the relative relationship between bimanual arms and embed this relationship into the representation of each arm. The relative relationship can have many forms, which depend on task-specific coordination characteristics. For example, if bimanual arms are asked to grasp the same object simultaneously and keep the hold until it is placed, the relative relationship can be the relative displacement of end-effectors. The definitions of symbols are listed in Table I.

TABLE I: Definition of Symbols
Symbol Definition
DD State dimensions
OO Order of the controller
PP Number of reference frames
HH Number of arms, hh refers to left or right arm here
TT Time horizon, t[0,T]t\in[0,T]
KK Number of Gaussian components in a mixture model
𝝃\boldsymbol{\xi} Demonstration motion, 𝝃DT\boldsymbol{\xi}\in\mathbb{R}^{DT}
𝜻j\boldsymbol{\zeta}_{j} Motion in frame jj, 𝜻=[ζ1,,ζT]DOT\boldsymbol{\zeta}=\left[\zeta_{1}^{\top},\ldots,\zeta_{T}^{\top}\right]\in\mathbb{R}^{DOT}
𝒖\boldsymbol{u} Control command, 𝒖=[u1,,uT1]D(T1)\boldsymbol{u}=\left[u_{1}^{\top},\ldots,u_{T-1}^{\top}\right]\in\mathbb{R}^{D(T-1)}
𝒙\boldsymbol{x} Robot motion, 𝒙=[x1,,xT]DT\boldsymbol{x}=\left[x_{1}^{\top},\ldots,x_{T}^{\top}\right]\in\mathbb{R}^{DT}
𝑸\boldsymbol{Q} The required tracking precision matrix
𝑹\boldsymbol{R} The cost matrix on control command

In this section, we first briefly introduce the fundamental learning from demonstration method used in uni-manual scenarios (Sec. II-A), which consists of two parts: demonstration representation and motion reproduction or generation. We add the concept of relative parameterization to these two parts so that both the process of representation learning (Sec. II-B) and the process of control (Sec. II-C) take into account the bimanual coordination characteristics in the demonstration data. These two methods can be used independently or jointly. A feasible weighting approach is also proposed to increase the importance of coordination characteristics in the representation and control (Sec. II-D). The whole framework that is illustrated by a leader-follower example is shown in Fig. 2.

II-A Demonstration Representation and Motion Generation

Learning from demonstration method is a bridge between humans and robots, which is required to have the ability to extract the characteristics of human skills, plan the trajectories, and control the robot to perform similar skills. Thus, it is necessary to combine human skill learning and robot motion planning and control together in the same encoding approach. A popular way is to link them in the form of probability, like Hidden Markov Model (HMM) and Gaussian Mixture Model (GMM). Besides, considering that the application scenarios of service robots are unstructured and need to adapt to changing situations, a class of task-parameterized models is proposed to address this problem [11]. The task parameters are variables describing the task-specific situation, like the position of an object in a pick-and-place task. By contrary, some task-independent information can also be extracted from the demonstration data, which reflects the nature of the skill itself, namely skill parameters. The concept of task-parameterized models is to observe the skill in multiple frames, like from starting points and ending points, and describe the impedance of the systems by variations and correlations with a linear quadratic regulator, which can then be used to control the robot.

Task-parameterized Gaussian Mixture Model (TP-GMM) is a typical method that probabilistically encodes datapoints, and the relevance of candidate frames PP by mixture models, which has good generalization capability [12]. Formally, if we define the task parameters as {𝒃j,𝑨j}j=1P\left\{\boldsymbol{b}_{j},\boldsymbol{A}_{j}\right\}_{j=1}^{P}, the demonstrations 𝝃\boldsymbol{\xi} can be observed as 𝜻j=𝑨j1(𝝃𝒃j)\boldsymbol{\zeta}_{j}=\boldsymbol{A}_{j}^{-1}\left(\boldsymbol{\xi}-\boldsymbol{b}_{j}\right) in each frame jj. These transformed demonstrations are then represented as GMM {π(k),{𝝁j(k),𝚺j(k)}j=1P}k=1K\{\pi^{(k)},\{\boldsymbol{\mu}_{j}^{(k)},\boldsymbol{\Sigma}_{j}^{(k)}\}_{j=1}^{P}\}_{k=1}^{K} by log-likehood maximization, where π(k)\pi^{(k)} refers to prior probability of kk-th Gaussian component, 𝝁j(k)\boldsymbol{\mu}_{j}^{(k)} and 𝚺j(k)\boldsymbol{\Sigma}_{j}^{(k)} refer to mean and covariance matrix of the kk-th Gaussian in frame jj. We can regard these Gaussian components in multiple frames as skill parameters that can be transferred following the change of task parameters. For instance, if a new situation is given by task parameters {𝒃^j,𝑨^j}j=1P\{\boldsymbol{\hat{b}}_{j},\boldsymbol{\hat{A}}_{j}\}_{j=1}^{P}, a new task-specific GMM can be generated by Product of Expert (PoE):

𝒩(𝝂^(k),𝚪^(k))j=1P𝒩(𝝂j(k),𝚪j(k))\displaystyle\mathcal{N}\left(\boldsymbol{\hat{\nu}}^{(k)},\boldsymbol{\hat{\Gamma}}^{(k)}\right)\propto\prod_{j=1}^{P}\mathcal{N}\left(\boldsymbol{\nu}_{j}^{(k)},\boldsymbol{\Gamma}_{j}^{(k)}\right) (1)

where 𝝂j(k)=𝑨j𝝁j(k)+𝒃j,𝚪j(k)=𝑨j𝚺j(k)𝑨j\boldsymbol{\nu}_{j}^{(k)}=\boldsymbol{A}_{j}\boldsymbol{\mu}_{j}^{(k)}+\boldsymbol{b}_{j},\boldsymbol{\Gamma}_{j}^{(k)}=\boldsymbol{A}_{j}\boldsymbol{\Sigma}_{j}^{(k)}\boldsymbol{A}_{j}^{\top}.The result of the Gaussian product is given analytically by

𝚪^(k)=(j=1P𝚪^j(k))11,𝝂^(k)=𝚪^(k)j=1P𝚪j(k)𝝂j(k)1\displaystyle\boldsymbol{\hat{\Gamma}}^{(k)}=\left(\sum_{j=1}^{P}\boldsymbol{\hat{\Gamma}}_{j}^{(k)}{}^{-1}\right)^{-1},\quad\boldsymbol{\hat{\nu}}^{(k)}=\boldsymbol{\hat{\Gamma}}^{(k)}\sum_{j=1}^{P}\boldsymbol{\Gamma}_{j}^{(k)}{}^{-1}\boldsymbol{\nu}_{j}^{(k)} (2)

For generating robot motion from GMM, optimal control methods like Linear Quadratic Regulator (LQR) and Linear Quadratic Tracking (LQT) can be used as planning and control methods. Here we give the classical form of LQT as follows:

cost=(𝝂^𝒙)𝑸(𝝂^𝒙)+𝒖𝑹𝒖\displaystyle cost=\left(\boldsymbol{\hat{\nu}}-\boldsymbol{x}\right)^{\top}\boldsymbol{Q}\left(\boldsymbol{\hat{\nu}}-\boldsymbol{x}\right)+\boldsymbol{u}^{\top}\boldsymbol{R}\boldsymbol{u} (3)

where 𝝂^\boldsymbol{\hat{\nu}} is the mean matrices of the task-specific GMM obtained by the previous PoE process.

Assume that the system evolution is linear,

xt+1=𝑨sxt+𝑩sut\displaystyle x_{t+1}=\boldsymbol{A}_{s}x_{t}+\boldsymbol{B}_{s}u_{t} (4)

where 𝑨s,𝑩s\boldsymbol{A}_{s},\boldsymbol{B}_{s} are coefficients for this system. Then, the relationship between the control command and the robot states can be described in the matrix as 𝒙=𝑺xx1+𝑺u𝒖\boldsymbol{x}=\boldsymbol{S}_{x}x_{1}+\boldsymbol{S}_{u}\boldsymbol{u}, where 𝑺xDT×D\boldsymbol{S}_{x}\in\mathbb{R}^{DT\times D} and 𝑺uDT×D(T1)\boldsymbol{S}_{u}\in\mathbb{R}^{DT\times D(T-1)} are the matrix form combination of 𝑨s,𝑩s\boldsymbol{A}_{s},\boldsymbol{B}_{s}. More details can be found in the appendix of [12].

Here we just consider an open loop controller, which solution can be given analytically by

𝒖^=(𝑺u𝑸𝑺u+𝑹)1𝑺u𝑸(𝝂^𝑺xx1)\small\hat{\boldsymbol{u}}=\left(\boldsymbol{S}_{u}^{\top}\boldsymbol{Q}\boldsymbol{S}_{u}+\boldsymbol{R}\right)^{-1}\boldsymbol{S}_{u}^{\top}\boldsymbol{Q}\left(\boldsymbol{\hat{\nu}}-\boldsymbol{S}_{x}x_{1}\right) (5)

with a residual as 𝚺^u=(𝑺u𝑸𝑺u+𝑹)1\hat{\boldsymbol{\Sigma}}_{u}=\left(\boldsymbol{S}_{u}^{\top}\boldsymbol{Q}\boldsymbol{S}_{u}+\boldsymbol{R}\right)^{-1}.

II-B Representation with Relative Parameterization

In the bimanual setting, coordination is reflected at the data level as some characteristics of the relative motion of the arms. For instance, for a bimanual box-lifting task, this characteristic manifests itself as the arms move from free movement to a fixed relative relationship and maintain this relationship for a certain time frame. For a leader-follower task like stir-fry [9], the characteristic refers to the following arm (holding the spoon) motion, and its periodicity is determined with reference to the leading arm (holding the pot). In this work, instead of pre-defining the roles between the arms (as leader or follower), we aim to describe the relative relationship between the arms in a more general way: let the arms parameterize each other.

Formally, we define another frame that takes the trajectory of the other arm as dynamic task parameters and represents the relative relationship as GMM as well. Different from the observation perspectives built with a fixed pose, the transformation matrices 𝑨c,t,𝒃c,t\boldsymbol{A}_{c,t},\boldsymbol{b}_{c,t} are dynamic that change with the motion of the other arm. The relative motion is described as 𝜻c=𝑨c,t1(𝝃𝒃c,t)\boldsymbol{\zeta}_{c}=\boldsymbol{A}_{c,t}^{-1}\left(\boldsymbol{\xi}-\boldsymbol{b}_{c,t}\right) and represented by {π(k),𝝁c(k),𝚺c(k)}k=1K\{\pi^{(k)},\boldsymbol{\mu}_{c}^{(k)},\boldsymbol{\Sigma}_{c}^{(k)}\}_{k=1}^{K}. For each arm hh, the task-specific GMM obtained by PoE

𝒩(𝝂^(k),𝚪^(k))j=1P𝒩(𝝂j(k),𝚪j(k))𝒩(𝝂c(k),𝚪c(k))\displaystyle\mathcal{N}\left(\boldsymbol{\hat{\nu}}^{(k)},\boldsymbol{\hat{\Gamma}}^{(k)}\right)\propto\prod_{j=1}^{P}\mathcal{N}\left(\boldsymbol{\nu}_{j}^{(k)},\boldsymbol{\Gamma}_{j}^{(k)}\right)\cdot\mathcal{N}\left(\boldsymbol{\nu}_{c}^{(k)},\boldsymbol{\Gamma}_{c}^{(k)}\right) (6)

where 𝝂c(k)=𝑨c,t𝝁c(k)+𝒃c,t,𝚪j(k)=𝑨c,t𝚺j(k)𝑨c,t\boldsymbol{\nu}_{c}^{(k)}=\boldsymbol{A}_{c,t}\boldsymbol{\mu}_{c}^{(k)}+\boldsymbol{b}_{c,t},\boldsymbol{\Gamma}_{j}^{(k)}=\boldsymbol{A}_{c,t}\boldsymbol{\Sigma}_{j}^{(k)}\boldsymbol{A}_{c,t}^{\top}.

Such a relative parameterization entangles the representation of bimanual arms together, letting them consider each other by constructing time-varying mutual observing perspectives. This brings two useful functions:

  • Generate the motion of one arm based on a given motion of the other one in a leader-follower manner.

  • Generate bimanual motions to adapt to new situations simultaneously in a synergistic manner.

Refer to caption

Figure 3: The upper and lower rows show the effect of the proposed relative parameterization method on bimanual coordination learning with 2-dim and 3-dim synthetic data, respectively. In each example, three synthetic bimanual motions are given as demonstrations and use the relative parameterization method to extract and construct a coordination relationship. The right column is the motion result under the new task parameters generated by the proposed method, which are compared with motion generation without coordination (trajectories in light colors).

For instance, if the left arm motion 𝝃l\boldsymbol{\xi}_{l} is pre-defined or adjusted to new situations by other methods like Dynamic Movement Primitive (DMP) in [9], a corresponding right arm motion that considers the spatial-temporal coordination implicit in the demonstration can be generated by gaining the dynamic relative parameters 𝑨c,t,𝒃c,t\boldsymbol{A}_{c,t},\boldsymbol{b}_{c,t} from 𝝃l\boldsymbol{\xi}_{l}. Then we can obtain a task-and-coordination-specific GMM of the right arm for further motion generation and control.

For generating bimanual motions synergistically, since the bimanual motions are unknown at the beginning, the relative parameterization cannot be established. Thus, we first use the product of GMMs in other reference systems to generate the independent motions of arms and then use these motions as the relative frame of the other arm to embed learned coordination iteratively.

II-C Control with Relative Parameterization

Coordination relationships can also be embedded when generating trajectories and corresponding control commands from GMM. Let the cost function of the vanilla LQT controller in Equ. 3 be 𝒞vanilla\mathcal{C}_{vanilla}. The composition cost function that takes coordination into account can then be written as

𝒞=hH𝒞vanillah+(𝝂c𝒙c)𝑸c(𝝂c𝒙c)\displaystyle\mathcal{C}=\sum_{h}^{H}\mathcal{C}_{vanilla}^{h}+\left(\boldsymbol{\nu}_{c}-\boldsymbol{x}_{c}\right)^{\top}\boldsymbol{Q}_{c}\left(\boldsymbol{\nu}_{c}-\boldsymbol{x}_{c}\right) (7)

By setting a similar linear system like Equ. 4, the composition cost function can rewrite the cost function as

𝒞=\displaystyle\mathcal{C}= hH[(𝝂^uh𝒖h)𝛀uh(𝝂^uh𝒖h)+𝒖h𝑹h𝒖h]\displaystyle\sum_{h}^{H}\Big{[}\left(\boldsymbol{\hat{\nu}}^{h}_{u}-\boldsymbol{u}^{h}\right)^{\top}\boldsymbol{\Omega}_{u}^{h}\left(\boldsymbol{\hat{\nu}}^{h}_{u}-\boldsymbol{u}^{h}\right)+{\boldsymbol{u}^{h}}^{\top}\boldsymbol{R}^{h}\boldsymbol{u}^{h}\Big{]} (8)
+(𝝂u,c𝒖c)𝛀u,c(𝝂u,c𝒖c)\displaystyle+\left(\boldsymbol{\nu}_{u,c}-\boldsymbol{u}_{c}\right)^{\top}\boldsymbol{\Omega}_{u,c}\left(\boldsymbol{\nu}_{u,c}-\boldsymbol{u}_{c}\right)

where 𝝂^uh=𝑺u1(𝝂^h𝑺xx1)\boldsymbol{\hat{\nu}}^{h}_{u}=\boldsymbol{S}_{u}^{-1}\left(\boldsymbol{\hat{\nu}}^{h}-\boldsymbol{S}_{x}x_{1}\right) and 𝛀u=𝑺u𝑸𝑺u\boldsymbol{\Omega}_{u}=\boldsymbol{S}_{u}^{\top}\boldsymbol{Q}\boldsymbol{S}_{u}. 𝝂^u,c\boldsymbol{\hat{\nu}}_{u,c} and 𝛀u,c\boldsymbol{\Omega}_{u,c} share the similar transformation.

Since there exists multivariate (𝒖h,𝒖c\boldsymbol{u}^{h},\boldsymbol{u}_{c}), we cannot directly change this sum of quadratic error terms into PoE. Thus, we set a unified vector 𝑼DT×H\boldsymbol{U}\in\mathbb{R}^{DT\times H} for representing the control command of the whole system, and a binary coordination matrix 𝑪DT×DTH,𝑪=[𝑪1,,𝑪H]\boldsymbol{C}\in\mathbb{R}^{DT\times DTH},\boldsymbol{C}=[\boldsymbol{C}^{1},\ldots,\boldsymbol{C}^{H}]. For convenience, we set [𝑪h]=[𝟎,,𝑪h,,𝟎][\boldsymbol{C}^{h}]=[\mathbf{0},\ldots,\boldsymbol{C}^{h},\ldots,\mathbf{0}], then we can continue to rewrite the cost function as

𝒞=\displaystyle\mathcal{C}= hH[(𝝂^uh[𝑪h]𝑼)𝛀uh(𝝂^uh[𝑪h]𝑼)\displaystyle\sum_{h}^{H}\Big{[}\left(\boldsymbol{\hat{\nu}}^{h}_{u}-[\boldsymbol{C}^{h}]\boldsymbol{U}\right)^{\top}\boldsymbol{\Omega}_{u}^{h}\left(\boldsymbol{\hat{\nu}}^{h}_{u}-[\boldsymbol{C}^{h}]\boldsymbol{U}\right) (9)
+𝑼[𝑪h]𝑹h[𝑪h]𝑼]\displaystyle+\boldsymbol{U}^{\top}[\boldsymbol{C}^{h}]^{\top}\boldsymbol{R}^{h}[\boldsymbol{C}^{h}]\boldsymbol{U}\Big{]}
+(𝝂u,c𝑪𝑼)𝛀u,c(𝝂u,c𝑪𝑼)\displaystyle+\left(\boldsymbol{\nu}_{u,c}-\boldsymbol{C}\boldsymbol{U}\right)^{\top}\boldsymbol{\Omega}_{u,c}\left(\boldsymbol{\nu}_{u,c}-\boldsymbol{C}\boldsymbol{U}\right)

Set 𝛀Uh=[𝑪h]𝛀uh[𝑪h]\boldsymbol{\Omega}_{U}^{h}=[\boldsymbol{C}^{h}]^{\top}\boldsymbol{\Omega}_{u}^{h}[\boldsymbol{C}^{h}], 𝑹Uh=[𝑪h]𝑹h[𝑪h]\boldsymbol{R}_{U}^{h}=[\boldsymbol{C}^{h}]^{\top}\boldsymbol{R}^{h}[\boldsymbol{C}^{h}], 𝝂^Uh=[𝑪h]1𝝂^uh\boldsymbol{\hat{\nu}}^{h}_{U}=[\boldsymbol{C}^{h}]^{-1}\boldsymbol{\hat{\nu}}^{h}_{u}, 𝝂U,c=𝑪1𝝂u,c\boldsymbol{\nu}_{U,c}=\boldsymbol{C}^{-1}\boldsymbol{\nu}_{u,c}, the composition cost function is simplified as

𝒞=\displaystyle\mathcal{C}= hH[(𝝂^Uh𝑼)𝛀Uh(𝝂^Uh𝑼)+𝑼𝑹Uh𝑼]\displaystyle\sum_{h}^{H}\Big{[}\left(\boldsymbol{\hat{\nu}}^{h}_{U}-\boldsymbol{U}\right)^{\top}\boldsymbol{\Omega}_{U}^{h}\left(\boldsymbol{\hat{\nu}}^{h}_{U}-\boldsymbol{U}\right)+\boldsymbol{U}^{\top}\boldsymbol{R}^{h}_{U}\boldsymbol{U}\Big{]} (10)
+(𝝂U,c𝑼)𝛀U,c(𝝂U,c𝑼)\displaystyle+\left(\boldsymbol{\nu}_{U,c}-\boldsymbol{U}\right)^{\top}\boldsymbol{\Omega}_{U,c}\left(\boldsymbol{\nu}_{U,c}-\boldsymbol{U}\right)

Then we can finally change this sum of quadratic error terms into PoE

𝒩\displaystyle\mathcal{N} (𝑼^,𝚺^U)\displaystyle\left({\boldsymbol{\hat{U}}},\boldsymbol{\hat{\Sigma}}_{U}\right)\propto (11)
hH[𝒩(0,𝑹Uh1)𝒩(𝝂^Uh,𝛀Uh1)]𝒩(𝝂U,c,𝛀U,c1)\displaystyle\prod_{h}^{H}\left[\mathcal{N}\left(0,{\boldsymbol{R}_{U}^{h}}^{-1}\right)\mathcal{N}\left(\boldsymbol{\hat{\nu}}^{h}_{U},{\boldsymbol{\Omega}_{U}^{h}}^{-1}\right)\right]\mathcal{N}\left(\boldsymbol{\nu}_{U,c},{\boldsymbol{\Omega}_{U,c}}^{-1}\right)

Refer to caption


Figure 4: This figure shows the effect of the proposed relative parameterization method in the real bimanual robot manipulation with the example of palletizing. The task parameters in the robot execution process are different from those in human demonstrations, which reflects the generalizability of the proposed method. Meanwhile, the ability to maintain the synergistic coordination in generalized motions is the core contribution of this work.

The result can be written as

𝚺^U\displaystyle\boldsymbol{\hat{\Sigma}}_{U} =(hH[𝛀Uh+𝑹Uh]+𝛀U,c)1\displaystyle=\left(\sum_{h}^{H}\left[\boldsymbol{\Omega}_{U}^{h}+\boldsymbol{R}_{U}^{h}\right]+\boldsymbol{\Omega}_{U,c}\right)^{-1} (12)
𝑼^\displaystyle\boldsymbol{\hat{U}} =𝚺^U(hH𝛀Uh𝝂^Uh+𝛀U,c𝝂U,c)\displaystyle=\boldsymbol{\hat{\Sigma}}_{U}\left(\sum_{h}^{H}\boldsymbol{\Omega}_{U}^{h}\boldsymbol{\hat{\nu}}^{h}_{U}+\boldsymbol{\Omega}_{U,c}\boldsymbol{\nu}_{U,c}\right)

By using the binary coordination matrix 𝑪\boldsymbol{C}, we can extract the coordinated control commands and motions from 𝑼^\boldsymbol{\hat{U}}.

II-D Weighted Relative Parameterization

A feasible variant of the above methods is to introduce weight coefficients σ\sigma to adjust the influence of coordination relationship in representation and control.

For the GMM representation,

𝒩(𝝂^(k),𝚪^(k))j=1P𝒩(𝝂j(k),𝚪j(k))[𝒩(𝝂c(k),𝚪c(k))]σ\displaystyle\mathcal{N}\left(\boldsymbol{\hat{\nu}}^{(k)},\boldsymbol{\hat{\Gamma}}^{(k)}\right)\propto\prod_{j=1}^{P}\mathcal{N}\left(\boldsymbol{\nu}_{j}^{(k)},\boldsymbol{\Gamma}_{j}^{(k)}\right)\cdot\left[\mathcal{N}\left(\boldsymbol{\nu}_{c}^{(k)},\boldsymbol{\Gamma}_{c}^{(k)}\right)\right]^{\sigma} (13)

For the LQT controller,

𝚺^U\displaystyle\boldsymbol{\hat{\Sigma}}_{U} =(hH[𝛀Uh+𝑹Uh]+σ𝛀U,c)1\displaystyle=\left(\sum_{h}^{H}\left[\boldsymbol{\Omega}_{U}^{h}+\boldsymbol{R}_{U}^{h}\right]+{\sigma}\cdot\boldsymbol{\Omega}_{U,c}\right)^{-1} (14)
𝑼^\displaystyle\boldsymbol{\hat{U}} =𝚺^U(hH𝛀Uh𝝂^Uh+σ𝛀U,c𝝂U,c)\displaystyle=\boldsymbol{\hat{\Sigma}}_{U}\left(\sum_{h}^{H}\boldsymbol{\Omega}_{U}^{h}\boldsymbol{\hat{\nu}}^{h}_{U}+{\sigma}\cdot\boldsymbol{\Omega}_{U,c}\boldsymbol{\nu}_{U,c}\right)

III Experiments

III-A Setup

The effectiveness of the proposed method is illustrated by learning through both synthetic motions and real demonstration motions. Some pre-designed coordinated motions can show the coordination explicitly, which is meant to demonstrate the performance of the method.

Synthetic motions: The synthetic motions were created via Bézier curves, where bimanual arms depart from a distance and meet at the same point. This kind of motion often occurs in some daily activities that require both arms to grasp, carry or pick up something simultaneously. We provide both two-dimensional and three-dimensional data to show the dimension scalability, as shown in Fig. 3.

Real demonstration motions: We also provide demonstrations of two real tasks to show the effect in bimanual robot manipulation. The palletizing example shown in Fig. 4 represents a class of synergistic coordinated motions and tasks, while the pouring example shown in 2 is a typical bimanual coordination task in the leader-follower manner.

III-B Demonstration collection

The human demonstration data was collected via Optitrack. The demonstrator attached two groups of markers on his hands for detection by Optitrack. Each group of markers contains four individual markers, which are required to determine the pose of each arm. These four markers will be detected via six Optitrack cameras to record two end-effector trajectories with both position and orientation. We chose the poses from the centers of each marker group to reproduce human bimanual demonstration motions. In addition, the box and the two cups each have a set of four markers for recording object motions. The raw data were pre-processed by our open-source toolbox [13] to extract the valuable information and separate it into multiple demonstrations visually. Each demonstration will have seven pose values for each marker group.

III-C Coordination learning performance analysis

The goal of synthetic motions is that bimanual arms should meet in the same pose, whether in 2-dim or 3-dim. As shown in the left column of Fig. 3, we provide three bimanual motions as demonstrations for each synthetic example. These motions start and end from different positions but move in a similar style. The middle column, with multiple small figures, shows the process of using the proposed relative parameterization method. We use three observation frames to parameterize the motions of each arm, from the start points, endpoints, and a dynamic relative observation frame depending on the other arm. We can extract and construct coordination relationships from this parameterization from demonstration data. The parameterized coordination is then used in motion generation and control in new situations with different task parameters. Keeping the same coordination relationship in these generalized motions is required to achieve some specific bimanual tasks. Thus, the generalized motion generation results are shown in the right column of Fig. 3. In the 2-dim example, bimanual motions are required to meet at a new position, (5,5)(5,5). In the 3-dim example, this new meeting point is set to (5,8,5)(5,8,5). The generated motions with learned coordination are shown in red and blue, while we also provide a comparison with generated motions without coordination (in light red and blue). By comparison, we find that just regarding bimanual arms as a simple combination of two single arms is insufficient for bimanual tasks. It is necessary to parameterize coordination relationship no matter in a leader-follower or synergistic manner; this is the key to achieving bimanual tasks mostly.

III-D Real robot experiment

We adopt the self-designed humanoid CURI robot for real robot experiments to perform the bimanual motions. Since this work focuses on learning and generalizing coordinated motion, task parameters such as start and end points and object poses are obtained through the Optitrack system. As shown in Fig. 4, we paste four markers on the box to be transported and the box as a destination to facilitate obtaining their poses in the world coordinate system. Meanwhile, four fixed connected markers are also on the back of the CURI robot. The coordinated human hand motions are learned by relative parameterization. Then we use this parameterized coordination model to generate motions that adapt to new object poses and destinations. It is worth mentioning that, unlike the observation frames used in the synthetic data, we set five observation frames to transport this palletizing task, namely from start points, end points, center poses of the transport box, and the center pose of the destination box. This allows the robot to move from an initial pose with its arms outstretched to the sides of the box, carry the box and place it in the target position, and then release the box. Besides, the result of the pouring example can be found in Fig. 2. A self-designed impedance controller supports the execution of the CURI robot, and the trajectories are converted to joint space commands via its inverse kinematics model.

IV Discussion

This work still has some limitations. First, the proposed relative parameterization method is only applied to trajectories in Cartesian space without considering joint space coordination. Learning joint-space bimanual coordination or even whole-body coordination from human demonstrations remains an open problem. Some previous work can be found in [14]. Besides, the method based on the Gaussian mixture model will take a certain amount of time when processing high-frequency sampling demonstration data, which might affect the actual real-time usage. Some improvements using Tensor instead of large sparse matrices can be found in [15].

V Conclusion

In this work, we propose a method for parameterizing coordination in bimanual tasks by probabilistic relative motion relationship of bimanual arms from human demonstration and guiding the robot motion generation in new situations. By embedding relative motion relationship, bimanual motions can be generated in a leader-follower manner and also synergistic manner. We provide a detailed formulation derivation process and demonstrate the effectiveness of the proposed method in coordination learning with some synthetic data with prominent coordination characteristics. We also deploy the method on a real humanoid robot to perform coordination motions to show its generalization in new situations. We believe that this easy-to-use bimanual LfD method can be used as a robust demonstration data augmentation method for training robot large manipulation model [16], and we will do research to show this potential in the future.

References

  • [1] K. Yao, D. Sternad, and A. Billard, “Hand pose selection in a bimanual fine-manipulation task,” Journal of Neurophysiology, vol. 126, no. 1, pp. 195–212, 2021.
  • [2] J. Lee and P. H. Chang, “Redundancy resolution for dual-arm robots inspired by human asymmetric bimanual action: Formulation and experiments,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 6058–6065, IEEE, 2015.
  • [3] L. Shi, S. Kayastha, and J. Katupitiya, “Robust coordinated control of a dual-arm space robot,” Acta Astronautica, vol. 138, pp. 475–489, 2017.
  • [4] B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” Robotics and autonomous systems, vol. 57, no. 5, pp. 469–483, 2009.
  • [5] P. Pastor, H. Hoffmann, T. Asfour, and S. Schaal, “Learning and generalization of motor skills by learning from demonstration,” in 2009 IEEE International Conference on Robotics and Automation, pp. 763–768, IEEE, 2009.
  • [6] L. P. Ureche and A. Billard, “Constraints extraction from asymmetrical bimanual tasks and their use in coordinated behavior,” Robotics and autonomous systems, vol. 103, pp. 222–235, 2018.
  • [7] E. Gribovskaya and A. Billard, “Combining dynamical systems control and programmingby demonstration for teaching discrete bimanual coordination tasks to a humanoid robot,” in Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction, pp. 33–40, 2008.
  • [8] F. Krebs and T. Asfour, “A bimanual manipulation taxonomy,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 11031–11038, 2022.
  • [9] J. Liu, Y. Chen, Z. Dong, S. Wang, S. Calinon, M. Li, and F. Chen, “Robot cooking with stir-fry: Bimanual non-prehensile manipulation of semi-fluid objects,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5159–5166, 2022.
  • [10] Z. Sun, Z. Wang, J. Liu, M. Li, and F. Chen, “Mixline: A hybrid reinforcement learning framework for long-horizon bimanual coffee stirring task,” in International Conference on Intelligent Robotics and Applications, pp. 627–636, Springer, 2022.
  • [11] S. Calinon, T. Alizadeh, and D. G. Caldwell, “On improving the extrapolation capability of task-parameterized movement models,” in 2013 IEEE/RSJ international conference on intelligent robots and systems, pp. 610–616, IEEE, 2013.
  • [12] S. Calinon, “A tutorial on task-parameterized movement learning and retrieval,” Intelligent service robotics, vol. 9, no. 1, pp. 1–29, 2016.
  • [13] J. Liu, C. Li, D. Delehelle, Z. Li, and F. Chen, “Rofunc: The full process python package for robot learning from demonstration and robot manipulation,” June 2023.
  • [14] J. Silvério, S. Calinon, L. Rozo, and D. G. Caldwell, “Bimanual skill learning with pose and joint space constraints,” in 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids), pp. 153–159, IEEE, 2018.
  • [15] S. Shetty, J. Silvério, and S. Calinon, “Ergodic exploration using tensor train: Applications in insertion tasks,” IEEE Transactions on Robotics, vol. 38, no. 2, pp. 906–921, 2021.
  • [16] J. Liu, Z. Li, S. Calinon, and F. Chen, “Softgpt: Learn goal-oriented soft object manipulation skills by generative pre-trained heterogeneous graph transformer,” arXiv preprint arXiv:2306.12677, 2023.