This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Variable-Frequency Model Learning and Predictive Control for Jumping Maneuvers on Legged Robots

Chuong Nguyen, Abdullah Altawaitan, Thai Duong, Nikolay Atanasov, and Quan Nguyen Manuscript received: July 18, 2024; Revised October 20, 2024; Accepted November 19, 2024.This paper was recommended for publication by Editor Jaydev P. Desai upon evaluation of the Associate Editor and Reviewers’ comments.Chuong Nguyen and Quan Nguyen are with the Department of Aerospace and Mechanical Engineering, University of Southern California, Los Angeles, CA 90007, USA, e-mails: {vanchuong.nguyen,quann}@usc.edu.Abdullah Altawaitan, Thai Duong, and Nikolay Atanasov are with the Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093, USA, e-mails: {aaltawaitan,tduong,natanasov}@ucsd.edu. A. Altawaitan is also affiliated with Kuwait University as a holder of a scholarship.Digital Object Identifier (DOI): see top of this page.
Abstract

Achieving both target accuracy and robustness in dynamic maneuvers with long flight phases, such as high or long jumps, has been a significant challenge for legged robots. To address this challenge, we propose a novel learning-based control approach consisting of model learning and model predictive control (MPC) utilizing a variable-frequency scheme. Compared to existing MPC techniques, we learn a model directly from experiments, accounting not only for leg dynamics but also for modeling errors and unknown dynamics mismatch in hardware and during contact. Additionally, learning the model with variable-frequency allows us to cover the entire flight phase and final jumping target, enhancing the prediction accuracy of the jumping trajectory. Using the learned model, we also design variable-frequency to effectively leverage different jumping phases and track the target accurately. In a total of 9292 jumps on Unitree A1 robot hardware, we verify that our approach outperforms other MPCs using fixed-frequency or nominal model, reducing the jumping distance error 282-8 times. We also achieve jumping distance errors of less than 3%3\% during continuous jumping on uneven terrain with randomly-placed perturbations of random heights (up to 44 cm or 27%27\% the robot’s standing height). Our approach obtains distance errors of 121-2 cm on 3434 single and continuous jumps with different jumping targets and model uncertainties. Code is available at https://github.com/DRCL-USC/Learning_MPC_Jumping.

Index Terms:
Model Learning for Control, Legged Robots, Whole-Body Motion Planning and Control

I Introduction

Aggressive jumping maneuvers with legged robots have received significant attention recently, and have been successfully demonstrated using trajectory optimization [QuannICRA19, chuongjump3D, matthew_mit2021_1, ChignoliICRA2021], model-based control [YanranDingTRO, continuous_jump_bipedal, GabrielICRA2021, park2017high, ZhitaoIROS22, fullbody_MPC], and learning-based control [zhuang2023robot, RL_jump_bipedal, yang2023cajun, vassil_drl, jumpingrl, lokesh_ogmp]. Unlike walking or running, aggressive motions are particularly challenging due to (1) the extreme underactuation in the mid-air phase (the robot mainly relies on the force control during contact to regulate its global position and orientation), (2) significant dynamics model error and uncertainty that are inherently hard to model accurately, especially with contact and hardware load during extreme motions, and (3) the trade-off between model accuracy and efficient computation for real-time execution. Achieving both target accuracy and robustness for long-flight maneuvers, such as high or long jumps, therefore still presents an open challenge. In this work, we address this challenge by developing a real-time MPC for quadruped jumping using a robot dynamics model learned from experiments.

Many control and optimization techniques have been developed for jumping motions. Trajectory optimization (TO) with full-body nonlinear dynamics is normally utilized to generate long-flight trajectories in an offline fashion (e.g., [QuannICRA19, chuongjump3D]). Many MPC approaches sacrifice model accuracy to achieve robust maneuvers by using simplified models that treat the trunk and legs as a unified body [NMPC_3D_hopping, ZhitaoIROS22, GabrielICRA2021, park2017high, YanranDingTRO]. Our recent iterative learning control (ILC) work [chuong_ilc_jump] handles model uncertainty to realize long-flight jumps via multi-stage optimization that optimizes the control policy after each jump offline until reaching a target accurately. It also relies on a simplified model for computational efficiency. However, this work focuses on target jumping rather than robustness, e.g. requiring the same initial condition for all trials. Different from existing works, we learn a robot dynamics model from experiments and develop real-time MPC using the learned dynamics to achieve both target accuracy and robustness in continuous quadruped jumping.

Refer to caption
Figure 1: A Unitree A1 robot performs continuous jumps on unknown uneven terrain, achieving both target accuracy and robustness. The target distance for each jump is 0.60.6 m. The flight phase covers a vertical height of up to 4×4\times the robot’s normal height. MPC with a nominal single rigid-body model is used to collect data for training a neural-network residual dynamics model following using a variable-frequency scheme. The learned model is then used in a variable-frequency MPC to execute jumping motions. The variable-frequency scheme varies the time step sizes in the contact phase (Δt𝒞\Delta t_{{\cal C}}) and flight phase (Δt\Delta t_{{\cal F}}) to dedicate more model capacity and more MPC optimization steps to the contact phase, which improves the jumping robustness and accuracy. The green dashed line is the actual robot trajectory. Supplemental video: https://youtu.be/yUqI_MBOC6Q.

Learning robot models from experiments to capture complex dynamics effects and account for model errors with real hardware has become a popular approach [bauersfeld2021neurobem, hewing2019cautious, saviolo2022physics, duong23porthamiltonian]. Many frameworks learn residual dynamics using neural networks [bauersfeld2021neurobem, salzmann2023real] or Gaussian processes [hewing2019cautious, cao2017gaussian] and use MPC to control the learned systems [salzmann2023real, pohlodek2022hilompc] to improve tracking errors. Most existing works, however, primarily investigate dynamical systems without contact. Recently, Sun et al. [sun2021online] proposed a notable method to learn a linear residual model, addressing unknown dynamics during walking stabilization. Pandala et al. [Pandala_robustMPC] propose to close the gap between reduced- and full-order models by using deep reinforcement learning to learn unmodeled dynamics, which are used in MPC to realize robust walking on various terrains. Meanwhile, we use supervised learning to learn unmodelled dynamics with real data from hardware. Also, we learn dynamics for maneuvers with long-flight periods to tackle (1) the switching between multiple dynamic models due to contact, including flight phase where control action has very little effect on the body dynamics, (2) disturbances and uncertainty in dynamic modeling due to hard impact in jumping, and (3) the effect of intermittent control at contact transitions on state predictions.

In this letter, we propose a residual learning model that uses a variable-frequency scheme, i.e., varying the coarseness of integration time step, to address the aforementioned challenges and enhance long-term trajectory predictions over different jumping phases. MPC has been commonly used for jumping by optimizing the control inputs during the contact phase based on future state prediction during the flight phase (e.g.,[GabrielICRA2021, fullbody_MPC, park2017high, YanranDingTRO, continuous_jump]). A major challenge in MPC for long flight maneuvers is to utilize a limited number of prediction steps yet still effectively cover the entire flight phase and especially the final jumping target. Another challenge is to obtain an accurate model for complex dynamic maneuvers, unknown dynamics, and model mismatch with the real hardware to improve the jumping accuracy, while still ensuring real-time performance. Some recent works have addressed these challenges partially. Many methods use conventional single-rigid-body dynamic (SRBD) models, ignoring the leg dynamics, to achieve real-time execution [GabrielICRA2021, continuous_jump_bipedal, YanranDingTRO]. Using the SRBD model in the contact phase can lead to inaccurate predictions of the robot’s state at take-off, which is the initial condition of the projectile motion in the flight phase. Thus, the trajectory prediction error can accumulate significantly over long flight periods. Some methods account for leg inertia [ZiyiZhouRAL, He2024, fullbody_MPC], however, disturbance and uncertainty in dynamic modeling have not been considered. Recently, planning with multi-fidelity models and multi-resolution time steps has emerged as an effective strategy to enhance target accuracy and robustness [heli_cafempc, Norby_adaptivempc, Heli_hierarchympc]. Li et al. [Heli_hierarchympc] adopt a less accurate model in the far horizon and a more accurate but expensive model in the near future. Norby et al. [Norby_adaptivempc] adapt model complexity based on the task complexity along the prediction horizon. Li et al. [Heli_hierarchympc] also design multi-resolution time steps to cover the whole flight phase. While we vary the coarseness of the time step as [Heli_hierarchympc], we design a novel residual model learning approach combined with variable-frequency MPC to address the aforementioned challenges.

Contributions: The contributions of this letter are summarized as follows.

  • We learn a residual dynamics model directly from a small real experiment dataset. The model accounts for nonlinear leg dynamics, modeling errors, and unknown dynamics mismatch in hardware and during contact.

  • We propose learning the model in an variable-frequency scheme that leverages different time resolutions to capture the entire flight phase, the jumping target, and the contact transitions over a few-step horizon, thereby significantly improving the accuracy of long-term trajectory prediction.

  • We develop variable-frequency MPC using the learned model to synthesize controls that improve both target accuracy and robustness in dynamic robot maneuvers.

  • Extensive hardware experiments validate the effectiveness and robustness of our approach with single and consecutive jumps on uneven terrains. Comparisons with other MPC techniques using a nominal model or fixed-frequency are also provided.

Refer to caption
Figure 2: System Architecture. The learning procedure and MPC execution are paired with the same integration timesteps {Δt𝒞,Δt}\{\Delta t_{{\cal C}},\Delta t_{{\cal F}}\}, prediction horizon KK, and predefined contact schedule (cscs). The MPC and low-level joint PD controllers are updated at 40Hz40Hz and 1kHz1kHz, respectively.

II Problem Statement

Consider a legged robot modelled by state 𝐱k\mathbf{x}_{k}, consisting of the pose and velocity of the body’s center of mass (CoM), foot positions 𝐫k\mathbf{r}_{k} relative to the body’s CoM, and ground force control input 𝐮k\mathbf{u}_{k} at the legs, sampled at time tkt_{k}. We augment the nominal SRBD model (e.g., [GabrielICRA2021, park2017high, YanranDingTRO]) with a learned residual term to account for leg dynamics, model mismatch with the real robot hardware, as well as capture complex dynamic effects in contact and hardware:

𝐱k+1=𝒇(𝐱k,𝐮k,𝐫k,Δtk)+δ𝒇𝜽(𝐱k,𝐮k,𝐫k)Δtk,\mathbf{x}_{k+1}=\mbox{\boldmath$f$}\left(\mathbf{x}_{k},\mathbf{u}_{k},\mathbf{r}_{k},\Delta t_{k}\right)+\delta\mbox{\boldmath$f$}_{\bm{\theta}}\left(\mathbf{x}_{k},\mathbf{u}_{k},\mathbf{r}_{k}\right)\Delta t_{k}, (1)

where 𝒇f represents the nominal SRBD model, δ𝒇𝜽\delta\mbox{\boldmath$f$}_{\bm{\theta}} with parameters 𝜽\bm{\theta} approximates the residual dynamics, and Δtk\Delta t_{k} is the sampling interval. Our first objective is to learn the residual dynamics 𝒇𝜽\mbox{\boldmath$f$}_{\bm{\theta}} using a dataset of jumping trajectories.

Problem 1

Given a set 𝒟={t0:K(i),𝐱0:K(i),𝐫0:K(i),𝐮0:K(i)}i=1D{\cal D}\!=\!\{t_{0:K}^{(i)},\mathbf{x}^{(i)}_{0:K},\mathbf{r}^{(i)}_{0:K},\mathbf{u}^{(i)}_{0:K}\}_{i=1}^{D} of DD sequences of states, foot positions, and control inputs, find the parameters 𝛉\bm{\theta} of the residual dynamics δ𝐟𝛉\delta\mbox{\boldmath$f$}_{\bm{\theta}} in (1) by rolling out the dynamics model and minimizing the discrepancy between the predicted state sequence 𝐱~1:K(i)\tilde{\mathbf{x}}^{(i)}_{1:K} and the true state sequence 𝐱1:K(i)\mathbf{x}^{(i)}_{1:K} in 𝒟{\cal D}:

min𝜽\displaystyle\min_{\bm{\theta}}\; i=1Dk=1K(𝐱k(i),𝐱~k(i))+reg(𝜽)\displaystyle\sum_{i=1}^{D}\sum_{k=1}^{K}\mathcal{L}(\mathbf{x}^{(i)}_{k},\tilde{\mathbf{x}}^{(i)}_{k})+{\cal L}_{\text{reg}}(\bm{\theta})
s.t. 𝐱~k+1(i)=𝒇(𝐱~k(i),𝐮k(i),𝐫k(i),Δtk)\displaystyle\tilde{\mathbf{x}}^{(i)}_{k+1}=\mbox{\boldmath$f$}\left(\tilde{\mathbf{x}}^{(i)}_{k},\mathbf{u}^{(i)}_{k},\mathbf{r}^{(i)}_{k},\Delta t_{k}\right)
+δ𝒇𝜽(𝐱~k(i),𝐮k(i),𝐫k(i),)Δtk,\displaystyle\qquad\qquad\qquad+{\color[rgb]{0,0,0}\delta\mbox{\boldmath$f$}_{\bm{\theta}}\left(\tilde{\mathbf{x}}^{(i)}_{k},\mathbf{u}^{(i)}_{k},\mathbf{r}^{(i)}_{k},\right)\Delta t_{k}},
𝐱~0(i)=𝐱0(i),i=1,,D,\displaystyle\tilde{\mathbf{x}}^{(i)}_{0}=\mathbf{x}_{0}^{(i)},\quad{\color[rgb]{0,0,0}\forall i=1,...,D,} (2)

where \mathcal{L} is an error function in the state space, and reg(𝛉){\cal L}_{\text{reg}}(\bm{\theta}) is a regularization term to avoid overfitting. Note that it is not necessary for Δtk=tk+1tk\Delta t_{k}=t_{k+1}-t_{k} to be fixed.

After learning to improve the accuracy of model prediction, our second objective is to use the learned model (1) for MPC to track a desired state trajectory 𝐱0:K\mathbf{x}_{0:K}^{*}.

Problem 2

Given the dynamics model (1), a current robot state 𝐱0\mathbf{x}_{0} and foot positions 𝐫0\mathbf{r}_{0}, a desired trajectory of states 𝐱0:K\mathbf{x}^{*}_{0:K}, foot positions 𝐫0:K\mathbf{r}^{*}_{0:K}, and control 𝐮0:K\mathbf{u}^{*}_{0:K}, design a control law 𝐮0=𝛑(𝐱0,𝐫0,𝐱0:K,𝐫0:K,𝐮0:K;𝛉)\mathbf{u}_{0}=\bm{\pi}(\mathbf{x}_{0},\mathbf{r}_{0},\mathbf{x}^{*}_{0:K},\mathbf{r}^{*}_{0:K},\mathbf{u}^{*}_{0:K};\bm{\theta}) to achieve accurate and robust tracking of the desired state trajectory via shifting-horizon optimization:

min𝐮0:K1\displaystyle\min_{\mathbf{u}_{0:K-1}}\; k=1K𝐱k𝐱k𝐐k+𝐮k1𝐮k1𝐑k\displaystyle\sum_{k=1}^{K}\|\mathbf{x}_{k}-\mathbf{x}_{k}^{*}\|_{\mathbf{Q}_{k}}+\|\mathbf{u}_{k-1}-\mathbf{u}_{k-1}^{*}\|_{\mathbf{R}_{k}}
s.t. 𝐱k+1=𝒇(𝐱k,𝐮k,𝐫k,Δtk)+δ𝒇𝜽(𝐱k,𝐮k,𝐫k)Δtk,\displaystyle{\color[rgb]{0,0,0}\mathbf{x}_{k+1}=\mbox{\boldmath$f$}\left(\mathbf{x}_{k},\mathbf{u}_{k},\mathbf{r}_{k},\Delta t_{k}\right)+\delta\mbox{\boldmath$f$}_{\bm{\theta}}\left(\mathbf{x}_{k},\mathbf{u}_{k},\mathbf{r}_{k}\right)\Delta t_{k}},
𝐮k𝒰k,k=0,,K1,\displaystyle\mathbf{u}_{k}\in{\cal U}_{k},\quad\forall k=0,...,K-1, (3)

where 𝐐k\mathbf{Q}_{k} and 𝐑k\mathbf{R}_{k} are positive definite weight matrices, and 𝒰0:K1{\cal U}_{0:K-1} represents an input constraint set.

Achieving accurate target jumping with MPC requires not only an accurate model but also the coverage of the final state upon landing, which represents the accuracy of the jumping target. Using a fixed-frequency has shown to be efficient for locomotion tasks with limited aerial phases; however, this can face two challenges in long-flight maneuvers. On the one hand, using a fine time-step dirscretization enhances the model prediction accuracy but requires a large number of steps to capture the entire flight phase, thereby increasing the optimization size and computational cost. On the other hand, using a coarse time-step discretization allows capturing the entire flight phase efficiently but can sacrifice model prediction accuracy. For jumping tasks, different phases may require different model resolutions, e.g., fine time resolution during the contact phase but coarser time resolution during the flight phase as the model complexity is reduced due to unavailable force control and contact. Therefore, we propose to learn a model for dynamic maneuvers (Problem 1) with an variable-frequency scheme that uses a coarse time discretization in the flight phase and a fine time discretization in the contact phase. Importantly, this variable-frequency scheme is also synchronized with the MPC control of the learned model (Problem 2) by utilizing the same time steps for both fitting the learned model and performing the MPC optimization. This synchronization ensures that the same discretization errors are leveraged when using the model for MPC predictions. Thus, in our formulation t0:Kt_{0:K} does not have equal time steps and is a mixture of fine and coarse discretization capturing the contact and flight phases. Note that we use a constant time step Δtk\Delta t_{k} during the different jumping phases. It is possible to use Δtk\Delta t_{k} as input to the residual model neural networks. However, we need to collect a substantial amount of data for a wide range of Δtk\Delta t_{k} to avoid overfitting, which would also increase the size of the neural network and computational cost.

III System Overview

Fig. 2 presents an overview of our system architecture. Our approach consists of two stages: variable-frequency model learning (Sec. IV) and variable-frequency MPC (Sec. V), which solve Problem 1 and Problem 2, respectively.

We synchronize the variable-frequency scheme for both model learning and MPC execution with the same (1) variable prediction timesteps Δt𝒞\Delta t_{{\cal C}} and Δt\Delta t_{{\cal F}} for contact and flight phase, respectively, (2) the same horizon length KK for all data collection, training, and MPC, and (3) the same contact schedule. Full-body trajectory optimization (TO) [QuannICRA19] is utilized to generate jumping references for various targets, including body states 𝐱\mathbf{x}^{*}, joint states {𝐪𝐉,𝐪˙𝐉}\{\mathbf{q}_{\mathbf{J}}^{*},\dot{\mathbf{q}}_{\mathbf{J}}^{*}\}, ground contact force 𝐮\mathbf{u}^{*}, and foot positions 𝐫\mathbf{r}^{*}. For data collection, we combine a baseline MPC using a nominal SRBD model and joint PD controller, generating diverse motions under disturbances. For training, we design a neural network to learn the discretized residual model δ𝒇𝜽\delta\mbox{\boldmath$f$}_{\bm{\theta}} with variable sampling time via supervised learning. For control, we design an variable-frequency MPC using the learned dynamics to track a desired reference trajectory obtained from full-body TO. The feedback states from the robot include global body’s CoM 𝐱\mathbf{x}, and joint states 𝐪𝐉,𝐪˙𝐉4\mathbf{q}_{\mathbf{J}},\dot{\mathbf{q}}_{\mathbf{J}}\in\mathbb{R}^{4}, and foot positions 𝐫4\mathbf{r}\in\mathbb{R}^{4}.

IV Variable-Frequency Model Learning

In this section, we describe how to learn the residual dynamics δ𝒇𝜽\delta\mbox{\boldmath$f$}_{\bm{\theta}} from data with a variable-frequency scheme that can cover the entire flight phase, the final state upon landing, and the contact transitions between jumping phases.

IV-A Learning Dynamics with Variable-Frequency

We consider a 2D jumping motion on a legged robot with point foot contact, e.g., quadruped robot, with generalized coordinates 𝖖=[𝐩ϕ]3\bm{\mathfrak{q}}=\begin{bmatrix}\mathbf{p}^{\top}&\phi\end{bmatrix}^{\top}\!\in\mathbb{R}^{3}, where 𝐩2\mathbf{p}\in\mathbb{R}^{2} is the CoM’s position, and ϕ\phi\in\mathbb{R} is the body pitch angle. We define the generalized robot’s velocity as 𝜻=[𝐯ω]3\bm{\zeta}=\begin{bmatrix}\mathbf{v}^{\top}&\omega\end{bmatrix}^{\top}\in\mathbb{R}^{3}, where 𝐯2\mathbf{v}\in\mathbb{R}^{2} and ω\omega\in\mathbb{R} are the linear and angular velocity. Both 𝔮\mathfrak{q} and 𝜻\bm{\zeta} are expressed in world-frame coordinates. The robot state is 𝐱=[𝖖𝜻g]7\mathbf{x}=\begin{bmatrix}\bm{\mathfrak{q}}^{\top}&\bm{\zeta}^{\top}&g\end{bmatrix}^{\top}\in\mathbb{R}^{7}, where the (constant) gravity acceleration gg is added to obtain a convenient state-space form [Carlo2018]. We define 𝐑(ϕ)=[cos(ϕ)sin(ϕ);sin(ϕ)cos(ϕ)]2×2\mathbf{R}(\phi)=\begin{bmatrix}\text{cos}(\phi)&-\text{sin}(\phi);\text{sin}(\phi)&\text{cos}(\phi)\end{bmatrix}\in\mathbb{R}^{2\times 2} as a rotation matrix of the main body, which converts 𝐫i,b\mathbf{r}_{i,b} (a relative foot i{1,2}i\in\{1,2\} position relative to the body’s CoM in the body frame) to the world frame 𝐫i=𝐑𝐫i,b\mathbf{r}_{i}=\mathbf{R}\mathbf{r}_{i,b}. We denote 𝐫=[𝐫1𝐫2]4\mathbf{r}=\begin{bmatrix}\mathbf{r}_{1}^{\top}&\mathbf{r}_{2}^{\top}\end{bmatrix}^{\top}\in\mathbb{R}^{4}. With the force control input for the front and rear legs as 𝐮=[𝐮f𝐮r]4\mathbf{u}=\begin{bmatrix}\mathbf{u}_{f}^{\top}&\mathbf{u}_{r}^{\top}\end{bmatrix}^{\top}\in\mathbb{R}^{4}, the nominal discrete-time SRBD model can be written as:

𝒇(𝐱k,𝐮k,𝐫k,Δtk)=𝐀k𝐱k+𝐁k(𝐫k)𝐮k,\mbox{\boldmath$f$}\left(\mathbf{x}_{k},\mathbf{u}_{k},\mathbf{r}_{k},\Delta t_{k}\right)=\mathbf{A}_{k}\mathbf{x}_{k}+\mathbf{B}_{k}(\mathbf{r}_{k})\mathbf{u}_{k}, (4)

where 𝐀k=𝐈7+𝐀ctΔtk\mathbf{A}_{k}=\mathbf{I}_{7}+\mathbf{A}_{ct}\Delta t_{k}, 𝐁k(𝐫k)=𝐁ct(𝐫k)Δtk\mathbf{B}_{k}(\mathbf{r}_{k})=\mathbf{B}_{ct}(\mathbf{r}_{k})\Delta t_{k}, Δtk\Delta t_{k} is the time step (e.g., Δt𝒞\Delta t_{{\cal C}} or Δt\Delta t_{{\cal F}} for the contact or flight phase, respectively), and 𝐀ct\mathbf{A}_{ct} and 𝐁ct\mathbf{B}_{ct} are obtained from the continuous-time robot dynamics:

𝐀ct=[𝟎𝐈3𝟎𝟎𝟎𝐞3𝟎𝟎0],𝐁ct(𝐫)=[𝟎𝟎𝐈2/m𝐈2/m[𝐫1]×/J[𝐫2]×/J𝟎𝟎],\mathbf{A}_{ct}=\begin{bmatrix}\bf 0&\mathbf{I}_{3}&\bf 0\\ \bf 0&\bf 0&\mathbf{e}_{3}\\ \bf 0&\bf 0&0\end{bmatrix},\mathbf{B}_{ct}(\mathbf{r})=\begin{bmatrix}\bf 0&\bf 0\\ \mathbf{I}_{2}/m&\mathbf{I}_{2}/m\\ [\mathbf{r}_{1}]_{\times}^{\top}/J&[\mathbf{r}_{2}]_{\times}^{\top}/J\\ \bf 0&\bf 0\end{bmatrix},

where 𝐀ct7×7\mathbf{A}_{ct}\in\mathbb{R}^{7\times 7}, 𝐁ct7×4\mathbf{B}_{ct}\in\mathbb{R}^{7\times 4}, 𝐞3=[010]\mathbf{e}_{3}=\begin{bmatrix}0&-1&0\end{bmatrix}^{\top}, mm and JJ are mass and moment inertial of the body, [𝐫i]×=[rizrix]2[\mathbf{r}_{i}]_{\times}=\begin{bmatrix}r_{iz}&-r_{ix}\end{bmatrix}^{\top}\in\mathbb{R}^{2}. The residual term δ𝒇𝜽(.)\delta\mbox{\boldmath$f$}_{\bm{\theta}}\left(.\right) in (1) is

δ𝒇𝜽(.)=𝐡𝜽(𝐱k,𝐫k)Δtk+𝐆𝜽(𝐱k,𝐫k)Δtk𝐮k,\delta\mbox{\boldmath$f$}_{\bm{\theta}}\left(.\right)=\mathbf{h}_{\bm{\theta}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{k}+\mathbf{G}_{\bm{\theta}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{k}\mathbf{u}_{k}, (5)

where 𝐡𝜽(𝐱k,𝐫k)\mathbf{h}_{\bm{\theta}}(\mathbf{x}_{k},\mathbf{r}_{k}) and 𝐆𝜽(𝐱k,𝐫k)\mathbf{G}_{\bm{\theta}}(\mathbf{x}_{k},\mathbf{r}_{k}) are represented by neural networks with learning parameters 𝜽\bm{\theta}. Since 𝐮k=𝟎\mathbf{u}_{k}=\bf 0 during the flight phase, 𝐆𝜽𝐮k=𝟎\mathbf{G}_{\bm{\theta}}\mathbf{u}_{k}=\bf 0. We have two separate models for the contact (𝒞{\cal C}) and flight phase ({\cal F}):

(𝒞):𝐱k+1=𝐀𝒞𝐱k+𝐁𝒞(𝐫k)𝐮k\displaystyle({\cal C}):~{}\mathbf{x}_{k+1}=\mathbf{A}_{{\cal C}}\mathbf{x}_{k}+\mathbf{B}_{{\cal C}}(\mathbf{r}_{k})\mathbf{u}_{k}
+𝐡𝜽1(𝐱k,𝐫k)Δt𝒞+𝐆𝜽1(𝐱k,𝐫k)Δt𝒞𝐮k,\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}+\mathbf{h}_{\bm{\theta}_{1}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{{\cal C}}+\mathbf{G}_{\bm{\theta}_{1}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{{\cal C}}\mathbf{u}_{k}, (6a)
():𝐱k+1=𝐀𝐱k+𝐡𝜽2(𝐱k,𝐫k)Δt\displaystyle({\cal F}):~{}~{}~{}\mathbf{x}_{k+1}=\mathbf{A}_{{\cal F}}\mathbf{x}_{k}+\mathbf{h}_{\bm{\theta}_{2}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{{\cal F}} (6b)

where 𝐀𝒞=𝐈7+𝐀ctΔt𝒞\mathbf{A}_{{\cal C}}=\mathbf{I}_{7}+\mathbf{A}_{ct}\Delta t_{{\cal C}}, 𝐁𝒞(𝐫k)=𝐁ct(𝐫k)Δt𝒞\mathbf{B}_{{\cal C}}(\mathbf{r}_{k})=\mathbf{B}_{ct}(\mathbf{r}_{k})\Delta t_{{\cal C}}, and 𝐀=𝐈7+𝐀ctΔt\mathbf{A}_{{\cal F}}=\mathbf{I}_{7}+\mathbf{A}_{ct}\Delta t_{{\cal F}}. We roll out the dynamics based on (6), starting from initial state 𝐱0\mathbf{x}_{0} with given control input sequence 𝐮0:K1\mathbf{u}_{0:K-1} to obtain a predicted state sequence 𝐱~1:K\tilde{\mathbf{x}}_{1:K}. Using the variable-frequency scheme, the state prediction accounts for contact transitions (feet taking off the ground), a long flight phase, and the final robot state upon landing. We define the loss functions in Problem 1 as follows:

(𝐱,𝐱~)\displaystyle\mathcal{L}(\mathbf{x},\tilde{\mathbf{x}}) =𝖖𝖖~22+𝜻𝜻~22\displaystyle=\|\bm{\mathfrak{q}}-\tilde{\bm{\mathfrak{q}}}\|^{2}_{2}+\|\bm{\zeta}-\tilde{\bm{\zeta}}\|^{2}_{2} (7)
reg(𝜽)\displaystyle\mathcal{L}_{reg}(\bm{\theta}) =α1𝐡𝜽+α2𝐆𝜽.\displaystyle=\alpha_{1}\|\mathbf{h}_{\bm{\theta}}\|+\alpha_{2}\|\mathbf{G}_{\bm{\theta}}\|.

The parameters 𝜽=[𝜽1,𝜽2]\bm{\theta}=[\bm{\theta}_{1},\bm{\theta}_{2}] for each phase are updated by gradient descent to minimize the total loss.

IV-B Data Collection

For model learning, we directly collect state-control trajectories from hardware experiments by implementing an MPC controller with the nominal dynamics model (4) and a reference body trajectory 𝐱\mathbf{x}^{*} obtained from full-body TO. The TO assumes jumping from flat and hard ground with point foot contact. We generated various jumps to different targets under different disturbances (e.g., blocks under the robot feet with random height) to obtain a diverse dataset.

While the MPC aims to track the body reference trajectory 𝐱0:K\mathbf{x}_{0:K}^{*}, a joint PD controller is used to track the joint reference trajectory (𝐪𝐉,𝐪˙𝐉)(\mathbf{q}_{\mathbf{J}}^{*},\mathbf{\dot{q}}_{\mathbf{J}}^{*}) from the full-body TO via 𝝉pd,setpoint=𝐊p(𝐪𝐉𝐪𝐉)+𝐊d(𝐪˙𝐉𝐪˙𝐉)\bm{\tau}_{pd,setpoint}=\mathbf{K}_{p}(\mathbf{q}_{\mathbf{J}}^{*}-\mathbf{q}_{\mathbf{J}})+\mathbf{K}_{d}(\mathbf{\dot{q}}_{\mathbf{J}}^{*}-\mathbf{\dot{q}}_{\mathbf{J}}). Thus, the evolution of the robot states is governed by the combination of MPC and the joint PD controller. We collected the trajectory dataset 𝒟{\cal D} with inputs 𝐮=𝐮mpc+𝐮pd,\mathbf{u}=\mathbf{u}_{mpc}+\mathbf{u}_{pd}, where 𝐮pd=(𝐉(𝐪𝐉)𝐑)1𝝉pd,setpoint\mathbf{u}_{pd}=\left(\mathbf{\mathbf{J}}(\mathbf{q}_{\mathbf{J}})^{\top}\mathbf{R}^{\top}\right)^{-1}\bm{\tau}_{pd,setpoint}, 𝐉(𝐪𝐉)\mathbf{J}(\mathbf{q}_{\mathbf{J}}) is the foot Jacobian..

The dataset is collected at the different time steps for the contact and flight phases, i.e., Δt𝒞\Delta t_{{\cal C}} and Δt\Delta t_{{\cal F}}, respectively. The data for each jump is then chunked by shifting a sliding window, size of KK by 1 timesteps. Let NN be the number of collected state-control data points for each jump and HH be the number of jumps. We then obtain D=H×(NK+1)D=H\times(N-K+1) state-control trajectories in total.

V Variable-Frequency MPC with Learned Dynamics

In this section, we design a variable-frequency MPC controller for the learned dynamics (6) to track a desired jumping reference trajectory obtained from full-body TO. For a given robot state 𝐱0\mathbf{x}_{0}, we formulate MPC as:

min𝐮0:K1\displaystyle\min_{\mathbf{u}_{0:K-1}} k=1K𝐱k𝐱k𝐐k+𝐮k1𝐮k1𝐑k,\displaystyle\;\sum_{k=1}^{K}\|\mathbf{x}_{k}-\mathbf{x}_{k}^{*}\|_{\mathbf{Q}_{k}}+\|\mathbf{u}_{k-1}-\mathbf{u}_{k-1}^{*}\|_{\mathbf{R}_{k}}, (8a)
s.t. (𝒞):𝐱k+1=𝐀𝒞𝐱k+𝐁𝒞(𝐫k)𝐮k\displaystyle({\cal C}):~{}~{}\mathbf{x}_{k+1}=\mathbf{A}_{{\cal C}}\mathbf{x}_{k}+\mathbf{B}_{{\cal C}}(\mathbf{r}_{k})\mathbf{u}_{k}
+𝐡𝜽1(𝐱k,𝐫k)Δt𝒞+𝐆𝜽1(𝐱k,𝐫k)Δt𝒞𝐮k,\displaystyle~{}~{}~{}~{}~{}+\mathbf{h}_{\bm{\theta}_{1}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{{\cal C}}+\mathbf{G}_{\bm{\theta}_{1}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{{\cal C}}\mathbf{u}_{k}, (8b)
():𝐱k+1=𝐀𝐱k+𝐡𝜽2(𝐱k,𝐫k)Δt,\displaystyle({\cal F}):~{}~{}\mathbf{x}_{k+1}=\mathbf{A}_{{\cal F}}\mathbf{x}_{k}+\mathbf{h}_{\bm{\theta}_{2}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{{\cal F}}, (8c)
𝐜¯k𝐂k𝐮k𝐜¯k,k=0,,K1,\displaystyle\underline{\mathbf{c}}_{k}\leq\mathbf{C}_{k}\mathbf{u}_{k}\leq\bar{\mathbf{c}}_{k},\;\forall k=0,...,K-1, (8d)
𝐃k𝐮k=𝟎,k=0,,K1,\displaystyle\mathbf{D}_{k}\mathbf{u}_{k}=\bm{0},\qquad\quad\forall k=0,...,K-1, (8e)

where (8d) represents input constraints related to friction cone and force limits and (8e) aims to nullify the force on the swing legs based on the contact schedule.

With the MPC horizon including K𝒞K_{{\cal C}} steps in the contact phase and K=KK𝒞K_{{\cal F}}=K-K_{{\cal C}} steps in the flight phase, we define 𝐔~𝒞=[𝐮~0,𝐮~1,,𝐮~K𝒞1]K𝒞×4\tilde{\mathbf{U}}_{{\cal C}}=[\tilde{\mathbf{u}}_{0}^{\top},\tilde{\mathbf{u}}_{1}^{\top},...,\tilde{\mathbf{u}}_{K_{{\cal C}}-1}^{\top}]^{\top}\in\mathbb{R}^{K_{{\cal C}}\times 4} as a concatenation of the control inputs in the contact phase. With this notation, the predicted trajectory is:

(𝒞):𝐗~𝒞\displaystyle({\cal C}):~{}\tilde{\mathbf{X}}_{{\cal C}} =𝒜𝒞𝐱0+(𝒞+𝒞,𝜽)𝐔~𝒞+𝐇𝒞,𝜽\displaystyle={\cal A}_{\cal C}\mathbf{x}_{0}+({\cal B}_{{\cal C}}+{\cal B}_{{\cal C},\bm{\theta}})\tilde{\mathbf{U}}_{{\cal C}}+\mathbf{H}_{{\cal C},\bm{\theta}} (9a)
():𝐗~\displaystyle({\cal F}):~{}\tilde{\mathbf{X}}_{{\cal F}} =𝒜𝐱~Kc+𝐇,𝜽\displaystyle={\cal A}_{{\cal F}}\tilde{\mathbf{x}}_{K_{c}}+\mathbf{H}_{{\cal F},\bm{\theta}} (9b)

where 𝐗~𝒞=[𝐱~1,𝐱~2,,𝐱~Kc]\tilde{\mathbf{X}}_{{\cal C}}=[\tilde{\mathbf{x}}_{1}^{\top},\tilde{\mathbf{x}}_{2}^{\top},...,\tilde{\mathbf{x}}_{K_{c}}^{\top}]^{\top} and 𝐗~=[𝐱~Kc+1,𝐱~Kc+2,,𝐱~K]\tilde{\mathbf{X}}_{{\cal F}}=[\tilde{\mathbf{x}}_{K_{c}+1}^{\top},\tilde{\mathbf{x}}_{K_{c}+2}^{\top},...,\tilde{\mathbf{x}}_{K}^{\top}]^{\top} denote the concatenation of predicted states belonging to the contact and flight phase, respectively. The matrices 𝒜𝒞{\cal A}_{{\cal C}} and 𝒜{\cal A}_{{\cal F}} are computed as

{𝒜𝒞=[(𝐀𝒞)(𝐀𝒞2)(𝐀𝒞Kc)]7Kc×7,𝒜=[(𝐀)(𝐀2)(𝐀K)]7K×7.\left\{\begin{array}[]{lr}{\cal A}_{{\cal C}}=\begin{bmatrix}(\mathbf{A}_{{\cal C}})^{\top}(\mathbf{A}_{{\cal C}}^{2})^{\top}\ldots(\mathbf{A}_{{\cal C}}^{K_{c}})^{\top}\end{bmatrix}\in\mathbb{R}^{7K_{c}\times 7},\\ {\cal A}_{{\cal F}}=\begin{bmatrix}(\mathbf{A}_{{\cal F}})^{\top}(\mathbf{A}_{{\cal F}}^{2})^{\top}\ldots(\mathbf{A}_{{\cal F}}^{K_{{\cal F}}})^{\top}\end{bmatrix}\in\mathbb{R}^{7K_{{\cal F}}\times 7}.\end{array}\right.

While we use the current robot state for training the neural networks {𝐆𝜽\{\mathbf{G}_{\bm{\theta}}, 𝐡𝜽}\mathbf{h}_{\bm{\theta}}\} as presented in Sec. IV, we utilize the reference trajectory (𝐱k,𝐫k\mathbf{x}_{k}^{*},\mathbf{r}_{k}^{*}) in future MPC horizons for guiding the MPC since the future robot states are not available at each MPC update ([continuous_jump_bipedal, YanranDingTRO, GabrielICRA2021, ZhitaoIROS22, park2017high]). The reference will be used to compute the output of the neural networks 𝐆𝜽\mathbf{G}_{\bm{\theta}} and 𝐡𝜽\mathbf{h}_{\bm{\theta}} in future horizons. In particular, we still use the current robot state and foot position (𝐱0,𝐫0)(\mathbf{x}_{0},\mathbf{r}_{0}) to evaluate the residual term 𝐁𝜽,0=𝐆𝜽(𝐱0,𝐫0)Δt0\mathbf{B}_{\bm{\theta},0}=\mathbf{G}_{\bm{\theta}}(\mathbf{x}_{0},\mathbf{r}_{0})\Delta t_{0} for each MPC update, then define 𝐁𝜽,k=𝐆𝜽(𝐱k,𝐫k)Δtk\mathbf{B}_{\bm{\theta},k}=\mathbf{G}_{\bm{\theta}}(\mathbf{x}_{k}^{*},\mathbf{r}_{k}^{*})\Delta t_{k}, k=1,,K𝒞k=1,...,K_{{\cal C}}. In (9a), we obtain [𝒞]{m,n}=𝐀𝒞mn𝐁n1[{\cal B}_{{\cal C}}]_{\{m,n\}}=\mathbf{A}_{{\cal C}}^{m-n}\mathbf{B}_{n-1} and its residual [𝒞,𝜽]{m,n}=𝐀𝒞mn𝐁𝜽,n1ifmn,and𝟎otherwise.\left[{\cal B}_{{\cal C},\bm{\theta}}\right]_{\{m,n\}}=\mathbf{A}_{{\cal C}}^{m-n}\mathbf{B}_{\bm{\theta},n-1}~{}\textit{if}~{}m\geq n,\textit{and}~{}\bm{0}~{}\textit{otherwise}.

For the sake of notation, we define χk=(𝐱k,𝐫k)\chi_{k}=(\mathbf{x}_{k},\mathbf{r}_{k}), χk=(𝐱k,𝐫k)\chi_{k}^{*}=(\mathbf{x}_{k}^{*},\mathbf{r}_{k}^{*}) as the actual and reference robot state and foot trajectory. The residual matrix 𝐇𝒞,𝜽\mathbf{H}_{{\cal C},\bm{\theta}} in (9) is 𝐇𝒞,𝜽=Δt𝒞\mathbf{H}_{{\cal C},\bm{\theta}}=\Delta t_{{\cal C}} 𝐒𝒞[𝐡𝜽(χ0)𝐡𝜽(χ1)𝐡𝜽(χKc1)]\mathbf{S}_{{\cal C}}\left[\mathbf{h}_{\bm{\theta}}(\chi_{0})^{\top}~{}\mathbf{h}_{\bm{\theta}}(\chi_{1}^{*})^{\top}~{}...~{}\mathbf{h}_{\bm{\theta}}(\chi_{K_{c}-1}^{*})^{\top}\right]^{\top}, where 𝐒𝒞\mathbf{S}_{{\cal C}} is lower triangular matrix with each entry [𝐒𝒞]{m,n}=𝐀𝒞mn7×7[\mathbf{S}_{{\cal C}}]_{\{m,n\}}=\mathbf{A}_{{\cal C}}^{m-n}\in\mathbb{R}^{7\times 7}, mnm\geq n. The residual matrix 𝐇,𝜽=Δt𝐒[𝐡𝜽(χKc)𝐡𝜽(χK)]\mathbf{H}_{{\cal F},\bm{\theta}}=\Delta t_{{\cal F}}\mathbf{S}_{{\cal F}}\left[\mathbf{h}_{\bm{\theta}}(\chi_{K_{c}}^{*})^{\top}\quad\ldots\quad\mathbf{h}_{\bm{\theta}}(\chi_{K}^{*})^{\top}\right]^{\top} where 𝐒\mathbf{S}_{{\cal F}} is lower triangular matrix with each entry defined [𝐒]{m,n}=𝐀mn7×7[\mathbf{S}_{{\cal F}}]_{\{m,n\}}=\mathbf{A}_{{\cal F}}^{m-n}\in\mathbb{R}^{7\times 7}, mnm\geq n. The state prediction at the end of K𝒞K_{{\cal C}} steps can be computed as

𝐱~Kc=𝐀𝒞Kc𝐱0+(𝐒+𝐒θ)𝐔~𝒞+ϕ𝜽,\tilde{\mathbf{x}}_{K_{c}}=\mathbf{A}_{{\cal C}}^{K_{c}}\mathbf{x}_{0}+(\mathbf{S}+\mathbf{S}_{\theta})\tilde{\mathbf{U}}_{{\cal C}}+\bm{\phi}_{\bm{\theta}}, (10)

where 𝐒=[𝐀𝒞Kc1𝐁0𝐀𝒞Kc2𝐁1𝐁Kc1]\mathbf{S}=\left[\mathbf{A}_{{\cal C}}^{K_{c}-1}\mathbf{B}_{0}\quad\mathbf{A}_{{\cal C}}^{K_{c}-2}\mathbf{B}_{1}\quad...\quad\mathbf{B}_{K_{c}-1}\right], 𝐒𝜽=[𝐀𝒞Kc1𝐁𝜽,0𝐀𝒞Kc2𝐁𝜽,1𝐁𝜽,Kc1]\mathbf{S}_{\bm{\theta}}=\left[\mathbf{A}_{{\cal C}}^{K_{c}-1}\mathbf{B}_{\bm{\theta},0}\quad\mathbf{A}_{{\cal C}}^{K_{c}-2}\mathbf{B}_{\bm{\theta},1}\quad...\quad\mathbf{B}_{\bm{\theta},K_{c}-1}\right], and ϕ𝜽=[𝐀𝒞Kc1𝐡𝜽,0𝐀𝒞Kc2𝐡𝜽,1𝐡𝜽,Kc1]\bm{\phi}_{\bm{\theta}}=\left[\mathbf{A}_{{\cal C}}^{K_{c}-1}\mathbf{h}_{\bm{\theta},0}\quad\mathbf{A}_{{\cal C}}^{K_{c}-2}\mathbf{h}_{\bm{\theta},1}\quad...\quad\mathbf{h}_{\bm{\theta},K_{c}-1}\right]. Substituting (10) into (9) yields the state prediction as

𝐗~=𝐀𝐀𝒞Kc𝐱0+𝐀(𝐒+𝐒θ)𝐔~𝒞+𝐀ϕ𝜽+𝐇,𝜽.\tilde{\mathbf{X}}_{{\cal F}}=\mathbf{A}_{{\cal F}}\mathbf{A}_{{\cal C}}^{K_{c}}\mathbf{x}_{0}+\mathbf{A}_{{\cal F}}(\mathbf{S}+\mathbf{S}_{\theta})\tilde{\mathbf{U}}_{{\cal C}}+\mathbf{A}_{{\cal F}}\bm{\phi}_{\bm{\theta}}+\mathbf{H}_{{\cal F},\bm{\theta}}. (11)
𝐗~=[𝐗~𝒞𝐗~]=𝐀qp+(𝐁qp+𝐁𝜽,qp)𝐔~+𝐇𝜽\tilde{\mathbf{X}}=\begin{bmatrix}\tilde{\mathbf{X}}_{{\cal C}}\\ \tilde{\mathbf{X}}_{{\cal F}}\end{bmatrix}=\mathbf{A}_{\text{qp}}+\left(\mathbf{B}_{\text{qp}}+\mathbf{B}_{\bm{\theta},\text{qp}}\right)\tilde{\mathbf{U}}+\mathbf{H}_{\bm{\theta}} (12)

where, 𝐀qp=[𝒜𝒞𝒜𝐀𝒞Kc]\mathbf{A}_{\text{qp}}=\begin{bmatrix}{\cal A}_{{\cal C}}\\ {\cal A}_{{\cal F}}\mathbf{A}_{{\cal C}}^{K_{c}}\end{bmatrix}, 𝐁qp=[𝒞𝟎𝒜𝐒𝟎]\mathbf{B}_{\text{qp}}=\begin{bmatrix}{\cal B}_{{\cal C}}&\bf 0\\ {\cal A}_{{\cal F}}\mathbf{S}&\bf 0\end{bmatrix}, 𝐁𝜽,qp=[𝒞,𝜽𝟎𝒜𝐒𝜽𝟎]\mathbf{B}_{\bm{\theta},\text{qp}}=\begin{bmatrix}{\cal B}_{{\cal C},\bm{\theta}}&\bf 0\\ {\cal A}_{{\cal F}}\mathbf{S}_{\bm{\theta}}&\bf 0\end{bmatrix}, 𝐇𝜽=[𝐇𝒞,𝜽𝐇,𝜽+𝒜ϕ𝜽]\mathbf{H}_{\bm{\theta}}=\begin{bmatrix}\mathbf{H}_{{\cal C},\bm{\theta}}\\ \mathbf{H}_{{\cal F},\bm{\theta}}+{\cal A}_{{\cal F}}\bm{\phi}_{\bm{\theta}}\end{bmatrix}, and 𝐔~=[𝐔~𝒞𝐔~]\tilde{\mathbf{U}}=\begin{bmatrix}\tilde{\mathbf{U}}_{{\cal C}}\\ \tilde{\mathbf{U}}_{{\cal F}}\end{bmatrix}.

The objective of the MPC in (8) can be rewritten as:

J(𝐔)=𝐗~𝐗𝐏2+𝐔~𝐔𝐐2,J(\mathbf{U})=\|\tilde{\mathbf{X}}-\mathbf{X}^{*}\|_{\mathbf{P}}^{2}+\|\tilde{\mathbf{U}}-\mathbf{U}^{*}\|_{\mathbf{Q}}^{2}, (13)

leading to a quadratic program (QP):

min𝐔\displaystyle\underset{\mathbf{U}}{\text{min}} 𝐔𝜶𝐔+𝜷𝐔s.t.𝐔k𝒰k,\displaystyle\mathbf{U}^{\top}\bm{\alpha}\mathbf{U}+\bm{\beta}^{\top}\mathbf{U}\quad\text{s.t.}\;\;\mathbf{U}_{k}\in{\cal U}_{k}, (14)

where

𝜶\displaystyle\bm{\alpha} =(𝐁qp+𝐁𝜽,qp)𝐐(𝐁qp+𝐁𝜽,qp)+𝐑,\displaystyle=(\mathbf{B}_{\text{qp}}+\mathbf{B}_{\bm{\theta},\text{qp}})^{\top}\mathbf{Q}(\mathbf{B}_{\text{qp}}+\mathbf{B}_{\bm{\theta},\text{qp}})+\mathbf{R},
𝜷\displaystyle\bm{\beta} =2(𝐁qp+𝐁𝜽,qp)𝐐(𝐀qp𝐱k+𝐇𝜽𝐗)𝐑𝐔,\displaystyle=2(\mathbf{B}_{\text{qp}}+\mathbf{B}_{\bm{\theta},\text{qp}})^{\top}\mathbf{Q}\left(\mathbf{A}_{\text{qp}}\mathbf{x}_{k}+\mathbf{H}_{\bm{\theta}}-\mathbf{X}^{*}\right)-\mathbf{R}\mathbf{U}^{*},
𝒰k\displaystyle{\cal U}_{k} ={𝐮4𝐜¯k𝐂k𝐮k𝐜¯k,𝐃k𝐮k=𝟎}.\displaystyle=\{\mathbf{u}\in\mathbb{R}^{4}\mid\underline{\mathbf{c}}_{k}\leq\mathbf{C}_{k}\mathbf{u}_{k}\leq\bar{\mathbf{c}}_{k},\mathbf{D}_{k}\mathbf{u}_{k}=\bm{0}\}.

VI Evaluation

VI-A Experiment Setup

We evaluated our approach using a A1 robot. To get data for real-time control, we utilized a motion capture system for estimating the position and orientation of the robot trunk at 11 kHz with position errors of 11 mm. We obtain velocity data from position data using forward finite-difference method.

We used Pytorch to implement and train the residual dynamics model. The trained Pytorch model was converted to a torch script trace, which in turn, was loaded in our MPC implementation in C++. The MPC is solved by a QP solver (qpOASES). We utilized the optimization toolbox CasADi [casadi] to set up and solve the full-body TO for various jumping targets within [0.3,0.8][0.3,0.8] m on a flat ground. We considered jumping motions with contact phase (C) consisting of an all-leg contact period (e.g., 500500 ms) and a rear-leg contact period (e.g., 300300 ms). With the flight phase (F) scheduled for 400400 ms, the front legs and rear legs become swing legs (SW) up to 58%58\% and 33%33\% of the entire jumping period, respectively. Jumping maneuvers feature long flight phases, i.e., the robot can jump up to 44 times its normal standing height during the mid-air period. All jumping experiments are executed with sufficient battery level (>90%>90\%)

VI-B Data Collection and Training

TABLE I: Training Parameters
Parameter Symbol Value
Variable time steps Δt𝒞,Δt\Delta t_{{\cal C}},\Delta t_{{\cal F}} {25,100}(ms)\{25,100\}(ms)
Prediction steps KK 1010
NN architecture 𝐡𝜽1(𝔮,𝐫)\mathbf{h}_{\bm{\theta}_{1}}(\mathbf{{\mathfrak{q}}},\mathbf{r}) 77 - 400400 Tanh - 400400 Tanh - 33
𝐆𝜽1(𝔮,𝐫)\mathbf{G}_{\bm{\theta}_{1}}(\mathbf{{\mathfrak{q}}},\mathbf{r}) 77 - 10001000 Tanh - 10001000 Tanh - 1212
𝐡𝜽2(𝔮,𝐫)\mathbf{h}_{\bm{\theta}_{2}}(\mathbf{{\mathfrak{q}}},\mathbf{r}) 77 - 400400 Tanh - 400400 Tanh - 33
Learning rate γ\gamma 2.1042.10^{-4}
Regulation weights α1,α2\alpha_{1},\alpha_{2} 103,10310^{-3},10^{-3}
Total training steps NtrainN_{train} 2.1042.10^{4}
Refer to caption
Figure 3: Training loss and testing loss (Log scale).

We aim to learn a residual dynamics model that is not trajectory-specific and can generalize to a whole family of forward jumps. To collect a sufficient dataset for training, we utilized a baseline MPC controller with a nominal model (Sec. IV-B) to generate diverse trajectories under a variety of unknown disturbances (box under the front feet with random height within [0,8][0,8] cm) and zero-height jumping targets. We collected data from H=20H=20 jumps with 80%80\% for training and 20%20\% for testing. The data points were sampled with variable sampling time Δt𝒞=25\Delta t_{{\cal C}}=25 ms, Δt=100\Delta t_{{\cal F}}=100 ms, and horizon length of K=10K=10.

The training parameters are listed in Table LABEL:tab:training_params. We used fully-connected neural networks with architectures listed in Table LABEL:tab:training_params: the first number is the input dimension, the last number is the output dimension, and the numbers in between are the hidden layers’ dimensions and activation functions. The output of 𝐆𝜽1\mathbf{G}_{\bm{\theta}_{1}} is then converted to a 3×43\times 4 matrix. The training and testing losses are illustrated in Fig. 3.

Refer to caption
Figure 4: State prediction comparison on a random testing dataset. It starts at 650650 ms, consisting of Kc=6K_{c}=6 steps in the contact phase and Kf=4K_{f}=4 steps covering the flight period of 400400 ms. The blue and yellow areas represent the contact and flight periods, respectively. We compare variable-frequency learned model (our method), fixed-frequency learned model, variable-frequency nominal model, and the ground-truth trajectories. The ground-truth data is directly obtained from hardware experiments.
Refer to caption

Figure 5: MPC solving time (tMPCsolt_{MPC}^{sol}) with various horizons K=10K=10 and K=22K=22. The red dashed line denotes the MPC update time (1/fMPCupd1/f_{MPC}^{upd}) of 2525 ms. Real-time performance is achieved if tMPCsol<<1/fMPCupdt_{MPC}^{sol}<<1/f_{MPC}^{upd}.
Refer to caption
Figure 6: Comparison of error of final target jumping (in cm) for four real-time MPC setups with different models and timesteps resolutions in a total of 9292 jumps with robot hardware. Positive values indicate that the jumps fall short of the target. The cross (×\times) denotes failure to jump on the box. Supplemental video: https://youtu.be/2QjZkARs1mU.
Refer to caption
Figure 7: Jumping from uneven terrain and Jumping on a box: Comparison among full-body TO, nominal-model MPC with fixed and variable frequency, and learned-model MPC with fixed and variable frequency. All MPCs have a horizon length of K=10K=10. An unknown 55 cm block (30%30\% of the robot’s initial height) was introduced under the robot’s front feet (upper figures). With variable-frequency learned-model MPC (our method), the robot is able to compensate for the disturbance and perform successful jumps. Supplemental video: https://youtu.be/yUqI_MBOC6Q.

VI-C Comparative Analysis

VI-C1 Evaluation on testing dataset

We rolled out the learned dynamics with variable frequency to predict the future state trajectories and compared with (i) those of the variable-frequency nominal model, (ii) those of the fixed-frequency learned model, and (iii) the ground-truth states in Fig. 4. The fixed-frequency learned model is trained with K=10K=10 and Δt=25\Delta t=25 ms. The figure shows that our proposed learned model (red lines) outperforms variable-frequency nominal model (green lines) and fixed-frequency learned model (blue lines). We highlight a large deviation in trajectory prediction of variable-frequency nominal model with the others. This inaccuracy can be explained by the use of conventional SRBD for the entire prediction horizon and the coarse Euler integration for the flight phase. Our variable-frequency learned model helps address the inaccuracies introduced by the nominal model and keeps sufficient accuracy of trajectory prediction while allowing real-time MPC by keeping a small number of decision variables.

VI-C2 Effect of prediction horizon

Fig. 5 compares the solving time for MPC with different horizons. With our variable-frequency scheme, we can use a few prediction horizons, e.g., K=10K=10 (66 for contact and 44 for the flight phase), allowing QP solving time for each MPC update ((14)) to be 22 ms only on average. This efficient computation enables real-time performance for MPC. Using a fixed-frequency scheme with sample time of 2525 ms requires MPC to use a large number of steps, e.g. K=22K=22 (66 for contact and 1616 for the whole flight phase). This long horizon yields a large optimization problem, which takes \sim3030 ms for each MPC update and deteriorates the real-time performance.

VI-C3 Execution

We verified that our proposed approach enables both robust and accurate target jumping to various targets. We compared (a) nominal model &\& fixed-frequency MPC, (b) nominal model &\& variable-frequency MPC, (c) fixed-frequency learned model &\& fixed-frequency MPC, and (d) variable-frequency learned model &\& variable-frequency MPC (our approach). The fixed-frequency scheme uses (Δt𝒞,Δt)=(25,25)ms(\Delta t_{{\cal C}},\Delta t_{{\cal F}})=(25,25)ms and variable-frequency one utilizes (Δt𝒞,Δt)=(25,100)ms(\Delta t_{{\cal C}},\Delta t_{{\cal F}})=(25,100)ms. All MPCs use the prediction horizon K=10K=10, thus the fixed-frequency MPC does not cover the entire flight phase.

For each combination, we perform 2323 jumps consisting of (i) 3 single jumps for each of five targets: x=40x=40 cm, x=50x=50 cm, x=60x=60 cm, (x,z)=(60,20)(x,z)=(60,20) cm for jump on box, and x=70x=70 cm, and (ii) 88 continuous jumps of 6060 cm. This yields a total of 9292 jumps for comparison. Fig. 6 shows the average final jump target errors of four different MPC combinations across different jumping tasks. Our approach outperforms the methods that adopt fixed-frequency or nominal models, reducing the jumping distance error up to 88 times. Our method demonstrates successful jumping on a box with landing angle errors <15<15^{\circ}, while the fixed-frequency MPC or nominal model MPC fail, as shown in Fig. 7. Note that the box-jumping task is not executed during data collection, verifying the scalability of our method to generalize to unseen reference trajectories and unseen tasks. With our method, the robot also successfully jumps under model uncertainty, e.g, an unknown 5cm5cm block (30%30\% robot standing height) placed under the front feet (Fig. 7). This task uses a reference trajectory designed for flat ground and demonstrates an example where the robot significantly deviates from the reference trajectory.

We also studied the effect of uncertainty in the model by evaluating the jumping performance if we only utilize a joint PD controller to track the joint reference from full-body TO [QuannICRA19], via 𝝉pd=𝝉ff+𝐊p(𝐪𝐉𝐪𝐉)+𝐊d(𝐪˙𝐉𝐪˙𝐉)\bm{\tau}_{pd}=\bm{\tau}_{ff}+\mathbf{K}_{p}(\mathbf{q}_{\mathbf{J}}^{*}-\mathbf{q}_{\mathbf{J}})+\mathbf{K}_{d}(\mathbf{\dot{q}}_{\mathbf{J}}^{*}-\mathbf{\dot{q}}_{\mathbf{J}}). The full-order model can be conservative by assuming hard feet, point-to-surface contact, and hard ground. These assumptions, however, are not valid for the Unitree A1 robot equipped with deformable feet [Zachary_RAL_online_calibration_2022]. Uncertainties due to DC motors working at extreme conditions or motor deficiency are difficult to model and are normally ignored in the full-body model. These factors affect the model accuracy and prevent the robot from reaching the jumping target if we solely rely on the joint PD, as can be seen in Fig. 7.

Refer to caption
Figure 8: Hardware experiments: We conduct 22 rounds with 44 jumps in each round for each method. The figures (a)-(d) select snapshots of the first and the last jumps when the robot executes four consecutive jumps with different MPC approaches. The target for each jump is 0.60.6 m.
Refer to caption
Figure 9: Continuous jumping on uneven terrain with A1 robot, shown in Fig. 1: control forces on the front (a) and rear (b) legs, where the forces are 0 for swing legs; and (c) accurate joint tracking of thigh and calf of rear legs. Plots (a) and (b) show the force outputs satisfy |Fz|Fmax=350|F_{z}|\leq F_{max}=350 N and the friction cone |Fx/Fz|μ=0.6|F_{x}/F_{z}|\leq\mu=0.6 during all phases.

We evaluate the target accuracy and robustness with our method for continuous jumps on flat ground. The results are presented in Fig. 9, showing some selected snapshots of the first and the last jump to the final target 2.42.4 m. Our method achieves the highest target accuracy, allowing the robot to traverse 2.472.47 m with an average distance error of only 1.751.75 cm per jump. During flight periods, the robot can jump up to 4×4\times its standing height.

Note that computing residual matrices 𝐆𝒞,𝜽,𝐇𝒞,𝜽,𝐇,𝜽\mathbf{G}_{{\cal C},\bm{\theta}},\mathbf{H}_{{\cal C},\bm{\theta}},\mathbf{H}_{{\cal F},\bm{\theta}} can consume significant time if we feed the neural network with (𝐱0,𝐮0),(𝐱1,𝐮1),,(𝐱K,𝐮K)(\mathbf{x}_{0},\mathbf{u}_{0}),(\mathbf{x}_{1}^{*},\mathbf{u}_{1}^{*}),...,(\mathbf{x}_{K}^{*},\mathbf{u}_{K}^{*}) sequentially. To improve computational efficiency, we combined the current state and reference as a batch, then feed-forward the entire batch to the learned neural networks all at once, reducing the neural network query time to less than 11 ms.

We further evaluate MPC with different models learned with other timing choices of (Δtc,Δtf)={(50,50),(25,80)}(\Delta t_{c},\Delta t_{f})=\{(50,50),(25,80)\} ms, in addition to the two timesteps (Δtc,Δtf)={(25,25),(25,100)}(\Delta t_{c},\Delta t_{f})=\{(25,25),(25,100)\} we already discussed above.

TABLE II: Distance errors (cm) when varying coarseness of timesteps (Δt𝒞,Δt)(\Delta t_{{\cal C}},\Delta t_{{\cal F}}) ms. Red colors indicate jumping failure as robot falls short of the box (positive values). The distance errors are measured by the motion capture system at the end of the predefined flight period (12001200ms). Supplemental video: https://youtu.be/2QjZkARs1mU.
Task (25,25)(25,25) (50,50)(50,50) (25,80)(25,80) (25,100)(25,100)
Box-Jump 18.3±3.718.3\pm 3.7 10.8±2.110.8\pm 2.1 1.8±0.9-1.8\pm 0.9 2.0±1.3-2.0\pm 1.3

Three jumps are conducted separately for each timing choice of jumping on box task, and the experiments are summarized in Tab. LABEL:tab:vary_timestep. We also achieved successful jumps using the variable-frequency scheme (25,80)(25,80) ms. We also observed that using higher timestep in contact to cover the whole flight phase even with fixed-frequency scheme, e.g.,(50,50)(50,50) ms does not guarantee successful jumps. This confirms our rationale that due to the higher complexity of the robot dynamics during the contact phase, a coarse time step during this phase leads to higher accumulated prediction errors, even in the flight phase, causing task failures.

VI-D Continuous Jumping on Uneven Terrain

We tested the robustness and target accuracy of our learning-based MPC for continuous jumping on uneven terrains with a target of 6060 cm for each jump, as illustrated in Fig. 1 and Fig. 9. The terrain consisted of multiple blocks with random heights between 242-4 cm, randomly placed on the ground. Since the robot legs can impact the ground early or late on the real robot, there is usually a mismatch between scheduled contact and actual contact states. Whenever both contacts happen (i.e., the landing phase LL starts), we activate a separate landing MPC to make a transition between two jumps in a short period of \sim200200 ms [continuous_jump]. The landing MPC is designed with simplified dynamics, K=10K=10, and Δt=25\Delta t=25 ms. This aims to track the body reference trajectory from TO for continuous jumping [continuous_jump] and connects two jumps seamlessly. Figure 1 shows the actual trajectory of three continuous jumps, leaping around 175175 cm in total that yields only 1.671.67 cm distance error (<3%<3\%) for each jump on the uneven terrain. Compared to our prior work [continuous_jump], we successfully achieve continuous jumping in hardware in this work. With our framework, we also achieve a target jumping error of less than 22 cm (3.5%3.5\%) on 3434 single and continuous jumps with different targets and uneven terrain.

VII Discussion and Conclusion

In this letter, we developed a learned-model MPC approach that enables both target accuracy and robustness for aggressive planar jumping motions. We learned a residual model from a limited dataset to account for the effect of leg dynamics and model mismatch with real hardware. Given the learned model, we designed a learning-based variable-frequency MPC that considers the jumping transitions, the entire flight phase, and the jumping target during the optimization process. We demonstrated the scalability of our approach to handle new jumping targets with unseen reference trajectories. While allowing real-time computation, our approach uses the reference trajectory to evaluate the neural network residual model and ensure a linear model for MPC, which sacrifices a certain degree of accuracy if the system does not operate around the reference under substantial disturbances. To overcome this limitation, our future work will explore the use of nonlinear MPC and address its associated higher computational cost. We will also generalize the model learning and control framework with varying contact schedules. Further, we will incorporate line-foot, rolling, and soft contact in learning aggressive legged locomotion maneuvers.