Variable-Frequency Model Learning and Predictive Control for Jumping Maneuvers on Legged Robots

Chuong Nguyen, Abdullah Altawaitan, Thai Duong, Nikolay Atanasov, and Quan Nguyen Manuscript received: July 18, 2024; Revised October 20, 2024; Accepted November 19, 2024.This paper was recommended for publication by Editor Jaydev P. Desai upon evaluation of the Associate Editor and Reviewers’ comments.Chuong Nguyen and Quan Nguyen are with the Department of Aerospace and Mechanical Engineering, University of Southern California, Los Angeles, CA 90007, USA, e-mails: {vanchuong.nguyen,quann}@usc.edu.Abdullah Altawaitan, Thai Duong, and Nikolay Atanasov are with the Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093, USA, e-mails: {aaltawaitan,tduong,natanasov}@ucsd.edu. A. Altawaitan is also affiliated with Kuwait University as a holder of a scholarship.Digital Object Identifier (DOI): see top of this page.

Abstract

Achieving both target accuracy and robustness in dynamic maneuvers with long flight phases, such as high or long jumps, has been a significant challenge for legged robots. To address this challenge, we propose a novel learning-based control approach consisting of model learning and model predictive control (MPC) utilizing a variable-frequency scheme. Compared to existing MPC techniques, we learn a model directly from experiments, accounting not only for leg dynamics but also for modeling errors and unknown dynamics mismatch in hardware and during contact. Additionally, learning the model with variable-frequency allows us to cover the entire flight phase and final jumping target, enhancing the prediction accuracy of the jumping trajectory. Using the learned model, we also design variable-frequency to effectively leverage different jumping phases and track the target accurately. In a total of $92$ jumps on Unitree A1 robot hardware, we verify that our approach outperforms other MPCs using fixed-frequency or nominal model, reducing the jumping distance error $2-8$ times. We also achieve jumping distance errors of less than $3\%$ during continuous jumping on uneven terrain with randomly-placed perturbations of random heights (up to $4$ cm or $27\%$ the robot’s standing height). Our approach obtains distance errors of $1-2$ cm on $34$ single and continuous jumps with different jumping targets and model uncertainties. Code is available at https://github.com/DRCL-USC/Learning_MPC_Jumping.

Index Terms:

Model Learning for Control, Legged Robots, Whole-Body Motion Planning and Control

I Introduction

Aggressive jumping maneuvers with legged robots have received significant attention recently, and have been successfully demonstrated using trajectory optimization [QuannICRA19, chuongjump3D, matthew_mit2021_1, ChignoliICRA2021], model-based control [YanranDingTRO, continuous_jump_bipedal, GabrielICRA2021, park2017high, ZhitaoIROS22, fullbody_MPC], and learning-based control [zhuang2023robot, RL_jump_bipedal, yang2023cajun, vassil_drl, jumpingrl, lokesh_ogmp]. Unlike walking or running, aggressive motions are particularly challenging due to (1) the extreme underactuation in the mid-air phase (the robot mainly relies on the force control during contact to regulate its global position and orientation), (2) significant dynamics model error and uncertainty that are inherently hard to model accurately, especially with contact and hardware load during extreme motions, and (3) the trade-off between model accuracy and efficient computation for real-time execution. Achieving both target accuracy and robustness for long-flight maneuvers, such as high or long jumps, therefore still presents an open challenge. In this work, we address this challenge by developing a real-time MPC for quadruped jumping using a robot dynamics model learned from experiments.

Many control and optimization techniques have been developed for jumping motions. Trajectory optimization (TO) with full-body nonlinear dynamics is normally utilized to generate long-flight trajectories in an offline fashion (e.g., [QuannICRA19, chuongjump3D]). Many MPC approaches sacrifice model accuracy to achieve robust maneuvers by using simplified models that treat the trunk and legs as a unified body [NMPC_3D_hopping, ZhitaoIROS22, GabrielICRA2021, park2017high, YanranDingTRO]. Our recent iterative learning control (ILC) work [chuong_ilc_jump] handles model uncertainty to realize long-flight jumps via multi-stage optimization that optimizes the control policy after each jump offline until reaching a target accurately. It also relies on a simplified model for computational efficiency. However, this work focuses on target jumping rather than robustness, e.g. requiring the same initial condition for all trials. Different from existing works, we learn a robot dynamics model from experiments and develop real-time MPC using the learned dynamics to achieve both target accuracy and robustness in continuous quadruped jumping.

Refer to caption — Figure 1: A Unitree A1 robot performs continuous jumps on unknown uneven terrain, achieving both target accuracy and robustness. The target distance for each jump is $0.6$ m. The flight phase covers a vertical height of up to $4\times$ the robot’s normal height. MPC with a nominal single rigid-body model is used to collect data for training a neural-network residual dynamics model following using a variable-frequency scheme. The learned model is then used in a variable-frequency MPC to execute jumping motions. The variable-frequency scheme varies the time step sizes in the contact phase ( $\Delta t_{{\cal C}}$ ) and flight phase ( $\Delta t_{{\cal F}}$ ) to dedicate more model capacity and more MPC optimization steps to the contact phase, which improves the jumping robustness and accuracy. The green dashed line is the actual robot trajectory. Supplemental video: https://youtu.be/yUqI_MBOC6Q.

Learning robot models from experiments to capture complex dynamics effects and account for model errors with real hardware has become a popular approach [bauersfeld2021neurobem, hewing2019cautious, saviolo2022physics, duong23porthamiltonian]. Many frameworks learn residual dynamics using neural networks [bauersfeld2021neurobem, salzmann2023real] or Gaussian processes [hewing2019cautious, cao2017gaussian] and use MPC to control the learned systems [salzmann2023real, pohlodek2022hilompc] to improve tracking errors. Most existing works, however, primarily investigate dynamical systems without contact. Recently, Sun et al. [sun2021online] proposed a notable method to learn a linear residual model, addressing unknown dynamics during walking stabilization. Pandala et al. [Pandala_robustMPC] propose to close the gap between reduced- and full-order models by using deep reinforcement learning to learn unmodeled dynamics, which are used in MPC to realize robust walking on various terrains. Meanwhile, we use supervised learning to learn unmodelled dynamics with real data from hardware. Also, we learn dynamics for maneuvers with long-flight periods to tackle (1) the switching between multiple dynamic models due to contact, including flight phase where control action has very little effect on the body dynamics, (2) disturbances and uncertainty in dynamic modeling due to hard impact in jumping, and (3) the effect of intermittent control at contact transitions on state predictions.

In this letter, we propose a residual learning model that uses a variable-frequency scheme, i.e., varying the coarseness of integration time step, to address the aforementioned challenges and enhance long-term trajectory predictions over different jumping phases. MPC has been commonly used for jumping by optimizing the control inputs during the contact phase based on future state prediction during the flight phase (e.g.,[GabrielICRA2021, fullbody_MPC, park2017high, YanranDingTRO, continuous_jump]). A major challenge in MPC for long flight maneuvers is to utilize a limited number of prediction steps yet still effectively cover the entire flight phase and especially the final jumping target. Another challenge is to obtain an accurate model for complex dynamic maneuvers, unknown dynamics, and model mismatch with the real hardware to improve the jumping accuracy, while still ensuring real-time performance. Some recent works have addressed these challenges partially. Many methods use conventional single-rigid-body dynamic (SRBD) models, ignoring the leg dynamics, to achieve real-time execution [GabrielICRA2021, continuous_jump_bipedal, YanranDingTRO]. Using the SRBD model in the contact phase can lead to inaccurate predictions of the robot’s state at take-off, which is the initial condition of the projectile motion in the flight phase. Thus, the trajectory prediction error can accumulate significantly over long flight periods. Some methods account for leg inertia [ZiyiZhouRAL, He2024, fullbody_MPC], however, disturbance and uncertainty in dynamic modeling have not been considered. Recently, planning with multi-fidelity models and multi-resolution time steps has emerged as an effective strategy to enhance target accuracy and robustness [heli_cafempc, Norby_adaptivempc, Heli_hierarchympc]. Li et al. [Heli_hierarchympc] adopt a less accurate model in the far horizon and a more accurate but expensive model in the near future. Norby et al. [Norby_adaptivempc] adapt model complexity based on the task complexity along the prediction horizon. Li et al. [Heli_hierarchympc] also design multi-resolution time steps to cover the whole flight phase. While we vary the coarseness of the time step as [Heli_hierarchympc], we design a novel residual model learning approach combined with variable-frequency MPC to address the aforementioned challenges.

Contributions: The contributions of this letter are summarized as follows.

•

We learn a residual dynamics model directly from a small real experiment dataset. The model accounts for nonlinear leg dynamics, modeling errors, and unknown dynamics mismatch in hardware and during contact.
•

We propose learning the model in an variable-frequency scheme that leverages different time resolutions to capture the entire flight phase, the jumping target, and the contact transitions over a few-step horizon, thereby significantly improving the accuracy of long-term trajectory prediction.
•

We develop variable-frequency MPC using the learned model to synthesize controls that improve both target accuracy and robustness in dynamic robot maneuvers.
•

Extensive hardware experiments validate the effectiveness and robustness of our approach with single and consecutive jumps on uneven terrains. Comparisons with other MPC techniques using a nominal model or fixed-frequency are also provided.

II Problem Statement

Consider a legged robot modelled by state $\mathbf{x}_{k}$ , consisting of the pose and velocity of the body’s center of mass (CoM), foot positions $\mathbf{r}_{k}$ relative to the body’s CoM, and ground force control input $\mathbf{u}_{k}$ at the legs, sampled at time $t_{k}$ . We augment the nominal SRBD model (e.g., [GabrielICRA2021, park2017high, YanranDingTRO]) with a learned residual term to account for leg dynamics, model mismatch with the real robot hardware, as well as capture complex dynamic effects in contact and hardware:

\mathbf{x}_{k+1}=\mbox{\boldmath$f$}\left(\mathbf{x}_{k},\mathbf{u}_{k},\mathbf{r}_{k},\Delta t_{k}\right)+\delta\mbox{\boldmath$f$}_{\bm{\theta}}\left(\mathbf{x}_{k},\mathbf{u}_{k},\mathbf{r}_{k}\right)\Delta t_{k},

(1)

where $f$ represents the nominal SRBD model, $\delta\mbox{\boldmath$f$}_{\bm{\theta}}$ with parameters $\bm{\theta}$ approximates the residual dynamics, and $\Delta t_{k}$ is the sampling interval. Our first objective is to learn the residual dynamics $\mbox{\boldmath$f$}_{\bm{\theta}}$ using a dataset of jumping trajectories.

Problem 1

Given a set ${\cal D}\!=\!\{t_{0:K}^{(i)},\mathbf{x}^{(i)}_{0:K},\mathbf{r}^{(i)}_{0:K},\mathbf{u}^{(i)}_{0:K}\}_{i=1}^{D}$ of $D$ sequences of states, foot positions, and control inputs, find the parameters $\bm{\theta}$ of the residual dynamics $\delta\mbox{\boldmath$f$}_{\bm{\theta}}$ in (1) by rolling out the dynamics model and minimizing the discrepancy between the predicted state sequence $\tilde{\mathbf{x}}^{(i)}_{1:K}$ and the true state sequence $\mathbf{x}^{(i)}_{1:K}$ in ${\cal D}$ :

$\displaystyle\min_{\bm{\theta}}\;$	$\displaystyle\sum_{i=1}^{D}\sum_{k=1}^{K}\mathcal{L}(\mathbf{x}^{(i)}_{k},\tilde{\mathbf{x}}^{(i)}_{k})+{\cal L}_{\text{reg}}(\bm{\theta})$
s.t.	$\displaystyle\tilde{\mathbf{x}}^{(i)}_{k+1}=\mbox{\boldmath$f$}\left(\tilde{\mathbf{x}}^{(i)}_{k},\mathbf{u}^{(i)}_{k},\mathbf{r}^{(i)}_{k},\Delta t_{k}\right)$
	$\displaystyle\qquad\qquad\qquad+{\color[rgb]{0,0,0}\delta\mbox{\boldmath$f$}_{\bm{\theta}}\left(\tilde{\mathbf{x}}^{(i)}_{k},\mathbf{u}^{(i)}_{k},\mathbf{r}^{(i)}_{k},\right)\Delta t_{k}},$
	$\displaystyle\tilde{\mathbf{x}}^{(i)}_{0}=\mathbf{x}_{0}^{(i)},\quad{\color[rgb]{0,0,0}\forall i=1,...,D,}$	(2)

where $\mathcal{L}$ is an error function in the state space, and ${\cal L}_{\text{reg}}(\bm{\theta})$ is a regularization term to avoid overfitting. Note that it is not necessary for $\Delta t_{k}=t_{k+1}-t_{k}$ to be fixed.

After learning to improve the accuracy of model prediction, our second objective is to use the learned model (1) for MPC to track a desired state trajectory $\mathbf{x}_{0:K}^{*}$ .

Problem 2

Given the dynamics model (1), a current robot state $\mathbf{x}_{0}$ and foot positions $\mathbf{r}_{0}$ , a desired trajectory of states $\mathbf{x}^{*}_{0:K}$ , foot positions $\mathbf{r}^{*}_{0:K}$ , and control $\mathbf{u}^{*}_{0:K}$ , design a control law $\mathbf{u}_{0}=\bm{\pi}(\mathbf{x}_{0},\mathbf{r}_{0},\mathbf{x}^{*}_{0:K},\mathbf{r}^{*}_{0:K},\mathbf{u}^{*}_{0:K};\bm{\theta})$ to achieve accurate and robust tracking of the desired state trajectory via shifting-horizon optimization:

$\displaystyle\min_{\mathbf{u}_{0:K-1}}\;$	$\displaystyle\sum_{k=1}^{K}\\|\mathbf{x}_{k}-\mathbf{x}_{k}^{}\\|_{\mathbf{Q}_{k}}+\\|\mathbf{u}_{k-1}-\mathbf{u}_{k-1}^{}\\|_{\mathbf{R}_{k}}$
s.t.	$\displaystyle{\color[rgb]{0,0,0}\mathbf{x}_{k+1}=\mbox{\boldmath$f$}\left(\mathbf{x}_{k},\mathbf{u}_{k},\mathbf{r}_{k},\Delta t_{k}\right)+\delta\mbox{\boldmath$f$}_{\bm{\theta}}\left(\mathbf{x}_{k},\mathbf{u}_{k},\mathbf{r}_{k}\right)\Delta t_{k}},$
	$\displaystyle\mathbf{u}_{k}\in{\cal U}_{k},\quad\forall k=0,...,K-1,$	(3)

where $\mathbf{Q}_{k}$ and $\mathbf{R}_{k}$ are positive definite weight matrices, and ${\cal U}_{0:K-1}$ represents an input constraint set.

Achieving accurate target jumping with MPC requires not only an accurate model but also the coverage of the final state upon landing, which represents the accuracy of the jumping target. Using a fixed-frequency has shown to be efficient for locomotion tasks with limited aerial phases; however, this can face two challenges in long-flight maneuvers. On the one hand, using a fine time-step dirscretization enhances the model prediction accuracy but requires a large number of steps to capture the entire flight phase, thereby increasing the optimization size and computational cost. On the other hand, using a coarse time-step discretization allows capturing the entire flight phase efficiently but can sacrifice model prediction accuracy. For jumping tasks, different phases may require different model resolutions, e.g., fine time resolution during the contact phase but coarser time resolution during the flight phase as the model complexity is reduced due to unavailable force control and contact. Therefore, we propose to learn a model for dynamic maneuvers (Problem 1) with an variable-frequency scheme that uses a coarse time discretization in the flight phase and a fine time discretization in the contact phase. Importantly, this variable-frequency scheme is also synchronized with the MPC control of the learned model (Problem 2) by utilizing the same time steps for both fitting the learned model and performing the MPC optimization. This synchronization ensures that the same discretization errors are leveraged when using the model for MPC predictions. Thus, in our formulation $t_{0:K}$ does not have equal time steps and is a mixture of fine and coarse discretization capturing the contact and flight phases. Note that we use a constant time step $\Delta t_{k}$ during the different jumping phases. It is possible to use $\Delta t_{k}$ as input to the residual model neural networks. However, we need to collect a substantial amount of data for a wide range of $\Delta t_{k}$ to avoid overfitting, which would also increase the size of the neural network and computational cost.

III System Overview

Fig. 2 presents an overview of our system architecture. Our approach consists of two stages: variable-frequency model learning (Sec. IV) and variable-frequency MPC (Sec. V), which solve Problem 1 and Problem 2, respectively.

We synchronize the variable-frequency scheme for both model learning and MPC execution with the same (1) variable prediction timesteps $\Delta t_{{\cal C}}$ and $\Delta t_{{\cal F}}$ for contact and flight phase, respectively, (2) the same horizon length $K$ for all data collection, training, and MPC, and (3) the same contact schedule. Full-body trajectory optimization (TO) [QuannICRA19] is utilized to generate jumping references for various targets, including body states $\mathbf{x}^{*}$ , joint states $\{\mathbf{q}_{\mathbf{J}}^{*},\dot{\mathbf{q}}_{\mathbf{J}}^{*}\}$ , ground contact force $\mathbf{u}^{*}$ , and foot positions $\mathbf{r}^{*}$ . For data collection, we combine a baseline MPC using a nominal SRBD model and joint PD controller, generating diverse motions under disturbances. For training, we design a neural network to learn the discretized residual model $\delta\mbox{\boldmath$f$}_{\bm{\theta}}$ with variable sampling time via supervised learning. For control, we design an variable-frequency MPC using the learned dynamics to track a desired reference trajectory obtained from full-body TO. The feedback states from the robot include global body’s CoM $\mathbf{x}$ , and joint states $\mathbf{q}_{\mathbf{J}},\dot{\mathbf{q}}_{\mathbf{J}}\in\mathbb{R}^{4}$ , and foot positions $\mathbf{r}\in\mathbb{R}^{4}$ .

IV Variable-Frequency Model Learning

In this section, we describe how to learn the residual dynamics $\delta\mbox{\boldmath$f$}_{\bm{\theta}}$ from data with a variable-frequency scheme that can cover the entire flight phase, the final state upon landing, and the contact transitions between jumping phases.

IV-A Learning Dynamics with Variable-Frequency

We consider a 2D jumping motion on a legged robot with point foot contact, e.g., quadruped robot, with generalized coordinates $\bm{\mathfrak{q}}=\begin{bmatrix}\mathbf{p}^{\top}&\phi\end{bmatrix}^{\top}\!\in\mathbb{R}^{3}$ , where $\mathbf{p}\in\mathbb{R}^{2}$ is the CoM’s position, and $\phi\in\mathbb{R}$ is the body pitch angle. We define the generalized robot’s velocity as $\bm{\zeta}=\begin{bmatrix}\mathbf{v}^{\top}&\omega\end{bmatrix}^{\top}\in\mathbb{R}^{3}$ , where $\mathbf{v}\in\mathbb{R}^{2}$ and $\omega\in\mathbb{R}$ are the linear and angular velocity. Both $\mathfrak{q}$ and $\bm{\zeta}$ are expressed in world-frame coordinates. The robot state is $\mathbf{x}=\begin{bmatrix}\bm{\mathfrak{q}}^{\top}&\bm{\zeta}^{\top}&g\end{bmatrix}^{\top}\in\mathbb{R}^{7}$ , where the (constant) gravity acceleration $g$ is added to obtain a convenient state-space form [Carlo2018]. We define $\mathbf{R}(\phi)=\begin{bmatrix}\text{cos}(\phi)&-\text{sin}(\phi);\text{sin}(\phi)&\text{cos}(\phi)\end{bmatrix}\in\mathbb{R}^{2\times 2}$ as a rotation matrix of the main body, which converts $\mathbf{r}_{i,b}$ (a relative foot $i\in\{1,2\}$ position relative to the body’s CoM in the body frame) to the world frame $\mathbf{r}_{i}=\mathbf{R}\mathbf{r}_{i,b}$ . We denote $\mathbf{r}=\begin{bmatrix}\mathbf{r}_{1}^{\top}&\mathbf{r}_{2}^{\top}\end{bmatrix}^{\top}\in\mathbb{R}^{4}$ . With the force control input for the front and rear legs as $\mathbf{u}=\begin{bmatrix}\mathbf{u}_{f}^{\top}&\mathbf{u}_{r}^{\top}\end{bmatrix}^{\top}\in\mathbb{R}^{4}$ , the nominal discrete-time SRBD model can be written as:

\mbox{\boldmath$f$}\left(\mathbf{x}_{k},\mathbf{u}_{k},\mathbf{r}_{k},\Delta t_{k}\right)=\mathbf{A}_{k}\mathbf{x}_{k}+\mathbf{B}_{k}(\mathbf{r}_{k})\mathbf{u}_{k},

(4)

where $\mathbf{A}_{k}=\mathbf{I}_{7}+\mathbf{A}_{ct}\Delta t_{k}$ , $\mathbf{B}_{k}(\mathbf{r}_{k})=\mathbf{B}_{ct}(\mathbf{r}_{k})\Delta t_{k}$ , $\Delta t_{k}$ is the time step (e.g., $\Delta t_{{\cal C}}$ or $\Delta t_{{\cal F}}$ for the contact or flight phase, respectively), and $\mathbf{A}_{ct}$ and $\mathbf{B}_{ct}$ are obtained from the continuous-time robot dynamics:

\mathbf{A}_{ct}=\begin{bmatrix}\bf 0&\mathbf{I}_{3}&\bf 0\\ \bf 0&\bf 0&\mathbf{e}_{3}\\ \bf 0&\bf 0&0\end{bmatrix},\mathbf{B}_{ct}(\mathbf{r})=\begin{bmatrix}\bf 0&\bf 0\\ \mathbf{I}_{2}/m&\mathbf{I}_{2}/m\\ [\mathbf{r}_{1}]_{\times}^{\top}/J&[\mathbf{r}_{2}]_{\times}^{\top}/J\\ \bf 0&\bf 0\end{bmatrix},

where $\mathbf{A}_{ct}\in\mathbb{R}^{7\times 7}$ , $\mathbf{B}_{ct}\in\mathbb{R}^{7\times 4}$ , $\mathbf{e}_{3}=\begin{bmatrix}0&-1&0\end{bmatrix}^{\top}$ , $m$ and $J$ are mass and moment inertial of the body, $[\mathbf{r}_{i}]_{\times}=\begin{bmatrix}r_{iz}&-r_{ix}\end{bmatrix}^{\top}\in\mathbb{R}^{2}$ . The residual term $\delta\mbox{\boldmath$f$}_{\bm{\theta}}\left(.\right)$ in (1) is

\delta\mbox{\boldmath$f$}_{\bm{\theta}}\left(.\right)=\mathbf{h}_{\bm{\theta}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{k}+\mathbf{G}_{\bm{\theta}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{k}\mathbf{u}_{k},

(5)

where $\mathbf{h}_{\bm{\theta}}(\mathbf{x}_{k},\mathbf{r}_{k})$ and $\mathbf{G}_{\bm{\theta}}(\mathbf{x}_{k},\mathbf{r}_{k})$ are represented by neural networks with learning parameters $\bm{\theta}$ . Since $\mathbf{u}_{k}=\bf 0$ during the flight phase, $\mathbf{G}_{\bm{\theta}}\mathbf{u}_{k}=\bf 0$ . We have two separate models for the contact ( ${\cal C}$ ) and flight phase ( ${\cal F}$ ):


	$\displaystyle({\cal C}):~{}\mathbf{x}_{k+1}=\mathbf{A}_{{\cal C}}\mathbf{x}_{k}+\mathbf{B}_{{\cal C}}(\mathbf{r}_{k})\mathbf{u}_{k}$
	$\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}+\mathbf{h}_{\bm{\theta}_{1}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{{\cal C}}+\mathbf{G}_{\bm{\theta}_{1}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{{\cal C}}\mathbf{u}_{k},$		(6a)
	$\displaystyle({\cal F}):~{}~{}~{}\mathbf{x}_{k+1}=\mathbf{A}_{{\cal F}}\mathbf{x}_{k}+\mathbf{h}_{\bm{\theta}_{2}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{{\cal F}}$		(6b)

where $\mathbf{A}_{{\cal C}}=\mathbf{I}_{7}+\mathbf{A}_{ct}\Delta t_{{\cal C}}$ , $\mathbf{B}_{{\cal C}}(\mathbf{r}_{k})=\mathbf{B}_{ct}(\mathbf{r}_{k})\Delta t_{{\cal C}}$ , and $\mathbf{A}_{{\cal F}}=\mathbf{I}_{7}+\mathbf{A}_{ct}\Delta t_{{\cal F}}$ . We roll out the dynamics based on (6), starting from initial state $\mathbf{x}_{0}$ with given control input sequence $\mathbf{u}_{0:K-1}$ to obtain a predicted state sequence $\tilde{\mathbf{x}}_{1:K}$ . Using the variable-frequency scheme, the state prediction accounts for contact transitions (feet taking off the ground), a long flight phase, and the final robot state upon landing. We define the loss functions in Problem 1 as follows:

	$\displaystyle\mathcal{L}(\mathbf{x},\tilde{\mathbf{x}})$	$\displaystyle=\\|\bm{\mathfrak{q}}-\tilde{\bm{\mathfrak{q}}}\\|^{2}_{2}+\\|\bm{\zeta}-\tilde{\bm{\zeta}}\\|^{2}_{2}$		(7)
	$\displaystyle\mathcal{L}_{reg}(\bm{\theta})$	$\displaystyle=\alpha_{1}\\|\mathbf{h}_{\bm{\theta}}\\|+\alpha_{2}\\|\mathbf{G}_{\bm{\theta}}\\|.$		(7)

The parameters $\bm{\theta}=[\bm{\theta}_{1},\bm{\theta}_{2}]$ for each phase are updated by gradient descent to minimize the total loss.

IV-B Data Collection

For model learning, we directly collect state-control trajectories from hardware experiments by implementing an MPC controller with the nominal dynamics model (4) and a reference body trajectory $\mathbf{x}^{*}$ obtained from full-body TO. The TO assumes jumping from flat and hard ground with point foot contact. We generated various jumps to different targets under different disturbances (e.g., blocks under the robot feet with random height) to obtain a diverse dataset.

While the MPC aims to track the body reference trajectory $\mathbf{x}_{0:K}^{*}$ , a joint PD controller is used to track the joint reference trajectory $(\mathbf{q}_{\mathbf{J}}^{*},\mathbf{\dot{q}}_{\mathbf{J}}^{*})$ from the full-body TO via $\bm{\tau}_{pd,setpoint}=\mathbf{K}_{p}(\mathbf{q}_{\mathbf{J}}^{*}-\mathbf{q}_{\mathbf{J}})+\mathbf{K}_{d}(\mathbf{\dot{q}}_{\mathbf{J}}^{*}-\mathbf{\dot{q}}_{\mathbf{J}})$ . Thus, the evolution of the robot states is governed by the combination of MPC and the joint PD controller. We collected the trajectory dataset ${\cal D}$ with inputs $\mathbf{u}=\mathbf{u}_{mpc}+\mathbf{u}_{pd},$ where $\mathbf{u}_{pd}=\left(\mathbf{\mathbf{J}}(\mathbf{q}_{\mathbf{J}})^{\top}\mathbf{R}^{\top}\right)^{-1}\bm{\tau}_{pd,setpoint}$ , $\mathbf{J}(\mathbf{q}_{\mathbf{J}})$ is the foot Jacobian..

The dataset is collected at the different time steps for the contact and flight phases, i.e., $\Delta t_{{\cal C}}$ and $\Delta t_{{\cal F}}$ , respectively. The data for each jump is then chunked by shifting a sliding window, size of $K$ by 1 timesteps. Let $N$ be the number of collected state-control data points for each jump and $H$ be the number of jumps. We then obtain $D=H\times(N-K+1)$ state-control trajectories in total.

V Variable-Frequency MPC with Learned Dynamics

In this section, we design a variable-frequency MPC controller for the learned dynamics (6) to track a desired jumping reference trajectory obtained from full-body TO. For a given robot state $\mathbf{x}_{0}$ , we formulate MPC as:


$\displaystyle\min_{\mathbf{u}_{0:K-1}}$	$\displaystyle\;\sum_{k=1}^{K}\\|\mathbf{x}_{k}-\mathbf{x}_{k}^{}\\|_{\mathbf{Q}_{k}}+\\|\mathbf{u}_{k-1}-\mathbf{u}_{k-1}^{}\\|_{\mathbf{R}_{k}},$	(8a)
s.t.	$\displaystyle({\cal C}):~{}~{}\mathbf{x}_{k+1}=\mathbf{A}_{{\cal C}}\mathbf{x}_{k}+\mathbf{B}_{{\cal C}}(\mathbf{r}_{k})\mathbf{u}_{k}$
	$\displaystyle~{}~{}~{}~{}~{}+\mathbf{h}_{\bm{\theta}_{1}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{{\cal C}}+\mathbf{G}_{\bm{\theta}_{1}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{{\cal C}}\mathbf{u}_{k},$	(8b)
	$\displaystyle({\cal F}):~{}~{}\mathbf{x}_{k+1}=\mathbf{A}_{{\cal F}}\mathbf{x}_{k}+\mathbf{h}_{\bm{\theta}_{2}}(\mathbf{x}_{k},\mathbf{r}_{k})\Delta t_{{\cal F}},$	(8c)
	$\displaystyle\underline{\mathbf{c}}_{k}\leq\mathbf{C}_{k}\mathbf{u}_{k}\leq\bar{\mathbf{c}}_{k},\;\forall k=0,...,K-1,$	(8d)
	$\displaystyle\mathbf{D}_{k}\mathbf{u}_{k}=\bm{0},\qquad\quad\forall k=0,...,K-1,$	(8e)

where (8d) represents input constraints related to friction cone and force limits and (8e) aims to nullify the force on the swing legs based on the contact schedule.

With the MPC horizon including $K_{{\cal C}}$ steps in the contact phase and $K_{{\cal F}}=K-K_{{\cal C}}$ steps in the flight phase, we define $\tilde{\mathbf{U}}_{{\cal C}}=[\tilde{\mathbf{u}}_{0}^{\top},\tilde{\mathbf{u}}_{1}^{\top},...,\tilde{\mathbf{u}}_{K_{{\cal C}}-1}^{\top}]^{\top}\in\mathbb{R}^{K_{{\cal C}}\times 4}$ as a concatenation of the control inputs in the contact phase. With this notation, the predicted trajectory is:


$\displaystyle({\cal C}):~{}\tilde{\mathbf{X}}_{{\cal C}}$	$\displaystyle={\cal A}_{\cal C}\mathbf{x}_{0}+({\cal B}_{{\cal C}}+{\cal B}_{{\cal C},\bm{\theta}})\tilde{\mathbf{U}}_{{\cal C}}+\mathbf{H}_{{\cal C},\bm{\theta}}$	(9a)
$\displaystyle({\cal F}):~{}\tilde{\mathbf{X}}_{{\cal F}}$	$\displaystyle={\cal A}_{{\cal F}}\tilde{\mathbf{x}}_{K_{c}}+\mathbf{H}_{{\cal F},\bm{\theta}}$	(9b)

where $\tilde{\mathbf{X}}_{{\cal C}}=[\tilde{\mathbf{x}}_{1}^{\top},\tilde{\mathbf{x}}_{2}^{\top},...,\tilde{\mathbf{x}}_{K_{c}}^{\top}]^{\top}$ and $\tilde{\mathbf{X}}_{{\cal F}}=[\tilde{\mathbf{x}}_{K_{c}+1}^{\top},\tilde{\mathbf{x}}_{K_{c}+2}^{\top},...,\tilde{\mathbf{x}}_{K}^{\top}]^{\top}$ denote the concatenation of predicted states belonging to the contact and flight phase, respectively. The matrices ${\cal A}_{{\cal C}}$ and ${\cal A}_{{\cal F}}$ are computed as

\left\{\begin{array}[]{lr}{\cal A}_{{\cal C}}=\begin{bmatrix}(\mathbf{A}_{{\cal C}})^{\top}(\mathbf{A}_{{\cal C}}^{2})^{\top}\ldots(\mathbf{A}_{{\cal C}}^{K_{c}})^{\top}\end{bmatrix}\in\mathbb{R}^{7K_{c}\times 7},\\ {\cal A}_{{\cal F}}=\begin{bmatrix}(\mathbf{A}_{{\cal F}})^{\top}(\mathbf{A}_{{\cal F}}^{2})^{\top}\ldots(\mathbf{A}_{{\cal F}}^{K_{{\cal F}}})^{\top}\end{bmatrix}\in\mathbb{R}^{7K_{{\cal F}}\times 7}.\end{array}\right.

While we use the current robot state for training the neural networks $\{\mathbf{G}_{\bm{\theta}}$ , $\mathbf{h}_{\bm{\theta}}\}$ as presented in Sec. IV, we utilize the reference trajectory ( $\mathbf{x}_{k}^{*},\mathbf{r}_{k}^{*}$ ) in future MPC horizons for guiding the MPC since the future robot states are not available at each MPC update ([continuous_jump_bipedal, YanranDingTRO, GabrielICRA2021, ZhitaoIROS22, park2017high]). The reference will be used to compute the output of the neural networks $\mathbf{G}_{\bm{\theta}}$ and $\mathbf{h}_{\bm{\theta}}$ in future horizons. In particular, we still use the current robot state and foot position $(\mathbf{x}_{0},\mathbf{r}_{0})$ to evaluate the residual term $\mathbf{B}_{\bm{\theta},0}=\mathbf{G}_{\bm{\theta}}(\mathbf{x}_{0},\mathbf{r}_{0})\Delta t_{0}$ for each MPC update, then define $\mathbf{B}_{\bm{\theta},k}=\mathbf{G}_{\bm{\theta}}(\mathbf{x}_{k}^{*},\mathbf{r}_{k}^{*})\Delta t_{k}$ , $k=1,...,K_{{\cal C}}$ . In (9a), we obtain $[{\cal B}_{{\cal C}}]_{\{m,n\}}=\mathbf{A}_{{\cal C}}^{m-n}\mathbf{B}_{n-1}$ and its residual $\left[{\cal B}_{{\cal C},\bm{\theta}}\right]_{\{m,n\}}=\mathbf{A}_{{\cal C}}^{m-n}\mathbf{B}_{\bm{\theta},n-1}~{}\textit{if}~{}m\geq n,\textit{and}~{}\bm{0}~{}\textit{otherwise}.$

For the sake of notation, we define $\chi_{k}=(\mathbf{x}_{k},\mathbf{r}_{k})$ , $\chi_{k}^{*}=(\mathbf{x}_{k}^{*},\mathbf{r}_{k}^{*})$ as the actual and reference robot state and foot trajectory. The residual matrix $\mathbf{H}_{{\cal C},\bm{\theta}}$ in (9) is $\mathbf{H}_{{\cal C},\bm{\theta}}=\Delta t_{{\cal C}}$ $\mathbf{S}_{{\cal C}}\left[\mathbf{h}_{\bm{\theta}}(\chi_{0})^{\top}~{}\mathbf{h}_{\bm{\theta}}(\chi_{1}^{*})^{\top}~{}...~{}\mathbf{h}_{\bm{\theta}}(\chi_{K_{c}-1}^{*})^{\top}\right]^{\top}$ , where $\mathbf{S}_{{\cal C}}$ is lower triangular matrix with each entry $[\mathbf{S}_{{\cal C}}]_{\{m,n\}}=\mathbf{A}_{{\cal C}}^{m-n}\in\mathbb{R}^{7\times 7}$ , $m\geq n$ . The residual matrix $\mathbf{H}_{{\cal F},\bm{\theta}}=\Delta t_{{\cal F}}\mathbf{S}_{{\cal F}}\left[\mathbf{h}_{\bm{\theta}}(\chi_{K_{c}}^{*})^{\top}\quad\ldots\quad\mathbf{h}_{\bm{\theta}}(\chi_{K}^{*})^{\top}\right]^{\top}$ where $\mathbf{S}_{{\cal F}}$ is lower triangular matrix with each entry defined $[\mathbf{S}_{{\cal F}}]_{\{m,n\}}=\mathbf{A}_{{\cal F}}^{m-n}\in\mathbb{R}^{7\times 7}$ , $m\geq n$ . The state prediction at the end of $K_{{\cal C}}$ steps can be computed as

\tilde{\mathbf{x}}_{K_{c}}=\mathbf{A}_{{\cal C}}^{K_{c}}\mathbf{x}_{0}+(\mathbf{S}+\mathbf{S}_{\theta})\tilde{\mathbf{U}}_{{\cal C}}+\bm{\phi}_{\bm{\theta}},

(10)

where $\mathbf{S}=\left[\mathbf{A}_{{\cal C}}^{K_{c}-1}\mathbf{B}_{0}\quad\mathbf{A}_{{\cal C}}^{K_{c}-2}\mathbf{B}_{1}\quad...\quad\mathbf{B}_{K_{c}-1}\right]$ , $\mathbf{S}_{\bm{\theta}}=\left[\mathbf{A}_{{\cal C}}^{K_{c}-1}\mathbf{B}_{\bm{\theta},0}\quad\mathbf{A}_{{\cal C}}^{K_{c}-2}\mathbf{B}_{\bm{\theta},1}\quad...\quad\mathbf{B}_{\bm{\theta},K_{c}-1}\right]$ , and $\bm{\phi}_{\bm{\theta}}=\left[\mathbf{A}_{{\cal C}}^{K_{c}-1}\mathbf{h}_{\bm{\theta},0}\quad\mathbf{A}_{{\cal C}}^{K_{c}-2}\mathbf{h}_{\bm{\theta},1}\quad...\quad\mathbf{h}_{\bm{\theta},K_{c}-1}\right]$ . Substituting (10) into (9) yields the state prediction as

\tilde{\mathbf{X}}_{{\cal F}}=\mathbf{A}_{{\cal F}}\mathbf{A}_{{\cal C}}^{K_{c}}\mathbf{x}_{0}+\mathbf{A}_{{\cal F}}(\mathbf{S}+\mathbf{S}_{\theta})\tilde{\mathbf{U}}_{{\cal C}}+\mathbf{A}_{{\cal F}}\bm{\phi}_{\bm{\theta}}+\mathbf{H}_{{\cal F},\bm{\theta}}.

(11)

\tilde{\mathbf{X}}=\begin{bmatrix}\tilde{\mathbf{X}}_{{\cal C}}\\ \tilde{\mathbf{X}}_{{\cal F}}\end{bmatrix}=\mathbf{A}_{\text{qp}}+\left(\mathbf{B}_{\text{qp}}+\mathbf{B}_{\bm{\theta},\text{qp}}\right)\tilde{\mathbf{U}}+\mathbf{H}_{\bm{\theta}}

(12)

where, $\mathbf{A}_{\text{qp}}=\begin{bmatrix}{\cal A}_{{\cal C}}\\ {\cal A}_{{\cal F}}\mathbf{A}_{{\cal C}}^{K_{c}}\end{bmatrix}$ , $\mathbf{B}_{\text{qp}}=\begin{bmatrix}{\cal B}_{{\cal C}}&\bf 0\\ {\cal A}_{{\cal F}}\mathbf{S}&\bf 0\end{bmatrix}$ , $\mathbf{B}_{\bm{\theta},\text{qp}}=\begin{bmatrix}{\cal B}_{{\cal C},\bm{\theta}}&\bf 0\\ {\cal A}_{{\cal F}}\mathbf{S}_{\bm{\theta}}&\bf 0\end{bmatrix}$ , $\mathbf{H}_{\bm{\theta}}=\begin{bmatrix}\mathbf{H}_{{\cal C},\bm{\theta}}\\ \mathbf{H}_{{\cal F},\bm{\theta}}+{\cal A}_{{\cal F}}\bm{\phi}_{\bm{\theta}}\end{bmatrix}$ , and $\tilde{\mathbf{U}}=\begin{bmatrix}\tilde{\mathbf{U}}_{{\cal C}}\\ \tilde{\mathbf{U}}_{{\cal F}}\end{bmatrix}$ .

The objective of the MPC in (8) can be rewritten as:

J(\mathbf{U})=\|\tilde{\mathbf{X}}-\mathbf{X}^{*}\|_{\mathbf{P}}^{2}+\|\tilde{\mathbf{U}}-\mathbf{U}^{*}\|_{\mathbf{Q}}^{2},

(13)

leading to a quadratic program (QP):

\displaystyle\underset{\mathbf{U}}{\text{min}}

\displaystyle\mathbf{U}^{\top}\bm{\alpha}\mathbf{U}+\bm{\beta}^{\top}\mathbf{U}\quad\text{s.t.}\;\;\mathbf{U}_{k}\in{\cal U}_{k},

(14)

where

	$\displaystyle\bm{\alpha}$	$\displaystyle=(\mathbf{B}_{\text{qp}}+\mathbf{B}_{\bm{\theta},\text{qp}})^{\top}\mathbf{Q}(\mathbf{B}_{\text{qp}}+\mathbf{B}_{\bm{\theta},\text{qp}})+\mathbf{R},$
	$\displaystyle\bm{\beta}$	$\displaystyle=2(\mathbf{B}_{\text{qp}}+\mathbf{B}_{\bm{\theta},\text{qp}})^{\top}\mathbf{Q}\left(\mathbf{A}_{\text{qp}}\mathbf{x}_{k}+\mathbf{H}_{\bm{\theta}}-\mathbf{X}^{}\right)-\mathbf{R}\mathbf{U}^{},$
	$\displaystyle{\cal U}_{k}$	$\displaystyle=\{\mathbf{u}\in\mathbb{R}^{4}\mid\underline{\mathbf{c}}_{k}\leq\mathbf{C}_{k}\mathbf{u}_{k}\leq\bar{\mathbf{c}}_{k},\mathbf{D}_{k}\mathbf{u}_{k}=\bm{0}\}.$

VI Evaluation

VI-A Experiment Setup

We evaluated our approach using a A1 robot. To get data for real-time control, we utilized a motion capture system for estimating the position and orientation of the robot trunk at $1$ kHz with position errors of $1$ mm. We obtain velocity data from position data using forward finite-difference method.

We used Pytorch to implement and train the residual dynamics model. The trained Pytorch model was converted to a torch script trace, which in turn, was loaded in our MPC implementation in C++. The MPC is solved by a QP solver (qpOASES). We utilized the optimization toolbox CasADi [casadi] to set up and solve the full-body TO for various jumping targets within $[0.3,0.8]$ m on a flat ground. We considered jumping motions with contact phase (C) consisting of an all-leg contact period (e.g., $500$ ms) and a rear-leg contact period (e.g., $300$ ms). With the flight phase (F) scheduled for $400$ ms, the front legs and rear legs become swing legs (SW) up to $58\%$ and $33\%$ of the entire jumping period, respectively. Jumping maneuvers feature long flight phases, i.e., the robot can jump up to $4$ times its normal standing height during the mid-air period. All jumping experiments are executed with sufficient battery level ( $>90\%$ )

VI-B Data Collection and Training

TABLE I: Training Parameters

Parameter	Symbol	Value
Variable time steps	$\Delta t_{{\cal C}},\Delta t_{{\cal F}}$	$\{25,100\}(ms)$
Prediction steps	$K$	$10$
NN architecture	$\mathbf{h}_{\bm{\theta}_{1}}(\mathbf{{\mathfrak{q}}},\mathbf{r})$	$7$ - $400$ Tanh - $400$ Tanh - $3$
	$\mathbf{G}_{\bm{\theta}_{1}}(\mathbf{{\mathfrak{q}}},\mathbf{r})$	$7$ - $1000$ Tanh - $1000$ Tanh - $12$
	$\mathbf{h}_{\bm{\theta}_{2}}(\mathbf{{\mathfrak{q}}},\mathbf{r})$	$7$ - $400$ Tanh - $400$ Tanh - $3$
Learning rate	$\gamma$	$2.10^{-4}$
Regulation weights	$\alpha_{1},\alpha_{2}$	$10^{-3},10^{-3}$
Total training steps	$N_{train}$	$2.10^{4}$

We aim to learn a residual dynamics model that is not trajectory-specific and can generalize to a whole family of forward jumps. To collect a sufficient dataset for training, we utilized a baseline MPC controller with a nominal model (Sec. IV-B) to generate diverse trajectories under a variety of unknown disturbances (box under the front feet with random height within $[0,8]$ cm) and zero-height jumping targets. We collected data from $H=20$ jumps with $80\%$ for training and $20\%$ for testing. The data points were sampled with variable sampling time $\Delta t_{{\cal C}}=25$ ms, $\Delta t_{{\cal F}}=100$ ms, and horizon length of $K=10$ .

The training parameters are listed in Table LABEL:tab:training_params. We used fully-connected neural networks with architectures listed in Table LABEL:tab:training_params: the first number is the input dimension, the last number is the output dimension, and the numbers in between are the hidden layers’ dimensions and activation functions. The output of $\mathbf{G}_{\bm{\theta}_{1}}$ is then converted to a $3\times 4$ matrix. The training and testing losses are illustrated in Fig. 3.

VI-C Comparative Analysis

VI-C1 Evaluation on testing dataset

We rolled out the learned dynamics with variable frequency to predict the future state trajectories and compared with (i) those of the variable-frequency nominal model, (ii) those of the fixed-frequency learned model, and (iii) the ground-truth states in Fig. 4. The fixed-frequency learned model is trained with $K=10$ and $\Delta t=25$ ms. The figure shows that our proposed learned model (red lines) outperforms variable-frequency nominal model (green lines) and fixed-frequency learned model (blue lines). We highlight a large deviation in trajectory prediction of variable-frequency nominal model with the others. This inaccuracy can be explained by the use of conventional SRBD for the entire prediction horizon and the coarse Euler integration for the flight phase. Our variable-frequency learned model helps address the inaccuracies introduced by the nominal model and keeps sufficient accuracy of trajectory prediction while allowing real-time MPC by keeping a small number of decision variables.

VI-C2 Effect of prediction horizon

Fig. 5 compares the solving time for MPC with different horizons. With our variable-frequency scheme, we can use a few prediction horizons, e.g., $K=10$ ( $6$ for contact and $4$ for the flight phase), allowing QP solving time for each MPC update ((14)) to be $2$ ms only on average. This efficient computation enables real-time performance for MPC. Using a fixed-frequency scheme with sample time of $25$ ms requires MPC to use a large number of steps, e.g. $K=22$ ( $6$ for contact and $16$ for the whole flight phase). This long horizon yields a large optimization problem, which takes $\sim$ $30$ ms for each MPC update and deteriorates the real-time performance.

VI-C3 Execution

We verified that our proposed approach enables both robust and accurate target jumping to various targets. We compared (a) nominal model $\&$ fixed-frequency MPC, (b) nominal model $\&$ variable-frequency MPC, (c) fixed-frequency learned model $\&$ fixed-frequency MPC, and (d) variable-frequency learned model $\&$ variable-frequency MPC (our approach). The fixed-frequency scheme uses $(\Delta t_{{\cal C}},\Delta t_{{\cal F}})=(25,25)ms$ and variable-frequency one utilizes $(\Delta t_{{\cal C}},\Delta t_{{\cal F}})=(25,100)ms$ . All MPCs use the prediction horizon $K=10$ , thus the fixed-frequency MPC does not cover the entire flight phase.

For each combination, we perform $23$ jumps consisting of (i) 3 single jumps for each of five targets: $x=40$ cm, $x=50$ cm, $x=60$ cm, $(x,z)=(60,20)$ cm for jump on box, and $x=70$ cm, and (ii) $8$ continuous jumps of $60$ cm. This yields a total of $92$ jumps for comparison. Fig. 6 shows the average final jump target errors of four different MPC combinations across different jumping tasks. Our approach outperforms the methods that adopt fixed-frequency or nominal models, reducing the jumping distance error up to $8$ times. Our method demonstrates successful jumping on a box with landing angle errors $<15^{\circ}$ , while the fixed-frequency MPC or nominal model MPC fail, as shown in Fig. 7. Note that the box-jumping task is not executed during data collection, verifying the scalability of our method to generalize to unseen reference trajectories and unseen tasks. With our method, the robot also successfully jumps under model uncertainty, e.g, an unknown $5cm$ block ( $30\%$ robot standing height) placed under the front feet (Fig. 7). This task uses a reference trajectory designed for flat ground and demonstrates an example where the robot significantly deviates from the reference trajectory.

We also studied the effect of uncertainty in the model by evaluating the jumping performance if we only utilize a joint PD controller to track the joint reference from full-body TO [QuannICRA19], via $\bm{\tau}_{pd}=\bm{\tau}_{ff}+\mathbf{K}_{p}(\mathbf{q}_{\mathbf{J}}^{*}-\mathbf{q}_{\mathbf{J}})+\mathbf{K}_{d}(\mathbf{\dot{q}}_{\mathbf{J}}^{*}-\mathbf{\dot{q}}_{\mathbf{J}})$ . The full-order model can be conservative by assuming hard feet, point-to-surface contact, and hard ground. These assumptions, however, are not valid for the Unitree A1 robot equipped with deformable feet [Zachary_RAL_online_calibration_2022]. Uncertainties due to DC motors working at extreme conditions or motor deficiency are difficult to model and are normally ignored in the full-body model. These factors affect the model accuracy and prevent the robot from reaching the jumping target if we solely rely on the joint PD, as can be seen in Fig. 7.

We evaluate the target accuracy and robustness with our method for continuous jumps on flat ground. The results are presented in Fig. 9, showing some selected snapshots of the first and the last jump to the final target $2.4$ m. Our method achieves the highest target accuracy, allowing the robot to traverse $2.47$ m with an average distance error of only $1.75$ cm per jump. During flight periods, the robot can jump up to $4\times$ its standing height.

Note that computing residual matrices $\mathbf{G}_{{\cal C},\bm{\theta}},\mathbf{H}_{{\cal C},\bm{\theta}},\mathbf{H}_{{\cal F},\bm{\theta}}$ can consume significant time if we feed the neural network with $(\mathbf{x}_{0},\mathbf{u}_{0}),(\mathbf{x}_{1}^{*},\mathbf{u}_{1}^{*}),...,(\mathbf{x}_{K}^{*},\mathbf{u}_{K}^{*})$ sequentially. To improve computational efficiency, we combined the current state and reference as a batch, then feed-forward the entire batch to the learned neural networks all at once, reducing the neural network query time to less than $1$ ms.

We further evaluate MPC with different models learned with other timing choices of $(\Delta t_{c},\Delta t_{f})=\{(50,50),(25,80)\}$ ms, in addition to the two timesteps $(\Delta t_{c},\Delta t_{f})=\{(25,25),(25,100)\}$ we already discussed above.

TABLE II: Distance errors (cm) when varying coarseness of timesteps

(\Delta t_{{\cal C}},\Delta t_{{\cal F}})

ms. Red colors indicate jumping failure as robot falls short of the box (positive values). The distance errors are measured by the motion capture system at the end of the predefined flight period (

1200

ms). Supplemental video: https://youtu.be/2QjZkARs1mU.

Task	$(25,25)$	$(50,50)$	$(25,80)$	$(25,100)$
Box-Jump	$18.3\pm 3.7$	$10.8\pm 2.1$	$-1.8\pm 0.9$	$-2.0\pm 1.3$

Three jumps are conducted separately for each timing choice of jumping on box task, and the experiments are summarized in Tab. LABEL:tab:vary_timestep. We also achieved successful jumps using the variable-frequency scheme $(25,80)$ ms. We also observed that using higher timestep in contact to cover the whole flight phase even with fixed-frequency scheme, e.g., $(50,50)$ ms does not guarantee successful jumps. This confirms our rationale that due to the higher complexity of the robot dynamics during the contact phase, a coarse time step during this phase leads to higher accumulated prediction errors, even in the flight phase, causing task failures.

VI-D Continuous Jumping on Uneven Terrain

We tested the robustness and target accuracy of our learning-based MPC for continuous jumping on uneven terrains with a target of $60$ cm for each jump, as illustrated in Fig. 1 and Fig. 9. The terrain consisted of multiple blocks with random heights between $2-4$ cm, randomly placed on the ground. Since the robot legs can impact the ground early or late on the real robot, there is usually a mismatch between scheduled contact and actual contact states. Whenever both contacts happen (i.e., the landing phase $L$ starts), we activate a separate landing MPC to make a transition between two jumps in a short period of $\sim$ $200$ ms [continuous_jump]. The landing MPC is designed with simplified dynamics, $K=10$ , and $\Delta t=25$ ms. This aims to track the body reference trajectory from TO for continuous jumping [continuous_jump] and connects two jumps seamlessly. Figure 1 shows the actual trajectory of three continuous jumps, leaping around $175$ cm in total that yields only $1.67$ cm distance error ( $<3\%$ ) for each jump on the uneven terrain. Compared to our prior work [continuous_jump], we successfully achieve continuous jumping in hardware in this work. With our framework, we also achieve a target jumping error of less than $2$ cm ( $3.5\%$ ) on $34$ single and continuous jumps with different targets and uneven terrain.

VII Discussion and Conclusion

In this letter, we developed a learned-model MPC approach that enables both target accuracy and robustness for aggressive planar jumping motions. We learned a residual model from a limited dataset to account for the effect of leg dynamics and model mismatch with real hardware. Given the learned model, we designed a learning-based variable-frequency MPC that considers the jumping transitions, the entire flight phase, and the jumping target during the optimization process. We demonstrated the scalability of our approach to handle new jumping targets with unseen reference trajectories. While allowing real-time computation, our approach uses the reference trajectory to evaluate the neural network residual model and ensure a linear model for MPC, which sacrifices a certain degree of accuracy if the system does not operate around the reference under substantial disturbances. To overcome this limitation, our future work will explore the use of nonlinear MPC and address its associated higher computational cost. We will also generalize the model learning and control framework with varying contact schedules. Further, we will incorporate line-foot, rolling, and soft contact in learning aggressive legged locomotion maneuvers.