This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learning Rapid Turning, Aerial Reorientation, and Balancing using Manipulator as a Tail

Insung Yang1, and Jemin Hwangbo1 1 Robot and Artificial Intelligence Lab in the Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea {ycube13, jhwangbo} @kaist.ac.kr
Abstract

In this research, we investigated the innovative use of a manipulator as a tail in quadruped robots to augment their physical capabilities. Previous studies have primarily focused on enhancing various abilities by attaching robotic tails that function solely as tails on quadruped robots. While these tails improve the performance of the robots, they come with several disadvantages, such as increased overall weight and higher costs. To mitigate these limitations, we propose the use of a 6-DoF manipulator as a tail, allowing it to serve both as a tail and as a manipulator. To control this highly complex robot, we developed a controller based on reinforcement learning for the robot equipped with the manipulator. Our experimental results demonstrate that robots equipped with a manipulator outperform those without a manipulator in tasks such as rapid turning, aerial reorientation, and balancing. These results indicate that the manipulator can improve the agility and stability of quadruped robots, similar to a tail, in addition to its manipulation capabilities.

I INTRODUCTION

The tails of animals play crucial roles in enhancing their physical abilities, particularly in quadrupeds. For instance, cheetahs use their tails to change direction during high-speed chases, allowing them to catch prey more effectively [1]. Additionally, squirrels and lizards utilize their tails to adjust body orientation in mid-air, ensuring safe landings [2, 3]. Tails are also essential for balancing, swimming, hopping, courtship, and defense [4]. These versatile tails have garnered the attention of robotic engineers and have been established as solutions for addressing various challenges in the field of robotics.

Previous works have attached tails to quadruped robots to improve their stability and speed during turning and running [5, 6, 7, 8]. Some studies suggest that tails can aid the robot’s attitude control in aerial regions [9, 10, 11, 12]. There have also been attempts to improve the overall stability of robots by incorporating tails [13, 14]. These studies demonstrate that tails can enhance robotic performance in various tasks. However, adding a tail increases the complexity of the robot, leading to issues such as higher costs and increased weight. Consequently, the use of tails in practical applications may not always be efficient.

Refer to caption
Figure 1: A quadruped robot, Mini Cheetah, equipped with a WidowX250S 6-DoF manipulator, is executing a rapid 135135^{\circ} turn while running at 4.5 m/s. It learns to utilize the manipulator as a tail for sharp turns via reinforcement learning.

To alleviate these drawbacks and create a highly efficient tail capable of executing various tasks, we employ a manipulator as a tail rather than designing a tail solely for that purpose. Huang et al. [15] are also exploring the use of a manipulator as a tail. By employing a manipulator as a tail, it can simultaneously perform the roles of a gripper and a tail, thereby achieving high efficiency. In recent years, numerous studies have explored the use of manipulators in quadruped robots [16, 17, 18, 19]. The majority of these studies have focused on manipulators performing manipulation tasks. However, we have identified the potential for manipulators when utilized as tails.

Among various control methods, reinforcement learning (RL) has recently emerged as one of the most popular approaches for legged robots [20, 21, 22]. By using a deep reinforcement learning-based controller, many previous studies have successfully enabled high-complexity robots to execute complex behaviors [23, 24]. Inspired by these studies, we employ deep reinforcement learning (DRL) to develop a controller.

Beyond creating a robust controller, the primary consideration when incorporating a tail on a robot is whether its presence meaningfully enhances the robot’s overall performance. Our experiments demonstrate that a robot equipped with a manipulator outperforms one without a manipulator in tasks such as rapid turning, aerial reorientation, and balancing. Specifically, when a robot running at 4.5 m/s executed a 135135^{\circ} turn, the distance pushed out by centrifugal force was reduced by two-thirds compared to a robot without a manipulator. Additionally, a robot with a manipulator was able to land safely from initial angles of 9090^{\circ} to 120120^{\circ} at heights of 1.5 to 2.25 meters, whereas a robot without a manipulator failed to land under the same conditions. Finally, when subjected to external forces, a robot with a manipulator exhibited a higher survival rate compared to one without a manipulator.

Overall, this paper demonstrates that a manipulator can be an appropriate choice as a quadruped robot’s tail, enhancing the capabilities of a quadruped robot. Our main contributions are as follows.

  1. 1.

    We propose the utilization of a manipulator that functions as a tail.

  2. 2.

    We outline a method for controlling a robot equipped with a tail using reinforcement learning.

  3. 3.

    We demonstrate that a robot equipped with a manipulator exhibits improved performance in tasks such as rapid turning, aerial reorientation, and balancing.

Refer to caption
Figure 2: In the proposed reinforcement learning pipeline, two neural networks, namely the Actor and the Critic, are utilized. The Actor network generates the actions, while the Critic network computes the value function, which is essential for updating the Actor network. Both networks are structured as Multilayer Perceptron (MLP) and receive observations as inputs. These observations encompass two types: command, which is specified by the task, and robot state, which provides information about the robot’s current state. The actions generated by the Actor are conveyed to the environment through a Proportional-Derivative (PD) controller, and the resulting reward, based on the action’s effectiveness, is used to update the Actor network. This updating process employs the Proximal Policy Optimization (PPO) algorithm[25], refining the policy governed by the Actor network.

II METHOD

II-A Overview

Our goal is to develop a controller for a manipulator-mounted quadruped robot that performs well on various tasks, including rapid turning, aerial reorientation for safe landing, and balancing against external force. We utilize deep reinforcement learning to train a neural network policy. An overview of this deep reinforcement learning process is illustrated in Figure 2. In our study, we employ Proximal Policy Optimization (PPO), a deep reinforcement learning algorithm, to update the policy and utilize two distinct neural network architectures. The first network, termed the ”actor,” maps observations to actions and is structured as a Multi-Layer Perceptron (MLP) with hidden layers sized [512, 256, 128]. The second network, known as the ”critic,” evaluates the current state using an MLP with hidden layers sized [512, 256, 128]. Both networks are essential for the algorithm’s decision-making process and performance evaluation, facilitating the implementation of PPO in complex environments.

In our simulation, we deploy this controller on the Minicheetah robot mounted with a WidowX250S manipulator. We use RAISIM [26] as the simulation environment.

II-B Base

For three different tasks, we employ slightly different learning algorithms. However, the overarching methodology remains largely consistent. This section outlines the general algorithm constant for all tasks, and the subsequent sections will provide a detailed description of the specific adaptations made for each task.

II-B1 Observation

As depicted in Figure 2, the observation is divided into two distinct parts: the robot state and the command. The robot state comprises the joint state, base state, and joint history. We provide the joint velocity (p˙t\dot{p}_{t}) and joint position (ptp_{t}) for the joint state. For the base state, we include the base angular velocity (wx,y,zw_{x,y,z}), base linear velocity (vx,y,zv_{x,y,z}), and base orientation (ϕx,z\phi_{x,z}). Additionally, we provide the previous joint position history for the two preceding time steps (pt1p_{t-1}, pt2p_{t-2}). This type of observation remain constant across all tasks.

Unlike the robot state, the observation of commands(ocmdto^{t}_{\text{cmd}}) varies according to the task. For the task of rapid turning, we provide the command velocity and command yaw: ocmdt=(Vx,Vy,ϕcmd)o^{t}_{\text{cmd}}=(V_{x},V_{y},\phi_{\text{cmd}}). For the balancing task, we use the same command structure. In contrast, for the task of aerial reorientation, we do not provide any command observations. Since the goal is to land safely when falling in the air, no particular command is required. Without any command, the task can be sufficiently completed by termination.

TABLE I: General Constraint Reward
Reward Expression Reward Coefficient
Joint Position rp=kpptpnominal 2r_{p}=k_{p}\left\|p_{t}-p_{\text{nominal }}\right\|^{2} kpk_{p} -4.0
Joint Velocity rp˙=kp˙p˙t2r_{\dot{p}}=k_{\dot{p}}\left\|\dot{p}_{t}\right\|^{2} kp˙k_{\dot{p}} -0.005
Torque rτ=kττ2r_{\tau}=k_{\tau}\|\tau\|^{2} kτk_{\tau} -0.002
smoothness rs=ksptdesqt1des2r_{s}=k_{s}\left\|p_{t}^{des}-q_{t-1}^{des}\right\|^{2} ksk_{s} -4.0

II-B2 Action

Instead of using the non-scaled output of the actor network, we scale the output by the technique used by Ji et al. [27]. Here, the action space is scaled by a predefined factor, termed the action factor, and adds to the nominal joint position as follows: qtdes=qnominal+σaatq^{des}_{t}=q^{nominal}+\sigma_{a}\cdot a_{t}, where qtdesq^{des}_{t} is the desired position which would be transmitted to the PD controller (Kp:17Nmrad1K_{p}:17\text{N}\cdot\text{m}\cdot\text{rad}^{-1}, Kp:0.4Nmrad1K_{p}:0.4\text{N}\cdot\text{m}\cdot\text{rad}^{-1}), qnominalq^{nominal} is the nominal joint configuration, σa\sigma_{a} is the action factor, ata_{t} is the policy output in the neural network. We use 0.3 for action factor(σa\sigma_{a}). In the early iterations, this action scaling largely dominates the action, preventing it from moving excessively away from the initial state. However, as the iterations progress, this dominance diminishes.

II-B3 Reward

We utilize two types of rewards in our system: the objective reward rposr^{\text{pos}}, which grants a positive reward when the desired goal is achieved, and the constraint reward rnegr^{\text{neg}}, which imposes a negative reward for unsafe or unfeasible actions. We adopt the total reward formula used by Ji et al. [27], where the total reward is calculated as rtotal=rposexp(rewardfactorrneg)r^{\text{total}}=r^{\text{pos}}\cdot\text{exp}(reward\,factor\cdot r^{\text{neg}}). We use 0.02 for rewardfactorreward\,factor.

Given that three distinct tasks have different objectives, objective reward rposr^{\text{pos}} varies for each task. For the constraint reward rnegr^{\text{neg}}, it can be divided into two types: the general constraint reward rgennegr^{\text{neg}}_{gen}, and the task-specific constraint reward rtasknegr^{\text{neg}}_{task}. The General constraint reward rgennegr^{\text{neg}}_{gen}=rpr_{p}+rp˙r_{\dot{p}}+rτr_{\tau}+rsr_{s} is illustrated in Table  I, and the task-specific constraint reward will be illustrated in the following sections. In conclusion, the total reward function can be rewritten as follows.

rtotal=rtaskposexp(0.02(rgenneg+rtaskneg))r^{\text{total}}=r^{\text{pos}}_{task}\cdot\text{exp}(0.02\cdot(r^{\text{neg}}_{gen}+r^{\text{neg}}_{task})) (1)

II-C Rapid Turning

Designing a controller capable of executing rapid turns during high-speed running presents significant challenges. Frequently, the robot becomes trapped in a local minimum, where it remains stationary before executing the turn command and only moves after the command. This phenomenon occurs because the robot encounters substantial difficulty in performing rapid turns while in motion, leading it to abandon the attempt and execute the turn from a stationary position. Although various methods can successfully overcome this problem, such as gradually increasing the turning angle and velocity through curriculum learning, we have found that segmenting the training into two distinct stages is the most effective and stable approach for learning rapid turning. In the first stage, the robot learns to turn from a stationary position. In the second stage, it learns to execute rapid turns while running.

II-C1 Stage 1 (Yaw Adjustment in Standstill)

The concept of Stage 1 is illustrated at Figure 2. We train the robot to follow the turn command ranging from 00^{\circ} to 135135^{\circ} in a standstill position. Additionally, the robot learns to follow the velocity command. Stage 1 continues until the iteration count reaches 2000.

To enhance the stability of the learning process, we incrementally increase the command velocity using a curriculum learning approach, which has been shown to improve the stability of learning[27].

Vxcmd=1.0+1.51.0+e0.008(iteration500)V^{cmd}_{x}=1.0+\frac{1.5}{1.0+e^{-0.008\cdot(iteration-500)}} (2)

II-C2 Stage2 (Rapid turning during running)

Stage2 is the main stage of rapid turning. In stage 2, the robot is trained to rotate in a fully moving state. The concept is shown at Figure 2.

We also use curriculum learning for command velocity in this stage. However, we discover that a straightforward curriculum learning approach by iteration, such as that used in Stage 1, is insufficient. This is mainly because Stage 2 presents greater instability in the learning process compared to Stage 1, due to the requirement for rapid rotation at high speeds. To improve stability, we implement a reward-based curriculum learning strategy[28]. Instead of merely updating the command velocity based on iteration count alone, we adjust the command velocity if the total reward exceeds a predefined threshold value.

Vxcmd=1.77+2.731.0+e0.01(rewardstep100)V^{cmd}_{x}=1.77+\frac{2.73}{1.0+e^{-0.01\cdot(reward\,step-100)}} (3)

In this experiment, the predefined threshold value is set to 4.75. This means that rewardstepreward\,step is increased by 1 if the total average reward of the iteration exceeds 4.75.

It is essential to determine the proper timing for executing turn commands during the learning phase. If the commands are issued randomly from the beginning of the learning phase, the likelihood of successful learning is reduced because of the continuous variation in the robot’s foot position and posture during its run. One method to address this issue is to progressively extend the range of commands using a curriculum learning approach. We update the range of commands if the total reward exceeds a predefined threshold value, similar to the Stage 2 command velocity process.

CommandRange=1+min(300,rewardstep)Command\,Range=1+min(300,reward\,step) (4)

In this experiment, the value of rewardstepreward\,step is the same as the rewardstepreward\,step used in the velocity command. However, it is possible to set different values.

II-C3 Reward

TABLE II: Rapid Turning task Reward(Stage2)
Reward Expression
Linear Velocity rv=kvexp((VxcmdVx)2(VycmdVy)2)r_{v}=k_{v}\exp(-(V^{cmd}_{x}-V_{x})^{2}-(V^{cmd}_{y}-V_{y})^{2})
Yaw rϕ=kϕexp(2.5angle(ϕxcmd,ϕx))r_{\phi}=k_{\phi}\exp(-2.5angle(\phi^{cmd}_{x},\phi_{x}))
Angle Velocity rw=kw(312exp(0.5wz))r_{w}=k_{w}(3-12\exp(-0.5w_{z}))
Foot Airtime
rair,i={kminmax(Ts,i,Ta,i,0.2)if Tmax,i<0.250otherwiser_{air,i}=\begin{cases}k_{\text{min}}\cdot\max(T_{s,i},T_{a,i},0.2)&\text{if }T_{\text{max},i}<0.25\\ 0&\text{otherwise}\end{cases}
Foot Clearance rcl,i=kcl(footpz,ifootpzthres)2r_{cl,i}=k_{cl}\left(foot\,p_{z,i}-foot\,p_{z}^{thres}\right)^{2}
Base stability rbase =kbaseVz2r_{\text{base }}=k_{base}V_{z}^{2}
Orientation rori=kori(angle(ϕzworld,ϕzbody))2r_{ori}=k_{ori}(angle(\phi^{world}_{z},\phi^{body}_{z}))^{2}
Arm position rarm=karmptarmpnominal arm2r_{\text{arm}}=k_{\text{arm}}\left\|p^{\text{arm}}_{t}-p^{\text{arm}}_{\text{nominal }}\right\|^{2}
Coefficient   run   turn Coefficient   run   turn
kvk_{v} 2.0 2.0 kϕk_{\phi} 4.5 4.5
kwk_{w} 0.0 3.0 kairk_{air} 0.5 0.5
kclk_{cl} -50.0 -50.0 kbasek_{base} -10.0 -10.0
korik_{ori} -100.0 0.0 karmk_{arm} -15.0 0.0
TABLE III: Aerial Reorientation and Safe Landing task Reward
         Reward          Expression
Orientation rori=koriexp(2.5(angle(ϕxcmd,ϕx))2)r_{\textbf{ori}}=k_{\textbf{ori}}\exp(-2.5(angle(\phi^{cmd}_{x},\phi_{x}))^{2})
Linear Velocity rv=kvexp(5.0(Vx,ycmdVx,y)2)r_{v}=k_{v}\exp(-5.0(V^{cmd}_{x,y}-V_{x,y})^{2})
Base height rh=khexp(10.0|pznominalpz|r_{h}=k_{h}\exp(-10.0\,|p^{nominal}_{z}-p_{z}|
Foot Clearance rcl,i=kcl(footpz,ifootpzthres)2r_{cl,i}=k_{cl}\left(foot\,p_{z,i}-foot\,p_{z}^{thres}\right)^{2}
Arm position rarm=karmptarmpnominal arm2r_{\text{arm}}=k_{\text{arm}}\left\|p^{\text{arm}}_{t}-p^{\text{arm}}_{\text{nominal }}\right\|^{2}
Coefficient  air  ground Coefficient  air  ground
korik_{\textbf{ori}} 5.0 5.0 kvk_{v} 0.0 2.5
khk_{h} 0.0 5.0 kclk_{cl} 0.0 -100.0
karmk_{arm} 0.0 -150.0

Table  II illustrates the reward details for the rapid turning task in Stage 2. We divide each iteration into two distinct sections: one for running forward and the other for turning. Due to the distinct objectives of these two sections, we employ different sets of reward coefficients. Notably, in the turning section, a reward for angular velocity is added, while rewards for body orientation and arm position are omitted. The key factor enabling the robot to turn rapidly is the angular velocity reward (rwr_{w}). This reward provides a high incentive if the robot’s angular velocity is high during the turning phase. It is possible to increase the rotational speed of the robot without using the angular velocity reward. For example, accelerating the robot’s lateral speed or increasing the rate of change in the robot’s direction can increase the turning speed. However, the angular velocity reward provides the best performance in terms of both learning speed and turning performance. An important consideration when setting up an angular velocity reward is to set the angular velocity reward coefficient (kwk_{w}) higher than the other objective reward coefficients(kv,kϕ,kairk_{v},k_{\phi},k_{air}). Without this adjustment, achieving high-speed turning becomes nearly impossible.

Unlike Stage 2, in Stage 1 we solely utilize the reward associated with the running section, without dividing the iteration into two sections.

The objective task reward can be written as rtaskpos=rv+rϕ+rw+rairr^{\text{pos}}_{task}=r_{v}+r_{\phi}+r_{w}+r_{air}, and the constraint task reward would be rtaskneg=rcl+rbase+rori+rarmr^{\text{neg}}_{task}=r_{cl}+r_{base}+r_{ori}+r_{arm}.

II-D Aerial Reorientation and Safe Landing

Unlike the rapid turning task, we found that the aerial reorientation task does not require curriculum learning or stage-wise learning. The key factors for these tasks are termination as constraint and the inertia of the tail.

II-D1 Inertia of tail

In the task, we utilize not only the WidowX250S, which is used in the other tasks, but also the ViperX300S, which is heavier and longer. We observed that, due to the limited inertia of the tail, the WidowX250S does not perform well in aerial reorientation. Consequently, we employ a robot arm with higher inertia. The WidowX250S has a total length of 1300 mm and a weight of 2.35 kg, whereas the ViperX300S has a total length of 1500 mm and a weight of 3.6 kg.

II-D2 Termination

Reinforcement learning that relies solely on a reward function can have limitations. In this task, the robot receives a substantial reward if it lands safely and incurs a significant penalty if the landing fails. Due to this extreme setup, the robot might prematurely forfeit the reward in the air, as forfeiting the reward is more advantageous than failing to land. To address this issue, precisely adjusting the reward function can be a solution, but implementing a termination constraint could be more effective.

Several studies have attempted to train robots using constrainted reinforcement learning [29, 30, 31, 32]. Notably, Chane-Sane et al. [32] propose the CaT(Constraint as Termination) algorithm which utilize terminator as the constraint in RL method. In our study, both the reward function and termination constraints are used, and no other complex algorithms are necessary. We simply terminate the iteration with a high negative reward when the robot violates certain rules. This simple algorithm, combined with reward function, successfully enables the robot to execute feasible motions. The following rules are used for termination in this experiment:

{Collision with ground except for feetsmoothness:rs>2.0kstorque:rτ>180.0kτjointposition:rp>5.0kp\begin{cases}\text{Collision with ground except for feet}&\\ smoothness:r_{s}>-2.0\cdot k_{s}&\\ torque:r_{\tau}>-180.0\cdot k_{\tau}&\\ joint\,position:r_{p}>-5.0\cdot k_{p}\end{cases}

II-D3 Reward

This task consists of two stages: the aerial reorientation section, where the robot rotates its body to align its base orientation with the z-axis of the world frame, and the self-righting section, where the robot recovers its body to the nominal position after landing. We divide these sections according to body height: the aerial reorientation section corresponds to the air region (pz>0.4p_{z}>0.4), and the self-righting section corresponds to the ground region(pz<0.4p_{z}<0.4). Due to the different objectives of these two stages, the rewards are divided accordingly, as detailed in Table  III. In particular, in the air, only orientation rewards and general constraint rewards are needed. However, on the ground, where self-righting is required, various additional rewards are included.

The objective task reward can be written as rtaskpos=rori+rv+rw+rhr^{\text{pos}}_{task}=r_{ori}+r_{v}+r_{w}+r_{h}, and the constraint task reward would be rtaskneg=rcl+rarmr^{\text{neg}}_{task}=r_{cl}+r_{arm}.

II-E Balancing

Refer to caption
Figure 3: These graphs depict the comparison of trajectories for a robot with and without a manipulator executing a 135135^{\circ} turn while running at 4.5 m/s. A dot is plotted every 0.05 seconds. The blue dots indicate the trajectory of the robot equipped with a manipulator following the turn command, while the red dots represent the trajectory of the robot without a manipulator after the turn command. Additionally, the light blue dots denote the ideal trajectory, which assumes an instantaneous turn upon command.

For the balancing part, the robot is trained to endure random external impulses at random timings and in random directions while walking at speeds ranging from 0.5 m/s to 3 m/s. The range of impulses is 50Ns100Ns50N\cdot s\sim 100N\cdot s. During training, we use the same reward setup as in the running section of the Rapid Turning task.

III RESULT

III-A Rapid turning

We compared the turning performance of robots with and without a manipulator in simulations using the RAISIM environment. In the simulation tests, we set the forward velocity of the robots between 3,m/s3,\text{m/s} and 4.5,m/s4.5,\text{m/s}. We then issued commands for the robots to execute turns up to 135135^{\circ} while moving forward. The quality of the turns was assessed based on three criteria: the speed of the turn, the sharpness of the turn, and the displacement during turning.

Our findings indicate that the turning speed was nearly identical regardless of the presence of the manipulator. However, there was a notable difference in the sharpness of the turns. As depicted in Figure 3, the robot equipped with a manipulator made sharper turns compared to the robot without a manipulator. Furthermore, the trajectory of the robot with a manipulator was closer to the ideal trajectory than that of the robot without a manipulator. This means that the displacement during rotation was shorter when a manipulator was equipped. Specifically, during a 135135^{\circ} turn, the robot without a manipulator was pushed out by up to 1.74 meters, while the robot with a manipulator was pushed out to a maximum of only 1.2 meters.

Figure 4 shows the difference in trajectory with respect to velocity. In Figure 4(a), the trajectory of the robot without a manipulator varies significantly with speed. As speed increases, the deviation from the ideal trajectory becomes more pronounced. In contrast, the trajectory of the robot with a manipulator, shown in Figure 4(b), remains relatively consistent.

Refer to caption
(a) without manipulator
Refer to caption
(b) with manipulator
Figure 4: These graphs depict the trajectory results of a robot executing a 135135^{\circ} turn while running at speeds of 3.0 m/s, 3.5 m/s, 4.0 m/s, and 4.5 m/s.
Refer to caption
(a) drop at 9090^{\circ}
Refer to caption
(b) drop at 105105^{\circ}
Refer to caption
(c) drop at 120120^{\circ}
Figure 5: These graph show the angle between the body z-axis and the world z-axis over time from 0,s0,\text{s} to 0.5,s0.5,\text{s}. The red line shows the result of the robot without a manipulator, blue line shows the robot with a WidowX250S, and the green line shows the robot with a ViperX300S.

III-B Aerial Reorientation and Safe Landing

Unlike the previous task, we used three different robots to evaluate performance: Minicheetah without a manipulator, Minicheetah equipped with a WidowX250S, and Minicheetah equipped with a ViperX300S. Details of these three robots are provided in the Method section. In the simulation, we dropped these robots from heights ranging from 1.5m to 2.25m and angles ranging from 9090^{\circ} to 120120^{\circ}.

We found that the robot without a manipulator failed in all these scenarios. Similarly, the robot equipped with a WidowX250S failed in all scenarios. However, the robot equipped with a ViperX300S succeeded in all situations.

Figure 5 illustrates the orientation shift during aerial reorientation. Regardless of the initial conditions, the robot without a manipulator achieved a maximum rotation of 1515^{\circ}, the robot equipped with the WidowX250S reached up to 5050^{\circ}, and the robot equipped with the ViperX300S attained up to 6565^{\circ}. The graph for the robot equipped with the WidowX250S shows that its rotation speed decreases after 0.2 seconds. In contrast, the robot equipped with the ViperX300S maintains a consistent rotation speed while reorienting and culminates in a constant orientation angle phase after 0.3 seconds, which remains steady while falling.

III-C Balancing

Refer to caption
Figure 6: These graphs illustrate the survival outcomes given specific impulse combinations. A red dot indicates that the robot failed to survive at that impulse combination, whereas a blue dot indicates that it survived. The black line delineates the training distribution region. The area within the black line represents the impulse distribution during training.
Refer to caption
Figure 7: This figure illustrates the movement of the manipulator as the robot executes a 135135^{\circ} turn. When the robot moves forward, the manipulator is positioned centrally within the robot. However, during the turn, the robot experiences centrifugal force. To counteract this force, the manipulator moves in the opposite direction. Upon completing the turn, the manipulator returns to its original central position.

In simulation, we randomly applied impulses to the robot while walking at 1 m/s, with forces in the x, y and z-axis directions, totaling up to 100 N·s. The x-axis aligns with the robot’s forward direction, the y-axis corresponds to the robot’s lateral direction, and the z-axis aligns with gravity. We only display the results for the impulses on the y- and z-axis because the data show that the robot can withstand forces in the x-axis direction well, both with and without manipulator.

The results are illustrated in Figure 6. The survival rate of the robot equipped with a manipulator is 81.5%81.5\%, while the survival rate of the robot without a manipulator is 71.5%71.5\%. Notably, in almost all areas within the black line, where the robot has already experienced the learning session, the robot with a manipulator successfully survived. In contrast, the robot without a manipulator has relatively often failed in these regions.

IV Discussion

IV-A Rapid Turning

In the Results section, we evaluate the turning performance of the quadruped robot equipped with a manipulator based on three criteria: the speed of the turn, the sharpness of the turn, and the displacement during turning. Among these criteria, the quadruped robot with a manipulator demonstrates superior performance in terms of sharpness and displacement during the turn. This improved performance is attributed to the robot with the manipulator being able to withstand greater centrifugal force during the turn. Figure 4 strongly supports this observation. As velocity increases, the deviation between the ideal trajectory and the trajectory of the robot without the manipulator becomes more pronounced. Given that the magnitude of the centrifugal force increases proportionally with the square of the velocity, this result is expected. However, the trajectory of the robot equipped with the manipulator remains almost constant regardless of the velocity, indicating that the quadruped robot can better endure centrifugal forces with the assistance of the manipulator.

The mechanism by which the quadruped robot utilizes its manipulator to withstand centrifugal force is illustrated in Figure 7. It shows the movement of the quadruped robot with the manipulator when executing a 135135^{\circ} turn while running at a speed of 4.5 m/s. During turning, the manipulator is positioned on the side opposite to the direction of the centrifugal force. By positioning itself in this manner, the manipulator generates a counter-torque that opposes the torque produced by the centrifugal force. Consequently, this counter-torque enables the robot to execute sharper turns.

IV-B Aerial reorientation and safe landing

As illustrated in Figure 5, the rotation speed and the total rotation angle vary depending on the type of robot used. Robots equipped with manipulators rotate more rapidly and achieve a larger total rotation angle than a robot without a manipulator, although the extent of these differences varies depending on the specific manipulator used. The disparity in robot performance according to the presence of a manipulator is attributed to the additional inertia provided by the manipulator. Figure 8 depicts the overall process by which the robot utilizes its manipulator when falling upside-down. During the aerial reorientation phase shown in the figure, the robot rotates its manipulator in the opposite direction of the robot base rotation. This movement generates counter momentum, allowing the robot base to reorient its body in the air.

The rotational performance in the air also varies depending on the type of arm used. This is because the counter momentum generated by the manipulator relies on its inertia. Therefore, the robot equipped with the ViperX300S, which has a higher inertia than the WidowX250S, rotates faster and achieves a larger total rotation angle compared to the one equipped with the WidowX250S.

In Figure 5, after approximately 0.3 seconds, there is a region where the robot’s orientation remains constant for the robot equipped with the ViperX300S. This region represents the phase where the robot adjusts its legs into a posture advantageous for a safe landing. This phase can also be seen in Figure 8, where, in the middle of the figure, the robot finishes reorienting its body and adjusts its legs. We found that securing sufficient time to adjust its legs is crucial for a safe landing. For example, the robot equipped with WidowX250S fails to establish this region due to its inadequate speed of rotation in the air, resulting in unsuccessful landing attempts.

Refer to caption
Figure 8: This figure illustrates the motion of the robot as it falls from a height of 1.5 meters with a body angle of 105105^{\circ}. The task comprises two phases: the aerial reorientation phase, where the robot reorients its body to align with the world z-axis, and the self-righting phase, where the robot returns its body to its nominal position.

IV-C Balancing

The experimental results show that the quadruped robot has a higher survival rate when external forces are applied. Therefore, engineers can expect an enhancement in the robot’s stability if the manipulator is installed on the quadruped robot. However, when the impulse value is less than 60Ns60N\cdot s, no substantial differences are observed. Consequently, installing a manipulator solely for balance purposes would not be efficient, as situations involving an impulse greater than 60Ns60N\cdot s are relatively infrequent.

V CONCLUSIONS

In this study, we propose the integration of a 6-DoF manipulator as a multifunctional tail for quadruped robots to address the challenges posed by the complexity of traditional tails. Through simulation, we demonstrated that mounting a manipulator on a quadruped robot enhances its performance compared to a robot without one in tasks such as rapid turning, aerial reorientation, and balancing. The results indicate that the manipulator can improve the agility and stability of the quadruped robot. We believe that our study broadens the roles of manipulators, enabling robotics engineers to utilize them for additional functionalities when integrated with quadruped robots.

However, our research focuses solely on simulations, not real-world applications. Therefore, it remains to be proven that the manipulator can indeed improve the performance of quadruped robots in real-world scenarios. Additionally, we addressed only a few applications in which quadruped robots can be enhanced by using a manipulator as a tail. There are numerous other potential applications that future studies could explore.

References

  • [1] A. M. Wilson, J. Lowe, K. Roskilly, P. E. Hudson, K. Golabek, and J. McNutt, “Locomotion dynamics of hunting in wild cheetahs,” Nature, vol. 498, no. 7453, pp. 185–189, 2013.
  • [2] A. Jusufi, D. T. Kawano, T. Libby, and R. J. Full, “Righting and turning in mid-air using appendage inertia: reptile tails, analytical models and bio-inspired robots,” Bioinspiration & biomimetics, vol. 5, no. 4, p. 045001, 2010.
  • [3] A. Jusufi, D. I. Goldman, S. Revzen, and R. J. Full, “Active tails enhance arboreal acrobatics in geckos,” Proceedings of the National Academy of Sciences, vol. 105, no. 11, pp. 4215–4219, 2008.
  • [4] S. M. O’Connor, T. J. Dawson, R. Kram, and J. M. Donelan, “The kangaroo’s tail propels and powers pentapedal locomotion,” Biology letters, vol. 10, no. 7, p. 20140381, 2014.
  • [5] A. Patel and M. Braae, “Rapid turning at high-speed: Inspirations from the cheetah’s tail,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.   IEEE, 2013, pp. 5506–5511.
  • [6] ——, “Rapid acceleration and braking: Inspirations from the cheetah’s tail,” in 2014 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2014, pp. 793–799.
  • [7] N. J. Kohut, A. O. Pullin, D. W. Haldane, D. Zarrouk, and R. S. Fearing, “Precise dynamic turning of a 10 cm legged robot on a low friction surface using a tail,” in 2013 IEEE International Conference on Robotics and Automation.   IEEE, 2013, pp. 3299–3306.
  • [8] C. S. Casarez and R. S. Fearing, “Steering of an underactuated legged robot through terrain contact with an active tail,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2018, pp. 2739–2746.
  • [9] T. Fukushima, R. Siddall, F. Schwab, S. L. Toussaint, G. Byrnes, J. A. Nyakatura, and A. Jusufi, “Inertial tail effects during righting of squirrels in unexpected falls: from behavior to robotics,” Integrative and Comparative Biology, vol. 61, no. 2, pp. 589–602, 2021.
  • [10] E. Chang-Siu, T. Libby, M. Tomizuka, and R. J. Full, “A lizard-inspired active tail enables rapid maneuvers and dynamic stabilization in a terrestrial robot,” in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.   IEEE, 2011, pp. 1887–1894.
  • [11] Y. Tang, J. An, X. Chu, S. Wang, C. Y. Wong, and K. S. Au, “Towards safe landing of falling quadruped robots using a 3-dof morphable inertial tail,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 1141–1147.
  • [12] A. M. Johnson, T. Libby, E. Chang-Siu, M. Tomizuka, R. J. Full, and D. E. Koditschek, “Tail assisted dynamic self righting,” in Adaptive mobile robotics.   World Scientific, 2012, pp. 611–620.
  • [13] R. Briggs, J. Lee, M. Haberland, and S. Kim, “Tails in biomimetic design: Analysis, simulation, and experiment,” in 2012 IEEE/RSJ international conference on intelligent robots and systems.   IEEE, 2012, pp. 1473–1480.
  • [14] J. Norby, J. Y. Li, C. Selby, A. Patel, and A. M. Johnson, “Enabling dynamic behaviors with aerodynamic drag in lightweight tails,” IEEE Transactions on Robotics, vol. 37, no. 4, pp. 1144–1153, 2021.
  • [15] H. Huang, A. Loquercio, A. Kumar, N. Thakkar, K. Goldberg, and J. Malik, “More than an arm: Using a manipulator as a tail for enhanced stability in legged locomotion,” arXiv preprint arXiv:2305.01648, 2023.
  • [16] H. Ferrolho, V. Ivan, W. Merkt, I. Havoutis, and S. Vijayakumar, “Roloma: Robust loco-manipulation for quadruped robots with arms,” Autonomous Robots, vol. 47, no. 8, pp. 1463–1481, 2023.
  • [17] Z. Fu, X. Cheng, and D. Pathak, “Deep whole-body control: Learning a unified policy for manipulation and locomotion,” in Conference on Robot Learning.   PMLR, 2023, pp. 138–149.
  • [18] J.-P. Sleiman, F. Farshidian, M. V. Minniti, and M. Hutter, “A unified mpc framework for whole-body dynamic locomotion and manipulation,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4688–4695, 2021.
  • [19] S. Zimmermann, R. Poranne, and S. Coros, “Go fetch!-dynamic grasps using boston dynamics spot with external robotic arm,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 4488–4494.
  • [20] J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” Science Robotics, vol. 4, no. 26, p. eaau5872, 2019.
  • [21] J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science robotics, vol. 5, no. 47, p. eabc5986, 2020.
  • [22] T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,” Science Robotics, vol. 7, no. 62, p. eabk2822, 2022.
  • [23] Y. Ma, F. Farshidian, and M. Hutter, “Learning arm-assisted fall damage reduction and recovery for legged mobile manipulators,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 12 149–12 155.
  • [24] N. Rudin, H. Kolvenbach, V. Tsounis, and M. Hutter, “Cat-like jumping and landing of legged robots in low gravity using deep reinforcement learning,” IEEE Transactions on Robotics, vol. 38, no. 1, pp. 317–328, 2021.
  • [25] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  • [26] J. Hwangbo, J. Lee, and M. Hutter, “Per-contact iteration method for solving contact dynamics,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 895–902, 2018.
  • [27] G. Ji, J. Mun, H. Kim, and J. Hwangbo, “Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4630–4637, 2022.
  • [28] G. B. Margolis, G. Yang, K. Paigwar, T. Chen, and P. Agrawal, “Rapid locomotion via reinforcement learning,” The International Journal of Robotics Research, vol. 43, no. 4, pp. 572–587, 2024.
  • [29] Y. Kim, H. Oh, J. Lee, J. Choi, G. Ji, M. Jung, D. Youm, and J. Hwangbo, “Not only rewards but also constraints: Applications on legged robot locomotion,” IEEE Transactions on Robotics, 2024.
  • [30] S. Gangapurwala, A. Mitchell, and I. Havoutis, “Guided constrained policy optimization for dynamic quadrupedal robot locomotion,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3642–3649, 2020.
  • [31] J. Lee, L. Schroth, V. Klemm, M. Bjelonic, A. Reske, and M. Hutter, “Evaluation of constrained reinforcement learning algorithms for legged locomotion,” arXiv preprint arXiv:2309.15430, 2023.
  • [32] E. Chane-Sane, P.-A. Leziart, T. Flayols, O. Stasse, P. Souères, and N. Mansard, “Cat: Constraints as terminations for legged locomotion reinforcement learning,” arXiv preprint arXiv:2403.18765, 2024.