This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Put the Bear on the Chair! Intelligent Robot Interaction
with Previously Unseen Chairs via Robot Imagination

Hongtao Wu, Xin Meng, Sipu Ruan, Gregory S. Chirikjian This work was supported by NUS Startup grants R-265-000-665-133, R-265-000-665-731, Faculty Board account C-265-000-071-001, and the National Research Foundation, Singapore under its Medium Sized Center for Advanced Robotics Technology Innovation (CARTIN) R-261-521-002-592.H. Wu, X. Meng, S. Ruan, and G. S. Chirikjian are with the Department of Mechanical Engineering, National University of Singapore, Singapore. H. Wu is also with the Laboratory for Computational Sensing and Robotics (LCSR), Johns Hopkins University, Baltimore, MD 21218, USA. Address all correspondence to G. S. Chirikjian: [email protected]
Abstract

In this paper, we study the problem of autonomously seating a teddy bear on a previously unseen chair. To achieve this goal, we present a novel method for robots to imagine the sitting pose of the bear by physically simulating a virtual humanoid agent sitting on the chair. We also develop a robotic system which leverages motion planning to plan SE(2) motions for a humanoid robot to walk to the chair and whole-body motions to put the bear on it. Furthermore, to cope with cases where the chair is not in an accessible pose for placing the bear, a human assistance module is introduced for a human to follow language instructions given by the robot to rotate the chair and help make the chair accessible. We implement our method with a robot arm and a humanoid robot. We calibrate the proposed system with 3 chairs and test on 12 previously unseen chairs in both accessible and inaccessible poses extensively. Results show that our method enables the robot to autonomously seat the teddy bear on the 12 previously unseen chairs with a very high success rate. The human assistance module is also shown to be very effective in changing the accessibility of the chair. Video demos and more details are available at https://chirikjianlab.github.io/putbearonchair/.

I INTRODUCTION

Object affordances describe potential interactions with an object [1]. They are related to the object functionality. For example, a chair is able to afford the functionality of sitting. Thus, it possesses the sitting affordance. For robots operating in unstructured environments and interacting with previously unseen objects, e.g., personal robots, the understanding of object affordances is very desirable [2]. It allows robots to understand how to interact with them efficiently and intelligently. Reasoning about object affordances can also help robots discover the potential of an object to afford a functionality, despite it not being a typical instance of the object class to which the functionality is related (e.g., the improvised chair assembled with books and boxes in Fig. 5(d)). Independently identifying affordances represents a higher level of robot intelligence.

Refer to caption
Figure 1: Overview. (a) The robot imagines the sitting pose for a previously unseen chair and seats the bear on the chair. A robot arm with a mounted depth camera scans the chair and tracks the chair and the humanoid robot. The coordinate frame at the lower left corner indicates the base frame of the robot arm which is used as the world frame throughout the paper. (b) Result of seating the bear on the chair. (c) The humanoid robot retrieves its hands and uprights its body after seating the bear on the chair.

To address the problem of affordance reasoning, we introduced the interaction-based definition (IBD) which defines objects from the perspective of interactions instead of appearances in our previous work [3]. In particular, IBD defines a chair as “an object which can be stably placed on a flat horizontal surface in such a way that a typical human is able to sit (to adopt or rest in a posture in which the body is supported on the buttocks and thighs and the torso is more or less upright) on it stably above the ground”.

Refer to caption
Figure 2: Pipeline. The chair is randomly placed in front of the robot arm for 3D scanning which reconstructs the 3D model of the chair. The three snapshots in the sitting imagination show the results of three sitting trials. Only the left one with a check is classified as a correct sitting. The instruction given by the robot in the human assistance module is ”Please rotate the chair about the vertical axis counterclockwise for 90 degrees!”

In this paper, we go beyond our previous work. We propose a novel method for robots to imagine how to sit on a previously unseen chair, i.e., the sitting pose for the chair. We develop a robotic system, with a Franka Emika Panda robot arm and a NAO humanoid robot, to actualize the understanding of the sitting affordance by autonomously seating a teddy bear on the chair (Fig. 1). To “put the bear on the chair”, the robot needs to carry the bear to the chair and seat the bear on it. To accomplish this task, the robot first needs to understand the sitting affordance of the chair, i.e., how the chair can be sat. Also, the robot needs to plan a whole-body motion to seat the bear on the chair. Moreover, if the chair is in an inaccessible pose (e.g., when the chair is facing a wall), the robot should be able to reason the accessibility and understand how to make it accessible. In this paper, we introduce a simple but efficient human assistance module with closed-loop visual feedback to enable the robot to instruct a human to rotate the chair and make it accessible for seating the bear. Fig. 2 shows the pipeline of our method. Results show that our method enables the robot to autonomously seat the bear on 12 previously unseen chairs with diverse shapes and appearances in 72 trials, including both accessible and inaccessible chair poses, with a success rate of 94.4%94.4\%. Also, our human assistance module is shown to be very effective. It successfully changes the accessibility of the chair for seating the bear with a 100%100\% success rate in 36 trials with inaccessible poses. To our knowledge, this work is the first to physically seat an agent on a previously unseen chair in the real world. The aim of this work is to examine how well the robots are able to accomplish this challenging task by leveraging different components of robotics. We envision promising future applications of our method for robots operating in household environments and interacting with humans.

This work differs from our previous work [3] in various ways. Rather than classifying whether a previously unseen object is a chair or not, in this work, we assume the given object is known as a chair a priori, as such can be determined from [3]. Also, unlike [3] which focuses on perceiving the sitting affordance without real-robot experiments, this work goes beyond by actualizing the understanding of affordances with physical experiments. The main contributions of the paper are as follows:

  • A method for robots to imagine the sitting pose for a previously unseen chair.

  • A human assistance module to enable the robot to reason about the accessibility of a chair and interact with a human to change the accessibility if necessary.

  • A robotic system which is able to autonomously seat a teddy bear on a previously unseen chair for sitting.

II RELATED WORK

Object Affordance of Chairs. The class of chairs is an important object class in human life. The exploration of sitting affordance has become popular in recent decades [4, 5, 6, 3, 7]. Hinkle and Olson [4] simulate dropping spheres onto objects and classify objects into containers, chairs, and tables based on the final configuration of spheres. Grabner et al. [5] fit a humanoid mesh onto an object mesh and exploit the vertex distance and the triangle intersection to evaluate the object affordance as a chair for chair classification. Instead of object classification, we use object affordances to understand the interaction with a chair.

Affordance Reasoning. There is a growing interest in reasoning object affordances in the field of computer vision [8, 9, 10, 11, 12, 13, 14, 2, 15] and robotics [16, 17, 18, 19, 20, 21, 22, 2, 23]. Stoytchev [16] introduces an approach to ground tool affordances via dynamically applying different behaviors from a behavioral repertoire. [17, 18, 9, 10, 14] use convolutional neural networks (CNNs) to detect regions of affordance in an image. Ruiz and Mayol-Cuevas [11] predict affordance candidate locations in environments via the interaction tensor. In contrast to detecting different types of affordances, we focus on understanding the sitting affordance and use it for real robot experiments.

Physics Reasoning. Reasoning of physics has also been introduced to the study of object affordances [3, 19, 24, 25, 26, 27]. Battaglia et al. [24] introduce the concept of an intuitive physics engine which simulates physics to explain human physical perception of the world. Zhu et al. [26] employ physical simulations to infer forces exerted on humans and learn human utilities from videos. The digital twin [28] makes use of physical simulations to optimize operations and predict failures in manufacturing. We do not infer physics or use it to predict outcomes, but exploit physics to imagine interactions with objects to perceive object affordances.

III METHODS

III-A Problem Formulation

Throughout the paper, we assume the given unseen chair is upright and the back of the agent is supported while sitting. The virtual agent in the imagination is an articulated humanoid body (Fig. 3(a)). To seat the agent on the chair, we need to know 1) the sitting pose g=(R,𝐩)SE(3)g=(R,\mathbf{p})\in SE(3) where RSO(3)R\in SO(3) and 𝐩3\mathbf{p}\in\mathbb{R}^{3} specify the rotation and position of the agent’s base link in the world frame and 2) the joint angles of the agent. However, the real agent is a teddy bear of which the joints have large damping coefficients. The joints can be considered fixed and the joint angles are already close to those of a sitting configuration (Fig. 1(b)). Thus, in this paper, we simplify the problem to just finding gg.

According to the interaction-based definition (IBD) of chairs (Sec. I), the torso is more or less upright when sitting. Thus, we further simplify the problem by restricting R=Rz(γ)R0R=R_{z}(\gamma)R_{0}. Rz(γ)R_{z}(\gamma) denotes the rotation about the z-axis of the world frame and γ\gamma is the yaw angle of the base link. R0R_{0} denotes the initial rotation which sets the agent to an upright sitting configuration with its face facing towards the x-axis of the world frame. That is, given an unseen chair, the problem becomes finding the position 𝐩\mathbf{p} and the yaw angle γ\gamma of the base link in the world frame for sitting. We denote the direction indicated by γ\gamma in the xy-plane as the sitting direction. The base link of the agent is the pelvis link.

III-B Revisit of Sitting Affordance Model (SAM)

In sitting imagination (Sec. III-C), we simulate a virtual humanoid agent sitting on the chair in physical simulations. The resultant configuration of the agent at the end of the simulation is denoted as CresC_{{res}}. The sitting affordance model (SAM) [3] evaluates whether CresC_{{res}} is a correct sitting by comparing it with a key configuration CkeyC_{{key}}. We briefly review SAM here and further details can be found in [3].

The evaluation is based on four criteria. 1) Joint Angle Score. The joint angles of a configuration CC can be described with a vector 𝜽n\boldsymbol{\theta}\in\mathbb{R}^{n} where nn denotes the total number of the agent’s joints. The joint angle score is defined as J=iwJi|θresiθkeyi|J=\sum_{i}w_{J}^{i}|\theta_{{res}}^{i}-\theta_{{key}}^{i}|. wJiw_{J}^{i} denotes the weight of the ii-th joints. θresi\theta_{{res}}^{i} (θkeyi\theta_{{key}}^{i}) is the ii-th element of 𝜽res\boldsymbol{\theta}_{{res}} (𝜽key\boldsymbol{\theta}_{{key}}). Lower is better. 2) Link Rotation Score. According to IBD, the torso is more or less upright. Thus, SAM considers the link rotation in the world frame when evaluating CresC_{{res}}. The link rotation score is defined as L=iwLi(1𝐳resi𝐳keyi)L=\sum_{i}w_{L}^{i}(1-\mathbf{z}_{{res}}^{i}\cdot\mathbf{z}_{{key}}^{i}). wLiw_{L}^{i} is the weight for the ii-th link. 𝐳resi\mathbf{z}_{{res}}^{i} (𝐳keyi\mathbf{z}_{{key}}^{i}) is the z-axis unit vector of the ii-th link frame in CresC_{{res}} (CkeyC_{{key}}). Lower is better. 3) Sitting Height. Sitting height HH is also an important factor in sitting. SAM takes into account HH in the evaluation of CresC_{{res}}. 4) Contact. SAM counts the number of contact points of the agent’s links which can be described with a vector 𝐓m\mathbf{T}\in\mathbb{R}^{m} where mm denotes the total number of links.

CresC_{{res}} is classified as a correct sitting if the following are all satisfied: J<JmaxJ<J_{\textrm{max}}, L<LmaxL<L_{\textrm{max}}, H(Hmin,Hmax)H\in(H_{\textrm{min}},H_{\textrm{max}}), ϕ(T)>0\phi(\textbf{T})>0. JmaxJ_{\textrm{max}}, LmaxL_{\textrm{max}}, HminH_{\textrm{min}}, and HmaxH_{\textrm{max}} are four thresholds. ϕ():m{0,1}\phi(\cdot):\mathbb{R}^{m}\rightarrow\{0,1\} is a binary function which outputs 1 if 1) the total number of contact points is larger than some thresholds and 2) the number of contact points for lower and upper body links are both larger than zeros, and 0 if otherwise.

Refer to caption
Figure 3: Sitting imagination. (a) Imagination setting. (b) OBB transformation gobbg_{obb}. (c) and (d) show the snapshots at the beginning and the end of the sitting imagination, respectively. The checks and crosses in (d) indicate whether the sitting is correct or not. (e) shows the chair used for imagination in (c) and (d). (f), (g), and (h) show the sitting poses obtained from three rotations αi\alpha_{i}. Ncorrect=3N_{correct}=3 for all of them. The black dot and the red arrow indicate the sitting position 𝐩\mathbf{p} and the yaw angle γ\gamma, respectively. (f) has the smallest value of JLJL and thus is selected as the imagined sitting pose for the robot to seat the bear on the chair.

III-C Sitting Imagination

We physically simulate the agent sitting on the chair in different rotations RchrSO(3)R_{{chr}}\in SO(3) to find 𝐩\mathbf{p} and γ\gamma for sitting (Fig. 3). Given the 3D model of the chair, we first compute the minimum volume oriented bounding box (OBB) [29]. As in [3], we then apply a rigid body transformation gobb=(Robb,𝐩obb)SE(3)g_{{obb}}=(R_{{obb}},\mathbf{p}_{{obb}})\in SE(3) to the model (Fig. 3(b)). 𝐩obb\mathbf{p}_{{obb}} horizontally translates the chair to align the OBB center with the origin of the world frame in the xy-plane. Robb=Rz(γobb)R_{{obb}}=R_{z}(\gamma_{{obb}}) rotates the chair about the z-axis to align the OBB with the coordinate system of the world frame. We apply gobbg_{{obb}} because we notice that the back of the chair is heuristically coincident with one of the OBB faces which benefits the finding of correct sittings in the imagination. After applying gobbg_{obb}, we attach the world frame as the body frame of the chair.

The rotation of the chair RchrR_{chr} in the simulation is enumerated by setting Rchr=Rz(αi)RobbR_{{chr}}=R_{z}(\alpha_{i})R_{{obb}} and αi=iπ/4,i=0,1,2,,7\alpha_{i}=i\pi/4,i=0,1,2,...,7 (Fig. 3(c)). Note that Rchr=Rz(αi)Robb=Rz(αi+γobb)R_{{chr}}=R_{z}(\alpha_{i})R_{{obb}}=R_{z}(\alpha_{i}+\gamma_{{obb}}) is a rotation about the z-axis in the world frame. We drop the agent from above the chair to simulate sitting (Fig. 3(d)). Before the drop, the agent is set to a pre-sitting configuration facing the x-axis (Fig. 3(a)). The base link of the agent is placed on a plane 15cm above the chair OBB. For each rotation, we first sample three positions on the plane to drop: the origin and two positions with a translation of ±Lsit\pm L_{sit} along the x-axis from the origin, respectively. LsitL_{sit} is scaled linearly with respect to the size of the OBB. If no more than one correct sitting is found for all rotations, four extra positions are sampled to drop: positions with ±2Lsit\pm 2L_{sit} and ±3Lsit\pm 3L_{sit} translations along the x-axis from the origin, respectively. The reason we start sampling drops close to the origin of the plane, which is aligned with the center of the OBB horizontally, is that most chairs have their seats positioned close to the OBB center. However, the seats for some chairs are closed to the OBB peripheral. Thus, if not enough correct sittings can be found, our search expands towards the peripheral. In total, we simulate 24 drops if no extra drops are needed and 56 drops if otherwise.

The rotation with the largest number of correct sittings NN is selected as the best rotation α\alpha^{*} for sitting. If more than one rotation has the largest NN (Fig. 3(f)(g)(h)), we select the one with the smallest averaged value of JLJL (lower is better). The sitting pose g=(R,𝐩)g=(R,\mathbf{p}) in the world frame is:

𝐩=gobb1𝐩¯\displaystyle\vspace{-0.5cm}\mathbf{p}=g_{obb}^{-1}\circ\overline{\mathbf{p}^{\prime}} (1)
R=Rz(γ)R0,γ=γ¯γobb\displaystyle R=R_{z}(\gamma)R_{0},\quad\gamma=\overline{\gamma^{\prime}}-\gamma_{obb}\vspace{-0.5cm} (2)

where g𝐩=R𝐩+𝐩g\circ\mathbf{p}^{\prime}=R\mathbf{p}^{\prime}+\mathbf{p} and the inverse g1𝐩=R1𝐩R1𝐩g^{-1}\circ\mathbf{p}^{\prime}=R^{-1}\mathbf{p}^{\prime}-R^{-1}\mathbf{p}; 𝐩¯\overline{\mathbf{p}^{\prime}} and γ¯\overline{\gamma^{\prime}} are the weighted average of the agent’s position and yaw angle relative to the chair frame of all the correct sittings with the chair rotated by α\alpha^{*}. The weight is the reciprocal value of JLJL of the sitting.

Refer to caption
Figure 4: Motion Planning. (a) The motion of seating the bear on the chair. (b) After seating the bear, the robot releases the bear by opening its hands, retrieves the hands, and uprights the body. (c) The setting of the SE(2)SE(2) planning in real and rviz environment. The red dash lines in the top figure indicate the planning arena in the real world. The yellow ellipsoid in the bottom figure shows the SE(2)SE(2) goal.

III-D Motion Planning

We assume the motion of seating the bear is quasi-static. Due to the complexity of the motion constraints, we formulate the planning of this motion as a trajectory optimization problem [30]:

minimize𝐪,𝐪˙\displaystyle\underset{\mathrm{\mathbf{q}},\dot{\mathrm{\mathbf{q}}}}{\text{minimize}} J=k=1NL(𝐪(k))\displaystyle J=\sum_{k=1}^{N}L(\mathrm{\mathbf{q}}(k)) (3)
subject to Kinematics constraints (4)
Stability constraints (5)
Collision constraints (6)

where L(𝐪(k))=1/2𝐪(k)𝐪goalQ2L(\mathrm{\mathbf{q}}(k))=1/2\lVert\mathrm{\mathbf{q}}(k)-\mathrm{\mathbf{q}}_{{goal}}\rVert_{Q}^{2}. Kinematics constrains describe the state transition and limit the magnitude of joints and control inputs. Stability constrains confine the horizontal projection of the robot center of mass (COM) to the supporting polygon 𝐒\mathbf{S} formed by the feet. Collision constraints ensure that the trajectory is collision-free.

The initial configuration of the robot is a pre-defined standing posture (Fig. 4(a)). The goal configuration 𝐪goal\mathrm{\mathbf{q}}_{{goal}} is generated via a constrained optimization [30]:

minimize𝐪\displaystyle\underset{\mathrm{\mathbf{q}}}{\text{minimize}} w1dCOM(𝐪)+w2Torso(𝐪)\displaystyle w_{1}d_{\mathrm{COM}}(\mathrm{\mathbf{q}})+w_{2}\mathrm{Torso}(\mathrm{\mathbf{q}}) (7)
subject to Kinematics constraints (8)
Stability constraints (9)
Collision constraints (10)

The cost (Eq. (7)) aims to minimize 1) dCOM(𝐪)d_{\mathrm{COM}}(\mathrm{\mathbf{q}}) which is the distance between the COM horizontal projection and the center of the supporting polygon 𝐒\mathbf{S} and 2) Torso(𝐪)\mathrm{Torso}(\mathrm{\mathbf{q}}) which is the bending angle of the torso. A less bending torso exerts less torque on the motors, making uprighting the body easier after seating the bear. Kinematics constraints ensure that the bear sits in the imagined sitting pose gg when the robot is in the goal configuration 𝐪goal\mathbf{q}_{goal}.

We use RRTConnect [31] in the OMPL library [32] to plan an SE(2)SE(2) trajectory for the robot to carry the bear to the chair. The robot is encapsulated with an ellipsoid for collision checking with the FCL library [33]. The setting is shown in Fig. 4(c).

IV SYSTEM PIPELINE

Fig. 2 shows the pipeline of our method. The robot arm first scans and reconstructs the 3D model of the chair (Sec. V-A). The sitting imagination is then conducted to find the imagined sitting pose gg (Sec. III-C). With gg, we first determine the SE(2)SE(2) goal pose sgoals_{goal} for the NAO to walk to and place the bear. The rotation of sgoals_{goal} is set such that the NAO faces the opposite of the sitting direction indicated by γ\gamma. The position of sgoals_{goal} is on a horizontal ray which originates from the projection of 𝐩\mathbf{p} on the xy-plane and points towards the sitting direction. It is initially set such that the NAO is linitl_{init} away from 𝐩\mathbf{p}. If the NAO is in collision with the chair, we move it along the ray until it is collision-free. After that, if the distance between the NAO and 𝐩\mathbf{p} is too large, we move 𝐩\mathbf{p} horizontally along the ray and make it closer to the robot due to the robot workspace constraint. We then pre-plan the motion to seat the bear and check the validity of sgoals_{goal}. To reduce the planning time, the motion is simplified as a bilaterally symmetric motion – the motion of the left-half body is symmetric to that of the right-half. The bear is held in hands facing the sitting direction at the beginning of the motion (Fig. 4(a)). We restrict the motion to be an SE(2)SE(2) motion, in which only the pitch joints are activated, to maintain the facing of the bear throughout the motion.

The SE(2)SE(2) trajectory to walk to sgoals_{goal} is then planned. If sgoals_{goal} is out of the planning arena or blocked by obstacles, no plans will be made. In this case, the NAO gives a language instruction to instruct a human to assist with rotating the chair about the vertical axis such that the sitting direction points towards the NAO where there are no obstacles in between (Fig. 2). The instruction is generated from a template: “Please rotate the chair about the vertical axis 𝚍𝚒𝚛𝚎𝚌𝚝𝚒𝚘𝚗\tt{direction} for 𝚛𝚘𝚝𝚊𝚝𝚒𝚘𝚗_𝚊𝚗𝚐𝚕𝚎\tt{rotation\_angle} degrees”. 𝚍𝚒𝚛𝚎𝚌𝚝𝚒𝚘𝚗{𝚌𝚕𝚘𝚌𝚔𝚠𝚒𝚜𝚎,𝚌𝚘𝚞𝚗𝚝𝚎𝚛𝚌𝚕𝚘𝚌𝚔𝚠𝚒𝚜𝚎}\tt{direction}\in\{\tt{clockwise},\tt{counterclockwise}\}. 𝚛𝚘𝚝𝚊𝚝𝚒𝚘𝚗_𝚊𝚗𝚐𝚕𝚎{β|β=𝚒𝟹𝟶,𝚒=𝟶,𝟷,𝟸,,𝟼}\tt{rotation\_angle}\in\{\beta|\beta=i*30,i=0,1,2,...,6\} is the multiple of 30 degrees closest to the precise rotation angle such that the sitting direction is pointing towards the NAO. We use multiples of 30 degrees instead of the precise angle because it is easier for humans to understand. The pose of the chair is tracked by iterative closest point (ICP) [34]. The transformation of the chair in the human rotation is denoted as grotg_{rot}. The imagined sitting pose gg and the SE(2)SE(2) goal sgoals_{goal} are updated and transformed by grotg_{rot} after the rotation. The robot then tries to plan a trajectory to the updated sgoals_{goal}. This process is repeated until a valid SE(2)SE(2) trajectory is found. In practice, we regard the trial as a failure if no SE(2)SE(2) trajectories can be found after three rotations. When a valid SE(2)SE(2) trajectory is found, the NAO is controlled to walk and follow the waypoints along the trajectory via a PID controller. The walking motion is controlled by the NAOqi SDK111https://developer.softbankrobotics.com/nao6/naoqi-developer-guide/naoqi-apis. The bear is passed to the NAO manually before it starts walking. When the NAO arrives at sgoals_{goal}, a whole-body motion is planned to place the bear based on the robot current pose. Finally, the robot executes the motion, releases the bear, and uprights its body (Fig. 4(b)).

V EXPERIMENTS

Fig. 1(a) shows the experiment setup. A PrimeSense Carmine 1.09 RGB-D camera is mounted on the robot arm. Besides scanning the chair, the camera is also used for tracking the chair and the NAO. The NAO is tracked with an ArUco tag attached on top of its head.

Refer to caption
Figure 5: Data. (b) shows a captured depth image when the robot is at the capturing pose shown in (a). (c) shows three capturing poses of the 3D scanning. (d) shows all the chairs in the dataset. Among the chairs in the test set, two of them, indicated in the red box, are not typical chairs but are able to afford the sitting affordance. One is a step stool; the other is improvised by assembling books and boxes.

V-A 3D Scanning

In the experiment, the chair is placed randomly in a 50×\times50cm2 squared area in front of the robot arm in its upright pose. The robot arm moves to 9 pre-defined collision-free configurations to capture depth images of the scene (Fig. 5(a)(b)(c)). The pose of the camera at each view is obtained from the forward kinematics of the robot arm. This allows us to use TSDF Fusion [35] to densely reconstruct the scene and the point cloud. The chair is segmented from the scene by plane segmentation in the PCL library [36].

V-B Data

Our dataset contains 15 chairs with diverse shapes and appearances (Fig. 5(d)). They are designed for children aged from 0 to 3. We choose small chairs because the size of the chair is restricted by the workspace of the NAO and the robot arm. If the chair is too tall, the NAO will be too short to put the bear on it; if the chair is too large, the robot arm will not be able to scan it. We use 3 chairs (calibration set) to calibrate the simulation and the motion planning and control. The rest 12 chairs (test set), which are unseen by the robot, are used for testing.

V-C Physical Simulation

Pybullet [37] is used as the physics engine for sitting imagination. The chair and the virtual humanoid agent are imported with the URDF files which specify the mass, COM, inertia matrix, friction coefficient, and joint properties. The collision in the simulation is modelled as inelastic. The physics attributes of the chair is computed with Meshlab [38]. As the chairs in the dataset are designed for children, we set the height and weight of the agent in the simulation accordingly [39].

VI RESULTS & DISCUSSIONS

Refer to caption
Figure 6: Result: Accessible. (a) Snapshot of the chair. (b) Imagined sitting pose. (c) The robot walks to the chair. (d) and (e) show the beginning and the end of the motion of seating the bear, respectively. (f) and (g) show the motion of retrieving hands and uprighting the body after seating the bear. (h) Results.
Refer to caption
Figure 7: Result: Inaccessible. (a) Snapshot of the chair. (b) Imagined sitting pose. (c) Before human assistance. (d) The human obeys/disobeys the instruction given by the robot to rotate the chair. (e) After the first rotation. In the experiments of Human Obeys, the chair is already accessible after the first rotation. (f) After the last rotation. (g) and (h) show the beginning and the end of seating the bear on the chair, respectively. (i) Results.

We implement our method with Robot Operating System (ROS) on a computer running Intel Core i9-10920X @ 3.5GHz CPU. Our unoptimized implementation takes about 70s, 4s, and 29s for 3D scanning and reconstruction, sitting imagination, and motion planning, respectively. The walking and seating motions take about 67s and 24s, respectively.

Accessible. In the first set of experiments, the chair is placed such that it is accessible for seating the bear. The sitting direction points towards the NAO with a deviation within a range of ±30\pm 30 degrees. No human assistance is needed in this case. For each chair in the test set, we place it in 3 different poses for testing (36 trials in total) (Fig. 6).

Inaccessible + Human Obeys. In the second set of experiments, the chair is placed such that it is inaccessible. The sitting direction of the chair either points towards 1) the robot arm or 2) to the edges of the planning arena (Fig. 7(a)). In both cases, no valid SE(2)SE(2) trajectories can be found in the initial configuration. Human assistance is needed to rotate the chair and make it accessible. We recruit 6 volunteers to participate in the experiments. For each chair in the test set, we place it in 2 different inaccessible poses (24 trials in total). We ask the human to obey the instruction given by the NAO throughout the whole trial (Fig. 7).

Inaccessible + Human Disobeys. There exist many uncertainties in human assistance (e.g., the human is distracted or misunderstands the instruction), resulting in inaccessibility even after a rotation. In this set of experiments, we want to test the robustness of our method in addressing these uncertainties. For each chair in the test set, we place it in an inaccessible pose as in the Inaccessible + Human Obeys setting (12 trials in total). We ask the human to deliberately disobey the first instruction given by the NAO and obey the following instructions (Fig. 7).

We recruit 15 annotators to annotate the experiment results. Each trial is annotated by five different annotators. For each trial, we show the experiment video and the images of the bear at the end of the trial. The annotator is then asked 1) “Do you think the robot has been successful in seating the bear on the chair?” For the trial with human assistance, we also 2) “Is the chair accessible after the human assistance?” The results on the test set are shown in the Tab. I.

(1) Seating Bear Success
Experiment Trial Positive Annotation Num.
Num. = 5 \geq 4 \geq 3
Accessible 36 26 34 34
Inaccessible + Human Obeys 24 18 22 23
Inaccessible + Human Disobeys 12 10 10 11
Total 72 54 66 68
(2) Chair Accessible After Human Assistance
Experiment Assist. Positive Annotation Num.
Num. = 5 \geq 4 \geq 3
Inaccessible + Human Obeys 23 23 23 23
Inaccessible + Human Disobeys 11 10 11 11
Total 34 33 34 34
TABLE I: Experiment Results on the Test Set.

In table (1) of Tab. I, we show the number of trials with at least 5, 4, and 3 positive annotations to the first question. We count the trial as successful if the sitting imagination found the sitting pose and more than half of all the 5 annotations (at least 3) are positive. In general, we are able to achieve a very high success rate of 94.4%94.4\% on the 12 unseen chairs in the test set (68 out of 72). The step stool accounts for all the 4 failure cases in the 72 trials. The sitting imagination fails to find the sitting pose because the depth of the seat is too small. A successful trial of the step stool is shown in Fig. 6. Remarkably, the success rate of all the 6 trials of the improvised chair is 100%100\%. This opens up a promising potential of our imagination method to discover the affordance of an object which can afford the sitting functionality despite it not being a typical chair.

In table (2), we show the total number of trials with human assistance and the number of trials with at least 5, 4, and 3 positive annotations to the second question. The goal of human assistance is to make the chair accessible for placing the bear. We evaluate the accessibility of the chair at the end of human assistance. We only count the trial in which human assistance is involved, i.e., we disregard the failure trials where the sitting imagination fails to find the sitting pose and no instructions will be given. We consider the chair accessible if more than half of all the annotations are positive. The results show that our human assistance module is very effective – it achieves a 100%100\% success rate in making the chair accessible in all the trials with successful sitting imagination.

VII CONCLUSIONS & FUTURE WORK

We propose a novel method for robots to imagine the sitting pose for a previously unseen chair. We develop a robotic system which is able to seat a teddy bear on the chair autonomously via robot imagination. Moreover, we introduce a module for robots to ask a human to assist with changing the accessibility of the chair when it is in an inaccessible pose. Results show that our method enables the robot to seat the bear on 12 previously unseen chairs in 72 trials with a very high success rate. The human assistance module is also shown to be very effective in making the chair accessible from inaccessible poses. Future work can focus on exploring more versatile whole-body motion planning methods for seating the bear on chairs.

ACKNOWLEDGMENT

The authors thank Yuanfeng Han for discussions, all the volunteers for the human assistance in the experiments, and all the annotators for the human annotations.

References

  • [1] J. J. Gibson, The Ecological Approach to Visual Perception.   Boston, MA, USA: Houghton Mifflin, 1979.
  • [2] L. Bogoni and R. Bajcsy, “Interactive recognition and representation of functionality,” Computer Vision and Image Understanding, vol. 62, no. 2, pp. 194–214, 1995.
  • [3] H. Wu, D. Misra, and G. S. Chirikjian, “Is That a Chair? Imagining Affordances Using Simulations of an Articulated Human Body,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 7240–7246.
  • [4] L. Hinkle and E. Olson, “Predicting object functionality using physical simulations,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2013, pp. 2784–2790.
  • [5] H. Grabner, J. Gall, and L. Van Gool, “What makes a chair a chair?” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1529–1536.
  • [6] V. Seib, M. Knauf, and D. Paulus, “Detecting Fine-grained Sitting Affordances with Fuzzy Sets.” in Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016), 2016, pp. 289–298.
  • [7] E. Bar-Aviv and E. Rivlin, “Functional 3D Object Classification Using Simulation of Embodied Agent,” in Proceedings of the British Machine Vision Conference (BMVC), 2006, pp. 307–316.
  • [8] A. Myers, C. L. Teo, C. Fermüller, and Y. Aloimonos, “Affordance detection of tool parts from geometric features,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), 2015.
  • [9] J. Sawatzky, A. Srikantha, and J. Gall, “Weakly Supervised Affordance Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2795–2804.
  • [10] A. Roy and S. Todorovic, “A Multi-scale CNN for Affordance Segmentation in RGB Images,” in European Conference on Computer Vision (ECCV).   Springer, 2016, pp. 186–201.
  • [11] E. Ruiz and W. Mayol-Cuevas, “Geometric Affordance Perception: Leveraging Deep 3d Saliency With the Interaction Tensor,” Frontiers in Neurorobotics, vol. 14, p. 45, 2020.
  • [12] Y. Zhu, A. Fathi, and L. Fei-Fei, “Reasoning about Object Affordances in a Knowledge Base Representation,” in European Conference on Computer Vision (ECCV), 2014, pp. 408–424.
  • [13] S.-B. Ho, “Representing and using functional definitions for visual recognition,” Ph.D. dissertation, The University of Wisconsin-Madison, 1987.
  • [14] C. Desai and D. Ramanan, “Predicting Functional Regions on Objects,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013, pp. 968–975.
  • [15] A. Bansal, S. S. Rambhatla, A. Shrivastava, and R. Chellappa, “Detecting human-object interactions via functional generalization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 10 460–10 469.
  • [16] A. Stoytchev, “Behavior-Grounded Representation of Tool Affordances,” in 2005 IEEE International Conference on Robotics and Automation (ICRA), 2005, pp. 3060–3065.
  • [17] T.-T. Do, A. Nguyen, and I. Reid, “AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 5882–5889.
  • [18] F.-J. Chu, R. Xu, and P. A. Vela, “Learning Affordance Segmentation for real-World Robotic Manipulation via Synthetic Images,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1140–1147, 2019.
  • [19] H. Wu and G. S. Chirikjian, “Can I pour into it? Robot Imagining Open Containability Affordance of Previously Unseen Objects via Physical Simulations,” IEEE Robotics and Automation Letters, vol. 6, no. 1, pp. 271–278, 2020.
  • [20] P. Abelha and F. Guerin, “Learning how a tool affords by simulating 3D models from the web,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 4923–4929.
  • [21] L. Piyathilaka and S. Kodagoda, “Affordance-map: Mapping human context in 3D scenes using cost-sensitive SVM and virtual human models,” in 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), 2015, pp. 2035–2040.
  • [22] R. Bajcsy, “Active perception,” Proceedings of the IEEE, vol. 76, no. 8, pp. 966–1005, 1988.
  • [23] H. Kjellström, J. Romero, and D. Kragić, “Visual object-action recognition: Inferring object affordances from human demonstration,” Computer Vision and Image Understanding, vol. 115, no. 1, pp. 81–90, 2011.
  • [24] P. W. Battaglia, J. B. Hamrick, and J. B. Tenenbaum, “Simulation as an engine of physical scene understanding,” Proceedings of the National Academy of Sciences, vol. 110, no. 45, pp. 18 327–18 332, 2013.
  • [25] Y. Zhu, Y. Zhao, and S.-C. Zhu, “Understanding Tools: Task-Oriented Object Modeling, Learning and Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  • [26] Y. Zhu, C. Jiang, Y. Zhao, D. Terzopoulos, and S.-C. Zhu, “Inferring Forces and Learning Human Utilities from Videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3823–3833.
  • [27] L. Kunze and M. Beetz, “Envisioning the qualitative effects of robot manipulation actions using simulation-based projections,” Artificial Intelligence, vol. 247, pp. 352–380, 2017.
  • [28] S. Boschert and R. Rosen, “Digital Twin—The Simulation Aspect,” in Mechatronic Futures, 2016, pp. 59–74.
  • [29] M. Dawson-Haggerty et al., “Trimesh,” https://trimsh.org/, 2019–2021.
  • [30] Y. Han, R. Li, and G. S. Chirikjian, “Can I lift it? Humanoid robot reasoning about the feasibility of lifting a heavy box with unknown physical properties,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 3877–3883.
  • [31] J. J. Kuffner and S. M. LaValle, “RRT-Connect: An efficient approach to single-query path planning,” in 2000 IEEE International Conference on Robotics and Automation (ICRA), vol. 2, 2000, pp. 995–1001.
  • [32] I. A. Şucan, M. Moll, and L. E. Kavraki, “The Open Motion Planning Library,” IEEE Robotics & Automation Magazine, vol. 19, no. 4, pp. 72–82, 2012.
  • [33] J. Pan, S. Chitta, and D. Manocha, “FCL: A general purpose library for collision and proximity queries,” in 2012 IEEE International Conference on Robotics and Automation (ICRA), 2012, pp. 3859–3866.
  • [34] P. J. Besl and N. McKay, “A method for registration of 3-D shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239–256, 1992.
  • [35] B. Curless and M. Levoy, “A Volumetric Method for Building Complex Models from Range Images,” in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 1996, pp. 303–312.
  • [36] R. B. Rusu and S. Cousins, “3D is here: Point Cloud Library (PCL),” in 2011 IEEE International Conference on Robotics and Automation (ICRA), 2011, pp. 1–4.
  • [37] E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” http://pybullet.org, 2016–2021.
  • [38] P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli, and G. Ranzuglia, “Meshlab: an Open-Source Mesh Processing Tool.” in Eurographics Italian Chapter Conference, vol. 2008, 2008, pp. 129–136.
  • [39] M. D. Onis, “The WHO child growth standards.” Pediatric nutrition in practice, pp. 254–269, 2008.