Let’s be friends! A rapport-building 3D embodied conversational agent for the Human Support Robot
Abstract.
Partial subtle mirroring of nonverbal behaviors during conversations (also known as mimicking or parallel empathy), is essential for rapport building, which in turn is essential for optimal human-human communication outcomes. Mirroring has been studied in interactions between robots and humans, and in interactions between Embodied Conversational Agents (ECAs) and humans. However, very few studies examine interactions between humans and ECAs that are integrated with robots, and none of them examine the effect of mirroring nonverbal behaviors in such interactions. Our research question is whether integrating an ECA able to mirror its interlocutor’s facial expressions and head movements (continuously or intermittently) with a human-service robot will improve the user’s experience with the support robot that is able to perform useful mobile manipulative tasks (e.g. at home). Our contribution is the complex integration of an expressive ECA, able to track its interlocutor’s face, and to mirror his/her facial expressions and head movements in real time, integrated with a human support robot such that the robot and the agent are fully aware of each others’, and of the users’, nonverbals cues. We also describe a pilot study we conducted towards answering our research question, which shows promising results for our forthcoming larger user study.
1. Introduction
Forthcoming human-support robots are anticipated to assist people in a variety of contexts (Dahl and Boulos, 2014; Šabanović, 2010) involving socio-emotional personal information (e.g. helping an elderly person live safely independently) best communicated to humans via their innate communication modalities (e.g. speech, facial expressions, gestures), and ideally with established rapport between interlocutors. We aim to continue investigating the introduction of ECAs on social and service robots as a natural user interface metaphor for HRI (Peña et al., 2018; Goodrich and Schultz, 2008).
Establishing and maintaining rapport between humans is a proven determinant of positive communication outcomes, and is the result of a combination of highly socio-cultural-emotional complex processes, some of which are unconscious: mutual attentiveness (e.g., mutual gaze, mutual interest, and focus during interaction), positivity (e.g., head nods, smiles, friendliness, and warmth) and unconscious coordination (e.g., postural mirroring, synchronized movements, balance, and harmony) (Grahe, 1999; Tickle-Degnen and Rosenthal, 1990).
In this article, we focus on one of these processes, coordination, of partial subtle mirroring, and synchronized movements of facial expressions, and head movements (Fischer and Hess, 2017; Hess and Fischer, 2014; Chartrand et al., 2005). Mirroring has been studied extensively in interactions between robots and humans, as well as in interactions between ECAs and humans. However, there are only a few studies (Duque-Domingo et al., 2020; Cavedon et al., 2015) that examine interactions between humans and ECAs that are integrated with robots, and none of them examine the effect of mirroring nonverbal behaviors in such interactions.
Our aim is to answer the research question as to whether integrating an ECA capable of mirroring its interlocutor’s facial expressions and head movements (continuously or intermittently) with a human-service robot will improve the user’s experience with the support robot that is capable of performing useful mobile manipulative tasks (e.g. at home). Our current contribution reviews latest research on our topic, and discusses our approach to modeling rapport for human-robot interaction. We discuss the integration of our speaking, expressive, and realistic ECA with a robotic platform, the Toyota Human-Support Robot (shown in Figure 1), and how we enabled the ECA to sublty track its interlocutor’s face, and to mirror their facial expressions and head movements in real time. Lastly, we describe a pilot study we conducted towards answering our research question, which shows promising results for our forthcoming larger user study.
2. Related work
In human-robot interaction (HRI), research on robots establishing rapport is under way, with some research groups investigating verbal (Dieter et al., 2019; Seo et al., 2018; Grigore et al., 2016), and non-verbal (Riek et al., 2010; Ritschel et al., 2019; Hasumoto et al., 2020) cues. Previous work has examined how mimicry affects human-ECA interaction and human-robot interaction, but not human interaction with an ECA running on a robot. Hasumoto et al. (Hasumoto et al., 2020) studied the effects of body movement mimicry in human-robot interaction by designing the Reactive Chameleon, a method of generating robot body movements that subtly mimics human body swaying during interactions. They found that subtle mimicry can positively impact the establishment of rapport, while noticeable mimicry can negatively impact it. However, this method was limited to mimicking movements of the torso and did not consider movement of other parts of the robot such as the head.
In the experiment by Riek et al. (Riek et al., 2010), participants interacted with a robotic chimpanzee named Virgil that exhibited three different types of behavior: full mimicry of head gestures, partial mimicry of nodding gestures only, and no mimicry accompanied by periodic blinks. Afterwards, participants filled out a survey that measured the social attraction toward and emotional credibility of conversation partners. No significant differences were found between participant ratings of the different mimic conditions. However, this might have been due to technical issues, as a few participants “said that the head movements were too erratic or jerky.” Other participants wished for the robot to make “non-speech sounds” (backchannel cues) to indicate understanding in conjunction with head gestures. This suggests that robot mimicry of nonverbal behavior by itself might not be enough to create rapport during an interaction with a human.
Niewiadomski et al. (Niewiadomski et al., 2010) studied how mimicry of smiles influenced interactions between ECAs and humans by testing three different types of ECA smiling behavior when providing backchannel cues: mimicking the smiles of a participant (MS), randomly smiling (RS), and no smiling (NS). They found that participants felt less engaged and more frustrated in condition NS than in condition MS, and “felt more at ease and more listened to” while telling a story to the ECA in condition MS than RS. These results suggest that mimicry in the smiling behavior of an ECA influences “the quality, ease, and warmth, of the user-agent interaction.”

In the first experiment of the case study by Stevens et al. (Stevens et al., 2016), participants read some sentences and then listened to an ECA speak some of those sentences incorrectly. They were then asked to say the correct version of the sentence to the ECA. Afterwards, when interacting with the experimental group, the ECA repeated what the subject said while mirroring the eyebrow raises and head nods that were observed during the subject’s reading and correcting of potentially erroneous sentences, whereas in the control group no mimicry occurred when the sentence was repeated. The results showed that more prominent cues lead to higher ratings of ECA lifelikeness in the mimic condition, which supports the use of mimicry for building rapport in human-ECA interaction.
Although there exists previous work that integrates an avatar on a robot, to the best of our knowledge no study examines how the mimicry of nonverbal behavior by an ECA running on a robot influences interactions with humans. For example, Domingo et al. (Duque-Domingo et al., 2020) projected an avatar on a robotic head and designed a gaze control system that enabled the robotic head to reorient its position based on the location of people that it interacted with. However, the study did not investigate mimicry of nonverbal behavior by the avatar or by the robot.
3. Approach
We propose to integrate the Toyota Human Support Robot (HSR) (Yamamoto et al., 2019) with a fully autonomous ECA. HSR is a social robot designed to assist people with disabilities and the elderly with household tasks such as cleaning or bringing objects. HSR has a wide array of sensors that provide rich data on its surrounding environment. By using the Robot Operating System (ROS), which provides services such as access of sensor data from HSR via ROS topics, we can create modules that utilize the sensor data to drive robot behavior during interaction with humans. For an ECA, we use the modular framework eEVA (Polceanu and Lisetti, 2019), which enables the creation of ECA dialogs suitable for a wide range of scenarios.
3.1. Face Detection and Posture Mimicking
To detect the face of a participant interacting with the HSR, we use an adapted version of DLib (King, 2009) that gets images from HSR’s Asus Xtion Pro RGB-D camera and is able to run with 30fps. When DLib detects a face in the image, it draws a box around the face and marks the face with 68 landmarks as seen in Fig. 2.
Next, we create a ROS node that extracts the central position of participant’s face and publishes the extracted data to a ROS topic. This data can then be accessed by both the ECA and HSR for posture mimicry. As the ECA runs on as a stand-alone Unity application, we use the Rosbridge 111http://wiki.ros.org/rosbridge_suite library to facilitate communication between the ROS node and the ECA.

3.2. Facial Expression on Emotions Mirroring
In addition to mimicking posture, the ability to mirror facial expressions in human-human communication or at least the most universal of such emotive expressions, is important. According to Ekman (Ekman, 1992), there are seven basic emotions that can be expressed by the human face. Our eEVA agent (Polceanu and Lisetti, 2019) (cf. 3) is able to portray any of these emotions in real time through movements of all the individual facial action units identified by Ekman. To enable our system to recognize emotions, we create a ROS node that utilizes EmoPy (Angelica, [n.d.]), a deep learning toolkit that classifies the seven basic emotions from facial expressions (see Fig. 4). We publish the classified emotion from EmoPy to a ROS topic, which is then accessed by eEVA and mapped to facial expressions on the avatar. In that way, the avatar mimics the participant’s facial expression during their interaction.
4. Pilot study experiment and Discussion
Overview. To answer our research question – whether integrating an ECA able to mirror its interlocutor’s facial expressions and head movements (continuously or intermittently) with a human-service robot will improve the user’s experience with the support robot that is able to perform useful mobile manipulative tasks (e.g. at home) – we designed three within-subjects experiments described in the next sections to assess the impact of various skills on the user’s sense of comfort and naturalness with the robot: experiment 1 assesses the impact of posture-mimicking skills; experiment 2 assesses the impact of facial-expression-mirroring skills; and experiment 3 assesses the impact of combining posture-mimicking with facial-expression-mirroring. In preparation for running our planned large user study (with diverse participants, and validated measures of engagement, rapport, and presence), we conducted a pilot study of these experiments and discuss its results.
Material. All three pilot experiments were performed under the following setup: EmoPy and DLib run on an HP Spectre laptop with 16 GB RAM, four CPUs (Intel Core i7-6500U @ 2.50GHz), and an integrated graphics unit (Intel HD Graphics 520, Skylake GT2). The ECA was created using the Unity game engine and runs on HSR, which has a CPU board (Intel Core i7-4700EQ @ 2.4GHz) and an NVIDIA Jetson GPU. Communication between the laptop and HSR in close proximity (5 meters) is facilitated using a 5G WiFi network with an average bandwidth speed of 5 ms.
Participants. All three pilot experiments were conducted with three participants recruited from our lab, ages 24 to 56 years. Participants were 2 (66%) males and 1 (33%) female, with mean age of 35 years (SD = 18.2 years). One participant reported his race as Asian (33%), and two reported their race as White (66%). Their education level was Graduate Degree (100%).
Procedure (in common). All three pilot experiments were conducted with the ECA running on the HSR as shown in Fig. 1. The HSR was set up in one spot, and lighting conditions remained unchanged. Depending upon the experiment, participants were asked to take various positions while interacting with the ECA-HSR.
After interacting with the ECA-robot, participants were asked to answer questions, and an informal discussion followed. We enabled and disabled modules for each corresponding behavior to control our independent variables.
EXPERIMENT 1: Assessing skills in posture-mimicking In this experiment, we tested the effect on the user of the posture-mimicking skills of the ECA alone, of the robot alone, and of both skills in synchrony. The module running DLib was used to control the posture mimicking behavior (face following) behavior (Fig. 2). Our independent variable was posture-mimicking with three possible conditions:
-
•
the ECA looks at the person and its face moves according to the user’s movements, while the robot stays immobile;
-
•
only the robot head moves following the direction to which the user is moving, while the ECA’s face stays immobile in the center of the robot screen; the robot head can eiter move left-right-up-down (all 4 direction and its mixtures), it does not tilt up or down more than 23 degrees, and does not rotate left or right more than 35 degrees);
-
•
both the ECA’s face and the robot head move according to the user’s movements.
Procedure (cont.) for Exp. 1. Participants were asked to stand in front of the robot (60 cm approx.), to turn their head to the sides, as well as to move their body, with the restriction that their face should not be turned more than 90 degrees from the camera. This is because DLib can detect faces from front and angled profiles, but not from a complete side profile. Each mode of interaction lasted 15 seconds. At the end of the session, we asked ”Which of the three presented versions seemed most comfortable or natural during the interaction? Why?”

Results. All participants identified the interaction with the ECA on HSR to be the most natural in the case where only the robot head was moving. They also noted that, in the case where only the ECA moved,
it would feel unrealistic if they were walking around the room. The combination of the ECA and the robot head moving in synchrony put the line of sight of the robot away from the participant, which resulted in an impression that the interaction was unnatural.

EXPERIMENT 2: Assessing skills in mirroring facial expression of emotions. In this experiment, only the response to the ECA’s facial mirroring skills was evaluated. The module running EmoPy was used to control the emotion mirroring behavior (Fig. 4). Our independent variable was emotive facial mimicry, with two possible conditions:
-
•
emotive facial expression mirroring disabled;
-
•
emotive facial expression mirroring active on ECA.
Procedure (cont.) for Exp. 2. Each participant were asked to stand in the same proximity to the HSR as before (60 cm), but this time they were asked not to walk around the lab during the experiment. Each person was asked to while looking at the robot, randomly portray three expressions for 10 seconds (to give the system enough time to accomplish the emotion mirroring): happiness, anger, and neutral. Each mode of interaction lasted 20 seconds. Participants stood in front of the HSR’s head camera so that their faces would be close enough for expression detection. After the interaction was complete, participants were asked ”Which of the two presented versions did you find more engaging? Why?”
Results. All participants expressed that, although it was exaggerated and funny at times, the interaction with the ECA on HSR was more engaging when emotive facial expression mirroring was enabled.
They also noted that sometimes the emotion mirrored by ECA was not the same emotion they were portraying.
EXPERIMENT 3: Assessing skills combining facial mirroring with posture mimicking. In this experiment, we tested whether posture mimicking, in conjunction with emotion mirroring, improves user’s comfort level during human-robot interaction. Our independent variable was mimicking combination with two conditions:
-
•
both posture mimicking and emotion mirroring disabled
-
•
both posture mimicking and emotion mirroring enabled (the ECA was mirroring emotion and the robot head and ECA were turning towards the location of each participant’s face)
Procedure (cont.) for Exp. 3. Participants were asked to walk to the side of the robot while maintaining a close proximity, and to express three different emotions on their face (happiness, anger, neutral) every five to ten seconds, while facing the robot camera. They were given no time restriction on how long they needed to interact with either setup, and asked to give a verbal cue when they wished to end the interaction. When they stopped, they were asked ”Which version of the two presented versions (posture mimicking and emotion mirroring both enabled, or neither) did you prefer? Why?”
Results. All participants preferred the condition where both posture mimicking and emotive mirroring were enabled. They explained that the two behaviors made the ECA and the robot more engaging, and helped to establish a connection with the ECA and the robot.
Joint discussion. During the informal discussion with all participants, they all agreed that the interaction with the ECA on HSR was more natural with posture mimicry present, while emotive mirroring engaged them more. We also noticed that during the third experiment, participants chose to interact longer when both behaviors were enabled.
Technical considerations. In all three experiments, we found that the ECA on HSR was performing well under three conditions: (1) the face of a participant was not obstructed; (2) the participant was not outside of the maximal scope of sight of the robot’s camera, and (3) the participant was not faced away from the robot. DLib excels at detecting faces in a frontal profile but when a participant turns away, it cannot properly detect their face. The same applies to using EmoPy for detecting emotions on facial expressions. Furthermore, we discovered (1) some latency issues when posture mimicking behavior was enabled (likely due to latency issues in the WiFi connection); and (2) some inaccuracies in the emotions detected by EmoPy (likely due to the small size of the training set used to create models for classifying emotions).
5. Conclusions
To our knowledge, our proposed approach is one of the earliest, if not the first, to study how mimicry of nonverbal behavior by a fully autonomous ECA integrated with a fully autonomous physical service robot influences the agents’ interactions with humans. The preliminary experiments conducted on rapport-building – specifically on posture mimicking and emotive mirroring – show promising results.
In the first experiment on posture mimicking, participants preferred the robot head moving alone without extra movement from the ECA. In the second experiment on emotive mirroring, participants felt more engaged when the ECA mirrored their facial expression, and reported that the interaction felt more natural. Finally, in the third experiment that investigated the combined behaviors of posture mimicking and emotive mirroring, participants chose to interact longer when both behaviors were enabled in comparison to when neither were enabled.
For future work, we will consider alternatives to EmoPy to improve the emotive mirroring behavior of our ECA. More importantly, we plan to conduct these experiments beyond this pilot with a large enough N and validated measures to produce statistically significant results that can be generalized.
6. Appendices
The video demonstration of the experiments can be found at https://www.cs.miami.edu/home/visser/hsr-videos/nHRI21.mp4.
References
- (1)
- Angelica ([n.d.]) Perez Angelica. [n.d.]. EmoPy: A Machine Learning Toolkit for Emotional Expression, 2018.
- Cavedon et al. (2015) Lawrence Cavedon, Christian Kroos, Damith Herath, Denis Burnham, Laura Bishop, Yvonne Leung, and Catherine J. Stevens. 2015. “C’Mon dude!”: Users adapt their behaviour to a robotic agent with an attention model. International Journal of Human-Computer Studies 80 (2015), 14–23. https://doi.org/10.1016/j.ijhcs.2015.02.012
- Chartrand et al. (2005) Tanya L Chartrand, William W Maddux, and Jessica L Lakin. 2005. Beyond the perception-behavior link: The ubiquitous utility and motivational moderators of nonconscious mimicry. The new unconscious (2005), 334–361.
- Dahl and Boulos (2014) Torbjørn S Dahl and Maged N Kamel Boulos. 2014. Robots in health and social care: A complementary technology to home care and telehealthcare? Robotics 3, 1 (2014), 1–21.
- Dieter et al. (2019) Justin Dieter, Tian Wang, Arun Tejasvi Chaganty, Gabor Angeli, and Angel Chang. 2019. Mimic and Rephrase: Reflective listening in open-ended dialogue. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). 393–403.
- Duque-Domingo et al. (2020) Jaime Duque-Domingo, Jaime Gómez-García-Bermejo, and Eduardo Zalama. 2020. Gaze Control of a Robotic Head for Realistic Interaction With Humans. Frontiers in Neurorobotics 14 (2020), 34. https://doi.org/10.3389/fnbot.2020.00034
- Ekman (1992) Paul Ekman. 1992. An argument for basic emotions. Cognition & Emotion 6, 3/4 (1992), 169–200. http://www.tandfonline.com/doi/abs/10.1080/02699939208411068
- Fischer and Hess (2017) Agneta Fischer and Ursula Hess. 2017. Mimicking emotions. Current opinion in psychology 17 (2017), 151–155.
- Goodrich and Schultz (2008) Michael A Goodrich and Alan C Schultz. 2008. Human-robot interaction: a survey. Now Publishers Inc.
- Grahe (1999) JE Grahe. 1999. The importance of nonverbal cues in judging rapport. Journal of Nonverbal behavior 23, 4 (1999), 253–269. http://www.springerlink.com/index/V8U30855W38M4673.pdf
- Grigore et al. (2016) Elena Corina Grigore, Andre Pereira, Ian Zhou, David Wang, and Brian Scassellati. 2016. Talk to me: Verbal communication improves perceptions of friendship and social presence in human-robot interaction. In International conference on intelligent virtual agents. Springer, 51–63.
- Hasumoto et al. (2020) Ryosuke Hasumoto, Kazuhiro Nakadai, and Michita Imai. 2020. Reactive Chameleon: A Method to Mimic Conversation Partner’s Body Sway for a Robot. International Journal of Social Robotics 12, 1 (2020), 239–258.
- Hess and Fischer (2014) Ursula Hess and Agneta Fischer. 2014. Emotional mimicry: Why and when we mimic emotions. Social and personality psychology compass 8, 2 (2014), 45–57.
- King (2009) Davis E King. 2009. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research 10 (2009), 1755–1758.
- Niewiadomski et al. (2010) Radoslaw Niewiadomski, Ken Prepin, Elisabetta Bevacqua, Magalie Ochs, and Catherine Pelachaud. 2010. Towards a Smiling ECA: Studies on Mimicry, Timing and Types of Smiles. In Proceedings of the 2nd International Workshop on Social Signal Processing (Firenze, Italy) (SSPW ’10). Association for Computing Machinery, New York, NY, USA, 65–70. https://doi.org/10.1145/1878116.1878134
- Peña et al. (2018) Pedro Peña, Mihai Polceanu, C Lisetti, and Ubbo Visser. 2018. eEVA as a real-time multimodal agent human-robot interface. In Robot World Cup. Springer, 262–274.
- Polceanu and Lisetti (2019) Mihai Polceanu and Christine Lisetti. 2019. Time to Go ONLINE! A Modular Framework for Building Internet-Based Socially Interactive Agents. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents (Paris, France) (IVA ’19). Association for Computing Machinery, New York, NY, USA, 227–229. https://doi.org/10.1145/3308532.3329452
- Riek et al. (2010) Laurel D Riek, Philip C Paul, and Peter Robinson. 2010. When my robot smiles at me: Enabling human-robot rapport via real-time head gesture mimicry. Journal on Multimodal User Interfaces 3, 1 (2010), 99–108.
- Ritschel et al. (2019) Hannes Ritschel, Ilhan Aslan, Silvan Mertes, Andreas Seiderer, and Elisabeth André. 2019. Personalized synthesis of intentional and emotional non-verbal sounds for social robots. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 1–7.
- Šabanović (2010) Selma Šabanović. 2010. Robots in society, society in robots. International Journal of Social Robotics 2, 4 (2010), 439–450.
- Seo et al. (2018) Stela H Seo, Keelin Griffin, James E Young, Andrea Bunt, Susan Prentice, and Verónica Loureiro-Rodríguez. 2018. Investigating people’s rapport building and hindering behaviors when working with a collaborative robot. International Journal of Social Robotics 10, 1 (2018), 147–161.
- Stevens et al. (2016) Catherine J Stevens, Bronwyn Pinchbeck, Trent Lewis, Martin Luerssen, Darius Pfitzner, David MW Powers, Arman Abrahamyan, Yvonne Leung, and Guillaume Gibert. 2016. Mimicry and expressiveness of an ECA in human-agent interaction: familiarity breeds content! Computational cognitive science 2, 1 (2016), 1–14.
- Tickle-Degnen and Rosenthal (1990) L. Tickle-Degnen and Robert Rosenthal. 1990. The nature of rapport and its nonverbal correlates. Psychological Inquiry 1, 4 (1990), 285–293. http://www.tandfonline.com/doi/abs/10.1207/s15327965pli0104_1
- Yamamoto et al. (2019) Takashi Yamamoto, Koji Terada, Akiyoshi Ochiai, Fuminori Saito, Yoshiaki Asahara, and Kazuto Murase. 2019. Development of Human Support Robot as the research platform of a domestic mobile manipulator. ROBOMECH Journal 6, 1 (18 Apr 2019), 4. https://doi.org/10.1186/s40648-019-0132-3