Group Dynamics: Survey of Existing Multimodal Models and Considerations for Social Mediation
^†^†thanks: ¹Honda Research Institute USA, Inc. {hifza_javed, njamali}@honda-ri.com

Hifza Javed¹, and Nawid Jamali¹

Abstract

Social mediator robots facilitate human-human interactions by producing behavior strategies that positively influence how humans interact with each other in social settings. As robots for social mediation gain traction in the field of human-human-robot interaction, their ability to “understand” the humans in their environments becomes crucial. This objective requires models of human understanding that consider multiple humans in an interaction as a collective entity and represent the group dynamics that exist among its members. Group dynamics are defined as the influential actions, processes, and changes that occur within and between group interactants. Since an individual’s behavior may be deeply influenced by their interactions with other group members, the social dynamics existing within a group can influence the behaviors, attitudes, and opinions of each individual and the group as a whole. Therefore, models of group dynamics are critical for a social mediator robot to be effective in its role. In this paper, we survey existing models of group dynamics and categorize them into models of social dominance, affect, social cohesion, conflict resolution, and engagement. We highlight the multimodal features these models utilize, and emphasize the importance of capturing the interpersonal aspects of a social interaction. Finally, we make a case for models of relational affect as an approach that may be able to capture a representation of human-human interactions that can be useful for social mediation.

Index Terms:

group dynamics, social mediation, social robot, social dominance, social cohesion, engagement, relational affect

I Introduction

Socially assistive robots have been leveraged in a number of application domains, including companion robots [1], robots for learning [2], and robots as assistants for older adults [3]. In all such applications, a key function of the robot is to act as a social mediator. In general, a social mediator facilitates human-human interactions with the goal of strengthening existing relationships between individuals and helping them form new connections. This requires that the robot produce behaviors that are able to impact how humans in its environment interact with one another. However, to produce behaviors that are relevant to the context of the social situation, the robot must first be able to “understand” the humans and the interaction dynamics among them in its environment.

Human-human interactions involve multiple individuals that interact with one another through complex verbal and nonverbal signals that are contextual and change over time. To successfully mediate such interactions, a robot must recognize these signals in real-time and understand what these mean in a given context. This understanding can then be leveraged to generate robot behaviors that can help build relationships, improve connectedness, and achieve group-specific goals.

Therefore, a prerequisite for social mediation is the understanding of group dynamics—the influential actions, processes, and changes that occur within and between group interactants [4]. Healthy group dynamics may manifest in various forms, including cooperation, creativity, cohesion, likeability, conflict resolution, open communication, and strong team performance. The interactions among group members have the potential to significantly impact an individual’s behavior [5], and as such, the social dynamics present within a group can shape the attitudes, opinions, and behaviors of both individuals and the group as a whole [6]. Thus, the development of models of group dynamics is critical to ensure that social mediator robots can perform their roles effectively.

In this paper, we survey existing approaches to model group dynamics that are used in human-human-robot interaction (HHRI) studies and evaluate their ability to represent the interpersonal aspects of a social interaction. Specifically, we do this by collecting the features these models use to represent human-human interactions and analyzing their ability to capture the interplay of behavioral cues like gaze, gestures, tone of voice, etc. We highlight some considerations for the next steps in this research field and make a case for models of relational affect that can be useful in building group-level understanding of human-human interactions.

II Existing modeling approaches

Prior research has explored several different perspectives on modeling of group-level interaction phenomena. In this section, we describe the most frequently researched themes within group dynamics and their applications within the field of HHRI.

II-1 Social dominance

Social dominance is a relational, behavioral, and interacting state in which an individual achieves control or influence over others through communicative actions [7]. Studies have shown that highly dominant children receive more social attention than their peers, which can affect their performance in group learning environments [7]. It is also associated with harsh power tactics in workplace environments [8]. Since social dominance is communicative in nature, it manifests in ways that make its detection, measurement, and classification easier compared to other phenomena. As a result, it has been investigated most frequently in existing research [7, 9, 10].

In HHRI literature, dominance is viewed from the perspective of imbalance in participation from the interactants. Therefore, dominance-related studies typically attempt to identify the least or most dominant individual in a group, in settings such as group meetings at the office [9], group project discussions among students [11], or interactive storytelling for child-robot interaction scenarios [7].

II-2 Affect

This category encompasses models based on a combination of affect, emotions, and mood. Affect is an umbrella term in psychology that refers to the experience of feelings, emotions, or moods [12]. Emotions are short-term and intense whereas moods are long-term and diffuse affective states. Moods emphasize a stable affective context, while emotions emphasize affective responses to specific events [13]. Since affect is an internal feeling state, often without clear, homogeneous expressions, its evaluation can be a significant challenge.

Research related to affective models typically represent emotions on the valence-arousal circumplex [14], often utilizing learning methods to classify emotion states from facial expressions. Bottom-up approaches combine individual-level affective states to form a measure of group affect [15], whereas top-down approaches focus on using global, scene-level features [16]. For a bottom-up approach, how the individual-level states may be combined to accurately capture group states is an important consideration.

II-3 Social cohesion

Social cohesion is the bonding that affects the membership of an individual in a group and is the most important attribute of a successful group [17]. It may manifest in the form of interpersonal attraction between group members, commitment to the task of the group, susceptibilities to interpersonal influence, and loyalty to a group [18]. Groups with high levels of social cohesion among their members tend to be highly productive and successful, and the individual members experience high levels of personal satisfaction [19]. Manifestations of social cohesion also tend to be less observable than social dominance, which necessitates a reliance on post-interaction surveys to extract cohesion-related information [20].

Research related to computational modeling of social cohesion in groups may address certain social aspects of cohesion, such as interest level within a group [21], rapport [22, 23], attraction [24], or synchrony [25], or may model cohesion as a whole, comprising of both task cohesion and social cohesion [26].

II-4 Conflict resolution

Conflict resolution is the process that two or more parties use to find a peaceful solution to their dispute. It can be important at the workplace where successful resolution of conflicts can lead to greater efficiency and goal achievement, and maintain a positive, comfortable environment for all employees. Since emotion regulation plays a key role in conflict resolution, some models of conflict resolution that target emotion regulation may be similar to models of affect [27, 28]. Emotion regulation is defined as a process by which individuals influence which emotions they have, when they have them, and how they experience and express them [29].

In HHRI literature, conflict resolution is often studied within the context of child-play scenarios, where play behaviors are analyzed as constructive vs. non-constructive and social vs. non-social when conflicts related to object possession occur between the playmates [30]. Prior studies have also established the effectiveness of a robot acting as an emotion regulator during group interactions, where mediative actions from the robot can influence conflict resolution skills [28]. These actions include, first, drawing attention to interpersonal conflicts arising from personal violations; second, discouraging interpersonal conflicts by identifying them as inappropriate; and finally, using humor to alleviate tension.

II-5 Engagement

According to Sidner et al. [31], engagement is defined as “the process by which individuals involved in an interaction start, maintain and end their perceived connection to one another.” The definition of engagement and its specific observations may vary slightly with the interaction task and the experimental settings used in a study. For example, in a robot-mediated activity for older adults, engagement may be observed through an individual’s attendance to the activity, the degree of attentiveness in the activity, the degree of active participation, the person’s attitude, and whether the person appears bored [32]. On the other hand, for children with autism disorders, engagement may be determined as a combination of behaviors such as gaze focus, imitation, verbalizations, self-initiated interactions, triadic interactions, and smiling [33]. In general, however, engagement is understood to be related to interactivity, attention, and the general user experience [34].

III Multimodal features for modeling group dynamics

Existing research has explored the use of multiple modalities of features to measure various aspects of group dynamics in HHRIs. These modalities include audio, visual, and physiological measurements, alongside subjective measures derived from questionnaires and surveys. This section summarizes the commonly used features in models of group dynamics.

III-1 Audio features

The use of audio cues to model group dynamics is an active area of research. Audio cues can provide valuable insights into how groups function and communicate, as well as to inform the design of more effective group communication strategies. Audio features consist of both verbal cues containing spoken content and vocal cues containing sounds, such as laughing or backchanneling [35, 36]. Verbal and vocal cues are widely used in modeling group dynamics since verbal activity can be a reliable indicator of interactivity and communication between the interactants. While these cues may provide information about the states of the individuals, they can also capture the relational aspects of a human-human interaction with measures related to turn-taking behaviors and speaking times [37, 7]. Table I summarizes the key audio cues used to model group dynamics.

TABLE I: Key audio features for modeling group dynamics

Feature	Description	Model
Total speaking energy	Speaker energy accumulated over the interaction	Dominance [9], Cohesion [26]
Total speaking length	Total time that an individual speaks	Dominance [9, 38, 39, 7], Cohesion [26], Engagement [32]
Total speaking turns	Total number of time intervals for which an individual’s speaking status is active	Dominance [9, 11, 39], Cohesion [26]
Backchannel length	Total number of short turns consisting of backchanneling behaviors	Cohesion [26]
Total number of interruptions	Cumulative number of times that an individual starts talking while another speaker is active	Dominance [9, 38, 7], Cohesion [26]
Total number of failed interruptions	Amount of time an individual talks but is unable to take over the floor from the speaker	Cohesion [26]
Total speaker floor grabs	Number of times an individual starts talking while there are other people speaking and all others stop talking before the individual does	Dominance [10]
Silence time	Amount of time spent in silence between exchanges in floor grabs	Cohesion [26]
Time between floor exchanges	Amount of time between all floor exchanges	Cohesion [26]
Speaking rate	Represents the pace of the conversation and is computed using the mrate estimator [40]	Cohesion [26]
Utterance addressee	Total time spent speaking to a certain addressee	Dominance [7]
Utterance type	Category or subject of each utterance from an individual	Dominance [7]
Participation unevenness	Difference between each individual’s speech time and the average speech time for the group	Dominance [39]
Low-level descriptors of voice	Spectral and cepstral coefficients, voice quality, energy, logarithmic harmonic-to-noise ratio, spectral harmonicity, psychoacoustic spectral sharpness, etc.	Affect [16], Cohesion [16]
Prosody	Prosodic features like pitch, loudness, and pace	Affect [16], Conflict (through emotion regulation) [29]

III-2 Visual features

Visual cues provide valuable means of capturing interpersonal behaviors. Visual cues refer to any nonverbal signals that are produced by the members of the group, such as facial expressions, body language, and gestures. While it is generally easier to capture relational signals from audio cues, as they are often produced in the presence of other interactants, visual observations from a social interaction can also provide valuable information on the interplay of behaviors, such as gaze [9], changes in proximity between members [16], and the level of emotional expressions during an interaction [41]. Table II summarizes the key visual features used to model group dynamics.

TABLE II: Key visual features for modeling group dynamics

Feature	Description	Model
Visual activity	A binary variable that indicates if a participant is visually active or inactive at each time step	Dominance [9]
Level of subtle visual activity	Strength of visual activity extracted from close view cameras using residual coding bit rate	Cohesion [26]
Total visual activity length	The accumulated motion activity for an individual	Dominance [9]
Total visual activity turns	Number of times an individual is continuously moving without breaks	Dominance [9]
Turn duration	Total number of time intervals for which an individual’s visual activity status is active	Dominance [10]
Physical coercion	Total time spent engaging in physical coercion (shoving, grabbing, etc.)	Dominance [7]
Gaze focus	Total time spent looking at specified targets in the interaction such as the robot or other interactants	Dominance [7], Cohesion [22], Engagement [31, 32]
Head nods	Number of times an individual’s head moves up and down in a single continuous movement on a vertical axis	Cohesion [22]
Head pose	Head pose yaw angle	Engagement [32, 34]
Sizes of face and body	Relative sizes of faces and bodies from image data	Affect [42, 15]
Relative location	Individual’s distance from the group	Affect [42, 15]
Interaction time	Amount of time spent in the interaction	Engagement [31]
Shared looking	Amount of time spent in coordinated gaze on targets of mutual interest	Engagement [31]

III-3 Physiological features

Physiological signals are generated by the body during the functioning of various physiological systems and refer to the bodily functions that can be measured and analyzed. These include signals that reflect changes to skeleton muscles (electromyogram or EMG), changes in heart beat or rhythm (electrocardiogram or ECG), changes in brain activity measured through the scalp (electroencephalogram or EEG), and changes in corneo-retinal potential measured from the human eye (electrooculogram or EOG) [43]. Physiological signals can provide insights into the internal states of individuals, including their emotions [44], stress levels [45], and cognitive processes [46]. By measuring physiological signals of individuals within a group, researchers can analyze how these signals correlate with the events of the interaction and influence the group dynamics. Table III summarizes the key physiological features used to model group dynamics.

TABLE III: Key physiological features for modeling group dynamics

Feature	Description	Model
Facial EMG	Facial electromyography (EMG) from the corrugator supercilii and zygomaticus major muscles	Affect [47]
Electrodermal activity	Measurement of the electrical conductivity of the skin	Affect [47, 48]
Heart rate	Mean and median heart rate as measures of centrality and standard deviation in heart rate	Affect [48], Conflict [29]
EEG	EEG signals collected from 14 different electrode channels	Engagement [32]

III-4 Subjective features

In addition to the objective measures summarized above, some subjective measures have also been used for the modeling of group dynamics. These are typically obtained using pre- and post-experimental interviews and surveys to record the participants’ self-ratings and feedback.

Interview questions extracting cohesion-related information can be posed to the interactants and may extract information such as whether an interactant’s partner listened to them, whether the partner annoyed them, and whether they would participate in the activity again with their partner [49]. Moreover, additional information such as the level of friendship and familiarity may also be obtained from interviews to contextualize the assessment of cohesion between interactants [49]. Measures of specific aspects of cohesion such as relationship satisfaction and trust can also be obtained by using a sub-scale of the Subjective Value Inventory [50] to understand an interactant’s satisfaction of their relationship with their partner.

Subjective ratings of valence and arousal are also obtained from participants as ground truth labels of their affective states [47, 51]. Another commonly used method for extracting affective states is to elicit retrospective ratings of affect from the interactants themselves that are used to form a measure called group affective balance that can be used as a predictor of conflict resolution skills [51].

IV Discussion: considerations for social mediation

While there is a wide body of qualitative evidence reporting the study of group dynamics, thorough quantitative analyses of these phenomena still remain a significant challenge. This may be attributed to difficulties with formal knowledge representation and the lack of data that can capture the nuances of human-human interactions in sufficient detail. Evaluating individual human states is a significant challenge and remains an active area of research. Therefore, it is inevitable that the study of multiple interactants and their dynamics presents additional complexities, which will have to be addressed by future research.

Various aspects of auditory signals have been investigated in the study of social dynamics within human-human interactions, as shown in Table I. The use of audio cues is not surprising, given that intentional vocal activity is typically expected to occur in the presence of other interactants. However, it must be noted that a deeper analysis of verbal behaviors, beyond the existence of vocal activity and turn-taking behavior, is not always available. Some studies perform an analysis to identify verbal behaviors based on its intent or meaning into categories such as acknowledgments, answers, influence, information request, rephrase, and signal of non-understanding [22]. Such an approach provides valuable context about the nature of the interactions taking place and enable a thorough analysis of the group dynamics.

In comparison to audio features, it was found that the automated extraction of visual features was less common and instead, relied frequently on manual coding from human annotators [30, 49]. Moreover, deep learning-based approaches that work directly with image inputs are often used to process visual activity in group interactions [16, 52, 53], which also avoids the explicit extraction of visual features.

The use of physiological signals for internal state estimation in group interactions is limited in comparison to audio and visual modalities. Firstly, although many features have been tested, there is still no clear support for feature combinations that may be strongly correlated with affective state [44]. Additionally, individual differences in experiences of various affective states also complicates the use of a model utilizing physiological features [44]. However, unlike behavioral responses, some physiological variables are not under complete voluntary control [54], making the combination of physiological signals with other modalities valuable for internal human state estimation.

Social dynamics prevailing in group interactions are represented not only by the emergence of certain behaviors in individuals but also by the interplay of behaviors such as gaze, gestures, tone and volume of voice, etc. Measures capturing this interplay of behaviors are able to represent the interpersonal aspects of the interaction. These can then be used to demonstrate not only the correlation between different modalities of behaviors but also the impact of this correlation on the evolving group dynamics. Some models surveyed in Section III use measures that represent such correlations between the audio and the visual behaviors. Strohkorb et al. [7] used combined behaviors like “looking while listening” and “looking while talking” to represent social dominance in groups of children as they interact with a robot, while Hung et al. [26] used features such as “motion during overlapping speech” and “motion when not speaking” to capture the correlations between body movements and speech.

This emphasis on capturing the interplay of behaviors that can represent interpersonal aspects of a social interaction gives rise to the concept of relational affect, which is a dyadic construct that represents affective states that an individual experiences from their interactions with others. According to Slaby [55], relational affect “does not refer to individual feeling states but to affective interactions in relational scenes, either between two or more interactants, or between an agent and aspects of [her or his] environment”. Relational affect is a consequence of the interaction itself in the form of an interplay of gaze, gesture, tone of voice etc. [55], and it focuses on the observable expressions of affect between the interactants rather than the individual internal experiences of emotion.

Therefore, rather than viewing emotional expressions as indicators of internal state, these can instead be viewed as actions that can shape people’s relational orientation towards each other [56]. For example, expressions of anger or hostility can push people away, whereas expressions of joy or sadness can bring people together [57]. These findings reiterate the need to add a third dimension of interpersonal orientation to the conventional valence-arousal model of affect, and support the proposal made by Jung [57] to represent interpersonal behaviors along an axis ranging from affiliative behaviors (behaviors that turn people towards each other) to distancing behaviors (behaviors that turn people away from each other).

Group-level models of human understanding that incorporate relational affect may be well-positioned for HHRI research, where they can be utilized by a social mediator robot to act in accordance with the social dynamics that exist within the group, thereby producing interaction strategies that are more effective not only in achieving the group’s specific goals but also in building stronger relationships and improving overall connectedness between the group members.

V Conclusion

A variety of multimodal features have been used in prior research to represent various aspects of social dynamics in human-human interactions. While most models of group dynamics capture the emergence of certain behaviors within individuals, some also capture the correlation between different modalities of behavioral indicators to represent certain nuances of interactions occurring between the individual interactants. We emphasize the need for capturing the interplay of behaviors between interactants that can represent the interpersonal aspects of a social interaction. We also make a case for models of relational affect, which can represent affective states that an individual experiences from their interactions with others. Such models can inform the interaction strategies produced by social mediator robot that aim to positively influence social dynamics prevailing within a group.

References

[1] G. Odekerken-Schröder, C. Mele, T. Russo-Spena, D. Mahr, and A. Ruggiero, “Mitigating loneliness with companion robots in the covid-19 pandemic and beyond: an integrative framework and research agenda,” Journal of Service Management, vol. 31, no. 6, pp. 1149–1162, 2020.
[2] T. Belpaeme, J. Kennedy, A. Ramachandran, B. Scassellati, and F. Tanaka, “Social robots for education: A review,” Science robotics, vol. 3, no. 21, p. eaat5954, 2018.
[3] M. Niemelä and H. Melkas, “Robots as social and physical assistants in elderly care,” Human-centered digitalization and services, pp. 177–197, 2019.
[4] D. R. Forsyth, Group dynamics. Cengage Learning, 2018.
[5] K. Lewin, “Field theory in social science: selected theoretical papers (edited by dorwin cartwright.).” 1951.
[6] J.-E. Lee and M. Shin, “How group dynamics affect team achievements in virtual environments,” International Journal of Contents, vol. 10, no. 3, pp. 64–72, 2014.
[7] S. Strohkorb, I. Leite, N. Warren, and B. Scassellati, “Classification of children’s social dominance in group interactions with robots,” in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, 2015, pp. 227–234.
[8] A. Aiello, A. Tesi, F. Pratto, and A. Pierro, “Social dominance and interpersonal power: Asymmetrical relationships within hierarchy-enhancing and hierarchy-attenuating work environments,” Journal of Applied Social Psychology, vol. 48, no. 1, pp. 35–45, 2018.
[9] D. B. Jayagopi, H. Hung, C. Yeo, and D. Gatica-Perez, “Modeling dominance in group conversations using nonverbal activity cues,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 3, pp. 501–513, 2009.
[10] O. Aran and D. Gatica-Perez, “Fusing audio-visual nonverbal cues to detect dominant people in group conversations,” in 2010 20th International Conference on Pattern Recognition. IEEE, 2010, pp. 3687–3690.
[11] H. Tennent, S. Shen, and M. Jung, “Micbot: A peripheral robotic object to shape conversational dynamics and team performance,” in 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 2019, pp. 133–142.
[12] N. H. Frijda et al., The emotions. Cambridge University Press, 1986.
[13] J. Xu, J. Broekens, K. Hindriks, and M. A. Neerincx, “Mood contagion of robot body language in human robot interaction,” Autonomous Agents and Multi-Agent Systems, vol. 29, no. 6, pp. 1216–1248, 2015.
[14] J. A. Russell, “A circumplex model of affect.” Journal of personality and social psychology, vol. 39, no. 6, p. 1161, 1980.
[15] V. Vonikakis, Y. Yazici, V. D. Nguyen, and S. Winkler, “Group happiness assessment using geometric features and dataset balancing,” in Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016, pp. 479–486.
[16] G. Sharma, S. Ghosh, and A. Dhall, “Automatic group level affect and cohesion prediction in videos,” in 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW). IEEE, 2019, pp. 161–167.
[17] A. V. Carron, W. N. Widmeyer, and L. R. Brawley, “The development of an instrument to assess cohesion in sport teams: The group environment questionnaire,” Journal of Sport and Exercise psychology, vol. 7, no. 3, pp. 244–266, 1985.
[18] N. E. Friedkin, “Social cohesion,” Annu. Rev. Sociol., vol. 30, pp. 409–425, 2004.
[19] T. M. Loughead, K. Fransen, S. Van Puyenbroeck, M. D. Hoffmann, B. De Cuyper, N. Vanbeselaere, and F. Boen, “An examination of the relationship between athlete leadership and cohesion using social network analysis,” Journal of sports sciences, vol. 34, no. 21, pp. 2063–2073, 2016.
[20] L. Brawley, A. Carron, and W. Widmeyer, “The development of an instrument to assess cohesion in sport teams: The group environment questionnaire,” Journal of Sport Psychology, vol. 7, no. 3, pp. 244–266, 1985.
[21] D. Gatica-Perez, L. McCowan, D. Zhang, and S. Bengio, “Detecting group interest-level in meetings,” in Proceedings.(ICASSP’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., vol. 1. IEEE, 2005, pp. I–489.
[22] J. Cassell, A. Gill, and P. Tepper, “Coordination in conversation and rapport,” in Proceedings of the workshop on Embodied Language Processing, 2007, pp. 41–50.
[23] J. Gratch, A. Okhmatovskaia, F. Lamothe, S. Marsella, M. Morales, R. J. van der Werf, and L.-P. Morency, “Virtual rapport,” in Intelligent Virtual Agents: 6th International Conference, IVA 2006, Marina Del Rey, CA, USA, August 21-23, 2006. Proceedings 6. Springer, 2006, pp. 14–27.
[24] A. Madan, R. Caneel, and A. Pentland, “Voices of attraction,” Augmented Cognition, HCI, 2005.
[25] N. Campbell, “Multimodal processing of discourse information; the effect of synchrony,” in 2008 Second International Symposium on Universal Communication. IEEE, 2008, pp. 12–15.
[26] H. Hung and D. Gatica-Perez, “Estimating cohesion in small groups using audio-visual nonverbal behavior,” IEEE Transactions on Multimedia, vol. 12, no. 6, pp. 563–575, 2010.
[27] S. G. Barsade, “The ripple effect: Emotional contagion and its influence on group behavior,” Administrative science quarterly, vol. 47, no. 4, pp. 644–675, 2002.
[28] M. F. Jung, N. Martelaro, and P. J. Hinds, “Using robots to moderate team conflict: the case of repairing violations,” in Proceedings of the tenth annual ACM/IEEE international conference on human-robot interaction, 2015, pp. 229–236.
[29] J. Costa, M. F. Jung, M. Czerwinski, F. Guimbretière, T. Le, and T. Choudhury, “Regulating feelings during interpersonal conflicts by changing voice self-perception,” in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 2018, pp. 1–13.
[30] S. Shen, P. Slovak, and M. F. Jung, “”stop. i see a conflict happening.” a robot mediator for young children’s interpersonal conflict resolution,” in Proceedings of the 2018 ACM/IEEE international conference on human-robot interaction, 2018, pp. 69–77.
[31] C. L. Sidner, C. Lee, C. D. Kidd, N. Lesh, and C. Rich, “Explorations in engagement for humans and robots,” Artificial Intelligence, vol. 166, no. 1-2, pp. 140–164, 2005.
[32] J. Fan, L. C. Mion, L. Beuscher, A. Ullal, P. A. Newhouse, and N. Sarkar, “Sar-connect: a socially assistive robotic system to support activity and social engagement of older adults,” IEEE Transactions on Robotics, vol. 38, no. 2, pp. 1250–1269, 2021.
[33] H. Javed, R. Burns, M. Jeon, A. M. Howard, and C. H. Park, “A robotic framework to facilitate sensory experiences for children with autism spectrum disorder: A preliminary study,” ACM Transactions on Human-Robot Interaction (THRI), vol. 9, no. 1, pp. 1–26, 2019.
[34] S. M. Anzalone, S. Boucenna, S. Ivaldi, and M. Chetouani, “Evaluating the engagement with social robots,” International Journal of Social Robotics, vol. 7, pp. 465–478, 2015.
[35] M. F. Jung, J. J. Lee, N. DePalma, S. O. Adalgeirsson, P. J. Hinds, and C. Breazeal, “Engaging robots: easing complex human-robot teamwork using backchanneling,” in Proceedings of the 2013 conference on Computer supported cooperative work, 2013, pp. 1555–1566.
[36] H. W. Park, M. Gelsomini, J. J. Lee, and C. Breazeal, “Telling stories to robots: The effect of backchanneling on a child’s storytelling,” in Proceedings of the 2017 ACM/IEEE international conference on human-robot interaction, 2017, pp. 100–108.
[37] J. S. Smith, C. Chao, and A. L. Thomaz, “Real-time changes to social dynamics in human-robot turn-taking,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2015, pp. 3024–3029.
[38] G. Skantze, “Predicting and regulating participation equality in human-robot conversations: Effects of age and gender,” in 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI. IEEE, 2017, pp. 196–204.
[39] S. Gillet, R. Cumbal, A. Pereira, J. Lopes, O. Engwall, and I. Leite, “Robot gaze can mediate participation imbalance in groups with different skill levels,” in Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction, 2021, pp. 303–311.
[40] N. Morgan, E. Fosler-Lussier, and N. Mirghafori, “Speech recognition using on-line estimation of speaking rate.” in Eurospeech, vol. 97. Citeseer, 1997, pp. 2079–2082.
[41] P. Tarnowski, M. Kołodziej, A. Majkowski, and R. J. Rak, “Emotion recognition using facial expressions,” Procedia Computer Science, vol. 108, pp. 1175–1184, 2017.
[42] W. Mou, O. Celiktutan, and H. Gunes, “Group-level arousal and valence recognition in static images: Face, body and context,” in 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 5. IEEE, 2015, pp. 1–6.
[43] B. Rim, N.-J. Sung, S. Min, and M. Hong, “Deep learning in physiological signal data: A survey,” Sensors, vol. 20, no. 4, p. 969, 2020.
[44] L. Shu, J. Xie, M. Yang, Z. Li, Z. Li, D. Liao, X. Xu, and X. Yang, “A review of emotion recognition using physiological signals,” Sensors, vol. 18, no. 7, p. 2074, 2018.
[45] P. Karthikeyan, M. Murugappan, and S. Yaacob, “A review on stress inducement stimuli for assessing human stress using physiological signals,” in 2011 IEEE 7th International Colloquium on Signal Processing and its Applications. IEEE, 2011, pp. 420–425.
[46] E. Haapalainen, S. Kim, J. F. Forlizzi, and A. K. Dey, “Psycho-physiological measures for assessing cognitive load,” in Proceedings of the 12th ACM international conference on Ubiquitous computing, 2010, pp. 301–310.
[47] T. Sawabe, S. Honda, W. Sato, T. Ishikura, M. Kanbara, S. Yoshikawa, Y. Fujimoto, and H. Kato, “Robot touch with speech boosts positive emotions,” Scientific Reports, vol. 12, no. 1, pp. 1–8, 2022.
[48] M. Swangnetr and D. B. Kaber, “Emotional state classification in patient–robot interaction using wavelet analysis and statistics-based feature selection,” IEEE Transactions on Human-Machine Systems, vol. 43, no. 1, pp. 63–75, 2012.
[49] S. Strohkorb, E. Fukuto, N. Warren, C. Taylor, B. Berry, and B. Scassellati, “Improving human-human collaboration between children with a social robot,” in 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE, 2016, pp. 551–556.
[50] J. R. Curhan, H. A. Elfenbein, and H. Xu, “What do people value when they negotiate? mapping the domain of subjective value in negotiation.” Journal of personality and social psychology, vol. 91, no. 3, p. 493, 2006.
[51] M. F. Jung, “Coupling interactions and performance: Predicting team performance from thin slices of conflict,” ACM Transactions on Computer-Human Interaction (TOCHI), vol. 23, no. 3, pp. 1–32, 2016.
[52] L. Tan, K. Zhang, K. Wang, X. Zeng, X. Peng, and Y. Qiao, “Group emotion recognition with individual facial emotion cnns and global image based cnns,” in Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017, pp. 549–552.
[53] X. Guo, L. F. Polanía, and K. E. Barner, “Group-level emotion recognition using deep models on image scene, faces, and skeletons,” in Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017, pp. 603–608.
[54] A. A. Rasia-Filho, “Is there anything “autonomous” in the nervous system?” Advances in physiology education, vol. 30, no. 1, pp. 9–12, 2006.
[55] J. Slaby, “Relational affect: perspectives from philosophy and cultural studies,” in How to do things with affects. Brill, 2019, pp. 59–81.
[56] G. A. Van Kleef, C. K. De Dreu, and A. S. Manstead, “An interpersonal approach to emotion in social decision making: The emotions as social information model,” in Advances in experimental social psychology. Elsevier, 2010, vol. 42, pp. 45–96.
[57] M. F. Jung, “Affective grounding in human-robot interaction,” in 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI. IEEE, 2017, pp. 263–273.

Group Dynamics: Survey of Existing Multimodal Models and Considerations for Social Mediation ††thanks: 1Honda Research Institute USA, Inc. {hifza_javed, njamali}@honda-ri.com