Accessible Nonverbal Cues to Support Conversations in VR for Blind and Low Vision People

Crescentia Jung [email protected] Cornell University2 W Loop RdNew YorkNYUSA10044 , Jazmin Collins [email protected] Cornell University2 W Loop RdNew YorkNYUSA10044 , Ricardo E. Gonzalez Penuela [email protected] Cornell University2 W Loop RdNew YorkNYUSA10044 , Jonathan Isaac Segal [email protected] Cornell UniversityIthacaNYUSA14850 , Andrea Stevenson Won [email protected] Cornell UniversityIthacaNYUSA14850 and Shiri Azenkot [email protected] Cornell Tech2 W Loop RdNew YorkNYUSA10044

(2024)

Abstract.

Social VR has increased in popularity due to its affordances for rich, embodied, and nonverbal communication. However, nonverbal communication remains inaccessible for blind and low vision people in social VR. We designed accessible cues with audio and haptics to represent three nonverbal behaviors: eye contact, head shaking, and head nodding. We evaluated these cues in real-time conversation tasks where 16 blind and low vision participants conversed with two other users in VR. We found that the cues were effective in supporting conversations in VR. Participants had statistically significantly higher scores for accuracy and confidence in detecting attention during conversations with the cues than without. We also found that participants had a range of preferences and uses for the cues, such as learning social norms. We present design implications for handling additional cues in the future, such as the challenges of incorporating AI. Through this work, we take a step towards making interpersonal embodied interactions in VR fully accessible for blind and low vision people.

blind, low vision, VR, accessibility

^†^†journalyear: 2024^†^†copyright: acmlicensed^†^†conference: The 26th International ACM SIGACCESS Conference on Computers and Accessibility; October 27–30, 2024; St. John’s, NL, Canada^†^†booktitle: The 26th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS ’24), October 27–30, 2024, St. John’s, NL, Canada^†^†doi: 10.1145/3663548.3675663^†^†isbn: 979-8-4007-0677-6/24/10^†^†conference: Make sure to enter the correct conference title from your rights confirmation email; October 27–30, 2024; St. John’s, NL, Canada^†^†copyright: none^†^†ccs: Human-centered computing Accessibility

1. Introduction

Social virtual reality (VR) is growing in popularity but remains inaccessible to blind and low vision (BLV) people. Social VR has thousands of daily users that participate in a variety of social activities from informal parties to remote business meetings. In addition, the embodied nature of social VR allows people to communicate with nonverbal behaviors such as gestures and facial expressions. However, these behaviors are generally rendered visually, and this limits BLV people’s access to social information, excluding them from full participation.

Researchers have explored how to make social VR more accessible for BLV people (Zhang et al., 2022; Collins et al., 2023b; Gualano et al., 2023; Zhao et al., 2018; Ji et al., 2022). For example, Zhang et al. explored avatar diversity and the importance of self-presentation of people with disabilities in social VR (Zhang et al., 2022). Additionally, Collins and Jung et al. studied the use of a sighted guide as an accessibility tool in social VR (Collins et al., 2023b).

However, there has been little exploration of making nonverbal behaviors accessible in social VR. Some have investigated what kinds of behaviors BLV people wish to be made accessible in VR, as well as initial guidelines for the best ways of making certain behaviors accessible. For example, Wieland et al. (Wieland et al., 2022) conducted a user study to identify which nonverbal behaviors BLV people used in conversations with sighted partners and should be carried over into social VR, such as facial expressions, head movements, and gaze. Similarly, Collins and Jung et al. (Collins et al., 2023a) worked with a BLV person to identify best practices for representing gaze in VR. Ji et al.’s (Ji et al., 2022) “VRBubble” used audio beacons to indicate proxemic information about avatars to improve BLV people’s awareness of others.

Prior works such as those by Collins and Jung et al. (Collins et al., 2023a) and Ji et al. (Ji et al., 2022) have designed accessibility features for one type of nonverbal behavior at a time. The participants evaluating these systems also experienced only limited, pre-recorded conversations with agent-avatars, instead of actual people. This is understandable, considering the complexity and the immense range of the nonverbal behaviors real people use in conversation. Trying to design representations of multiple behaviors at a time is challenging, and may involve various problems such as determining which behaviors to represent, the best ways to represent them, and ensuring all representations can be used at once without becoming overwhelming. However, only having one accessible behavior during pre-recorded interactions does not represent real conversations, or provide enough nonverbal information to fully support conversation. BLV users need access to multiple nonverbal behaviors to support conversations with others in VR.

To address this need, we sought to design multiple accessible nonverbal cues that could be used simultaneously in social VR for BLV people. Specifically, we aim to address the following research question: What set of accessible nonverbal cues can support BLV users in conversations with others in VR? We use the word “cue” here and in the remainder of our paper to refer to a representation of a nonverbal behavior, while the word “behavior” refers to the nonverbal behavior itself.

We conducted a user-centered design process to design accessible audio and haptic cue representations of nonverbal behaviors. We iterated on five cue designs with six BLV participants. At the end of the process, we found that five cues were overwhelming for participants, and narrowed our focus to three final cues: eye contact, head nodding, and head shaking.

We then evaluated this set of three cues with 16 participants in social VR conversation tasks. Each participant completed two conversation tasks: one with cues and one without. We found that our nonverbal cues significantly improved participants’ accuracy in detecting certain social behaviors about their conversation partners without distracting them from the conversation. Participants also reported feeling more confident about “reading the room”, especially knowing how much attention they were receiving. Our discussions with participants yielded design implications for how nonverbal behaviors might be represented accessibly in the future, including more behaviors participants wanted to access, and potential ways of managing larger sets of accessible cues for these behaviors.

Our contributions include the design and evaluation of three accessible nonverbal cues in VR and novel design implications for sets of accessible nonverbal cues.

2. Related Work

2.1. Accessible VR for BLV People

VR experiences tend to be over-reliant on visual information, making them inaccessible for BLV people. To address this, researchers have explored how to make virtual environments and the content within them l perceivable to BLV users. Much of this work has focused on providing environmental information via haptics or audio to enhance BLV users’ individual experiences in VR, but far less has focused on providing social information to enhance social experiences with other people.

Many researchers have investigated how to provide information about a VR environment’s layout to support BLV people’s navigation (Guerreiro et al., 2023; Andrade et al., 2018; Hao et al., 2022; Nair et al., 2021; Siu et al., 2020; Zhang et al., 2020; Zhao et al., 2018; Gonçalves et al., 2023). For instance, Andrade et al. (Andrade et al., 2018) designed EchoHouse, a virtual environment where users navigate with “echolocation”, using real-time audio cues emitted from objects in the space. More recently, Collins and Jung et al. (Collins et al., 2023b) proposed supporting navigation through sighted guides. In their work, they implemented a system that allowed a BLV user to pair up with a sighted human guide to do visual interpretation of and navigate any virtual space together.

Besides navigation, researchers have explored ways of conveying details about objects in virtual environments through audio and haptics (Kornbrot et al., 2007; Nikolakis et al., 2004; Sinclair et al., 2019; Zhao et al., 2019; Gonzalez Penuela et al., 2022). Zhao et al. (Zhao et al., 2019) created a developer toolkit called SeeingVR, which offers 14 different visibility enhancements to make VR experiences accessible for users with low vision (e.g., magnification, highlighting salient objects, font size adjustment, and others). Researchers have also developed approaches using haptics and gestures for blind users, such as Penuela et al.’s (Gonzalez Penuela et al., 2022) haptic gloves allow BLV users to perceive the shape of objects through force feedback and elicit descriptions about objects in the environment through gestures.

Some researchers have also created accessible VR games that use nonvisual modalities to help BLV users complete complex objectives in virtual environments (Bailenson et al., 2004; Gluck et al., 2021; Gluck and Brinkley, 2020; Lumbreras and Sánchez, 1999; Morelli et al., 2010; Wedoff et al., 2019; Nair et al., 2022). For example, Wedoff et al. (Wedoff et al., 2019) designed Virtual Showdown, a VR audio game that provides spatialized audio cues to help players find, move towards, and hit a ball back to an opponent.

While the above efforts have taken an important step towards making individual VR experiences accessible for BLV users, they do not account for the social aspects of environments where multiple users are present, like social VR applications. These applications are often designed so users interact with each other to complete shared objectives or bond with each other in conversations. In such cases, communicating social information about other people is more important to support BLV users’ abilities to participate in the virtual space. Without access to social information about other people, BLV users may remain isolated in social VR spaces, even with accessibility enhancements for individual experiences. Our work seeks to address this gap by supporting BLV people’s access to a key aspect of communication in social VR: nonverbal behaviors.

2.2. Nonverbal Behaviors in Social VR

Nonverbal behaviors are an important aspect of social VR and have been explored extensively in prior work. However, this work has largely focused on understanding how they are used, rather than how to make them accessible.

Researchers have explored which nonverbal behaviors are most commonly used in social VR (Maloney et al., 2020; Tanenbaum et al., 2020). Maloney et al. (Maloney et al., 2020) observed people’s use of nonverbal behaviors on social VR applications and conducted an interview study of participants’ perceptions of nonverbal communication in social VR. The authors uncovered several types of nonverbal communication methods used in social VR, including applauding, dancing, and even flying or using emojis. Among the most common across platforms were hand gestures and head movements, like waving or nodding. Similarly, Tanenbaum et al. (Tanenbaum et al., 2020) explored ten popular social VR platforms and took inventory of their existing designs for expressing nonverbal behaviors. While the quality of each platform’s support of these behaviors varied, they found that most social VR platforms included the ability to express behaviors such as proxemics, facial expressions, and gestures.

Other researchers have explored particular types of nonverbal behaviors, including gestures, nodding, and eye gaze, and how they affect perceptions of conversation partners in VR (Ide et al., 2020; Aburumman et al., 2022; Kurzweg et al., 2021; Kevin et al., 2018; Llobera et al., 2010; Yee et al., 2007). For instance, Aburumman et al. (Aburumman et al., 2022) conducted VR studies with 21 participants, having participants experience two types of nodding behavior in a conversation with agent-avatars. They found that an agent-avatar that nodded while participants spoke resulted in higher levels of trust felt towards the agent-avatar. Ide et al. (Ide et al., 2020) observed the effect of symbolic gestures to invoke gestures on virtual avatars (e.g., thumbs-up emoji to do a thumbs-up, surprised emoji to make a surprised face) on brainstorming tasks. They found that symbolic gestures helped participants express their emotions and supported social communication. Similarly, Kurzweg et al. (Kurzweg et al., 2021) explored the effect of body language (e.g., crossing arms, drinking) on communication in VR and found that body language can indicate willingness to communicate and attentiveness towards others in a virtual environment.

These investigations demonstrate the importance of nonverbal behaviors for communication in VR. Despite this, there has been little exploration of making these behaviors accessible to support BLV communication needs (Segal et al., 2024; Wieland et al., 2022; Collins et al., 2023a; Ji et al., 2022; Wieland et al., 2023). One of the most notable efforts is Wieland et al.’s (Wieland et al., 2022) work on identifying important nonverbal behaviors in social VR. Wieland et al. conducted interview studies with eleven participants, seven BLV and four sighted companions of BLV people, to identify which nonverbal behaviors they used most often in conversation, and which should be carried over into VR for effective communication. They found that gaze, head direction, head movements, and facial expressions were all important to identify, though they could be difficult to notice for BLV participants.

Other researchers have developed design guidelines for making certain nonverbal behaviors accessible. Collins and Jung et al. (Collins et al., 2023a) probed the design space of accessible gaze feedback in VR with a blind co-designer. The co-designer experienced and adjusted feedback for two kinds of gaze (mutual and resting gaze) through 5 parameters (e.g., Modality, Strength, Duration, etc.) to create their preferred gaze feedback in a VR prototype. Their study concluded with recommendations to send haptic vibrations to a blind user via handheld controllers when someone was looking at them in VR. Ji et al. (Ji et al., 2022) explored avatar proxemics in VR. They designed a system where audio beacons indicated proxemic information and tested it in user studies with 12 BLV participants. The system improved BLV people’s awareness of others, providing concrete design recommendations for representing proxemics in social VR.

Both of these works have introduced initial approaches to making nonverbal behaviors accessible to BLV people and are an important step in supporting communication. However, they each only focus on one type of nonverbal behavior at a time, namely eye gaze and avatar proxemics. People use multiple nonverbal behaviors (e.g., eye gaze, head movements, and gestures) when engaging in a social space. As stated by Wieland et al. (Wieland et al., 2022, 2023), BLV people want access to multiple nonverbal behaviors that are used in conversation. In order to effectively support BLV people’s conversation needs, multiple nonverbal behaviors should be made accessible at once. In addition, these works conducted their user evaluations with pre-recorded agent-avatars in scripted conversations, rather than supporting real conversations with unpredictable human conversation partners. These limitations represent a significant gap in current research efforts to support communication in VR for BLV people. Our work seeks to address this gap by implementing multiple nonverbal cues in conversations in VR to evaluate how these accessible representations of nonverbal behaviors support real conversations with others.

3. Designing Accessible Nonverbal Cues

Our goal was to design a set of robust accessible nonverbal cues that could support conversations in VR. To achieve this, we conducted an iterative user-centered design process to convey a set of nonverbal behaviors via accessible nonverbal cues. We worked in a mixed-ability team with professionals experienced in designing accessible technology. We also conducted a series of early informal design sessions with BLV co-designers. Throughout the design process, we met weekly to discuss and test prototypes.

To guide our design process, we developed four design considerations based on: (1) discussions among our mixed-ability team, (2) informal design sessions with BLV co-designers, and (3) past work in designing accessible representations of nonverbal cues in VR (Collins et al., 2023a; Ji et al., 2022).

•

Cues should leverage nonvisual modalities supported by VR technology. We should consider the nonvisual capabilities of commercial VR technologies such as haptic vibrations, audio patterns, and spatialized audio.
•

Audio cues should represent the emotions behind each nonverbal behavior. Leveraging the versatility of audio, audio cues should evoke emotions that match the behaviors they are representing, like positive emotions for a smile.
•

Cues should be unobtrusive during a conversation. Cues should not be too loud, long, or distracting to the point where they interrupt or make conversation difficult for users.
•

Cues should be understandable when multiple cues are being used simultaneously. Since people use multiple nonverbal behaviors simultaneously in conversation, the cues should be playable simultaneously, without overwhelming the user.

After finalizing these design considerations, we needed to establish which nonverbal behaviors to make accessible. We first considered the broad scope of possible nonverbal behaviors identified by prior work and narrowed them down to a set that is useful in conversations for BLV people. Wieland et al. (Wieland et al., 2022, 2023) found that eye gaze, head direction, head movements, and facial expressions were important for BLV people during conversations. When considering different forms of eye gaze, Collins and Jung et al. (Collins et al., 2023a) and Wieland et al. (Wieland et al., 2022, 2023) found that eye contact information was the most preferred and important for BLV people. Maloney et al. (Maloney et al., 2020) noted that some of the most commonly used nonverbal behaviors in VR included nodding and head shaking. Finally, existing research on facial expressions lists smiling and frowning as two of the most common expressions people make (Hung et al., 1996). Considering these works, we narrowed our focus to five nonverbal behaviors to design as accessible cues: eye contact, head nodding, head shaking, smiling, and frowning.

3.1. Design Process

To create our initial designs, we examined existing sound libraries used to indicate information. After sorting through a variety of libraries (e.g., Facebook’s emoji sound effects (Crisan, 2021), vocal bursts such as laughs and sighs (Parsons et al., 2014)) we created our initial designs of cues for the five nonverbal behaviors using Paquette et al.’s database of musical emotional bursts. Each burst in this dataset was a brief music clip that Paquette et al.’s participants associated with specific emotions (Paquette et al., 2013).

We sought to iterate on the initial designs of our cues with BLV users to ensure they would be usable and understandable. To do this, we conducted studies with six BLV participants. Each participant took part in a single in-person 90-minute session where they experienced the most recent iteration of our cue designs. We wanted to demonstrate our cue designs to participants in various scenarios, both in the physical world and in VR, to give them a good idea of what having conversations with these cues would be like. Thus, each study session contained four parts: (1) an introduction to the nonverbal cues in the physical world, (2) a one-on-one conversation augmented by the cues in the physical world, (3) a one-on-one conversation augmented by the cues in VR, and (4) a three-person group conversation in VR.

We represented the cues for participants in different ways in the non-VR and VR settings. In our non-VR introduction to the cues, we played each of the cues (eye gaze, head nodding, head shaking, smiling, or frowning) one by one from a laptop to share the cue audio while participants held VR controllers to receive haptic feedback. In the non-VR conversation, participants again held controllers to receive haptic feedback, but the researcher who was speaking to them played the audio cues from a phone in their hand, so that participants would hear the audio coming from the direction of the person they were speaking with. Finally, for the VR settings, we developed a VR prototype where researchers and the participant would enter a multi-user virtual environment together. Within this environment, the researchers could manually trigger nonverbal cues when they performed certain nonverbal behaviors to augment conversations. Participants used a VR headset and controllers to enter the scene and heard audio cues spatialized to the researchers’ avatars while feeling haptic feedback from cues in their controllers.

We wanted to improve our cue designs with participants and allow them to try any suggestions they had for the cues during their study session. To do this, we asked participants for their perceptions of each cue after the conversations, including emotion invoked, level of distraction, and suggestions for improvements. After discussing the initial cue designs, we worked with participants to develop new iterations of the cues based on their critiques. Participants could request changes such as the length of the haptic cue, the volume of the audio, and the type of audio file being played (e.g., a musical note, a TV show sound effect, a recorded laugh, etc.). Participants could also create their own audio cues from scratch by directing the research team to create new sound effects. After making changes, we would demonstrate the new cues for participants, prompt them for feedback, and continue iterating. All of the changes we made to the cues were implemented via a laptop running Unity and the audio mixing software Garageband. Participants experienced the new cue iterations by listening to audio played from the laptop and holding VR controllers to receive haptic feedback. At the end of the session, we had participants comment on which iterations of the cues they preferred the most. We reached consensus for final designs when three participants in a row preferred the current versions of the cue designs without requesting modifications.

Following these sessions, we developed three additional design considerations from participants’ feedback:

•

Accessible audio cues should be short for usability and accuracy. Shorter lengths for the accessible nonverbal cues were preferred. If cues are too long, they become disruptive to the conversation (e.g., a two-second clip representing head shaking drowned out conversation and tended to last over a second after head movements stopped).
•

Familiar sound effects are preferable for inciting emotions. Sound effects that imitated sounds from media sources, like television game shows, were easier to associate with specific emotions than musical patterns.
•

Haptics should be used to represent frequent nonverbal cues. Since audio repeating continuously would disrupt conversation, cues such as eye contact that occur frequently should be represented by less-disruptive haptic vibrations.

3.2. Final Designs

After receiving consistent feedback from participants that cues were easy to understand and unobtrusive to conversation, we stopped iterating on our designs. The final designs were as follows ¹¹1All tracks for audio cues are provided here: https://soundcloud.com/shadowdios/sets/nvc-study-sound-effects:

•

Eye Contact. A continuous haptic buzz that lasted for the duration of the eye contact.
•

Head Nod. A succession of one high-pitched note followed by a slightly lower note from a xylophone, repeating twice and lasting 1 second.
•

Head Shake. A succession of one low-pitched note followed by a slightly lower note from a flute, repeating twice and lasting under 1 second.
•

Smile. A high-tone chime-like sound effect, formed of a series of cheerful ringing notes lasting 2 seconds.
•

Frown. A low-tone trumpet sound effect lasting for 1 second.

An overarching takeaway we found with these cues was that participants found a set of five cues was overwhelming to learn and immediately use in conversation. Thus, we decided to explore a smaller set of nonverbal cues, selected from these five. We selected eye gaze for this subset since our participants responded the most positively to eye contact and its non-intrusive, haptic form. We also chose the two head movement cues–head nodding and head shaking–since they are more commonly used in mainstream VR than facial expressions due lack of integration with face-tracking technology.

3.3. Discussion and Implications

Our goal for the design process was to iteratively design accessible versions of nonverbal cues. We had not yet tested how effective these designs were at conveying social information about a conversation partner. One type of social information that many of our participants were particularly interested in was attention, specifically, in “reading the room” to know if their partner was paying attention to them. However, as we had focused primarily on determining how disruptive or intuitive the cue designs were, we had not examined the cues’ abilities to convey information like this. Further work was required to specifically evaluate how well cues can convey a conversation partner’s level of attention.

We had also only tested the cues with a small set of co-designers who were used to experimenting with and designing new accessible technologies. While their feedback was useful for designing effective nonverbal cues, it did not represent how easy these cues would be to learn and utilize for other BLV users. To truly understand how useful our cues would be, we needed to evaluate them with a larger and more diverse group of BLV participants.

4. Evaluating the Cues in Conversations

We aimed to evaluate the cues’ effectiveness in real-time conversations in VR. To do this, we needed to consider what it meant for nonverbal behaviors to support conversations. Nonverbal behaviors convey a wide range of social information, from emotional reactions to level of agreement. For example, someone may look at you to indicate that they are listening or nod their head in agreement with what you are saying. Our previous design process had established our participants’ interest in determining the attention levels of their partners. Since our set of three nonverbal cues, eye contact, head nodding, and head shaking, also commonly represent attention (Lawson, 2015; Sprabary, 2022; Minds, nd; Centre, nd), we selected attention as our focus for the evaluation. This would allow us to see if our cues could support our prior participants’ desired conversational needs and whether our cues could convey important social information to support conversations.

We designed our evaluation with certain key questions in mind:

•

How accurate are participants at detecting attention with and without accessible nonverbal cues?
•

How confident do participants feel about detecting attention with and without accessible nonverbal cues?

Refer to caption — Table 1. Participant demographics. F=Female; M=Male; All vision measures (acuity, visual field, light perception) were self-reported by participants.

P#	Age	Gender	Vision	Visual Acuity	Visual Field	Light Perception
P1	37	F	Low vision
Blind	R: 20/320
L: 20/160	R: no peripheral
L: wider visual field	Yes
P2	51	F	Low vision	R: 20/700
L: 20/1200	R: limited peripheral
L: ok	Yes
P3	67	M	Low vision	R: Unsure
L: 20/200	R: Unsure
L: full	Yes, left eye
P4	27	M	Blind	Unsure	Unsure	Yes
P5	21	M	Low vision	Unsure	None	Yes, right eye
P6	18	F	Blind	R: blurry,
2-3 inches
L: No peripheral,
23/200	Unsure	Yes
P7	50	M	Blind	None	None	None
P8	36	M	Low vision
Blind	Unsure	Lacks central, left eye is slightly better	Limited on the peripheral
P9	45	M	Blind	None	None	None
P10	74	F	Blind	None	None	None
P11	26	M	Low vision
Blind	R: 20/300
L: 20/400	Slight central in left eye, 180 degree visual fields	Yes
P12	66	M	R: Low vision
L: Blind	Unsure	Unsure	Yes
P13	34	M	Low vision	Unsure	Unsure	Yes
P14	45	F	Blind	None	None	None
P15	55	M	Low vision	R: Unsure
L: Unsure	Unsure	Yes, left eye
P16	43	M	Blind	None	None	None

Set	Debate Topic	Segment 1 Attention Condition	Segment 2 Attention Condition
A	if a hotdog is a sandwich	Nobody pays attention	Only Researcher A pays attention
B	morning or evening showers are better	Everyone pays attention	Only Researcher B pays attention
C	if pineapple belongs on pizza	Only Researcher B pays attention	Only Researcher A pays attention
D	if sandwiches should have crust or no crust	Everyone pays attention	Nobody pays attention

Accessible Nonverbal Cues to Support Conversations in VR for Blind and Low Vision People

Abstract.

1. Introduction

2. Related Work

2.1. Accessible VR for BLV People

2.2. Nonverbal Behaviors in Social VR

3. Designing Accessible Nonverbal Cues

3.1. Design Process

3.2. Final Designs

3.3. Discussion and Implications

4. Evaluating the Cues in Conversations

4.1. Methods

4.1.1. VR Prototype.

4.1.2. VR Tutorial.

4.1.3. Practice Task.

4.1.4. Counterbalanced Baseline and Treatment Tasks.

4.1.5. Semi-Structured Interview.

4.1.6. Data and Qualitative Analysis

4.1.7. Statistical Analysis.

4.2. Findings

4.2.1. Task Performance and Statistical Tests.

4.2.2. Participants’ Reactions to the Cues

4.2.3. Uses of Cues

4.2.4. Design Suggestions

5. Discussion

6. Conclusion

References