Evaluating Trust in the Context of Conversational Information Systems for new users of the Internet

Anurag Aribandi [email protected] BITS Pilani, Hyderabad Campus , Divyanshu Agrawal [email protected] BITS Pilani, Hyderabad Campus and Dipanjan Chakraborty [email protected] BITS Pilani, Hyderabad Campus

Abstract.

Most online information sources are text-based and in Western Languages like English. However, many new and first time users of the Internet are in contexts with low English proficiency and are unable to access vital information online. Several researchers have focused on building conversational information systems over voice for this demographic, and also highlighted the importance of building trust towards the information source. In this work we develop four versions of a voice based chat-bot on the Google Assistant platform in which we vary the gender, friendliness and personalisation of the bot. We find that the users rank the female version of the bot with more personalisations over the others; however when rating the bots individually, the ratings depend on the ability of the bot to understand the users’ spoken query and respond accurately.

^†^†copyright: none

1. Introduction

While there are currently several online methods for users to access information on different topics like agriculture, health and employment, most information access methods tend to be text-based and in Western Languages like English. These interfaces, therefore, remain out of reach for users from contexts of low English proficiency. Researchers have worked on developing voice-based and text-free interfaces for this user demography over systems like Interactive Voice Response (IVR) systems (Koradia et al., 2013; Medhi et al., 2006; Chakraborty and Seth, 2015). With increasing penetration of smartphones, recent research has focused on building voice-based conversational interfaces. Consequently, there has been an influx of research on how to design and evaluate such interfaces. Researchers have highlighted the importance of trust in the information source when building interaction systems for vital information like agriculture (Dhaygude and Chakraborty, 2020).

This study is motivated by the goal to determine the factors most important to trusting and easily using conversational agents for information retrieval in contexts of limited English proficiency. Our aim is that this can hopefully serve as a reference or a blueprint for those aiming to bring text-free information retrieval tools to those semi-proficient with English and novice digital users, especially in fields where trust is an integral component. We present a chat-bot using Google Actions on the Google Assistant platform. The users can interact with the bot using voice. At the backend is a knowledge-based of agricultural facts. We developed 4 versions of the bot in which we vary the gender, the friendliness, and the personalisation to find the most accessible and user-friendly combination to users in such contexts. Through semi-structured interviews and observations we evaluate the different versions on the trust, ease of use and enjoyability. When comparing the bots, the users ranked the female version with more personalisations higher, however we observed that the ratings for the individual bots depended on the bots ability to accurately understand the users’ speech and ability to respond accurately to the query.

In the rest of the paper, Section 2 details the relevant literature studied regarding what contributes to trust in technology and the design and evaluation of conversational agents. Section 3 describes the chat-bot development and the user study we designed and conducted. Section 4 describes the findings and in Section 5 we discuss the implications of the research and potential future work.

2. Related Work

Several researchers have focused on different aspects of conversational interfaces. In this section, we briefly summarise some of the previous work in this domain, organized around three themes.

2.1. Text Free and Hybrid Interfaces

There are several studies(Medhi et al., 2006; Medhi et al., 2011) published on the use of text free and hybrid interfaces by differently-abled users and semi-literate or illiterate users. However, there are very few on the use of such interfaces in low literacy and low economic non-English settings. Medhi et al. (Medhi et al., 2011) posits that while graphical interfaces have the highest task completion for low-literate or novice users, those comfortable with speech find spoken interfaces the easiest to use. Hence they recommend adding both voice and graphical cues in interfaces for such users to leverage the best of both. In another work, Medhi et al. (Medhi et al., 2006) establishes a lack of faith and trust in technology as a reason for difficulty in adapting such interfaces in low-literacy environments. In this work we build a chatbot on a hybrid Google Assistant platform which supports both text and voice, although our users are not familiar with text based interactions.

2.2. Trust and Artificial Intelligence

There have been numerous studies regarding what influences trust in conversational agents and in artificial intelligence. Richiello (Richiello, 2018) shows that it is possible that chat-bots can be utilised for sensitive matters that require a minimum degree of trust. Medhi et al. (Medhi Thies et al., 2017) illustrated how the personality of the agent influences the interaction. It was found that when agents are friendly and witty, the interactions were rated more positively by subjects. Körber (Körber, 2018) and Følstad et al. (Følstad et al., 2018) touched upon quantitative factors such as correct grammar and consistent predictable responses as well as qualitative factors like human-likeness and honesty. The trust placed in these interactions have also found to be dependant on elements independent of the agent, like the environment of the agent (Følstad et al., 2018) and the user’s themselves (Müller et al., 2019). For example, the “brand” associated with the service provider of the agent may influence the trust placed in the agent based on the trust associated with the provider by the user which can then influence the level of risk to their privacy or security that the user associates with the agent. Trust is a significant factor for the acceptance and use of any conversational agent. This is especially true for semi-literate or illiterate users like farmers (Dhaygude and Chakraborty, 2020; Przegalinska et al., 2019; Medhi et al., 2006). If there is insufficient trust in the agent, it has been found to hinder its acceptance (Müller et al., 2019),(Przegalinska et al., 2019). In this work we evaluate 4 versions of a voice based chat-bot to evaluate the factors which influence trust, ease of use and enjoyability.

2.3. Conversational Agent Evaluation

There are also several studies which focus on the evaluation of such conversational agents. Kuligowska (Kuligowska, 2015) focused on evaluating chat-bots from a commercial and usability standpoint. For commercial use, the chat-bots ability to handle unexpected situations and its knowledge base were particularly important. Langevin et al. (Langevin et al., 2021) proposed a set of heuristics adapted from Nelsen’s usability heuristics. Radziwill and Benton (Radziwill and Benton, 2017) and Maniou and Veglis (Maniou and Veglis, 2020) detailed the attributes contributing to the general qualities a chat-bot has like functionality and humanity (or personalization). All the studies varied in how the agents were evaluated according to the purpose of the agent itself. For example, in (Prasad et al., 2019), the personification of the bot was imperative to its functioning as its primary function was expanding networks and opportunities. Each agent should be evaluated in a context sensitive way albeit with some general design principles kept in mind.

3. Methodology

In this section we describe the methodology in the design of the chat-bot and the design of the user study.

3.1. The Chat-Bot

We built a Google Actions based voice service running on the Google Assistant platform. Google Actions are voice-based services that run on the Google Assistant platform. Users talk with the virtual assistant in a conversational manner, and Google’s Natural Language Understanding (NLU) engine processes the speech to extract variables called intents from the speech. Intents are concrete categories of information that the user is looking for. The Google Assistant platform passes the users’ queries along with the intents to the backend through webhooks. At the backend we have a knowledge-base of facts on farming in Hindi hosted in a MongoDB database. We use a rudimentary Information Retrieval model on the knowledge-base to retrieve an appropriate response to the users’ queries, which is then passed back to the Google Assistant platform. The Google Assistant uses Google’s Text to Speech (TTS) engine to speak out the response to the user. Users can ask simple queries like “Tell me when rice should be sown” and the answer is narrated back to the user. A chat log is saved for later reference. From the perspective of the user, the entire interaction is conducted through voice in Hindi. Figure 1 depicts the end to end architecture of the app.

Refer to caption — Figure 1. Architecture diagram of the application

3.2. User Study

We developed four versions of the chat-bot, varying the genders, varying levels of personalisation, and presence or absence of a cartoon avatar to give a face to the bot. Medhi et al. (Medhi Thies et al., 2017) show that being ”chatty” and the ”energy” of the bot can influence how users perceive it. We designed a study in which each user interacted with each of the versions. The order of the versions presented to the users was randomised, but balanced across the study. The users were given two sets of questionnaires: one before the interaction (pre-testing), and one after (post-testing). We named the chat-bot versions to help the users identify them in the post interview. It must be noted that the versions varied only in the interactions, while the responses returned by the backend is the same for the same query in all the versions. The versions are described below and also summarised in Table 1.

•

Arun is the most bare-bones version of the design. This version is designed to be as to the point as possible in greetings and responses. The bot does not have any characteristics other than the voice. Arun has a male voice.
•

Suraj: mentions his name and greets the user at the beginning when it is prompted. Through Suraj we add a warm and friendly character to the bot. Suraj has a male voice.
•

Raju and Rani: have cartoon avatars lending faces to the bot. They also elaborately greet the users and address the users by their preferred names, adding personalisation to the interaction. Raju has a male voice and Rani has a female voice.

Table 1. Chat-bot variations

Factor	Arun	Suraj	Raju	Rani
Cartoon Avatar	N	N	Y	Y
Introduces itself by name	N	Y	Y	Y
Greeting	Minimal	Friendly	Friendly and Personalised	Friendly and Personalised
Voice	Male	Male	Male	Female

A screen-grab of a typical interaction with a version of the bot is shown in Figure 2

We designed two questionnaires for the users. One questionnaire was given before the users interacted with the bot. This was a general questionnaire for the purpose of capturing demographic information and the users’ history of using technology to access information. The second questionnaire was given after the users had interacted with the bot. After each interaction, the users were asked to rate the interaction on a 5 point likert scale on the trust the users placed on the bot version. After all the interactions, the users were asked to rank the versions in terms of trust and the ease and enjoyability of the interactions. The questions are primarily closed or short answer questions, with a few open ended and descriptive questions in the end regarding their general thoughts and opinions. The responses of the users to the questionnaires were recorded by the interviewer in a Google Form. The questionnaires, summarised in English, are described in Table 2. Along with the structured interviews, the interviewers also undertook unstructured observation of the interactions for analysis.

Table 2. General questionnaire

Compartment	Questions
Pre-Testing	What gender do you identify as?
	What is your highest academic degree obtained?(attained or pursuing)?
	Have you ever used or owned a smartphone?
	Have you ever used a search engine before? (if yes, what for?)
Post-Testing	[After interacting with each bot] Rate the trust you place on the chatbot between 1-5?
	Have you ever used a Voice assistant or chatbot before?
	Rank the bots in terms of how trustworthy they felt (with reasons)
	Rank the bots in terms of how easy and enjoyable they were to use/interact with (with reasons)
	Which bot were you more comfortable with, Raju or Rani?
	What features would you want to be added in such a bot?
	Did you feel the addition of an automated assistant improved the information retrieval process?
	What other domains do you think will benefit from the addition of such an assistant?
	What sources of information help you gather data in this domain?

We used a purposive sampling approach to build the userbase for the testing. The study was designed to be a between-subjects design with respect to the order in which the users interacted with the variations. Before the evaluations were conducted, each user was informed that there was no right or wrong way to interact with the bot and no right or wrong answer to the questions posed. The questionnaires were conducted in the form of a structured interview in the users’ mother tongue while the interviewer recorded their responses in a Google Form. The interviews and the interactions were conducted by two male undergraduate engineering students in Hindi in Chhattisgarh and Hyderabad. Each participant was compensated for their time and effort with \rupee50 paid in cash.

4. Findings

We conducted five user interviews for this experiment. The users were from different occupational backgrounds, as summarised in Table 3. All of the users interviewed had low text-literacy skills and limited to no experience with technology. None of them had used voice interfaces to interact with devices before, and none of them were familiar with the English language.

Table 3. Demographic description of the users.

User	Gender	Education	Experience with phones	Occupation
U1	Female	Class 8th	Using a touch-screen phone	Maid
U2	Female	Class 5th	Making calls, playing games, using Google	Maid
U3	Female	None	No prior experience	Maid
U4	Male	Class 8th	Making calls on a feature phone	Gardener
U5	Female	None	Making calls on a feature phone	Washerwoman

The objective of the interviews was to observe the users as they interacted with the chat bot as they tried to find information. Before asking them to use the chat bot, we showed them a demonstration of how to use Google Assistant and also used the voice bot to complete a few interactions. If the user had never used a smart-phone before, we also explained how to use a touch screen and microphone to them. Further, we gave them some background knowledge of the subject, agriculture, and the type of questions the bot is able to answer.

We presented the four versions of the bot to each user in a randomised but balanced order, and allowed them several tries to interact with the chat bot to find information they were looking for. All users took 2-3 tries before they were able to successfully complete an interaction, i.e. , using voice to search for crop information. The voice bot has a linear flow - after an initial introduction, the bot asks a user what he/she wants to know about, then, and then tries to narrow down the query to a specific crop and a property of that crop ( for example ’fertilizer needs’ ). We considered an interaction successful when a user was able to use the voice bot without being stuck or misunderstood and also find a satisfactory answer to the questions they asked. For example, getting stuck in a screen, getting a wrong or irrelevant answer to a question, information not existing in the database, incorrect voice recognition are all considered unsuccessful interactions. The most common causes of failures were the user not understanding how to proceed because of lack of familiarity with mobile phones and the information not existing in our database. However, after a few tries, all users became familiar with the interface, enough to be able to complete the interactions without getting stuck and needing assistance.

When asked to rank the different versions, of all the four versions presented to the user, as described in Section 3.2, Raju and Rani were liked the most by the users. The users noted that personalised greetings and the avatars made the chat bot feel more human and trustworthy. When asked between Raju and Rani, which version do the users prefer, all users preferred Rani, citing reasons that the female voice is more pleasant and is easier to understand.

When the users rated the interactions individually on a 5-point likert scale, we noted that the users valued the most the bot’s ability to accurately understand the users’ spoken input and provide accurate responses. When the bot was not able to understand the users’ queries, users always rated it low in trust, regardless of the version. We also observed that the users were visibly satisfied only when the bot understood the query correctly and provided an accurate response.

In general, we observed that users were able to interact with the chat-bot well, despite not having much experience with smartphones. U3, who had never used a cell phone before, was able to speak to the Google Assistant and get answers to her questions, although it took her a few tries before being able to complete an interaction. This indicates that voice is a natural interaction medium for humans, and that conversational interaction systems probably have a smooth learning curve for first time users of digital interfaces.

5. Discussion and Future Work

The study we presented is a pilot study for a bigger project and researchers should repeat the study with a larger sample size for more robust findings. In our observations, users ranked higher the versions of the bots which had more personalisation and character. The users also ranked the female version of the bot higher. However, we also observed during the interviews that one important factor that made a user trust the chat-bot was its ability to understand the users’ questions and provide the correct response. Further study needs to be conducted to understand how does a usability metric, like the bot’s ability to understand natural language, affect metrics like trust. As users may ask the same question in several ways, researchers need to focus on enhancing the natural language understanding of the bot, along with improving the retrieval models. For example, to ask about “how to use fertilizers”, users may ask “how to take care of the soil”.

6. Conclusion

We present experiments with 4 versions of a voice based chat-bot built on the Google Assistant platform and connected to a backend knowledge-base of agricultural facts. We conducted semi-structured interactions for the bot with five users from low English text-literacy contexts. The interactions were prefaced and followed up with interviews to evaluate the users’ perception of trust, ease of use and enjoyability among the different versions. We discovered that people with little to no experience with using technology were able to use the search application with their voice. People ranked the versions which had a female avatar and was more personalised higher than the others. We also observed that the users rated the versions higher when the NLU engine was able to accurately understand the users’ query and provide a prompt response. This highlights that research must focus on improving NLU and Information Retrieval models in Indian languages, along with improving the quality of interaction of the bot.

References

(1)
Chakraborty and Seth (2015) Dipanjan Chakraborty and Aaditeshwar Seth. 2015. Building Citizen Engagement into the Implementation of Welfare Schemes in Rural India. In Proceedings of the Seventh International Conference on Information and Communication Technologies and Development (Singapore, Singapore) (ICTD ’15). ACM, New York, NY, USA, Article 22, 10 pages. https://doi.org/10.1145/2737856.2738027
Dhaygude and Chakraborty (2020) Mrunal Dhaygude and Dipanjan Chakraborty. 2020. Rethinking Design of Digital Platforms for Emergent Users: Findings from a Study with Rural Indian Farmers. In IndiaHCI ’20: Proceedings of the 11th Indian Conference on Human-Computer Interaction (Online, India) (IndiaHCI 2020). Association for Computing Machinery, New York, NY, USA, 62–69. https://doi.org/10.1145/3429290.3429297
Følstad et al. (2018) Asbjørn Følstad, Cecilie Bertinussen Nordheim, and Cato Alexander Bjørkli. 2018. What Makes Users Trust a Chatbot for Customer Service? An Exploratory Interview Study. In Internet Science, Svetlana S. Bodrunova (Ed.). Springer International Publishing, Cham, 194–208. https://link.springer.com/chapter/10.1007/978-3-030-01437-7_16
Koradia et al. (2013) Zahir Koradia, Piyush Aggarwal, Aaditeshwar Seth, and Gaurav Luthra. 2013. Gurgaon Idol: A Singing Competition over Community Radio and IVRS. In Proceedings of the 3rd ACM Symposium on Computing for Development (Bangalore, India) (ACM DEV ’13). ACM, New York, NY, USA, Article 6, 10 pages. https://doi.org/10.1145/2442882.2442890
Körber (2018) Moritz Körber. 2018. Theoretical considerations and development of a questionnaire to measure trust in automation. In Congress of the International Ergonomics Association. Springer, 13–30. https://link.springer.com/chapter/10.1007/978-3-319-96074-6_2
Kuligowska (2015) Karolina Kuligowska. 2015. Commercial chatbot: performance evaluation, usability metrics and quality standards of embodied conversational agents. Professionals Center for Business Research 2 (2015). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2569637
Langevin et al. (2021) Raina Langevin, Ross J Lordon, Thi Avrahami, Benjamin R. Cowan, Tad Hirsch, and Gary Hsieh. 2021. Heuristic Evaluation of Conversational Agents. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 632. https://doi.org/10.1145/3411764.3445312
Maniou and Veglis (2020) Theodora A. Maniou and Andreas Veglis. 2020. Employing a Chatbot for News Dissemination during Crisis: Design, Implementation and Evaluation. Future Internet 12, 7 (2020). https://doi.org/10.3390/fi12070109
Medhi et al. (2011) Indrani Medhi, Somani Patnaik, Emma Brunskill, S.N. Nagasena Gautama, William Thies, and Kentaro Toyama. 2011. Designing Mobile Interfaces for Novice and Low-Literacy Users. 18, 1, Article 2 (May 2011). https://doi.org/10.1145/1959022.1959024
Medhi et al. (2006) Indrani Medhi, Aman Sagar, and Kentaro Toyama. 2006. Text-Free User Interfaces for Illiterate and Semi-Literate Users. In 2006 International Conference on Information and Communication Technologies and Development. 72–82. https://doi.org/10.1109/ICTD.2006.301841
Medhi Thies et al. (2017) Indrani Medhi Thies, Nandita Menon, Sneha Magapu, Manisha Subramony, and Jacki O’Neill. 2017. How Do You Want Your Chatbot? An Exploratory Wizard-of-Oz Study with Young, Urban Indians. In Human-Computer Interaction - INTERACT 2017, Regina Bernhaupt, Girish Dalvi, Anirudha Joshi, Devanuj K. Balkrishan, Jacki O’Neill, and Marco Winckler (Eds.). Springer International Publishing, Cham, 441–459. https://link.springer.com/chapter/10.1007/978-3-319-67744-6_28
Müller et al. (2019) Lea Müller, Jens Mattke, Christian Maier, Tim Weitzel, and Heinrich Graser. 2019. Chatbot Acceptance: A Latent Profile Analysis on Individuals’ Trust in Conversational Agents (SIGMIS-CPR ’19). Association for Computing Machinery, New York, NY, USA, 35–42. https://doi.org/10.1145/3322385.3322392
Prasad et al. (2019) Archana Prasad, Sean Blagsvedt, Tej Pochiraju, and Indrani Medhi Thies. 2019. Dara: A Chatbot to Help Indian Artists and Designers Discover International Opportunities. In Proceedings of the 2019 on Creativity and Cognition (San Diego, CA, USA) (C&C ’19). Association for Computing Machinery, New York, NY, USA, 626–632. https://doi.org/10.1145/3325480.3326577
Przegalinska et al. (2019) Aleksandra Przegalinska, Leon Ciechanowski, Anna Stroz, Peter Gloor, and Grzegorz Mazurek. 2019. In bot we trust: A new methodology of chatbot performance measures. Business Horizons 62, 6 (2019), 785–797. https://doi.org/10.1016/j.bushor.2019.08.005 Digital Transformation & Disruption.
Radziwill and Benton (2017) Nicole M. Radziwill and Morgan C. Benton. 2017. Evaluating Quality of Chatbots and Intelligent Conversational Agents. arXiv:1704.04579 [cs.CY] https://arxiv.org/abs/1704.04579
Richiello (2018) Isabella Richiello. 2018. Women’s experience of a sexual and reproductive health chatbot. Master’s thesis. KTH, Media Technology and Interaction Design, MID. https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1230334&dswid=-5369