Alexa Depression and Anxiety Self-tests: A Preliminary Analysis of User Experience and Trust
Abstract.
Mental health resources available via websites and mobile apps provide support such as advice, journaling, and elements from cognitive behavioral therapy. The proliferation of spoken conversational agents, such as Alexa, Siri, and Google Home, has led to an increasing interest in developing mental health apps for these devices. We present the pilot study outcomes of an Alexa Skill that allows users to conduct depression and anxiety self-tests. Ten participants were given access to the Alexa Skill for two-weeks, followed by an online evaluation of the Skill’s usability and trust. Our preliminary evaluation suggests that participants trusted the Skill and scored the usability and user experience as average. Usage of the Skill was low, with most participants using the Skill only once. In view of work-in-progress, we also present a discussion of implementation and study design challenges to guide the current literature on designing spoken conversational agents for mental health applications.
1. Introduction
Mental health problems are a growing global challenge affecting people of all different backgrounds, ages, and socioeconomic status (Woodward et al., 2019). To tackle this challenge, mental health resources are increasingly available online and via mobile apps (Ly et al., 2017; Morris et al., 2015; Kamita et al., 2019). The proliferation of conversational agents has made them attractive for health applications (Laranjo et al., 2018) and mental health (Vaidyam et al., 2019). Notable chatbots that monitor mood and use aspects of CBT to help users deal with anxiety and depression include Woebot (Stiles-Shields, [n.d.]) and TESS (Fulmer et al., 2018).
The advancements and rapid adoption of spoken conversational agents, such as Siri and Alexa, make them attractive as a channel for providing mental health resources to users (Maharjan et al., 2019). Spoken conversational agents interact with users via spoken natural language. For mental health applications, this requires users to vocalize responses about their mental health status, which is different than typing responses to a chatbot or on a website. One study explained that some people are more likely to have truthful interactions about their mental health with technology than wth mental health professionals (Ravichander and Black, [n.d.]).
This paper presents work-in-progress findings of an Alexa Skill we developed that performs depression and anxiety self-tests. Current Alexa Skills focus on guiding, educating, and helping users manage mental health issues. Some Alexa Skills examples include management for anxiety and stress through advice sessions (Anti Anxiety, Anxiety Stress), assisting people with depression by providing tasks to boost their mood (Mental Health Day Manager), management advice and education for children and teenagers dealing anger, stress, anxiety, and depression (Mental Health Spies), and targeted exercises depending on the situation (work, studies, life) causing the user stress (Mindscape). One study used Alexa to monitor a user’s mental health behaviors and symptoms, requiring users to self-report data on sleep, mood, and activity levels (Maharjan et al., 2019). However, further studies are required to provide evidence regarding the efficacy of delivering mental health resources via spoken conversational agents.
Our contributions are: (1) We developed an Alexa Skill that allows users to conduct depression and anxiety self-tests and makes exercise recommendations to alleviate anxiety and depression symptoms; (2) We conducted a pilot study with 10 participants to assess the usability of our Alexa Skill.
2. Methodology
2.1. Study Design
In this preliminary study, we recruited 10 participants who owned an Alexa device or had a smartphone where the Alexa app could be installed. Participants first completed an online questionnaire that collected demographics and conversational agent usage habits. The questionnaire also included depression (PHQ-9 (Kroenke et al., 2001)) and anxiety (GAD-7 (Jordan et al., 2017)) self-tests. This helped familiarize the participants with the depression and anxiety self-tests that they would later complete with our Alexa Skill. In our results, we compare online self-test scores vs the self-test scores completed using our Alexa Skill.
Participants were given access to the Alexa Skill for two weeks. It was recommended to participants to use the Skill regularly, with reminders sent every three days. After two weeks, we asked participants to complete another questionnaire, which included questions to assess usability, user experience, and trust. Ethics approval was granted by Macquarie University’s Human Research Ethics Committee for Medical Sciences (ethics reference number of 52020662417083).
2.2. The Alexa Skill
The Alexa Skill allowed users to express their emotions, conduct self-tests for depression and anxiety, and make a number of suggestions to improve the user’s current state-of-mind. Each session began with Alexa asking the user how they were feeling. After expressing their emotions, the user was prompted to either complete self-tests for depression and anxiety or to hear self-help exercises.
If the user expressed a symptom related to anxiousness or depression, Alexa prompted them to complete a self-test depending on their listed emotions. After the user completed the depression and anxiety self-tests, Alexa stated their depression and anxiety scores. Afterward, Alexa would recommend that the user practice one of five actions (randomly selected): breathing exercise; muscle relaxation exercise; lifestyle recommendations such as proper sleep, exercise, and maintaining a healthy diet; journaling; practice gratitude.
2.3. Evaluation
The questionnaire after the two-week period collected the participant’s final self-test scores for depression and anxiety, a perspective of the app’s usability, user experience, trust, and feedback. We used the following questionnaires to assess the Alexa Skill: the System Usability Scale (SUS) for system usability (Borsci et al., 2009); the User Experience Questionnaire (UEQ) for pragmatic and hedonic qualities (Hinderks et al., 2019); and the Technology Trust Questionnaire for trust between the system and the user (Jian et al., 2000).
3. Results
The average age of the participants was 21.6 years old and 8/10 were male. 9/10 of the participants used conversational agents. 7/10 of the participants also indicated that they had not previously used an app for mental well-being. After the two-week period, 7/10 participants indicated that they rarely used our Alexa Skill.
Figure 1 shows the anxiety and depression scores completed using a webpage (online) vs using Alexa. Given that the majority of the participants rarely used the Alexa Skill, we cannot attribute the change in anxiety score or depression scores to the use of the Alexa Skill or to the delivery of the self-tests via Alexa.
Figure 3 shows the user experience scores of the Alexa Skill. The participants found the app’s attractiveness, dependability, and stimulation to be above average, but attractiveness, perspicuity, efficiency, and novelty were scored below average. Figure 3 shows the trust scores for the Alexa Skill. We conclude, participants mostly trusted the app.



4. Discussion and Challenges
This pilot showed a willingness from participants to trust Alexa with personal information such as depression and anxiety scores. The user experience scores of the Alexa Skill also showed that participants considered it lacking in efficiency and novelty.
The design of this study, the implementation of the Alexa Skill, and conducting the pilot, highlighted a number of challenges when it comes to the development of an Alexa Skill for the mental health space.
Cold Start Problem: The first user interactions with an Alexa Skills with multi-step or branching dialogues can be challenging for new users. New users do not know the possibility of dialogue flows, the intent recognitions that have been programmed into the Alexa Skill, and the responses that are valid (how to respond to a question from the Alexa Skill). While Alexa Skills can be deployed with a user manual or by walking the user through a tutorial, complex skills may require regular use and various trials before the user is comfortable interacting with the Alexa Skill. In our pre-pilot tests, users struggled when interacting with our Alexa Skill because they gave responses that were not captured by the intent recognition rules we had programmed. Careful design must be planned to ensure intuitive interactions between the user and the Alexa Skill, as any difficulties are bound to discourage users from future use. This is especially important when designing spoken conversational agent apps for mental health, where users may be experiencing distress when they decide to use the Skill.
Robust Dialogues: Alexa Skills with deep dialogues need robust handling of user responses. In our Alexa Skill, the depression and the anxiety self-tests involved Alexa reading multiple questions and waiting for a user response to each question. During testing, some users were frustrated by having to repeat the depression or the anxiety self-test because the speech recognition of Alexa did not understand their response, or because the Skill closed due to an Alexa error or an error in our Skill. If a user is experiencing distress, this type of experience may worsen the user’s mental state or fail to provide the support intended by the app/Skill.
Handling expression of emotions: Our Alexa Skill allowed users to state how they were feeling using single words. This aspect of our Alexa Skill was brittle, since expressing a word not included in our list of emotions meant that Alexa would ask the user to repeat themselves. Ideally, our long-term goal is to support all expressions of emotions, with Alexa being able to acknowledge what the user is experiencing. We believe this acknowledgment can help users in isolation and experiencing distress, but it poses technical challenges in natural language understanding. Some of the users that tested our Skill also expressed their emotions by referring to physical symptoms associated with an emotion they were feeling, i.e. ”sweaty palms”, ”heart racing”, which poses the additional challenge of knowing the association between a body response and an emotion.
Engagement with the self-test Alexa Skill: Engagement with health support technologies is concerning. Related literature in mobile health apps for mental health reports users’ uptake and engagement challenges (Ng et al., [n.d.]). In our pilot, we observed similar trends where participants rarely used the Alexa Skill. Enhancing the engagement of conversational agents through storytelling (Battaglino and Bickmore, [n.d.]), personalization (Kocaballi et al., 2019), and affect (Callejas et al., 2011) has been proposed in the past. However, further research is needed to understand the design implication of such features for spoken conversational agents in the mental health context.
References
- (1)
- Battaglino and Bickmore ([n.d.]) Cristina Battaglino and Timothy W. Bickmore. [n.d.]. Increasing the engagement of conversational agents through co-constructed storytelling. In INT/SBG@AIIDE (2015).
- Borsci et al. (2009) Simone Borsci, Stefano Federici, and Marco Lauriola. 2009. On the dimensionality of the System Usability Scale: a test of alternative measurement models. Cognitive Processing 10, 3 (Aug. 2009), 193–197. https://doi.org/10.1007/s10339-009-0268-9
- Callejas et al. (2011) Zoraida Callejas, Ramón López-Cózar, Nieves Ábalos, and David Griol. 2011. Affective conversational agents: the role of personality and emotion in spoken interactions. In Conversational agents and natural language interaction: Techniques and effective practices. IGI Global, 203–222.
- Fulmer et al. (2018) Russell Fulmer, Angela Joerin, Breanna Gentile, Lysanne Lakerink, and Michiel Rauws. 2018. Using Psychological Artificial Intelligence (Tess) to Relieve Symptoms of Depression and Anxiety: Randomized Controlled Trial. JMIR mental health 5, 4 (Dec. 2018), e64. https://doi.org/10.2196/mental.9782
- Hinderks et al. (2019) Andreas Hinderks, Martin Schrepp, Francisco José Domínguez Mayo, María José Escalona, and Jörg Thomaschewski. 2019. Developing a UX KPI based on the user experience questionnaire. Computer Standards & Interfaces 65 (July 2019), 38–44. https://doi.org/10.1016/j.csi.2019.01.007
- Jian et al. (2000) Jiun-Yin Jian, Ann M. Bisantz, and Colin G. Drury. 2000. Foundations for an Empirically Determined Scale of Trust in Automated Systems. International Journal of Cognitive Ergonomics 4, 1 (March 2000), 53–71. https://doi.org/10.1207/S15327566IJCE0401_04
- Jordan et al. (2017) Pascal Jordan, Meike C. Shedden-Mora, and Bernd Löwe. 2017. Psychometric analysis of the Generalized Anxiety Disorder scale (GAD-7) in primary care using modern item response theory. PLoS ONE 12, 8 (Aug. 2017). https://doi.org/10.1371/journal.pone.0182162
- Kamita et al. (2019) Takeshi Kamita, Tatsuya Ito, Atsuko Matsumoto, Tsunetsugu Munakata, and Tomoo Inoue. 2019. A Chatbot System for Mental Healthcare Based on SAT Counseling Method. Mobile Information Systems 2019 (March 2019), 9517321. https://doi.org/10.1155/2019/9517321
- Kocaballi et al. (2019) Ahmet Baki Kocaballi, Shlomo Berkovsky, Juan C Quiroz, Liliana Laranjo, Huong Ly Tong, Dana Rezazadegan, Agustina Briatore, and Enrico Coiera. 2019. The Personalization of Conversational Agents in Health Care: Systematic Review. J Med Internet Res 21, 11 (7 Nov 2019), e15360. https://doi.org/10.2196/15360
- Kroenke et al. (2001) K. Kroenke, R. L. Spitzer, and J. B. Williams. 2001. The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine 16, 9 (Sept. 2001), 606–613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x
- Laranjo et al. (2018) Liliana Laranjo, Adam G Dunn, Huong Ly Tong, Ahmet Baki Kocaballi, Jessica Chen, Rabia Bashir, Didi Surian, Blanca Gallego, Farah Magrabi, Annie Y S Lau, and Enrico Coiera. 2018. Conversational agents in healthcare: a systematic review. Journal of the American Medical Informatics Association 25, 9 (07 2018), 1248–1258. https://doi.org/10.1093/jamia/ocy072 arXiv:https://academic.oup.com/jamia/article-pdf/25/9/1248/25643433/ocy072.pdf
- Ly et al. (2017) Kien Hoa Ly, Ann-Marie Ly, and Gerhard Andersson. 2017. A fully automated conversational agent for promoting mental well-being: A pilot RCT using mixed methods. Internet Interventions 10 (Dec. 2017), 39–46. https://doi.org/10.1016/j.invent.2017.10.002
- Maharjan et al. (2019) Raju Maharjan, Per Bækgaard, and Jakob E. Bardram. 2019. ”Hear me out”: smart speaker based conversational agent to monitor symptoms in mental health. In Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers (UbiComp/ISWC ’19 Adjunct). Association for Computing Machinery, London, United Kingdom, 929–933. https://doi.org/10.1145/3341162.3346270
- Morris et al. (2015) Robert R. Morris, Stephen M. Schueller, and Rosalind W. Picard. 2015. Efficacy of a Web-based, crowdsourced peer-to-peer cognitive reappraisal platform for depression: randomized controlled trial. Journal of Medical Internet Research 17, 3 (March 2015), e72. https://doi.org/10.2196/jmir.4167
- Ng et al. ([n.d.]) Michelle M. Ng, Joseph Firth, Mia Minen, and John Torous. [n.d.]. User Engagement in Mental Health Apps: A Review of Measurement, Reporting, and Validity. 70, 7 ([n. d.]), 538–544. https://doi.org/10/gg39pj
- Ravichander and Black ([n.d.]) A Ravichander and A Black. [n.d.]. Proceedings Of The 19Th Annual Sigdial Meeting On Discourse And Dialogue.
- Stiles-Shields ([n.d.]) Colleen Stiles-Shields. [n.d.]. Woebot: A Professional Review. ([n. d.]). https://onemindpsyberguide.org/expert-review/woebot-an-expert-review/
- Vaidyam et al. (2019) Aditya Nrusimha Vaidyam, Hannah Wisniewski, John David Halamka, Matcheri S. Kashavan, and John Blake Torous. 2019. Chatbots and Conversational Agents in Mental Health: A Review of the Psychiatric Landscape. Canadian Journal of Psychiatry. Revue Canadienne De Psychiatrie 64, 7 (2019), 456–464. https://doi.org/10.1177/0706743719828977
- Woodward et al. (2019) K. Woodward, E. Kanjo, D. Brown, T. M. McGinnity, B. Inkster, D. J. Macintyre, and A. Tsanas. 2019. Beyond mobile apps: a survey of technologies for mental well-being. IEEE Transactions on Affective Computing (2019). http://irep.ntu.ac.uk/id/eprint/36543/