¹¹institutetext: Gabriela Ferraro ²²institutetext: Commonwealth Scientific and Industrial Research Organization & Australian National University
GPO Box 1700 CANBERRA ACT 2601
²²email: [email protected]
³³institutetext: Brendan Loo Gee ⁴⁴institutetext: Australasian Institute of Digital Health & Research School of Population Health, Centre for Mental Health Research, Australian National University
⁴⁴email: [email protected] ⁵⁵institutetext: Shenjia Ji ⁶⁶institutetext: College of Engineering and Computer Science, Australian National University
⁶⁶email: s[email protected] ⁷⁷institutetext: Luis Salvador-Carulla ⁸⁸institutetext: Research School of Population Health, Centre for Mental Health Research, Australian National University
⁸⁸email: [email protected]

Lightme: Analysing Language in Internet Support Groups for Mental Health

Gabriela Ferraro Brendan Loo Gee Shenjia Ji Luis Salvador-Carulla

(Received: date / Accepted: date)

Abstract

Background: Assisting moderators to triage harmful posts in Internet Support Groups is relevant to ensure its safe use. Automated text classification methods analysing the language expressed in posts of online forums is a promising solution.
Methods: Natural Language Processing and Machine Learning technologies were used to build a triage post classifier using a dataset from Reachout.com mental health forum for young people.
Results: When comparing with the state-of-the-art, a solution mainly based on features from lexical resources, received the best classification performance for the crisis posts (52%), which is the most severe class. Six salient linguistic characteristics were found when analysing the crisis post; 1) posts expressing hopelessness, 2) short posts expressing concise negative emotional responses, 3) long posts expressing variations of emotions, 4) posts expressing dissatisfaction with available health services, 5) posts utilising storytelling, and 6) posts expressing users seeking advice from peers during a crisis.
Conclusion: It is possible to build a competitive triage classifier using features derived only from the textual content of the post. Further research needs to be done in order to translate our quantitative and qualitative findings into features, as it may improve overall performance.

1 Introduction

Internet Support Groups (ISG) has been important and popular technologies for individuals with mental ill-health to receive support from other peers that have similar lived experiences Islam2018 and to anonymously share their stories with others to support their recovery mikal-hurst-conway:2017:CLPsych . They are also referred to as online peer-support forum or networks. ISGs have supported groups of people with specific chronic health conditions, such as diabetes or mental health Islam2018 ; Naslund2018 . Current evidence suggests ISGs may have a positive impact on individuals with mental ill-health; however, it may also exacerbate a person’s distress levels Kaplan:2011 . Nevertheless, the safe use of ISGs will require more attention, especially designing mechanisms that can assist in mitigating possible adverse effects and harm to ISG users Griffiths:2017 .

Assessment and monitoring in ISGs are challenging and costly because it relies on the manual detection of posts in an online forum by trained moderators. This raises particular concerns on the scalability of ISGs as a potential digital health intervention. To overcome limitations, Natural Language Processing (NLP) and Machine Learning (ML) technologies can be used to build systems that can assist trained moderators in detecting and responding to hazardous posts that may cause further distress or self-harm to ISG users. Prior research has shown text classification methods to be promising solutions in reducing the workload of trained moderators Huh:2013 .

Moderators play an important role in managing communication between users in an ISG. They offer a range of informal support and advice to users, including providing personal experiences of recovery, motivating users to participate in the discussion, and enhancing the adoption of digital mental health services Kornfield:2018 . However, moderators may lack the necessary skills and expertise to guide appropriate decision making on issues relating to clinical safety Hartzler:2011 . Triaging ISG posts to assist moderators in reviewing new content uploaded daily is an automated text classification task designed to efficiently detect individuals’ thoughts, feelings, emotions, and possible behaviors represented in messages Conway:2016 ; Tausczik:2010 .

Previous research has often focused on evaluating the performance of different ML classification models using mental health ISG data, such as Logistic Regression Cohan:2016 ; Pink:2016 ; Zirikly:2016 , Stochastic Gradient Descent (SGD) kim-EtAl:2016:CLPsych , and Linear Discriminant Analysis (LDA) Shickel:2016 . The study by Islam2018 used different ML techniques to detect depression from Facebook data. They evaluated the performance of several classification models, including Support Vector Machines (SVM), Decision Tree (DT), ensemble methods, and K-Nearest Neighbor (KNN). The results demonstrated the relative performance for specific classifiers. However, the authors did not evaluate the performance of different language features from lexical resources and deep learning models.

Lexicon-based resources are central to modelling the linguistic characteristics of ISGs. Over the years, examples of comparative systems have used different lexicon-based features to classify hazardous posts in an ISG for mental health CLPsych:2017 ; milne-EtAl:2016:CLPsych , including in posts from Twitter data Odea:2017 ; Odea:2015 ; coppersmith-EtAl:2016:CLPsych ; jamil-EtAl:2017:CLPsych . While modelling linguistic characteristics are important for accuracy performance, other features such as interactions of ISG users, forum structure, meta-data and other external features may likely improve the prediction performance Carron-Arthur:2015 ; Smithson:2011 . However, some authors have stated that relying on features extracted from external sources (e.g., forum structure and meta-data) may introduce biases; therefore, decreasing the predictive capabilities of the classifier on never seen before messages published on online forums Altszyler:2018 .

Our study focuses on developing an automated classifier for triaging posts using only features from the textual content of the post derived from lexicon-based resources. We want to investigate the language of ISGs that exclude the use of the forum structure or post threads. By excluding forum structure and meta-data features from the model, the study primarily focuses on optimizing the linguistic aspects of detecting forum posts to avoid biases on unseen messages. Furthermore, given the extent of previous research on the combination of different ML classification models, we want to experiment on a broad combination of features using only a relatively small number of linear and nonlinear ML techniques including a couple of different deep learning models.

1.1 Research rationale

This study used state-of-the-art methods to develop hand-crafted features derived from the Reachout online support forum CLPsych:2017 . The model aims to achieve the best classification performance for crisis posts and competitive results for other classification labels, described in Section 2. We conducted a qualitative analysis of the post, which requires the immediate attention of moderators. The study has two aims:

•

Shed some light about the linguistic characteristics of the urgent posts.
•

Examine the feasibility of lexical resources in an ML classification system for triage post using the Reachout dataset.

2 Materials and Methods

2.1 Dataset

This study used a collection of posts from the Australian Reachout mental health online forum released by the Computational Linguistics and Clinical Psychology Shared Task (CLPsych) CLPsych:2017 . Participants range from 18 to 25 years old. All of the posts are written in English. Each post in the dataset is labelled with a semaphore pattern to indicate the urgency of the post, and the required attention of the moderator, as shown in Table 1. Label distribution across the training and testing dataset of the Reachout online forum is given in Table 2.

Label	Description	Example
Green	No input from a moderator, and it can be safely left for the wider community of peers to respond.	I’m proud that I was able to call and keep up a phone conversation with my mum.
Amber	A moderator should address the post at some point, but they do not need to do so immediately.	There are so many stuff I’m thinking about, but my medications are slowing my thoughts down and making it more manageable.
Red	A moderator should respond to the post as soon as possible.	I feel helpless and things seem pointless. I hate feeling so down.
Crisis	The author, or someone they know, is in imminent risk of being harmed, or harming themselves or others. Posts should be prioritized above all others.	Im having some strong thoughts about ending my life, nothing helps.

Table 1: Severity label descriptions and examples in the Reachout dataset.

	Train	%	Test	%
Crisis	40	3.36	42	10.5
Red	137	11.53	48	12
Amber	296	24.91	94	23.5
Green	715	60.18	216	54
Total	1188	-	400	-

Table 2: Label distribution across training and testing set of the Reachout dataset 2017

Precision, recall and F-measure were used to examine the performance of the classifier. Precision is defined as the proportion of correctly classified posts into a particular label by the ML model. Recall is defined as the proportion of the labels that are successfully classified. The F-measure is the mean of precision and recall. Macro f-score metric is preferred since it gives more weight to infrequent yet more critical labels, such as red and crisis. Similar to Altszyler:2018 , the f-score for crisis versus non-crisis was reported. This metric measured the classifier’s capability to detect the most severe cases. Details of the official evaluation matrices are described below;

•

Macro-averaged F-score: The macro-averaged f-score is calculated among crisis, red and amber, and after excluding the green class.
•

F-score for flagged vs. non-flagged: This metric separates the posts that moderators need to action (i.e. crisis, red, amber) compared to posts that can be safely ignored (i.e. green). This is the most important metric in CLPsych since it measures the classifier’s capability to identify the post that requires moderator attention.
•

F-score for urgent vs. non-urgent: This metric is the average F1-score among urgent (crisis + red) and non-urgent (amber + green) labels.

A search of key computing and health databases (IEEE, ACM, PubMed and PsycINFO) were conducted to identify the key components of previous text classifiers for ISGs. Table 3 shows the features and methods used by the best performing classifiers using the Reachout dataset, more details in milne-EtAl:2016:CLPsych and CLPsych:2017 .

Lexicon Features	Used by
LIWC lexicon Pennebaker:2015	Cohan:2016 ; Malmasi:2016
MPQA lexicon PHQ-9:Kroenke2001	Cohan:2016 ; Altszyler:2018
PERMA lexicon Perma:2016	Altszyler:2018
Emolex lexicon Mohammad2013CrowdsourcingAW	Altszyler:2018
DepecheMood lexicon staiano2014depeche	Cohan:2016 ; Altszyler:2018
Other Features
Lexical diversity	Altszyler:2018
Topic modeling	Cohan:2016
TF-IDF weighted	kim-EtAl:2016:CLPsych ; Brew:2016
Character embeddings	Malmasi:2016
Word embeddings	kim-EtAl:2016:CLPsych ; Brew:2016 ; Malmasi:2016 ; Altszyler:2018
Sentence embeddings	Le:2014
POS-tags	Malmasi:2016
Pronouns	Altszyler:2018
Sentiment analysis	Shickel:2016 ; Zirikly:2016
Post author	Altszyler:2018
Post history	Malmasi:2016 ; Altszyler:2018
Post reply chain	Pink:2016
Time of the post	Altszyler:2018
Time between post	Altszyler:2018
Week day of the post	Altszyler:2018
References to advisors	Altszyler:2018
References to self-harm	Altszyler:2018
References to Telephone helplines	Altszyler:2018
Algorithm
LDA: unsupervised topic modeling	Shickel:2016
SGD: supervised classification	kim-EtAl:2016:CLPsych
Support Vector Machine (SVM): supervised classification	Malmasi:2016 ; Brew:2016 ; Zirikly:2016 ; Altszyler:2018
Logistic regression: supervised classification	Cohan:2016 ; Pink:2016 ; Zirikly:2016

Table 3: Examples of features used for triage classification using the Reachout dataset. Used by refers to previous research studies that a feature was used.

2.2 Predicting Alerts Approach

The Reachout dataset $D=\{x_{i},y_{i}\}^{n}_{i=1}$ consist of $n$ training instances, where the $i$ th instance is a feature vector $x_{i}$ and label $y_{i}$ . The classification task is to predict the label $y_{i}$ given the feature vector $x_{i}$ for each forum post such that:

\hat{y_{i}}=\mathop{\arg\max}_{y_{i}}P_{\theta}(y_{i}|x_{i})

(1)

We trained a Support Vector Machine (SVM) multi-class classifier with linear kernels Vapnik:1963 . SVM is a supervised ML method used widely in text classification. This method used a state-of-the-art triage classification using the Reachout dataset. Hyper-parameters¹¹1In machine learning, an hyper-parameter is a parameter whose value is set before the learning process, while the value of other parameters are derived via learning. were selected with a grid search²²2Grid search is a way of choosing the best hyper-parameters, and consist of exhaustively searching through a subset of the hyper-parameter space of a learning algorithm. scheme with a 5-fold Cross-Validation over the training set. The C hyper-parameter³³3The C hyper-parameter referrers to the regularization value, which serves as a degree of importance that is given to miss-classification. The larger the value, the less the wrongly classified examples are allowed. is 1 with $l_{1}$ regularization type, and the loss function⁴⁴4A loss function or cost function measures how good a prediction model does in terms of being able to predict the expected outcome. is hinge, the maximum number of iterations is 2000.

In order to compare Lightme (which used SVM) against other ML classifiers, we also trained K-Nearest Neighbour (KNN), and Naïve Bayes. Since deep learning models are the state-of-the-art in many natural language processing applications, we trained two neural network classifiers: Multi Layer Perceptron (MLP) and Recurrent Neural Networks (RNN) with Long Short Term Memory (LSTM).

The feature set is shown in Table 4. All the features were derived from the post themselves. No features derived from the forum structure or interactions between posts were used. We included additional language features such as MPQA, offensive language, and mental health lexicons.

During feature extraction, negation was model as in cimino2014linguistically . Thus, when a $term_{i}$ from a post is found in a lexicon, its negation is checked by inspecting the term $term_{i-1}$ . As in cimino2014linguistically , we used a list of negation terms:

no, nobody, nothing, none, never, neither, nor, nowhere, hardly, scarcely, barely, don’t, isn’t, wasn’t, doesn’t, ain’t, can’t, won’t, wouldn’t, shouldn’t, couldn’t, hasn’t, haven’t, didn’t

If a negation term was found, the polarity of the term was shifted when the lexicon differentiate between positive and negative terms (e.g., the PERMA lexicon); otherwise, it was skipped and not included as a feature.

Lexicon Features	Feature Description
MPQA lexicon*	The number of words with MPQA polarity in each post
DepecheMood lexicon*	The number of words overlap between each category in DepechMood and a post.
Emolex lexicon*	The number of words overlap between each category in the NRC-Emotion-Lexicon-v0.92 lexicon and a post
Mental Disorder lexicon ⁵⁵5http://mental-health-matters.com/psychological-disorders/alphabetical-list-of-disorders/	The number of words overlap between the Mental Disorder lexicon and a post
PHQ_9 lexicon	The number of words overlap between the PHQ_9 and a post
PERMA lexicon (1)*	The number of bi-gram and tri-gram overlap between PERMA and a post
PERMA lexicon (2)	The number of bi-gram and tri-gram overlap between PERMA negatives categories and a post
PERMA lexicon (3)	The weights sum of the bi-gram and tri-grams overlap between PERMA and a post
Offensive word lexicon⁶⁶6https://www.cs.cmu.edu/~biglou/resources/	The number of words overlap between offensive word list
Other Features
TF-IDF weighted	N-grams TF-IDF representation of each post with top max features chosen by Scikit-learn based on term frequency
Pronouns	The number of pronouns used in each post, including I, me, you, he, him, she, her, it, we, us, they, them
Mean word length	The average length of words in a post
Sentence embeddings	Sentence representation computed by averaging pre-trained FastText word embeddings fine-tuned with the Reachout dataset
Last sentence embeddings	Sentence representation of the last sentence in each post computed by averaging FastText word embeddings trained with the Reachout dataset
Sentiment analysis feature	The sentiment of each post classified by a sentiment classifier trained by us with GloVe [pennington2014glove, ] word embeddings feature and emoticon embedding
User rank	The forum title of the poster for each post
Number of web links	Total number of web links in a post
Number of reference to a help line services	mental health, australia, general practitioner, doctor, psychologist, counsellor, gp (general practitioner), emergency, 000, lifeline, 131114, 13 11 14, kids help line, 1800 55 1800, 1800551800, salvation army care line, 1300 36 36 22, 1300363622, e-couch, moodgym, bluepages, black dog institute, reachout, beyondblue, www.moodgym.anu.edu.au, www.ecouch.anu.edu.au, www.bluepages.anu.edu.au, www.researchout.org.au, www.blackdoginstitute.org.au
Number of references to self-harm expressions	suicide, kill myself, kill my self, cut myself, cut my self, hurt myself, hurt my self, harm myself, harm my self, I want to die, don’t want to live, end my life, kill, hurt, cut, want to die, I don’t want to live
Number of references to advisors	supervisor, supervisors, mentor, manager, tutor, case-manager, managers, manager, psych, psychiatrist, gp (general practitioner), gps, counsellor, counselor

Table 4: Feature set used for triage classification with the Reachout dataset. ’*’ indicates a lexicon that have been tested in the previous studies (see Table 3).

3 Results

3.1 Triage Classification Experimental Results

The results of the triage classification experiment using the different features, including lexical resources and treating negation, are presented in Table 5. Best results are highlighted in boldface. Exclusive use of lexicon features resulted in lower performance for all classes (flagged, urgent and crisis) and the overall performance (macro F1-score). Treating negation when using only lexicons did not boost the classification performance. However, adding Term Frequency-Inverse Document Frequency (TF-IDF) contributed to improving the classification performance for all classes⁷⁷7TF-IDF is the amount of times a word appears in a document weighted by the number of meaningful words across multiple documents. Best results were achieved with features that included “TF-IDF + lexicons with negation”, F1-score of 0.44. The most complex set of features (included all features in Table 4) showed competitive results with the most state-of-the-art triage classification system by Altszyler:2018 , and the baseline classification system by milne-EtAl:2016:CLPsych .

	Macro F1-score	Flagged	Urgent	Crisis
Only lexicons	0.24	0.38	0.38	0.20
Lexicons with negation	0.19	0.43	0.37	0.04
TF-IDF + lexicons	0.38	0.71	0.53	0.44
TF-IDF + lexicons with negation	0.44	0.74	0.63	0.52
Lightme (features from Table 4)	0.43	0.77	0.59	0.51

Table 5: Triage classification with different features sets.

We also experimented on a linear and nonlinear classification method. Naïve Bayes was trained with the Lightme feature set since it is an easy and fast linear classification method suitable for classifying large chunks of data. Similarly, KNN was trained with the same feature set due to its practicality and ease. Hyper-parameters such as the number of neighbours were selected with a grid search scheme using a range from 1 to 25. Table 6 shows the results of Naïve Bayes and KNN, which underperformed SVM and other state-of-the-art systems. Compare to Naïve Bayes and KNN, SVM is known to perform better on rich feature sets such as the one presented in this study.

MLP was trained using the same set of features as Lightme with hidden layer sizes that varied between 100 and 300 nodes, depending on the development set. The RNN+LSTM model was trained with pre-trained word embeddings and without features since one of the advantages of this type of model is its ability to learn feature representations automatically. Important hyper-parameters such as the number of epochs, size of the hidden layer and batch size were tuned using a portion of the training set as the development set. As shown in Table 6, the deep learning models underperformed the other models. This is not surprising as deep learning models are data-hungry and the size of Reachout is small, especially some of the classes (e.g., red and crisis) only have a few instances.

System	Macro F1-score	Flagged	Urgent	Crisis
Baseline	0.3	0.61	0.44	-
Naïve Bayes	0.28	0.67	0.42	0.39
KNN	0.14	0.39	0.08	0.0
MLP	0.38	0.71	0.58	0.39
RNN+LSTM	0.28	0.44	0.008	0.0
Altszyler Altszyler:2018	0.44	0.90	0.68	0.48
TF-IDF + lexicons with negation	0.44	0.74	0.63	0.52
Lightme (features from Table 4)	0.43	0.77	0.59	0.51

Table 6: Comparison results on the test set in terms of F-score

3.2 Qualitative formative analysis of crisis posts

We randomly selected 40 crisis posts to analyse from the training dataset. We then used open coding to understand linguistic characteristics. Through the qualitative analysis of the selected crisis posts, we identified six linguistic characteristics. We extracted selected phases of crisis posts (including the post id) that matched the given linguistic profile, and suggested recommendation of features to the model.

3.2.1 Expressing hopelessness in crisis

Many of the posts used language or words that described a person’s feeling of immediate hopelessness. Extreme hopelessness or helplessness may be associated with an increased risk of suicide Cash2013 . Learned helplessness comes from a repeated belief that uncomfortable situations are inescapable, an example statement is ”I tried doing this for my anxiety, but I ended up faced with these challenges” Liu:2015 . Hopelessness is the feeling of a combination of helplessness and experiences of depression resulted from a person’s response to a negative event Cash2013 ; Liu:2015 . An example statement is ”I am fed up with my friend anger! I can’t bother trying anymore because I am frustrated”.

Extracted phases of crisis posts describing hopelessness of a forum user:

•

”I can feel pretty hopeless at times too. I start questioning if I can ever get better. It’s hard enough to live.” (Post ID: 136600)
•

”I’m feeling so tired, and I want to give up on life. I need to keep holding on. There’s still hope for me. I just need to make sure I reach out when I feel like things are getting way too intense.” (Post ID: 136601)
•

”No but I am pretty friggin sick of my entire life at this point and my existence…” (Post ID: 135818)
•

”I’m still finding it hard not to do anything stupid. I’ve screwed up. Now I don’t know where this is headed.” (Post ID: 138188)

Recommended features to model: Categorical features can be model with the following keywords; feel tired, fed up, better dead, give up life, the end is near, sick of life, sick of existence, holding on, hopeless times, hope, trying help or talking, and hard to try or do. Other features can include checking spelling mistakes.

3.2.2 Short crisis posts and emotional response

Short length posts contained concise descriptions of a person’s negative emotions. Contrast to longer post; shorter posts contained more variations in expressing positive and negative emotions. As noted by Odea:2017 , lexicons may be limited in detecting certain expressions such as irony, sarcasm, and metaphors. Therefore, any text under 50 words should be interpreted with caution. Further limitations of interpreting short posts included the use of negation gkotsis-EtAl:2016:CLPsych2 .

Examples of short crisis posts describing a concise negative emotion:

•

”I’m suffocating. I don’t if I can do this anymore.” (Post ID: 138064)
•

”@redhead I don’t know how long I can even keep myself together before I’m screwed.” (Post ID: 138067)
•

”@chessca_h no. I don’t want to be safe anymore. I’m ai over it right now.” (Post ID: 137786)

Recommended features to model: Features can define short posts as messages under 50 words, or posts that contain no more than two sentences. Additionally, features can include detecting only one negative emotion for short posts, treating all negations for short posts, and checking for spelling mistakes.

3.2.3 Long crisis posts and emotional coping

Long length posts were found to start with a user expressing some negative emotions, followed by positive emotions related to their abilities to cope. This may be a positive sign as it may indicate a person attempting to reconcile negative emotions Naslund2014 . However, they may express negative emotions after showing signs of positivity.

Extracted phases of crisis posts that expresses patterns of health service dissatisfaction:

•

”(Negative) Feeling extremeley tired each morning. It’s getting to the point I’m contemplating ringing in sick to aviod getting up. (Positive) Despite the tiredness, I’ve bee getting up and going to work, because I know I need to face the world. (Negative) Keep having thoughts to end it all…” (Post ID: 135898)
•

”(Negative) Didn’t sleep til way after 2 last night. It was super hard to sleep, then wake this morning. I just wanted to ignore the world today. (Positive) I eventually fell asleep. I eventually got up and went to work. I faced the world and smiled a little. (Negative) Really struggled through my shift today… ” (Post ID: 137919)

Recommended features to model: Feature can define long posts as messages that contain 50 or more words with varying levels of positive and negative emotions. Other features can include detecting negative emotions at the beginning of a sentence followed by subsequent positive emotions. Checking for spelling mistakes can also be a feature. A feature can detect positive emotions such as keywords relating to coping; getting there, I faced the world, or getting up and working.

3.2.4 Health service dissatisfaction

It was found that people who seek support services (e.g., health service, counsellors, or treatment) in the forum would sometimes feel hopelessness, avoidance, or frustration. This pattern may signal a person using the forum to vent their dissatisfaction or frustrations with local mental health support services, or failures with their treatment of care.

Extracted phases of crisis posts that expresses patterns of health service dissatisfaction:

•

”(Service) My gp was running late today, which heightened my anxiety. At first she didn’t realise it was my 3month follow up apt. She also had no idea that the psych was meant to write a letter, so the psych either ran out of time or forgot. Meh. I just wanted to run away and hide. It was SO VERY hard not to close off and run. I didn’t even hear her call my name the first time. Blergh…” (Post ID: 137919)
•

”im having bad thought about ending my life, nothing helps not even (Service) my counceller” (Post ID: 136895)
•

”I know looking back at therapy experiences that didn’t work out will only discourage me. I’m highly impatient and annoyed. I’m trying to find the right (Service) professionals for me, its a very frustrating process. I can feel pretty hopeless at times too. I start questioning if I can ever get better. It’s hard enough to live.” (Post ID: 136600)

Recommended features to model: Categorical features can be used to model any mention of support services. Features can also consider detecting the negative expression of support services in crisis posts, factoring the length of the posts, and checking for spelling mistakes.

3.2.5 Utilising story telling to express crisis

According to Smithson:2011 , there are two types of behaviours when ISG users seek help. The first behaviour involves a person wanting to communicate their story or ’trouble telling’, and the other behaviour involves a person wanting advice. Appropriate timing for offering advice is crucial. If the advice is suggested too soon, it is likely to be rejected. It was found some people would join the forum to seek advice about some issue followed by opening up to talk about their problems.

Extracted phases of story telling in crisis posts:

•

”…(Event) So today I went to the doctors and they told me that the chemotherapy that I am on is not working, my body isnt reacting to it the way it should be, which means that I now need to start this new treatment that is going to knock me around a lot more then the ast chemotheraphy…” (Post ID: 136116)
•

”(Event) I moved out of home into a defacto relationship about a year ago now, and despite having troubles with my mum, who I used to live with (single parent), I have the feeling that she is very lonely and she often gets teary about that. (Event) She mentioned today that she may as well just kill herself because she feels like she’s not really worth it anymore.” (Post ID: 137384)

Recommended features to model: Features can detect a sequence of personal events. Personal events may contain temporal features, such as today, yesterday, or tomorrow. Additionally, features can detect negative and positive emotional responses relating to different events identified in the post. Other features can include checking spelling mistakes.

3.2.6 Seeking advice of peers during crisis

Crisis post was found to contain more advice seeking information than information providing support to other peers. Gaining the support of peers online is a common behaviour among people with severe mental ill-health Naslund2014 . Carron-Arthur:2015 differentiate posts that provide support to peers and posts that attempt to seek advice from other peers in an ISG. Supportive posts were characterized by the user providing emotional support, and informational support, such as offering website links to seek help. However, posts that seek advice are characterized by the users seeking informational support and seeking emotional support, and companionship from other peers.

Extracted phases of advice seeking in crisis posts:

•

”…Suffering from anxiety and deppression myself, this kind of relationship is setting me back quite significantly. (Advice Seeking) Has anyone else ever had a depressed parent that they are worried about when they move out of home? I have been going to her place often and not sure what else I can do to really help her…” (Post ID: 137384)
•

”She’s very depressed and always wnats to die. I’m pretty scared and I try to help but deep down I’m pretty useless for helping.(Advice Seeking) Any good tips? Because she likes to talk to me because i’m nice to her and doesn’t judge her.” (Post ID: 135748)

Recommended features to model: Features can detect questions relating to an emotional response. Other feature can also detect advice-seeking information as a text embedding feature that identifies information relating to emotional support.

4 Discussion

4.1 Principal Findings

This study demonstrates a solution that utilises a variety of lexicon-based resources and supervised ML techniques to assist trained moderators to efficiently moderate ISGs. Contrast to other similar research; this study extracted lexicon-based features from the textual content of posts which may avoid possible biases during classification. The classification experiment found one of our classifier (Liteme) achieved the best results for the crisis post (0.52 F1-score) and competitive results in the other classes (i.e., non-green, flagged, and urgent posts). These results may indicate that it is possible to build a strong classifier that can process only textual features extracted from individual messages. However, the experimental results also demonstrated that using only lexicons was not enough to classify posts into all relevant classes. Exclusive use of vocabulary in the Reachout dataset was built into the solution which may have introduced some noise that may have impacted on the classification performance of flagged and urgent posts. Furthermore, this study demonstrated the limitations of utilizing lexicons, especially their ability to only capture information at the ’word’ level. This may prevent their ability to understand the contextual meaning at the ’sentence’ level.

Furthermore, the findings suggest that using mental health lexicons can have an impact on the classification of posts requiring immediate response by trained moderators. This is unsurprising given the distinct domain-specific properties of lexicons, especially their association with certain mental and behavioural health theoretical constructs Kornfield:2018 . Lastly, six linguistic characteristics were identified in the qualitative analysis of crisis posts. Interestingly, we found a person in crisis will use words or language associated to hopelessness, publish short posts containing concise negative emotional responses, publish long posts containing variations of emotions, express dissatisfaction with locally available health services, use storytelling to express crisis, and seek the advice of peers during a crisis.

4.2 Comparison to Previous Research

Our best classifier showed comparative results with the state-of-the-art systems for triage classification with the Reachout dataset and the baseline classifiers. The baseline system by milne-EtAl:2016:CLPsych used uni-grams and bi-grams as features, and a default scikit-learn logistic regression classifier scikit-learn . We also found the classification performance of the best system by Altszyler:2018 from the CLPsych 2017 Shared Task also utilized an SVM classifier. This classifier used a richer set of features, including features from the forum structure and interactions between posts, which outperformed our system for flagged and urgent posts. Interestingly, our approaches showed better results in identifying crisis posts. This is an important category for this problem, especially the moderator’s need to immediately respond to these posts. Adding features derived from the forum structure may help to improve the classification performance. However, the trade-off is the expense of not properly classifying posts from new users.

Similar to other systems, our triage text classifier found using TF-IDF with lexicons improved classification performance. kim-EtAl:2016:CLPsych received the best results in CLPsych 2016 Shared Task when using TF-IDF weighted n-grams and post embeddings using Sent2vec in an SGD classifier, and a set of twelve fine-coarse grained labels, instead of the coarse-grained four labels. The system by Brew:2016 also weights n-grams with TF-IDF producing similar results. Similarly, the use of TF- IDF showed comparative results to triage text classifiers for Twitter Odea:2015 , and an online support forum for substance abuse Kornfield:2018 .

The qualitative findings appeared to support prior research that found similar patterns of online interactions in people with mental ill-health using social media. As highlighted in previous research, online peer-to-peer interactions can improve health and psychosocial outcomes by facilitating a range of positive behaviours that can empower people, such as seeking information and emotional support Naslund2014 . However, these online networks can also become harmful when social media content begin to promote self-harm, suicide, or pro-eating disorder behaviours Gerrard2018 ; Dyson2016 . Particularly, social media posts that promote “problematic” content that may be difficult to identify by specifically moderating hashtags in online communities Gerrard2018 .

4.3 Implication on Future Research

Most of the qualitative findings may be translated into features that could improve classification performance. As noted, the qualitative findings of the crisis posts could be used to distinguish salient linguistics characteristics of the language used in urgent messages for moderators. For example, specific features for detecting hopelessness may improve detection of crisis messages. Suggestions for future work may also include differentiating posts that provide support or seek advice to other peers and identifying participants roles, such as leaders, influence, and opinion users Carron-Arthur:2015 . Furthermore, various types of help-seeking behaviours can be identified, such as users wanting to share their personal stories of struggle Smithson:2011 . The analysis of satisfaction with available services can play a role in developing enhanced mixed reality care approaches combining eHealth and on-site services vanGenderen2018 .

4.4 Limitation

There was a limitation to this study. First, our classifier is restricted to one dataset. More data is needed to generalize the model to avoid overfitting. Second, the training set was relatively small. This may have had implications to our approach and subsequent results. Third, an error analysis was not conducted. The error analysis could have examined why certain posts were misclassified or classified correctly.

4.5 Conclusion

The current study examines a triage classifier using features derived only from the textual content of the post. Various lexicons were used to analyse the value of lexical resources on the text classifier for triaging posts. Lexical resources alone were not enough to build a good performing classifier; however, a solution that includes lexicons with other features derived from the content of the posts performed well in identifying crisis posts. Qualitative investigation on the crisis posts found six salient linguistic characteristics. While qualitative findings are still formative, more work is needed to translate these findings into features that can improve the overall performance.

5 Disclosure Statement

No competing financial interests exist.

References

[1] Altszyler, E., Berenstein, A.J., Milne, D.N., Calvo, R.A., Slezak, D.F.: Using contextual information for automatic triage of posts in a peer-support forum. In: K. Loveys, K. Niederhoffer, E. Prud’hommeaux, R. Resnik, P. Resnik (eds.) Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, CLPsych@NAACL-HTL, New Orleans, LA, USA, June 2018, pp. 57–68. Association for Computational Linguistics (2018). URL https://aclanthology.info/papers/W18-0606/w18-0606
[2] B., C.A., K., A., JA., C., KM, G.: From help-seekers to influential users: A systematic review of participation styles in online health communities. Journal of Medical Internet Research (2015)
[3] Brew, C.: Classifying reachout posts with a radial basis function svm. In: Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology, pp. 138–142. Association for Computational Linguistics (2016). DOI 10.18653/v1/W16-0315. URL http://www.aclweb.org/anthology/W16-0315
[4] Cash, S.J., Thelwall, M., Peck, S.N., Ferrell, J.Z., Bridge, J.A.: Adolescent suicide statements on myspace. Cyberpsychology, Behavior, and Social Networking 16(3), 166–174 (2013). DOI 10.1089/cyber.2012.0098. URL https://doi.org/10.1089/cyber.2012.0098. PMID: 23374167
[5] Cimino, A., Cresci, S., Dell’Orletta, F., Tesconi, M.: Linguistically-motivated and lexicon features for sentiment analysis of italian tweets. 4th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA 2014) pp. 81–86 (2014)
[6] Cohan, A., Young, S., Goharian, N.: Triaging mental health forum posts. In: Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, pp. 143–147. Association for Computational Linguistics, San Diego, CA, USA (2016). URL http://www.aclweb.org/anthology/W16-0316
[7] Conway, M., O’Connor, D.: Social media, big data, and mental health: current advances and ethical implications. Current Opinion in Psychology 9, 77–82 (2016). DOI https://doi.org/10.1016/j.copsyc.2016.01.004. URL http://www.sciencedirect.com/science/article/pii/S2352250X16000063. Social media and applications to health behavior
[8] Coppersmith, G., Ngo, K., Leary, R., Wood, A.: Exploratory analysis of social media prior to a suicide attempt. In: Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, pp. 106–117. Association for Computational Linguistics, San Diego, CA, USA (2016). URL http://www.aclweb.org/anthology/W16-0311
[9] Dyson, M.P., Hartling, L., Shulhan, J., Chisholm, A., Milne, A., Sundar, P., Scott, S.D., Newton, A.S.: A systematic review of social media use to discuss and view deliberate self-harm acts. PLOS ONE 11(5), 1–15 (2016). DOI 10.1371/journal.pone.0155813. URL https://doi.org/10.1371/journal.pone.0155813
[10] van Genderen, M., Vlake, J.: Virtual healthcare; use of virtual, augmented and mixed reality. Nederlands tijdschrift voor geneeskunde 162, D3229 (2018)
[11] Gerrard, Y.: Beyond the hashtag: Circumventing content moderation on social media. New Media & Society 20(12), 4492–4511 (2018). DOI 10.1177/1461444818776611. URL https://doi.org/10.1177/1461444818776611
[12] Gkotsis, G., Velupillai, S., Oellrich, A., Dean, H., Liakata, M., Dutta, R.: Don’t let notes be misunderstood: A negation detection method for assessing risk of suicide in mental health records. In: Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, pp. 95–105. Association for Computational Linguistics, San Diego, CA, USA (2016). URL http://www.aclweb.org/anthology/W16-0310
[13] Griffiths, K.M.: Mental health internet support groups: just a lot of talk or a valuable intervention? World Psychiatry 16(3), 247–248 (2017). DOI 10.1002/wps.20444. URL https://doi.org/10.1002/wps.20444
[14] Hartzler, A., Pratt, W.: Managing the personal side of health: How patient expertise differs from the expertise of clinicians. J Med Internet Res 13(3), e62 (2011). DOI 10.2196/jmir.1728
[15] Hollingshead, K., Ireland, M.E., Loveys, K. (eds.): Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology — From Linguistic Signal to Clinical Reality. Association for Computational Linguistics, Vancouver, BC (2017). URL http://www.aclweb.org/anthology/W17-31
[16] Huh, J., Yetisgen-Yildiz, M., Pratt, W.: Text classification for assisting moderators in online health communities. Journal of Biomedical Informatics 46(6), 998–1005 (2013). DOI https://doi.org/10.1016/j.jbi.2013.08.011. URL http://www.sciencedirect.com/science/article/pii/S1532046413001391. Special Section: Social Media Environments
[17] Islam, M.R., Kabir, M.A., Ahmed, A., Kamal, A.R.M., Wang, H., Ulhaq, A.: Depression detection from social network data using machine learning techniques. Health Information Science and Systems 6(1), 8 (2018). DOI 10.1007/s13755-018-0046-0. URL https://doi.org/10.1007/s13755-018-0046-0
[18] Jamil, Z., Inkpen, D., Buddhitha, P., White, K.: Monitoring tweets for depression to detect at-risk users. In: Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology — From Linguistic Signal to Clinical Reality, pp. 32–40. Association for Computational Linguistics, Vancouver, BC (2017). URL http://www.aclweb.org/anthology/W17-3104
[19] Kaplan, K., Salzer, M., Solomon, P., Brusilovskiy, E., Cousounis, P.: Internet peer support for individuals with psychiatric disabilities: A randomized controlled trial 72, 54–62 (2011)
[20] Kim, S.M., Wang, Y., Wan, S., Paris, C.: Data61-csiro systems at the clpsych 2016 shared task. In: Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, pp. 128–132. Association for Computational Linguistics, San Diego, CA, USA (2016). URL http://www.aclweb.org/anthology/W16-0313
[21] Kornfield, R., Sarma, P.K., Shah, D.V., McTavish, F., Landucci, G., Pe-Romashko, K., Gustafson, D.H.: Detecting recovery problems just in time: Application of automated linguistic analysis and supervised machine learning to an online substance abuse forum. J Med Internet Res 20(6), e10136 (2018). DOI 10.2196/10136. URL http://www.jmir.org/2018/6/e10136/
[22] Kroenke, K., Spitzer, R.L., Williams, J.B.W.: The phq-9. Journal of General Internal Medicine 16(9), 606–613 (2001). DOI 10.1046/j.1525-1497.2001.016009606.x. URL https://doi.org/10.1046/j.1525-1497.2001.016009606.x
[23] Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, pp. II–1188–II–1196. JMLR.org (2014). URL http://dl.acm.org/citation.cfm?id=3044805.3045025
[24] Liu, R.T., Kleiman, E.M., Nestor, B.A., Cheek, S.M.: The hopelessness theory of depression: A quarter-century in review. Clinical Psychology: Science and Practice 22(4), 345–365 (2015). DOI 10.1111/cpsp.12125
[25] Malmasi, S., Zampieri, M., Dras, M.: Predicting post severity in mental health forums. In: Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, pp. 133–137. The Association for Computational Linguistics (2016)
[26] Mikal, J., Hurst, S., Conway, M.: Investigating patient attitudes towards the use of social media data to augment depression diagnosis and treatment: a qualitative study. In: Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology — From Linguistic Signal to Clinical Reality, pp. 41–47. Association for Computational Linguistics, Vancouver, BC (2017). URL http://www.aclweb.org/anthology/W17-3105
[27] Milne, D.N., Pink, G., Hachey, B., Calvo, R.A.: Clpsych 2016 shared task: Triaging content in online peer-support forums. In: Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, pp. 118–127. Association for Computational Linguistics, San Diego, CA, USA (2016). URL http://www.aclweb.org/anthology/W16-0312
[28] Mohammad, S., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Computational Intelligence 29, 436–465 (2013)
[29] Naslund, J.A., Aschbrenner, K.A., Marsch, L.A., McHugo, G.J., Bartels, S.J.: Facebook for supporting a lifestyle intervention for people with major depressive disorder, bipolar disorder, and schizophrenia: an exploratory study. Psychiatric Quarterly 89(1), 81–94 (2018). DOI 10.1007/s11126-017-9512-0. URL https://doi.org/10.1007/s11126-017-9512-0
[30] Naslund, J.A., Grande, S.W., Aschbrenner, K.A., Elwyn, G.: Naturally occurring peer support through social media: The experiences of individuals with severe mental illness using youtube. PLoS One 9(10) (2014)
[31] O’Dea, B., Larsen, M.E., Batterham, P.J., Calear, A.L., Christensen, H.: A linguistic analysis of suicide-related twitter posts. Crisis 38(5), 319–329 (2017). DOI 10.1027/0227-5910/a000443. URL https://doi.org/10.1027/0227-5910/a000443. PMID: 28228065
[32] O’Dea, B., Wan, S., Batterham, P.J., Calear, A.L., Paris, C., Christensen, H.: Detecting suicidality on twitter. Internet Interventions 2(2), 183–188 (2015). DOI https://doi.org/10.1016/j.invent.2015.03.005. URL http://www.sciencedirect.com/science/article/pii/S2214782915000160
[33] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
[34] Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
[35] Pink, G., Radford, W., Hachey, B.: Classification of mental health forum posts. In: Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, CLPsych@NAACL-HLT 2016, June 16, 2016, San Diego, California, USA, pp. 180–182 (2016). URL http://aclweb.org/anthology/W/W16/W16-0324.pdf
[36] Schwartza, H.A., Sap, M., Kern, M.L., Eichstaedt, J.C., Kapelner, A., Agrawal, M., Blanco, E., Dziurzynski, L., Park, G., Stillwell, D., Kosinski, M., Seligman, M.E., Ungar, L.H.: Predicting individual well-being through the language of social media pp. 516–527 (2016)
[37] Shickel, B., Heesacker, M., Benton, S., Ebadi, A., Nickerson, P., Rashidi, P.: Self-reflective sentiment analysis. In: Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, pp. 23–32. Association for Computational Linguistics (2016). DOI 10.18653/v1/W16-0303. URL http://www.aclweb.org/anthology/W16-0303
[38] Smithson, J., Sharkey, S., Hewis, E., Jones, R., Emmens, T., Ford, T., Owens, C.: Problem presentation and responses on an online forum for young people who self-harm. Discourse Studies 13(4), 487–501 (2011). DOI 10.1177/1461445611403356. URL https://doi.org/10.1177/1461445611403356
[39] Staiano, J., Guerini, M.: Depeche mood: a lexicon for emotion analysis from crowd annotated news. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 427–433. Association for Computational Linguistics, Baltimore, Maryland (2014). URL http://www.aclweb.org/anthology/P14-2070
[40] Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: Liwc and computerized text analysis methods. Journal of Language and Social Psychology 29(1), 24–54 (2010). DOI 10.1177/0261927X09351676. URL https://doi.org/10.1177/0261927X09351676
[41] V. N. Vapnik, A.Y.L.: Recognition of patterns with help of generalized portraits. pp. 774–780 (1963)
[42] W., P.J., L., B.R., K., J., Blackburn, K.: The development and psychometric properties of liwc2015 (2015)
[43] Zirikly, A., Kumar, V., Resnik, P.: The gw/umd clpsych 2016 shared task system. In: Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, pp. 166–170. Association for Computational Linguistics (2016). DOI 10.18653/v1/W16-0321. URL http://www.aclweb.org/anthology/W16-0321