\setcode

utf8

\lefttitle

LaTeX Supplement \righttitleNatural Language Engineering

\papertitle

Article

\jnlPage

100 \jnlDoiYr2019 \doival10.1017/xxxxx

{authgrp}

\history

(Received xx xxx xxx; revised xx xxx xxx; accepted xx xxx xxx)

Emojis as Anchors to Detect
Arabic Offensive Language and Hate Speech

Hamdy Mubarak^1,* Sabit Hassan² and Shammur Absar Chowdhury¹ [email protected] ¹Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
²School of Computing and Information,University of Pittsburgh, PA, USA
^*Corresponding Author. Email:

Abstract

We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets – analysing key cultural differences. We observed a constant usage of these emojis to represent offensiveness throughout different timespans on Twitter. We manually annotate and publicly release the largest Arabic dataset for offensive, fine-grained hate speech, vulgar and violence content. Furthermore, we benchmark the dataset for detecting offensiveness and hate speech using different transformer architectures and perform in-depth linguistic analysis. We evaluate our models on external datasets – a Twitter dataset collected using a completely different method, and a multi-platform dataset containing comments from Twitter, YouTube and Facebook, for assessing generalization capability. Competitive results on these datasets suggest that the data collected using our method captures universal characteristics of offensive language. Our findings also highlight the common words used in offensive communications, common targets for hate speech, specific patterns in violence tweets; and pinpoint common classification errors that can be attributed to limitations of NLP models. We observe that even state-of-the-art transformer models may fail to take into account culture, background and context or understand nuances present in real-world data such as sarcasm.

1 Introduction

Disclaimer: Due to the nature of this work, some examples contain offensiveness, hate speech and profanity. This doesn’t reflect authors’ opinions by any mean. We hope this work can help in detecting and preventing spread of such harmful content.

Social media platforms provide a medium for individuals or a group to connect with the world and share their opinion (Intapong et al., (2017)), often with limited inhibitions. Taking the advantage of such liberation, social media users may use vulgar, pornographic, or hateful language. Such behaviour can result in the spread of verbal hostility and can impact users’ psychological well-being (Gülaçtı, (2010); Waldron, (2012)).

In recent years, Twitter has become highly popular in the Arab region (Abdelali et al., 2021b ), with more than 27 million tweets per day (Alshehri et al., (2018)). Many look to Twitter to express their views, ideas and share their stories. While the importance of sharing these views and ideas is immense, the aforementioned societal problem of sharing malicious content also arise.

With the sheer volume of the content on social media, manually filtering out malicious content while maintaining users’ right to the freedom of expression, is virtually impossible for the platform providers. The increased risk and effect of such hostility presence in social media has attracted many multidisciplinary researchers and motivated the need to automatically detect offensiveness of posts/comments and utilize the system to: (i) filter adult content (Cheng et al., (2014); Mubarak et al., (2021)); (ii) quantify the intensity of polarization (Belcastro et al., (2020); Conover et al., (2011)); (iii) classify trolls and propaganda accounts (Alhazbi, (2020); Darwish et al., (2017); Dimitrov et al., 2021a ; Dimitrov et al., 2021b ); and (iv) identify hate speech and conflicts (Kiela et al., (2020); Davidson et al., (2017); Ousidhoum et al., (2019); Chung et al., (2019)).

Machine learning based automation approaches require labeled datasets to distinguish malicious content from others. Constructing datasets, however, is not a straightforward process. In a randomly sampled collection of tweets, the percentage of malicious tweets is very small. Mubarak et al., (2017) report that only 1-2% of Arabic tweets are abusive. Percentage of hate speech is even smaller. This implies that to obtain a sizable dataset of malicious content, a massive number of tweets need to be labeled. To avoid this, existing approaches utilize heuristics to increase the percentage of malicious tweets before manually labeling them. Such heuristics include searching for specific language-dependent keywords or patterns.

In this paper, we present an automated emoji-based approach of collecting tweets that have much higher percentage of malicious content, without having any language dependency. Emojis have quickly become an important part of our daily communication. They are used worldwide for conveying messages without any language or online platform barriers ; and has been referred to as an universal language (Mei, (2019); Dürscheid and Siever, (2017)). Hence, the extralinguistic information carried by the emojis can be instrumental in capturing offensive content, without being impacted by the lack of language-dependent knowledge, or any preferred linguistic patterns. Using emojis also resolves challenges posed by non-standard spellings of offensive words.

To this end, we use emojis as anchors to collect tweets and manually annotate them for fine-grained abusive/offensive language categories – including hate speech, vulgar, and violence tweet contents. We exploit the collected dataset for studying: (i) emoji usage in a different period of time; (ii) extraction of offensive words with different dialectal spellings and morphological variations; (iii) hate speech targets; (iv) linguistic content in vulgar and profanity words; and (v) exploring common patterns present in violent tweets.

Our comparative analysis shows that by using emojis as anchors, we end up with 35% offensive (OFF) tweets and 11% hate speech (HS) tweets, which is almost as double as the approach used by Mubarak et al., (2017). Our experiments with various machine learning and deep learning classifiers show that the classifiers are able to learn effectively from the data. Further, we show that these classifiers are capable of generalizing well on two external datasets: i) the OffensEval 2020 (SemEval, task 12) dataset (Zampieri et al., (2020)) for Arabic offensive language identification, and ii) the MPOLD dataset (Chowdhury et al., 2020b ) – a multi-platform dataset containing news comments annotated for offensiveness from Twitter, YouTube and Facebook. This suggests our method captures universal characteristics of offensive language on social media and can be used to collect larger datasets with less effort and linguistic-knowledge support.

Therefore, the main contributions of the paper are:

•

We present an emoji-based method to collect offensive and hate speech tweets. We show that using the method, we can collect higher percentage of offensive and hate speech content compared to the existing methods. Moreover, we demonstrate that using this method we can collect large number of offensive contents in other languages such as Bengali.
•

We manually label the largest dataset for offensiveness, along with fine-grained hate speech types, vulgarity (profanity) and violence. This labeled data will be publicly released for the community.¹¹1Data can be downloaded from this URL: to be added
•

We perform in-depth analysis of different properties of the dataset including common offensive words and hate speech targets. We show also that there are some culture differences in using emojis in Arabic offensive tweets compared to English.
•

We build effective machine learning and deep learning based classifiers to automatically tag offensive and hate speech with high accuracy. We show that our model perform well on external datasets collected from Twitter and also demonstrate the potential generalisation capability of the designed model in different social media platforms – such as YouTube and Facebook.
•

We analyze common classification errors and provide some recommendations to enhance model explainability.

While our main focus, in this paper, is to create Arabic offensive dataset, this method can easily be adapted for other languages and potentially other text classification tasks such as sentiment analysis or emotion detection.

The paper is structured as follows. In section 2, we discuss related work, focusing on the methods used by other researchers to collect datasets for offensive language. In section 3, we describe our data collection and annotation jobs. Section 4 contains extensive analysis of the dataset. The analysis includes study of offensive emojis, how emoji usage changes in a different time period. We further analyze offensive words and hate speech targets found in our dataset. In Section 5, we train a set of machine learning and deep learning models for classification of offensive language and hate speech. We manually analyze errors made by our models and also use LIME explainability tool to interpret decisions made by our models. In this section we also show that models trained on our data achieves good results on other datasets. Section 6 discusses ethics and social impact of our work. Finally, in Section 7, we present summary of our findings.

2 Related Work

There has been a large amount of research in recent years to address and detect the growing use of ‘offensive languages’ and hate speech in different social platforms (see Nakov et al., (2021) and Salminen et al., (2020) for more details.). Detecting offensive language has been the focus of many shared tasks such as OffensEval 2020 (Zampieri et al., (2020)) for five languages and OSACT 2020 (Mubarak et al., 2020b ) for Arabic. The best systems at the shared tasks (Alami et al., (2020), Hassan et al., (2020)) utilized Support Vector Machines and fine-tuned transformer models. Hate speech has been less explored compared to offensive language. A dataset consisting of 5% hate speech was presented at OSACT 2020 shared task. The best system performed extensive preprocessing including normalizing emojis (translate their English description to Arabic) and dialectal Arabic (DA) to Modern Standard Arabic (MSA) conversion among others (Husain, (2020)). ASAD (Hassan et al., 2021a is an online tool that utilizes the shared task datasets for offensiveness and hate speech detection in tweets along with other social media analysis components such as emotion (Hassan et al., 2021b ) and spam detection (Mubarak et al., 2020a ).

Table 1: Available Arabic datasets along with the annotation labels, data source and collection method and percentage of offensive content. Sources: TW (Twitter), Aljazeera (AJ), FB (Facebook), YT (YouTube).

Dataset	Source	Method	Size	Language	OFF%	Annotations (Values (1: Yes, 0: No))
Mubarak et al., (2017)	TW	Controversial users	1,100	Egyptian	59%	OFF (1/0), Vulgar (1/0)
	AJ	Deleted comments	31,692	MSA	82%	OFF (1/0), Vulgar (1/0)
Albadi et al., (2018)	TW	Keywords	6,030	MSA/DA	45%	Religious HS (1/0)
Alshaalan and Al-Khalifa, (2020)	TW	Keywords	9,316	Saudi	28%	HS (1/0)
Mubarak et al., 2020b	TW	Pattern	10,000	MSA/DA	19%	OFF (1/0), HS (1/0)
Chowdhury et al., 2020b	TW, FB, YT	User replies to AJ	4,000	MSA/DA	17%	OFF (1/0), HS (1/0), Vulgar (1/0)
Our Method	TW	Emojis	12,698	MSA/DA	35%	OFF (1/0), Vulgar (1/0), Violence (1/0)
						HS (gender, race, ideology, social class,
						religion, disability)

To collect potentially offensive tweets, some studies use a list of seed offensive words and hashtags (e.g. Mubarak et al., (2017)). This approach has several drawbacks: (i) it is hard to maintain this list as offensive words are ever-evolving; (ii) Arabic dialects are widely used on social media and building such a list for different dialects is a challenging task that requires deep knowledge about cultures in many countries; (iii) Arabic dialects have no standard writings. So, it is extremely difficult to list all possible surface forms and creative spellings for words in general including offensive words. Moreover, (iv) Arabic has a rich morphology (both derivational and inflectional), and a large number of affixes can be attached to words - e.g., \<وسيفعلها> (“wsyfElhA” – “and he will do it”)²²2We provide Arabic examples and their Buckwalter transliteration and English translation throughout this paper. - which makes string match for offensive words less optimal. Finally, (v) the offensiveness of many words is highly dependent on context, e.g., the word \<كلب> (“klb” – “dog”) can be used in offensive contexts such as \<هو كلب> (“hw klb” – “He is a dog”), and in clean contexts such as \<عندي كلب> (“Endy klb” – “I have a dog”) .

Authors in Mubarak et al., 2020c showed that offensive language exists in less than 2% of any random collection of tweets, and by considering a common pattern used in offensive communications, this ratio increases to 19% (5% are hate speech). This pattern is \<يا .. يا> (“yA .. yA” – “O .. O ..”) which is used mainly to direct the speech to a person or a group. This pattern is used across all dialects without any preference to topics or genres, however, it cannot be generalized to other languages.

In the same research, authors observed that the most frequent personal attack on Arabic Twitter is to call a person an animal name (i.e. direct name calling), and the most used animals are \<كلب> (“klb” – “dog”), \<حمار> (“HmAr” – “donkey”), and \<بهيم> (“bhym” – “beast”) among others. In Chowdhury et al., 2020b , authors analyzed Arabic offensive language on Facebook, YouTube and Twitter, and they showed that some emojis are widely used in offensive communications including some animals (dog, pig, monkey, cow, etc.), some face emojis (anger, disgust, etc.) and others (shoe, etc.)

Unlike the aforementioned studies, we propose a generic method to collect offensive tweets regardless of their topics, genres or dialects. We applied it to Arabic Twitter and obtained the largest dataset of tweets labeled for offensiveness (approx. 13K tweets) with 35% of the tweets labeled as offensive and 11% as hate speech. In this method, we simply use a list of emojis that commonly appear in offensive communications including emojis that express anger, some animals and inanimate things extracted from existing datasets of offensive tweets. We believe with small modifications to this list, considering cultural differences, the approach can be used for other languages as well to collect a large number of offensive tweets. Comparison with available Arabic datasets for offensive language and hate speech is shown in Table 1.

Refer to caption — Figure 1: Emojis used in Wiegand and Ruppenhofer, (2021)

It is worth noting that authors in Wiegand and Ruppenhofer, (2021) used emojis to detect English abusive tweets. They used 8 emojis based on correlations between concepts and abusive language as reported in the literature. These emojis (Figure 1) indicate violence and death, anger and disgust, dehumanization, and disrespect. Authors mentioned that middle finger is the strongest emoji as it is universally regarded as a deeply offensive gesture, and they used it to collect distantly-labeled training data. We will later highlight some differences between usage of offensive emojis in Arabic and English.

3 Data

3.1 Data Collection

We extracted common emojis that appear mostly in offensive communications from shared datasets in Zampieri et al., (2020) and Chowdhury et al., 2020b , and obtained their emoji variations from https://emojipedia.org/. These emojis include some animals and symbols used for dehumanization and expressing disrespect, anger or disgust. The complete list of emojis used for the data collection and their categories is given in Figure 2.

From a collection of 4.4M Arabic tweets between June 2016 and November 2017, we extracted all tweets having any of the aforementioned emojis. After removing duplicates, near duplicates and very short tweets, we ended up with 12,698 tweets. We show later that tweets collected in another time period (in March 2021) still have high percentages of offensiveness and hate speech.

3.2 Annotation and Quality Control

We created an annotation job on Appen crowdsourcing platform to judge whether a tweet is offensive or not (Job1) and we invited annotators from all Arab countries.³³3We described offensive language to annotators as any rude and socially unaccepted language used by some users to offend a person or a group. This includes also vulgar and swear words. Quality was assured using 200 questions that we manually obtained gold labels for (hidden test questions). Annotators should pass 80% of them to continue.⁴⁴4We paid $15 per hour of work to conform to the minimum wage rate in the US. Each tweet was judged by 3 annotators and more than 190 annotators contributed to this job. Such a large number is needed for a subjective task like judging tweet offensiveness. Inter-Annotator Agreement (IAA) using Cohen’s kappa ( $\kappa$ ) value was 0.82, which indicates high quality annotations (Landis and Koch, (1977)).

Annotators fully agree (3 out of 3) in 65% of the tweets. For better quality, we hired and trained an expert linguist who is familiar with different dialects to carefully check all tweets having different opinions. This resulted in changing 18% of the labels (w.r.t to the previously assigned majority label) for the tweets.

3.3 Annotation of Hate Speech, Profanity and Violence

We created another job on Appen with all the annotated offensive tweets from Job1 and asked annotators to detect the presence of hate speech. We defined hate speech as any content that contains offensive language targeting group of people based on common characteristics such as: race/ethnicity/nationality, religion/belief, ideology, disability/disease, social class, and gender. We used job settings similar to Job1. Additionally, we asked annotators to judge whether the offensive tweets have profanity or vulgar words, and whether they promote violence. Basic statistics and examples from different annotations are listed in Table 2.

Table 2: Statistics and examples from the annotated corpus (Total of 12,698 tweets)

Class/Subclass	#	%	Example
Clean	8,235	65	\<لن تحــصــل علــى غــدٍ افــضل مادمــت تفــكر بالامــس>
(CLN)			You won’t have a better tomorrow as long as you think about yesterday.
Offensive	4,463	35	\<يلعن ابوك على هالسؤال. عساه ينقرض الكريه >
(OFF)			May God curse your father for this question! I hope this fool will die out!
* Hate Speech	1,339	11	(Note: 30% of Offensive tweets are labeled as Hate Speech)
- Gender	641	48	\<بنات اليوم قليلات أدب. والله ما نوصل لعهر بعض الرجال>
			Girls today are impolite. I swear to God, we don’t reach for some men immorality.
- Race	366	27	\<شعبكم متخلف. الله ياخذك إنتي والفلبين>
			People of your country are musty. May God take (kill) you and the Philippines.
- Ideology	190	14	\<ناديك وضيع لا شك في ذلك. حزبك لا يقدر إلا على النباح>
			You club is vile, no doubt about that. Your party cannot do anything except barking.
- Social Class	101	8	\<دامك مقيم انكتم وخل اهل الأرض الاصليين يتكلمون. ابلع يا سباك!>
			As you are a resident, shut up and let original citizens speak. Swallow, plumber!
- Religion	38	3	\<إنتوا بتعملوا ف ديك أبونا كده ليه هو إحنا كفرة ولايهود>
			Why are you doing this to us? Are we disbelievers or Jews?
- Disability	3	0	\<ذا القزم طلعت جايزتن له بس ماعرف يعبر>
			This dwarf got two prizes, but he does not know how to express.
* Vulgar	189	1.5	\<كاتبة ع حسابك قحبة بنت قحبة و بتتعجبي انهم بيبعثو صور فاضحة>
			\<* قحبة: امرأة فاجرة فاسِدة تمارس البِغاء (معجم اللغة العربية المعاصر)>
			You describe yourself as a prostitute, and you wonder why they’re sending porn photos!
* Violence	85	0.7	\<صفعه بهذا الحذاء حجم راسك على قرعتك. ييييععع ودي اطعنها>
			Slap by this shoe on your bald head. Yucky! I’d like to stab her.

4 Analysis

We obtained a total of 582 different emojis in the final annotated data. Out of those, 122 emojis appear more than 20 times. In this section, we first report some observations seen in our data, followed by in-depth analysis of emoji in a different time period, linguistic characteristics associated with emojis along with targets of hate speech in these tweets.

4.1 Common Offensive Emojis

We report our observations about common emojis used to represent offensiveness in below:

•

There are additional emojis, unique to our initial seed list, frequently used in the annotated offensive tweets. This class of emojis (see “New emojis” row in Figure 3) includes instances like spit/drops (65% of tweets having it was labeled as offensive) to hammer (18% offensive). This suggests a potential for enriching the initial emoji seed list and expanding the dataset in an iterative way.
•

Top offensive emojis are shown in Figure 3 (mostly animals). If a tweet has any of them, most likely it is labeled as offensive (pig (84% offensive) to middle finger (62%)). Similar emojis appear in hate speech tweets with percentages range from 36% to 18% respectively.
•

Top vulgar emojis are mostly used in tweets having adult content (e.g. adult ads).
•

For violence category, the most common emojis are: knife, punch and shoe in order.

4.2 Emojis Usage In A Different Time Period

We collected tweets from a different time period (in March 2021) to study the usage pattern of offensive emojis over time. For this, we selected six emojis from Figure 3 and their corresponding words to extract their tweets.⁵⁵5We ignored ”high heel shoe” emoji as it is used dominantly in adult ads, and ”donkey/horse” emoji as it is ambiguous in its usage (as donkey in some cases and horse in others). “Donkey” emoji is not found in https://emojipedia.org For each emoji and its corresponding word, we then randomly selected 200 tweets and asked an Arabic native speaker to judge the content for offensiveness.⁶⁶6Some words appear less than 200 times in this collection.

Our study suggests that some emojis (see Figure 4) and their corresponding words are widely used in offensive tweets during different periods of time. While middle finger was reported as the strongest offensive emoji in English tweets (Wiegand and Ruppenhofer, (2021)), we found that there are more common offensive emojis appear in our dataset of Arabic tweets such as pig (84%), shoe (77%) and dog (68%) (first in “Top offensive emojis” row in Figure 4). Tweets having middle finger emoji were tagged as offensive in 62% of the cases. We found annotation errors due to short context in some cases, but some users use middle finger mistakenly instead of index finger⁷⁷7We anticipate middle finger emoji is not widely understood as offensive by all Arab users. as shown in Figure 5.

4.3 Emojis and Linguistic Usage

As observed in Donato and Paggio, (2017), emojis are often redundant and convey something already expressed verbally in tweets. Thus, offensive emojis may co-occur with many offensive words. As dialectal Arabic (DA) is widely used on Twitter, it’s very hard to list all variations of offensive dialectal words. Figure 6 shows an example where offensive emojis co-occurred with offensive dialectal words which confirms the aforementioned observation. It also shows how emojis are used to cover morphological variations such as number and gender.

4.4 Offensive Words

To partially solve spelling mistakes in the collected tweets, we normalized some letters which are commonly used interchangeably by mistake, namely letters \<أإآ،ة،ى> to letters \<ا،ه،ي> in order. We calculated valence score for normalized words in offensive and clean tweets as described in Conover et al., (2011). The score helps determine the distinctiveness of a given word in a specific class in reference to other classes. Given $N(t,off)$ and $N(t,cln)$ , which are the frequency of the term $t$ in offensive tweets ( $off$ ) and clean tweets ( $cln$ ) in order, valence score is computed as:

\large V(t)=2*\frac{\frac{N(t,off)}{N(off)}}{\frac{N(t,off)}{N(off)}+\frac{N(t,cln)}{N(cln)}}-1

(1)

where $N(off)$ and $N(cln)$ are the total number of occurrences of all words in offensive ( $off$ ) and clean ( $cln$ ) tweets respectively.

Figure 7 shows top words with the highest valance score in offensive tweets.⁸⁸8Condition: Valence score ¿= 0.8 and frequency ¿= 5. In addition to some animal words (e.g. dog, sheep), the list contains many dialectal words that are widely used in offensive communications such as \<زق، خرا، زباله> (“zq, xrA, zbAlh” - shit, garbage) and some adult ads words such as \<سكس، ديوث> (“sks, dywv” - sex, cuckold). This proves the efficiency of starting with offensive emojis to collect related offensive words without any preference to dialect or genre. All lists of extracted words with their valence scores will be made publicly available.

For intrinsic evaluation, we manually revised these top offensive words, and we found that 71% of them appear dominantly in offensive communications and can be considered as correct offensive words. However, the remaining 29% are not necessarily offensive and can appear in many clean contexts. This includes named-entities such as \<إيران، بوتين، الحوثي، بشار> (“AyrAn, bwtyn, AlHwvy, b$Ar” - Iran, Putin, Houthi, Bashar) in addition to some words that were concentrated by chance in our offensive tweets. Examples of such words are: \<ذل، بلدك، ذكور> (“*l, bldk, *kwr” - humiliation, your country, males).

4.5 Hate Speech Targets

We used the aforementioned valence score formula to extract common words appear in all hate speech tweets. Then, we manually filtered and grouped them to extract common targets for hate speech. These targets include women, countries, groups and political parties as listed in Table 3.

Table 3: Common targets in hate speech tweets

Target	Words	Example
Women	\<الحريم، البنات>	\<الله يسامح اللي سمح للحريم يدخلون الملعب. منظر مقرف!>
	women, girls	May God forgive those who allowed women to enter the stadium. Disgust!
Iran	\<إيران، الفرس، المجوس>	\<كل زق أنت وإيران المجوسية>
	Iran, Persians, Magi	Eat shit, you and the Magian Iran.
Israel	\<إسرائيل، الصهاينة>	\<اللعنة عليك وعلى إسرائيل>
	Israel, Zionists	May God curse you and Israel!
Jews	\<اليهود>	\<أنتم تقتلون الأبرياء وتتركون اليهود يا خونة>
	Jews	You kill innocent people and leave Jews. O traitors!
Houthi	\<الحوثي، الحوثيين>	\<دمرتم كل جميل في وطني. الحوثي عدو الانسانية>
(group)	Houthi, Houthis	You destroyed all the beauties in my country. Houthi is the enemy of humanity.
Saudi Arabia	\<السعودية، السعوديين>	\<السعودية من الاحتضار الديني الى حضارة العري>
	KSA, Saudis	Saudi Arabia from religious agony to a civilization of nudity
Muslim Brotherhood	\<الإخوان>	\<هادو المرتزقة من أتباع حماس والإخوان>
(group, party)	Brotherhood	These mercenaries are followers of Hamas and the Brotherhood
Qatar	\<قطر، القطريين>	\<اعلام دويلة قطر تنبح ليل نهار>
	Qatar, Qataris	Media in the tiny country of Qatar barks day and night
USA	\<أمريكا، الأمريكان>	\<خرا عليك وعلى روسيا وأمريكا وخود الخليج بمعيتك>
	America, Americans	Shit on you and on Russia and the US in addition to the Gulf.
Turkey	\<تركيا، الأتراك، الترك، العثمانيين>	\<أنت يا تركي كذاب مثل أجدادك>
	Turkey, Turks, Ottomans	You, Turkish, are a liar like your ancestors.

4.5.1 Religious Hate Speech Targets

Detecting religious hate speech in Arabic was the main focus of Albadi et al., (2018). Authors mentioned that Arabic is the official language in six of the eleven countries with the highest Social Hostilities Index.⁹⁹9SHI is measure of crimes partially motivated by religion. They found that 33% of all hateful tweets targeted against Jews followed by Shia and Christians. We analyzed hateful tweets in our dataset and we found similar results where Jews are the main target of religious prejudice with 39% of all hateful tweets followed by Muslims, Christians and Shia. Figure 8 shows distribution of hate speech tweets for different religious groups.

4.6 Vulgar and Profanity Words

We observe that vulgar words include genitals, sexual actions and references in addition to words are used in adult ads. They are mainly written in DA and the most common words are shown in Figure 7.

4.7 Violence Tweets

As violence tweets are rare (0.7% in our collection), we manually analyzed them all to extract common patterns. These patterns are summarized in Table 4. First, we define <head> for head and neck, and <body> for body parts including <head> that can be used as objects for many violence verbs, ex: I will stab [him/his neck]. For future work, we plan to extend this list using synonyms and related words from dictionaries/BERT models. Then, we extract tweets having any of these verbs and targets after applying some morphological expansions.

Table 4: Common patterns in violence tweets

Pattern. S:Sbj, V:Verb, O:Obj	Description/Verbs
<head>	\<رأس، دماغ، وجه، رقبة>
	head, brain, face, neck
<body>	\<رأس، دماغ، وجه، رقبة، عين، خشم، أسنان، بطن>
	head, brain, face, neck, eye, nose, teeth, stomach
S V O, O: <human>	\<يقتل، يذبح، يدبح>
(ex: I will kill you)	kill, slaughter
S V O, O: <human>/<body>	\<يدوس، يدعس، يجلد، يطعن، يضرب، يكسر>
(ex: I will hit your head)	trample, whip, stab, hit, break
S V O, O: <head>	\<يقطع، يفتح، يطير>
(ex: I will cut your neck)	cut
<hit> on <body>	\<كف، جزمة، صفعة: على>
(ex: A slap on your face)	shoe, slap

4.8 Is Emojis A Universal Anchor?

Typical approach to crawl data from social media, specially from Twitter, is heavily dependent upon task-specific keywords. Preparing such keyword-specific anchors needs a significant knowledge on the language usage. However, using emojis to collect the data removes the initial language dependency and inherited bias from the keyword-search space.

In this Section, we study the efficacy of emojis to collect offensive tweets for other languages. We utilize the proposed pipelines for Bengali (a low-resource language)¹⁰¹⁰10Bengali is the lingua franca of Bengalidesh and parts of India. There are 228 million native speakers, making it the fifth most-spoken native language in the world. Source: https://en.wikipedia.org/wiki/ $Bengali_{l}anguage$ and analysed the percentage of offensive tweets obtained using the method. Moreover, we also include analysis on English tweets to indicate how offensive communication and emoji usage are still culture-dependent, if not language.

For Bengali, we collected 200 tweets in August 2021 for five emojis (a total of 1K tweets) from the different groups listed in Figure 2. From these 200, we randomly selected 50 tweets per emoji (a total of 250 tweets) and a Bengali native speaker annotated them for offensiveness. We noticed that $\approx 34\%$ of all the annotated tweets are offensive. We observed that tweets with ‘middle finger’ had most frequent ( $50\%$ of tweets) number of offensive content, followed by ‘pile of poo’ emoji (32% of tweets). The detailed distribution is in Figure 10 with examples in Figure 11. We plan to use this method to collect and annotate offensive tweets for other low-resource languages such as Hindi,¹¹¹¹11Pilot annotation of Hindi tweets shows similar trends (52% offensive for ‘middle finger’, and 40% offensive for ‘angry face’.) in addition to Bengali. We keep this as a potential future work.

As usage of emojis for offensive content in English is well-studied, we only collected English tweets having the top Arabic offensive emojis (pig and shoe) in April 2021. We collected more than 8,000 tweets per each, and we randomly selected and annotated 200 tweets for each emoji. We noticed that less than 5% of those tweets are offensive. Often, pig emoji appears in contexts about food while shoe emoji appears in tweets talking about fashion and clothes. Examples are shown in Figure 9.

This suggests that while some emojis are widely used in offensive communications in a particular culture, they might be used completely in different ways in other languages or cultures. Thus, it’s important to keep these differences in mind when collecting data using emojis from different cultures.

5 Experiments and Results

In this Section, we describe the details of machine learning algorithms - SVM and transformers, used to test the performance of the dataset followed by the experimental setup. We then present out classification results for in-domain model testing along in-detailed classification error analysis and model prediction explanability. Furthermore, we evaluated the model’s generalization capability across different datasets and various social media platforms.

Table 5: Distribution of offensive and hate speech data

	Train	Dev	Test	Total
OFF	3,172	404	887	4,463
NOT_OFF	5,716	865	1,654	8,235
HS	959	109	271	1,339
NOT_HS	7,929	1,160	2,270	11,359
Total	8,888	1,269	2,541	12,698

5.1 Classification Models

In order to design the automated classification system, we fine-tuned the state-of-the-art monolingual (Arabic) pretrained transformer architectures – AraBERT (Antoun et al., (2020)) and QARiB (Abdelali et al., 2021a ). Furthermore, we compared the performance of the monolingual transformers with two multilingual BERT: mBERT (Devlin et al., (2019)) and XLM-RoBERTa (Conneau et al., (2019)). As a baseline, we use a SVM (Platt, (1998)) classifier, due to its reputation as an universal learner and for its ability to learn independently of the dimensionality of the feature space and the class distribution of the training dataset.

Support Vector Machines (SVM)

We used word n-gram and character n-gram vectors weighted by term frequency-inverse document frequency (tf-idf) as features to SVM. We experimented with character n-grams ( $n=[2,5]$ ) and word n-gram ( $n=[1,3]$ ) individually and also combination of them using the linear kernal with default parameters of SVM classifier ( $C=1.0$ ).

Transformer models

We fine-tuned the mono- and multilingual transformer models on our training data. For the monolingual task, both the transformers – AraBERT and QARiB – are of identical architecture with variation in the pretraining data. AraBERT is trained on Arabic Wikipedia, whereas QARiB is trained on Twitter data and Arabic Gigaword, thus encapsulating formal and informal writing styles in each transformers.

As for the multilingual models, we fine-tuned mBERT, trained on Wikipedia articles for 104 languages including Arabic using the case sensitive base model. In addition, we also fine-tuned XLM-RoBERTa – trained on a large dataset with 100 languages using cleaned CommonCrawl data.

5.2 Experimental Data and Setup

Data Split

We trained the classifiers using 70% of the data, and validated with 10%. We used the rest (20% of the data) for testing the system performance. Detailed data distribution of offensive and hate speech tweets are given in Table 5.

Evaluation Measures

We evaluated the dataset using macro-averaged precision, recall and F1-measure, in addition to the accuracy.

Offensive/Hate Speech Classifiers

For the respective downstream tasks, we fine-tuned the aforementioned BERT models by adding a dense layer that outputs class probabilities. We use a learning rate of $8e-5$ with a batch size of $64$ and $3$ epochs. For the fine-tuning, we restricted the maximum input length to 47 tokens ( $99^{th}$ percentile in training dataset), with no extra preprocessing of the data.

5.3 Results

From the reported results, presented in Table 6 and Table 7, we observe the monolingual models significantly outperforms the multilingual models. This observation is in-aligned with previous studies (Polignano et al., (2019); Chowdhury et al., 2020a ).

Table 6: Macro-averaged (P)recision, (R)ecall and F1 for Offensive language classification. * represent results obtained with different learning rate (2e-5) instead of 8e-5. C: Character n-grams W: Words n-grams.

Classifier	Acc%	P	R	F1
SVM (C)	78.32	76.75	74.06	74.99
SVM (W)	72.14	69.82	70.81	70.15
SVM (C+W)	78.16	76.18	74.75	75.33
mBERT	76.43	74.09	73.32	73.66
XLM-RoBERTa^∗	75.00	72.50	72.47	72.48
AraBERT	82.09	80.50	79.63	80.02
QARiB	84.02	82.53	82.11	82.31

Table 7: Macro-averaged (P)recision, (R)ecall and F1 for hate speech classification. * represent results obtained with different learning rate (2e-5) instead of 8e-5. C: Character n-grams W: Words n-grams.

Classifier	Acc%	P	R	F1
SVM (C)	92.52	84.71	71.12	75.76
SVM (W)	90.08	73.90	70.73	72.15
SVM (C+W)	92.37	82.94	72.17	76.16
mBERT	91.26	77.55	73.34	75.20
XLM-RoBERTa^∗	92.29	79.96	78.79	79.36
AraBERT	92.64	81.04	79.31	80.14
QARiB	92.99	82.99	77.72	80.04

Comparing the monolingual models, for offensive classification, we observe that mixed trained model – QARiB, outperforms AraBERT by a margin of 2.3%. Contrary to that, in hate classification, we noticed AraBERT - trained with formal text – performs the best. However, the performance difference is very small ( $0.1\%$ ) and can be attributed to the randomness mentioned in Devlin et al., (2019).

Error Analysis

We analyzed classification errors of our best classifier for offensive language detection. Common cases are listed in Table 8. We found similar errors in hate speech detection, and the main difference is that the target of the offense is a group of people as opposed to individuals.

Table 8: Common error types in offense detection. Classes: FP (False Positives) and FN (False Negatives)

CL	Error Type	Example
FP	Annotation error	\<الله يفشلهن. ودك تتفل بوجهه>
		May God fail them. You wish to spit on his face
	Non-human target	\<هاشتاغ غبي. وع مره اللون مقرف>
		Stupid hashtag. Yucky! This color is very disgusting
	Unclear context	\<احييك عالفكرة بس اذا كان وجهك>
		I salute you for the idea, but if your face
	Non-targeted offense	\<لا بارك الله للعدو. اللهم لا تؤاخذنا بما فعل السفهاء منا>
	(e.g., proverbs/idioms)	Oh God, do not bless the enemy (prayer)
	Animal not offensive	\<اتوقع رسم قصة العنز وعياله. الكلاب كلوا القطوة>
		Expect to draw the story of the goat.. Dogs ate the cat
	Sarcasm	\<انا لكم الخت والام اي حركه ا اذبحكم ههه>
		I’m your sister and mother. I will kill you haha
FN	Culture/Background	\<هذا مقدارك. اليوم زي وجهك>
		“It’s your destiny”. Today looks like your face (ugly)
	OOV/Unseen words	\<اتكتمي يابت منك ليها. خرا فيج زين>
		Shut up you and she. Shit on you!
	Need understanding	\<الكدب ليس سياسة تحريرية ولكن إسلوب حياة>
		Lying is not editorial policy but a way of life
	Implicit insult	\<حضرة الدكتور فلاح. شوف مين الي بتتكلم>
		The doctor is a villager (uncivilized). Look who talks!
	Informal writings	\<ما كف بعطيككك بوكسس. إنت مرت زقة>
		I will givve youu a blowww. You’re a merce nary
	Offensive emoji only	\<مش عاوز كلام كتير بس طرش له ده (الإصبع الوسطى) وبيسكت>
		Just send (middle finger) to him and he will be silent
	Negation	\<معد فيه حياء في البنت. قلة أدب>
		The girl no longer has shyness. Incivility

From the Table 8, we noticed most of the confusion occurs due to lack of context in the input, implicit offensive instances, misunderstanding of grammatical construct like negation, bias towards some animals and the presence of sarcasm. Moreover, we also noticed some classification errors due to human annotation errors – which is common given the complexity of the annotation task itself and its dependence of individual perspective.

Model Prediction Explanability

We use Local Interpretable Model-Agnostic Explanations (LIME) (Ribeiro et al., (2016)) to interpret our best classifier. LIME is a model-agnostic method that perturbs the input text and gives weight to words based on how the model’s predictions change.

We analyzed a sample of false-positive and false-negative examples from output of LIME and our best classifier. We found many confusion in model decision while considering the words as offensive or clean. We summarize common errors in the examples shown in Figure 12 and recommend the followings for better explainability results: (i) Ignore Twitter-specific symbols and symbols used for formatting (e.g. RT (Retweet), user mentions, links, <LF> (newline)), e.g., replace mentions with @USER, links with URL, <LF> with space. (ii) As emojis are not recognized by LIME, replace them with their meanings, e.g., middle_finger. (iii) Remove repeated letters that are commonly found in tweets and used for emphasis, e.g., convert \<خلللاص> (“xlllAS” - “enooough”) to \<خلاص> (“xlAS” - “enough”). (iv) We noticed confusion in interpreting words’ contribution towards classifying (non-)offensiveness. However, such confusion is understandable given a limited resource that is used to learn discriminating linguistic information. In the future, we plan to explore use of more balanced large data, collected from diverse sources. It’s worth mentioning that those errors are not necessarily LIME errors as we have errors due to model predictions (F1 is 82.31).

5.4 Generalization Capability of Models

Our proposed method of collecting an offensive language dataset relies on emojis. While this method yields higher percentage of offensive content compared to existing methods, it’s crucial that models trained on our dataset perform well not only on our test set but also on external datasets. Achieving such generalization can help us to assert with confidence that our method captures characteristics of offensive language that are universal, not specific to our dataset only.

In addition, assessing the generalization of the model to other social media platforms such as Facebook and YouTube, can indicate the universal usage of emoji and also effectiveness of harnessing the extralinguistic information present in emojis for offensive task.

5.4.1 Generalization to Other Twitter Data

To ascertain the generalization capability to other Twitter data, we evaluate our best performing model (QARiB, fine-tuned on our offensive training data) on the official test set of SemEval 2020 task 12 (OffensEval 2020, Arabic data) Zampieri et al., (2020) data. For comparison, we also fine-tune QARiB on SemEval training data and evaluate on our dataset. Further, we fine-tune QARiB on combination of the two datasets and evaluate on the test sets separately.

The SemEval data contains 8,000 tweets for training and 2,000 tweets for testing. The tweets are manually annotated for offensiveness. To have higher percentage of offensive tweets, only tweets that contained "yA" in text were considered. "yA" is commonly used in Arabic offensive language (Zampieri et al., (2020)). 20% of the data are tagged as offensive.

From Table 9, we can see that QARiB, fine-tuned on our data, achieves F1 score of 85.21 on the SemEval test set. However, when fine-tuned on SemEval data but tested on our data, QARiB achieves F1 score of 72.26. This suggests that models trained on our dataset capture characteristics of SemEval data reasonably well while also having more variety compared to SemEval data. It’s worth mentioning that combining the two datasets yield F1 score of 91.57 on SemEval dataset, which is an increase of 1.40 from the highest ranked system (Alami et al., (2020)) at SemEval that achieved F1 score of 90.17 on the same test set.

Table 9: Performance comparison of QARiB, fine-tuned on our dataset (EMOJI-OFF), SemEval dataset and combination of the two datasets

Train Data	Test Data	Acc%	P	R	F1
EMOJI-OFF	EMOJI-OFF	84.02	82.53	82.11	82.31
EMOJI-OFF	SemEval	89.30	82.40	90.05	85.21
SemEval	EMOJI-OFF	78.55	82.48	70.61	72.26
SemEval	SemEval	94.65	92.61	90.42	91.45
EMOJI-OFF + SemEval	EMOJI-OFF	83.55	81.88	81.98	81.93
EMOJI-OFF + SemEval	SemEval	94.55	91.31	91.84	91.57

5.4.2 Generalization to Multi-Platform Data

Having established generalization capability of our model to external Twitter data, we are interested in assessing how much our model can generalize to other social media platforms such as Facebook and YouTube. This is a challenging task as content on these platforms can differ greatly from each other. For example, Twitter imposes a tweet length limit of 280 characters. There is no such limitation on Facebook or YouTube comments. The audience is also different on the different platforms. Because of these differences, user write in different style across the different platforms. Moreover, as Chowdhury et al., 2020b point out, the percentage of offensive content is much higher on Twitter compared to YouTube or Facebook.

Table 10: Performance on multi-platform data. MPOLD-TW/YT/FB refer to Twitter, YouTube and Facebook portions of MPOLD dataset respectively. The MPOLD-TW/YT/FB columns contain numbers reported in Chowdhury et al., 2020b . The EMOJI-OFF column represents QARIB fine-tuned on our data. All numbers are macro averaged F1.

Test Data	EMOJI-OFF	MPOLD-TW	MPOLD-YT	MPOLD-FB
MPOLD-TW	72.62	-	54	51
MPOLD-YT	68.2	60	-	53
MPOLD-FB	66.95	62	62	-

To assess our model performance on other platform content, we evaluate the MPOLD dataset (Chowdhury et al., 2020b ) using our best model (EMOJI-OFF). The MPOLD dataset is a multi-platform dataset that contains 4,000 Arabic news comments from Twitter (1,624), YouTube (1,592), and Facebook (784). These comments are manually tagged for offensiveness. Out of the 4,000 comments, 675 (16.88%) were tagged as offensive.

We report our cross-domain model performance in Table 10. For better comparison, we present the cross-domain results reported by Chowdhury et al., 2020b . From Table 10, we can see that QARiB, fine-tuned on our data, achieves F1 score of 68.2 and 66.95 on YouTube and Facebook portions of MPOLD dataset respectively. While these numbers are lower than that we saw earlier in Table 9 (as expected), they are still considerably higher than the cross-domain numbers reported in Chowdhury et al., 2020b , which are 0.60 and 0.62 for YouTube and Facebook testsets respectively. This suggests that models trained on our data have the potential of generalizing to content on other platforms.

5.4.3 Generalization to Datasets without Emoji

In order to assess how well our model performs on datasets without emoji, first, we removed all emojis from our dataset and fine-tuned our best model, QARiB. We evaluated this model on our test data and SemEval data (both with emojis removed). We did not notice any significant difference in performance as the model obtained F1 score of 85.03 on SemEval test set and F1 score of 83.05 on our test set, both of which are comparable to the numbers reported in Table 9. In a second experiment, we retained the emojis in our training set, but excluded all tweets from the SemEval test set that contain emojis. Our model achieved F1 score of 83.94 in this setting. This again suggests, models trained on our data are not reliant on emojis and can generalize to datasets without emojis.

6 Ethics and Social Impact

After showcasing the effectiveness of using emojis for data collection and design, here we discuss the social impacts and ethical concern surrounding the newly released data.

User Privacy

The dataset was collected using the Twitter API. For privacy, we anonymised the dataset by removing usernames and other sensitive user identifier. However, in the future, if any concern is raised for a particular content, we will comply to legitimate concerns by removing the affected tweets from the corpus. Furthermore, to keep privacy and datasets integrity, we are providing tweets by text not the corresponding tweet ids.

Biases

Any biases found in the dataset are unintentional, and we do not intend to do harm to any group or individual. The bias in our data, for example towards a particular group is unintentional and is a true representation of the Twitter space during the period of our collection and according to our generic proposed method. Emojis used as seeds to create the collection are extracted automatically from publicly available datasets without any bias in our selection. Collection was obtained over a span of 18 months to have a good diversity of tweets written by many authors.

As for the assigned annotation labels, we follow a well-defined schema and available information to perceive final labels for offensiveness, hate speech and vulgarity. The labels are not a reflection of our opinion. Test questions used for quality control were selected to cover different annotation classes proportionally to their distributions in the collected tweets.

Although more than 190 workers have participated in the annotation process (with at least 80% success ratio for 200 test questions), this doesn’t guarantee perfect quality of annotation. As noted in Mubarak and Darwish, (2014), Arabic Twitter is dominated by tweets from Gulf countries (Saudi Arabia, Kuwait, etc.) while Appen (previously CrowdFlower/Figure8) crowdsourcing platform has $\approx 30\%$ of annotators from Egypt (the most Arabic populated country) (Mubarak and Darwish, (2016)). This may lead to misunderstanding (and incorrect labeling) of portion of tweets especially when they are written in dialects that are not understood by annotators. Moreover, almost three quarters of Arab annotators on Appen are males which may lead to hidden biases in data annotation.

Potential Misuse

We urge the research community to be aware that our dataset can be used to misuse as like any other social media data. If such misuse is noticed, human moderation is encouraged in order to ensure this does not occur.

7 Conclusion

The automation of detecting offensiveness (including hate speech) are limited by the availability of large balanced manually annotated dataset. To overcome such lacking, we propose a generic language-, topic- and genre-independent approach to collect a large percentage of Arabic offensive and hate speech data. We utilize the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We manually annotated the Arabic dataset for different layers of information, including offensiveness, followed by fine-grained hate speech, vulgar and violence content annotation. Furthermore, we benchmark the dataset for detecting offensive content and hate speech using different transformer architectures and performed in-depth analysis for temporal changes, linguistic usage and target audience. We also studied specific patterns in violence tweets. We demonstrate that the transformer model trained on our data achieves strong results on an external Twitter dataset. Further, the model outperforms previous models in literature when tested on multi-platform data that contains offensive comments from Twitter, YouTube and Facebook. This is the largest Arabic dataset for multilevel offensive language detection annotated with fine-grained hate speech, vulgar and violence labels. We publicly release the dataset for the research community.

Competing interests:

The author(s) declare none

References

(1) Abdelali, A., Hassan, S., Mubarak, H., Darwish, K., and Samih, Y. 2021a. Pre-training BERT on arabic tweets: Practical considerations. CoRR, abs/2102.10684.
(2) Abdelali, A., Mubarak, H., Samih, Y., Hassan, S., and Darwish, K. 2021b. QADI: Arabic dialect identification in the wild. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pp. 1–10, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
Alami et al., (2020) Alami, H., Ouatik El Alaoui, S., Benlahbib, A., and En-nahnahi, N. 2020. LISAC FSDM-USMBA team at SemEval-2020 task 12: Overcoming AraBERT’s pretrain-finetune discrepancy for Arabic offensive language identification. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 2080–2085, Barcelona (online). International Committee for Computational Linguistics.
Albadi et al., (2018) Albadi, N., Kurdi, M., and Mishra, S. 2018. Are they our brothers? analysis and detection of religious hate speech in the arabic twittersphere. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 69–76. IEEE.
Alhazbi, (2020) Alhazbi, S. 2020. Behavior-based machine learning approaches to identify state-sponsored trolls on twitter. IEEE Access, 8:195132–195141.
Alshaalan and Al-Khalifa, (2020) Alshaalan, R. and Al-Khalifa, H. 2020. Hate speech detection in saudi twittersphere: A deep learning approach. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pp. 12–23, Barcelona, Spain (Online). Association for Computational Linguistics.
Alshehri et al., (2018) Alshehri, A., El Moatez Billah Nagoudi, H. A., and Abdul-Mageed, M. 2018. Think before your click: Data and models for adult content in arabic twitter. In TA-COS 2018: 2nd Workshop on Text Analytics for Cybersecurity and Online Safety, 15.
Antoun et al., (2020) Antoun, W., Baly, F., and Hajj, H. 2020. Arabert: Transformer-based model for arabic language understanding. In Proceedings of The 4th Workshop on Open-Source Arabic Corpora and Processing Tools, pp. 9–15.
Belcastro et al., (2020) Belcastro, L., Cantini, R., Marozzo, F., Talia, D., and Trunfio, P. 2020. Learning political polarization on social media using neural networks. IEEE Access, 8:47177–47187.
Cheng et al., (2014) Cheng, H., Xing, X., Liu, X., and Lv, Q. 2014. Isc: An iterative social based classifier for adult account detection on twitter. IEEE Transactions on Knowledge and Data Engineering, 27(4):1045–1056.
(11) Chowdhury, S. A., Abdelali, A., Darwish, K., Soon-Gyo, J., Salminen, J., and Jansen, B. J. 2020a. Improving arabic text categorization using transformer training diversification. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pp. 226–236.
(12) Chowdhury, S. A., Mubarak, H., Abdelali, A., Jung, S.-g., Jansen, B. J., and Salminen, J. 2020b. A multi-platform arabic news comment dataset for offensive language detection. In Proceedings of The 12th Language Resources and Evaluation Conference, pp. 6203–6212.
Chung et al., (2019) Chung, Y.-L., Kuzmenko, E., Tekiroglu, S. S., and Guerini, M. 2019. Conan–counter narratives through nichesourcing: a multilingual dataset of responses to fight online hate speech. arXiv preprint arXiv:1910.03270.
Conneau et al., (2019) Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., and Stoyanov, V. 2019. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
Conover et al., (2011) Conover, M., Ratkiewicz, J., Francisco, M. R., Gonçalves, B., Menczer, F., and Flammini, A. 2011. Political polarization on twitter. ICWSM, 133:89–96.
Darwish et al., (2017) Darwish, K., Alexandrov, D., Nakov, P., and Mejova, Y. 2017. Seminar users in the arabic twitter sphere. In Ciampaglia, G. L., Mashhadi, A., and Yasseri, T., editors, Social Informatics, pp. 91–108, Cham. Springer International Publishing.
Davidson et al., (2017) Davidson, T., Warmsley, D., Macy, M., and Weber, I. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, volume 11.
Devlin et al., (2019) Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
(19) Dimitrov, D., Ali, B. B., Shaar, S., Alam, F., Silvestri, F., Firooz, H., Nakov, P., and Da San Martino, G. 2021a. Detecting propaganda techniques in memes. In Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP’21).
(20) Dimitrov, D., Ali, B. B., Shaar, S., Alam, F., Silvestri, F., Firooz, H., Nakov, P., and Da San Martino, G. 2021b. Task 6 at semeval-2021: Detection of persuasion techniques in texts and images. In Proceedings of the 15th International Workshop on Semantic Evaluation, SemEval, volume 21.
Donato and Paggio, (2017) Donato, G. and Paggio, P. 2017. Investigating redundancy in emoji use: Study on a twitter based corpus. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 118–126.
Dürscheid and Siever, (2017) Dürscheid, C. and Siever, C. M. 2017. Beyond the alphabet–communcataion of emojis. Kurzfassung eines (auf Deutsch) zur Publikation eingereichten Manuskripts.
Gülaçtı, (2010) Gülaçtı, F. 2010. The effect of perceived social support on subjective well-being. Procedia-Social and Behavioral Sciences, 2(2):3844–3849.
(24) Hassan, S., Mubarak, H., Abdelali, A., and Darwish, K. 2021a. ASAD: Arabic social media analytics and unDerstanding. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 113–118, Online. Association for Computational Linguistics.
Hassan et al., (2020) Hassan, S., Samih, Y., Mubarak, H., and Abdelali, A. 2020. ALT at SemEval-2020 task 12: Arabic and English offensive language identification in social media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1891–1897, Barcelona (online). International Committee for Computational Linguistics.
(26) Hassan, S., Shaar, S., and Darwish, K. 2021b. Cross-lingual emotion detection. CoRR, abs/2106.06017.
Husain, (2020) Husain, F. 2020. Osact4 shared task on offensive language detection: Intensive preprocessing based approach. OSACT, 4.
Intapong et al., (2017) Intapong, P., Charoenpit, S., Achalakul, T., and Ohkura, M. 2017. Assessing symptoms of excessive sns usage based on user behavior and emotion: analysis of data obtained by sns apis. In 9th International Conference on Social Computing and Social Media, SCSM 2017 held as part of the 19th International Conference on Human-Computer Interaction, HCI International 2017, pp. 71–83. Springer Verlag.
Kiela et al., (2020) Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Ringshia, P., and Testuggine, D. 2020. The hateful memes challenge: Detecting hate speech in multimodal memes. arXiv preprint arXiv:2005.04790.
Landis and Koch, (1977) Landis, J. R. and Koch, G. G. 1977. The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.
Mei, (2019) Mei, Q. 2019. Decoding the new world language: Analyzing the popularity, roles, and utility of emojis. In Companion Proceedings of The 2019 World Wide Web Conference, pp. 417–418.
(32) Mubarak, H., Abdelali, A., Hassan, S., and Darwish, K. 2020a. Spam detection on arabic twitter. In Social Informatics, pp. 237–251, Cham. Springer International Publishing.
Mubarak and Darwish, (2014) Mubarak, H. and Darwish, K. 2014. Using twitter to collect a multi-dialectal corpus of arabic. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), pp. 1–7.
Mubarak and Darwish, (2016) Mubarak, H. and Darwish, K. 2016. Demographic surveys of arab annotators on crowdflower. In Weaving Relations of Trust in Crowd Work: Transparency and Reputation across Platforms.
Mubarak et al., (2017) Mubarak, H., Darwish, K., and Magdy, W. 2017. Abusive language detection on arabic social media. In Proceedings of the first workshop on abusive language online, pp. 52–56.
(36) Mubarak, H., Darwish, K., Magdy, W., Elsayed, T., and Al-Khalifa, H. 2020b. Overview of osact4 arabic offensive language detection shared task. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection.
Mubarak et al., (2021) Mubarak, H., Hassan, S., and Abdelali, A. 2021. Adult content detection on Arabic Twitter: Analysis and experiments. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pp. 136–144, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
(38) Mubarak, H., Rashed, A., Darwish, K., Samih, Y., and Abdelali, A. 2020c. Arabic offensive language on twitter: Analysis and experiments. arXiv preprint arXiv:2004.02192.
Nakov et al., (2021) Nakov, P., Nayak, V., Dent, K., Bhatawdekar, A., Sarwar, S. M., Hardalov, M., Dinkov, Y., Zlatkova, D., Bouchard, G., and Augenstein, I. 2021. Detecting abusive language on online platforms: A critical analysis. arXiv preprint arXiv:2103.00153.
Ousidhoum et al., (2019) Ousidhoum, N., Lin, Z., Zhang, H., Song, Y., and Yeung, D.-Y. 2019. Multilingual and multi-aspect hate speech analysis. arXiv preprint arXiv:1908.11049.
Platt, (1998) Platt, J. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines.
Polignano et al., (2019) Polignano, M., Basile, P., de Gemmis, M., Semeraro, G., and Basile, V. 2019. Alberto: Italian bert language understanding model for nlp challenging tasks based on tweets. In CLiC-it.
Ribeiro et al., (2016) Ribeiro, M. T., Singh, S., and Guestrin, C. 2016. "why should i trust you?": Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, 1135–1144, New York, NY, USA. Association for Computing Machinery.
Salminen et al., (2020) Salminen, J., Hopf, M., Chowdhury, S. A., Jung, S.-g., Almerekhi, H., and Jansen, B. J. 2020. Developing an online hate classifier for multiple social media platforms. Human-centric Computing and Information Sciences, 10(1):1–34.
Waldron, (2012) Waldron, J. 2012. The harm in hate speech. Harvard University Press.
Wiegand and Ruppenhofer, (2021) Wiegand, M. and Ruppenhofer, J. 2021. Exploiting emojis for abusive language detection. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 369–380.
Zampieri et al., (2020) Zampieri, M., Nakov, P., Rosenthal, S., Atanasova, P., Karadzhov, G., Mubarak, H., Derczynski, L., Pitenis, Z., and Çöltekin, c. 2020. SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020). In Proceedings of SemEval.

Emojis as Anchors to Detect Arabic Offensive Language and Hate Speech