GLARE: Detecting Harmful Memes
by Combining Global and Local Perspectives
(Appendix)

First Author
Affiliation / Address line 1
Affiliation / Address line 2
Affiliation / Address line 3
email@domain
&Second Author
Affiliation / Address line 1
Affiliation / Address line 2
Affiliation / Address line 3
email@domain

Appendix A Implementation Details and Hyperparameters

We train all the models using Pytorch (paszke2019pytorch) on a NVIDIA Tesla T4 GPU, with 16 GB dedicated memory, with CUDA-10 and cuDNN-11 installed. For the unimodal models, we import all the pre-trained weights from TORCHVISION.MODELS¹¹1http://pytorch.org/docs/stable/torchvision/models.html subpackage of the PyTorch framework. The non pre-trained weights are randomly initialized with a zero-mean Gaussian distribution with a standard deviation of 0.02. From the dataset statistics table (Table 1 in manuscript), we can observe a label imbalance problem for both harmfulness intensity ([Very Harmful, Partially Harmful] vs. Harmless) and target ([Individual, Organization, Community] vs. Entire Society) classification tasks. To deal with the imbalance problem, we use focal loss (FL) lin2017focal, which down-weighs easy examples and focuses training on hard ones. We train GLARE in a multi-task learning setup, where the loss due to target identification is considered only if the meme is partially harmful or very harmful. We train our models using Adam kingma2014adam optimizer and negative log-likelihood (NLL) loss as the objective function. In Table 1, we furnish the details of hyper-parameters used for the training.

	Models	Hyperparameters
	Models	Batch-size	Epochs	Learning Rate	Image Encoder	Text Encoder	#Parameters
Unimodal	TextBERT	16	20	0.001	-	Bert-base-uncased	110,683,414
	VGG19	64	200	0.01	VGG19	-	138,357,544
	DenseNet-161	32	200	0.01	DenseNet-161	-	28,681,538
	ResNet-152	32	300	0.01	ResNet-152	-	60,192,808
	ResNeXt-101	32	300	0.01	ResNeXt-101	-	83,455,272
Multimodal	Late Fusion	16	20	0.0001	ResNet-152	Bert-base-uncased	170,983,752
	Concat BERT	16	20	0.001	ResNet-152	Bert-base-uncased	170,982,214
	MMBT	16	20	0.001	ResNet-152	Bert-base-uncased	169,808,726
	ViLBERT CC	16	10	0.001	Faster RCNN	Bert-base-uncased	112,044,290
	V-BERT COCO	16	10	0.001	Faster RCNN	Bert-base-uncased	247,782,404
	GLARE	64	50	0.001	VGG19	DistilBERT-base-uncased	7,608,323

Table 1: Hyperparameters of different baseline models and GLARE.

Appendix B Filtering

To assure a satisfactory quality of the Harm-C and Harm-P datasets, we impose four well-defined and fine-grained filtering criteria during the data collection and annotation process. The criteria are as follows:

1.

The meme text must not be in code-mixed or non-English language.
2.

The meme text must be readable. (e.g., blurry text, incomplete text, etc. are not allowed)
3.

The meme must not be unimodal, i.e. it should not contain only textual or visual content.
4.

The meme must not contain several cartoons. (we add these filtering criteria as cartoons are often very hard to be interpreted by AI systems)

Figure 1 shows some example memes which were rejected during filtering process due to the four criteria mentioned above.

Appendix C Annotation Guidelines

C.1 What do we mean by harmful memes?

The intended ‘harm’ can be expressed in an obvious manner such as by abusing, offending, disrespecting, insulting, demeaning, or disregarding the entity or any socio-cultural or political ideology, belief, principle, or doctrine associated with that entity. Likewise, the ‘harm’ can also be in the form of a more subtle attack such as by mocking or ridiculing a person or an idea.

The entrenched meaning of harmful memes is targeted towards a social entity (e.g., an individual, an organization, a community, etc.), likely to cause calumny/ vilification/ defamation depending on their background (bias, social background, educational background, etc.). The ‘harm’ caused by a meme can be in the form of mental abuse, psycho-physiological injury, proprietary damage, emotional disturbance, compensated public image. A harmful meme typically attacks celebrities or well-known organizations intending to expose their professional demeanor.

C.2 What are the four target categories?

The four target entities are as follows:

1.

Individual: A person, usually a celebrity (e.g., well-known politician, actor, artist, scientist, environmentalist, etc. such as Donald Trump, Joe Biden, Vladimir Putin, Hillary Clinton, Barack Obama, Chuck Norris, Greta Thunberg, Michelle Obama).
2.

Organization: An organization is a group of people with a particular purpose, such as a business, government department, company, institution or association – comprising more than one person, and having a particular purpose, such as research organizations (e.g., WTO, Google) and political organizations (e.g., Democratic party).
3.

Community: A community is a social unit with commonalities based on personal, professional, social, cultural, or political attributes such as religious views, country of origin, gender identity, etc. Communities may share a sense of place situated in a given geographical area (e.g., a country, village, town, or neighborhood) or in virtual space through communication platforms (e.g., online forums based on religion, country of origin, gender).
4.

Society: When a meme promotes conspiracies or hate crimes, it becomes harmful to general public, i.e., the entire society.

Dataset	Harmfulness			Target
Dataset	Very harmful	Partially harmful	Harmless	Individual	Organization	Community	Society
Harm-C	mask (0.0512)	trump (0.0642)	you (0.0264)	trump (0.0541)	deadline (0.0709)	china (0.0665)	mask (0.0441)
	trump (0.0404)	president (0.0273)	home (0.0263	president (0.0263)	associated (0.0709)	chinese (0.0417)	vaccine (0.0430)
	wear (0.0385)	obama (0.0262)	corona (0.0251)	donald (0.0231)	extra (0.0645)	virus (0.0361)	alcohol (0.0309)
	thinks (0.0308	donald (0.0241)	work (0.0222)	obama (0.0217)	ensure (0.0645)	wuhan (0.0359)	temperatures (0.0309)
	killed (0.0269)	virus (0.0213)	day (0.0188)	covid (0.0203)	qanon (0.0600)	cases (0.0319)	killed (0.0271)
Harm-P	photoshopped (0.0589)	democratic (0.0164)	party (0.02514)	biden (0.0331)	libertarian (0.0358)	liberals (0.0328)	crime (0.0201)
	married (0.0343)	obama (0.0158)	debate (0.0151)	joe (0.0323)	republican (0.0319)	radical (0.0325)	rights (0.0195)
	joe (0.0309)	libertarian (0.0156)	president (0.0139)	obama (0.0316)	democratic (0.0293)	islam (0.0323)	gun (0.0181)
	trump (0.0249)	republican (0.0140)	democratic (0.0111)	trump (0.0286)	green (0.0146)	black (0.0237)	taxes (0.0138)
	nazis (0.0241)	vote (0.0096)	green (0.0086)	putin (0.0080)	government (0.0097)	mexicans (0.0168)	law (0.0135)

Table 2: Top-5 most frequent words per class for the Harm-C and Harm-P datasets. The tf-idf score per word is given within parenthesis.

C.3 Characteristics of harmful memes:

•

Harmful memes may or may not be offensive, hateful or biased in nature.
•

Harmful memes point out vices, allegations, and other negative aspects of an entity based on verified or unfounded claims or mocks.
•

Harmful memes leave an open-ended connotation to the word ‘community’, including ‘antisocial’ communities such as terrorist groups.
•

The harmful content in harmful memes is often implicit and might require critical judgment to establish the potency it can cause.
•

Harmful memes can be classified on multiple levels, based on the intensity of the harm caused, e.g., very harmful, partially harmful.
•

One harmful meme can target multiple individual, organizations, communities at the same time. In that case, we asked the annotators to go with the best personal judgment.
•

Harm can be expressed in form of sarcasm and/or political satire. Sarcasm is praise which is really an insult; sarcasm generally involves malice, the desire to put someone down. On the other hand, satire is the ironical exposure of the vices or follies of an individual, a group, an institution, an idea, a society, etc., usually with a view to correcting it.

Appendix D Annotation Process

We use the crowd-sourcing platform pybossa²²2https://pybossa.com/ (c.f. Figure 2) to build an annotation interface that shows each meme and request annotations for harmfulness levels and target. Before beginning the annotation process, we requested every annotator to thoroughly go through the annotation guidelines and conducted several discussion sessions to understand if all of them can understand exactly what harmful content is and how to differentiate it from humorous, satirical, hateful, and non-harmful content. The average inter-annotator agreement scores (Cohen’s $\kappa$ ) bobicev-sokolova-2017-inter for Harm-C and Harm-P $0.683$ and $0.790$ , respectively.

Step 1 - Dry run: At first, we took a subset of 200 memes ( $100$ from each dataset) and requested each annotator to annotate for harmfulness levels and targets. This step aimed to ensure that each annotator can comprehend the definition of harmfulness and targets. After this step, the average inter-annotator agreement score (Cohen’s $\kappa$ ) bobicev-sokolova-2017-inter for two tasks across all pairs of annotators was only $0.241$ and $0.325$ , which is low but expected. The annotators discussed their disagreements and re-annotated the memes. This time, the score improved to $0.704$ and $0.826$ , which is satisfactory for both tasks. Hence, we decided to begin the final annotation process.

Step 2 - Final annotation: In the final annotation stage, we divided the two datasets into $5$ equal subsets and assigned $3$ annotators for each subset. This ensures that each meme is annotated $3$ times. We also request the annotators to reject the memes which violets filtering criteria, as described in Appendix B. This process engages an additional level of filtering to ensure adequate quality of the datasets.

Step 3 - Consolidation: After the final annotation, the average inter-annotator agreement score over two whole datasets are $0.683$ and $0.790$ . We observed many memes where the annotation of two annotators differs from the third one - for example, two annotators independently annotated a meme to be partially harmful, but the third annotator annotated as very harmful. In the consolidation phase, we used majority voting to decide the final label. For cases where all the three annotations are different, we employed a fourth annotator to take the final decision.

Appendix E Lexical Statistics of Two Datasets

This section analyzes the lexical (word-level) statistics of the Harm-C and Harm-P datasets. Figure 4 shows the length distribution of the meme text for both the tasks across two datasets. Furthermore, Table 2 shows the top-5 most frequent words by every separate class in the combined validation and test sets of two datasets. We observe that for very harmful and partially harmful classes, names of US politicians and COVID-19 oriented words are frequent. In the target classes, we notice the presence of various class-specific words such as ‘trump’, ’joe’, ‘obama’, ’republican’, ’wuhan’, ‘china’, ’islam’. To alleviate potential bias caused by these class-specific systems, we intentionally included harmless memes of related to these individuals, groups and entities, which is described more in the Section $4$ of main manuscript.

GLARE: Detecting Harmful Memes by Combining Global and Local Perspectives (Appendix)