GLARE: Detecting Harmful Memes
by Combining Global and Local Perspectives
(Appendix)
Appendix A Implementation Details and Hyperparameters
We train all the models using Pytorch (paszke2019pytorch) on a NVIDIA Tesla T4 GPU, with 16 GB dedicated memory, with CUDA-10 and cuDNN-11 installed. For the unimodal models, we import all the pre-trained weights from TORCHVISION.MODELS111http://pytorch.org/docs/stable/torchvision/models.html subpackage of the PyTorch framework. The non pre-trained weights are randomly initialized with a zero-mean Gaussian distribution with a standard deviation of 0.02. From the dataset statistics table (Table 1 in manuscript), we can observe a label imbalance problem for both harmfulness intensity ([Very Harmful, Partially Harmful] vs. Harmless) and target ([Individual, Organization, Community] vs. Entire Society) classification tasks. To deal with the imbalance problem, we use focal loss (FL) lin2017focal, which down-weighs easy examples and focuses training on hard ones. We train GLARE in a multi-task learning setup, where the loss due to target identification is considered only if the meme is partially harmful or very harmful. We train our models using Adam kingma2014adam optimizer and negative log-likelihood (NLL) loss as the objective function. In Table 1, we furnish the details of hyper-parameters used for the training.
Models | Hyperparameters | ||||||
Batch-size | Epochs | Learning Rate | Image Encoder | Text Encoder | #Parameters | ||
Unimodal | TextBERT | 16 | 20 | 0.001 | - | Bert-base-uncased | 110,683,414 |
VGG19 | 64 | 200 | 0.01 | VGG19 | - | 138,357,544 | |
DenseNet-161 | 32 | 200 | 0.01 | DenseNet-161 | - | 28,681,538 | |
ResNet-152 | 32 | 300 | 0.01 | ResNet-152 | - | 60,192,808 | |
ResNeXt-101 | 32 | 300 | 0.01 | ResNeXt-101 | - | 83,455,272 | |
Multimodal | Late Fusion | 16 | 20 | 0.0001 | ResNet-152 | Bert-base-uncased | 170,983,752 |
Concat BERT | 16 | 20 | 0.001 | ResNet-152 | Bert-base-uncased | 170,982,214 | |
MMBT | 16 | 20 | 0.001 | ResNet-152 | Bert-base-uncased | 169,808,726 | |
ViLBERT CC | 16 | 10 | 0.001 | Faster RCNN | Bert-base-uncased | 112,044,290 | |
V-BERT COCO | 16 | 10 | 0.001 | Faster RCNN | Bert-base-uncased | 247,782,404 | |
GLARE | 64 | 50 | 0.001 | VGG19 | DistilBERT-base-uncased | 7,608,323 |
Appendix B Filtering
To assure a satisfactory quality of the Harm-C and Harm-P datasets, we impose four well-defined and fine-grained filtering criteria during the data collection and annotation process. The criteria are as follows:
-
1.
The meme text must not be in code-mixed or non-English language.
-
2.
The meme text must be readable. (e.g., blurry text, incomplete text, etc. are not allowed)
-
3.
The meme must not be unimodal, i.e. it should not contain only textual or visual content.
-
4.
The meme must not contain several cartoons. (we add these filtering criteria as cartoons are often very hard to be interpreted by AI systems)
Figure 1 shows some example memes which were rejected during filtering process due to the four criteria mentioned above.





Appendix C Annotation Guidelines
C.1 What do we mean by harmful memes?
The intended ‘harm’ can be expressed in an obvious manner such as by abusing, offending, disrespecting, insulting, demeaning, or disregarding the entity or any socio-cultural or political ideology, belief, principle, or doctrine associated with that entity. Likewise, the ‘harm’ can also be in the form of a more subtle attack such as by mocking or ridiculing a person or an idea.
The entrenched meaning of harmful memes is targeted towards a social entity (e.g., an individual, an organization, a community, etc.), likely to cause calumny/ vilification/ defamation depending on their background (bias, social background, educational background, etc.). The ‘harm’ caused by a meme can be in the form of mental abuse, psycho-physiological injury, proprietary damage, emotional disturbance, compensated public image. A harmful meme typically attacks celebrities or well-known organizations intending to expose their professional demeanor.
C.2 What are the four target categories?
The four target entities are as follows:
-
1.
Individual: A person, usually a celebrity (e.g., well-known politician, actor, artist, scientist, environmentalist, etc. such as Donald Trump, Joe Biden, Vladimir Putin, Hillary Clinton, Barack Obama, Chuck Norris, Greta Thunberg, Michelle Obama).
-
2.
Organization: An organization is a group of people with a particular purpose, such as a business, government department, company, institution or association – comprising more than one person, and having a particular purpose, such as research organizations (e.g., WTO, Google) and political organizations (e.g., Democratic party).
-
3.
Community: A community is a social unit with commonalities based on personal, professional, social, cultural, or political attributes such as religious views, country of origin, gender identity, etc. Communities may share a sense of place situated in a given geographical area (e.g., a country, village, town, or neighborhood) or in virtual space through communication platforms (e.g., online forums based on religion, country of origin, gender).
-
4.
Society: When a meme promotes conspiracies or hate crimes, it becomes harmful to general public, i.e., the entire society.
Dataset | Harmfulness | Target | |||||
Very harmful | Partially harmful | Harmless | Individual | Organization | Community | Society | |
Harm-C | mask (0.0512) | trump (0.0642) | you (0.0264) | trump (0.0541) | deadline (0.0709) | china (0.0665) | mask (0.0441) |
trump (0.0404) | president (0.0273) | home (0.0263 | president (0.0263) | associated (0.0709) | chinese (0.0417) | vaccine (0.0430) | |
wear (0.0385) | obama (0.0262) | corona (0.0251) | donald (0.0231) | extra (0.0645) | virus (0.0361) | alcohol (0.0309) | |
thinks (0.0308 | donald (0.0241) | work (0.0222) | obama (0.0217) | ensure (0.0645) | wuhan (0.0359) | temperatures (0.0309) | |
killed (0.0269) | virus (0.0213) | day (0.0188) | covid (0.0203) | qanon (0.0600) | cases (0.0319) | killed (0.0271) | |
Harm-P | photoshopped (0.0589) | democratic (0.0164) | party (0.02514) | biden (0.0331) | libertarian (0.0358) | liberals (0.0328) | crime (0.0201) |
married (0.0343) | obama (0.0158) | debate (0.0151) | joe (0.0323) | republican (0.0319) | radical (0.0325) | rights (0.0195) | |
joe (0.0309) | libertarian (0.0156) | president (0.0139) | obama (0.0316) | democratic (0.0293) | islam (0.0323) | gun (0.0181) | |
trump (0.0249) | republican (0.0140) | democratic (0.0111) | trump (0.0286) | green (0.0146) | black (0.0237) | taxes (0.0138) | |
nazis (0.0241) | vote (0.0096) | green (0.0086) | putin (0.0080) | government (0.0097) | mexicans (0.0168) | law (0.0135) |
C.3 Characteristics of harmful memes:
-
•
Harmful memes may or may not be offensive, hateful or biased in nature.
-
•
Harmful memes point out vices, allegations, and other negative aspects of an entity based on verified or unfounded claims or mocks.
-
•
Harmful memes leave an open-ended connotation to the word ‘community’, including ‘antisocial’ communities such as terrorist groups.
-
•
The harmful content in harmful memes is often implicit and might require critical judgment to establish the potency it can cause.
-
•
Harmful memes can be classified on multiple levels, based on the intensity of the harm caused, e.g., very harmful, partially harmful.
-
•
One harmful meme can target multiple individual, organizations, communities at the same time. In that case, we asked the annotators to go with the best personal judgment.
-
•
Harm can be expressed in form of sarcasm and/or political satire. Sarcasm is praise which is really an insult; sarcasm generally involves malice, the desire to put someone down. On the other hand, satire is the ironical exposure of the vices or follies of an individual, a group, an institution, an idea, a society, etc., usually with a view to correcting it.


Appendix D Annotation Process
We use the crowd-sourcing platform pybossa222https://pybossa.com/ (c.f. Figure 2) to build an annotation interface that shows each meme and request annotations for harmfulness levels and target. Before beginning the annotation process, we requested every annotator to thoroughly go through the annotation guidelines and conducted several discussion sessions to understand if all of them can understand exactly what harmful content is and how to differentiate it from humorous, satirical, hateful, and non-harmful content. The average inter-annotator agreement scores (Cohen’s ) bobicev-sokolova-2017-inter for Harm-C and Harm-P and , respectively.
Step 1 - Dry run: At first, we took a subset of 200 memes ( from each dataset) and requested each annotator to annotate for harmfulness levels and targets. This step aimed to ensure that each annotator can comprehend the definition of harmfulness and targets. After this step, the average inter-annotator agreement score (Cohen’s ) bobicev-sokolova-2017-inter for two tasks across all pairs of annotators was only and , which is low but expected. The annotators discussed their disagreements and re-annotated the memes. This time, the score improved to and , which is satisfactory for both tasks. Hence, we decided to begin the final annotation process.
Step 2 - Final annotation: In the final annotation stage, we divided the two datasets into equal subsets and assigned annotators for each subset. This ensures that each meme is annotated times. We also request the annotators to reject the memes which violets filtering criteria, as described in Appendix B. This process engages an additional level of filtering to ensure adequate quality of the datasets.
Step 3 - Consolidation: After the final annotation, the average inter-annotator agreement score over two whole datasets are and . We observed many memes where the annotation of two annotators differs from the third one - for example, two annotators independently annotated a meme to be partially harmful, but the third annotator annotated as very harmful. In the consolidation phase, we used majority voting to decide the final label. For cases where all the three annotations are different, we employed a fourth annotator to take the final decision.


Appendix E Lexical Statistics of Two Datasets
This section analyzes the lexical (word-level) statistics of the Harm-C and Harm-P datasets. Figure 4 shows the length distribution of the meme text for both the tasks across two datasets. Furthermore, Table 2 shows the top-5 most frequent words by every separate class in the combined validation and test sets of two datasets. We observe that for very harmful and partially harmful classes, names of US politicians and COVID-19 oriented words are frequent. In the target classes, we notice the presence of various class-specific words such as ‘trump’, ’joe’, ‘obama’, ’republican’, ’wuhan’, ‘china’, ’islam’. To alleviate potential bias caused by these class-specific systems, we intentionally included harmless memes of related to these individuals, groups and entities, which is described more in the Section of main manuscript.

