This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

GLARE: Detecting Harmful Memes
by Combining Global and Local Perspectives
(Appendix)

First Author
Affiliation / Address line 1
Affiliation / Address line 2
Affiliation / Address line 3
email@domain
&Second Author
Affiliation / Address line 1
Affiliation / Address line 2
Affiliation / Address line 3
email@domain

Appendix A Implementation Details and Hyperparameters

We train all the models using Pytorch (paszke2019pytorch) on a NVIDIA Tesla T4 GPU, with 16 GB dedicated memory, with CUDA-10 and cuDNN-11 installed. For the unimodal models, we import all the pre-trained weights from TORCHVISION.MODELS111http://pytorch.org/docs/stable/torchvision/models.html subpackage of the PyTorch framework. The non pre-trained weights are randomly initialized with a zero-mean Gaussian distribution with a standard deviation of 0.02. From the dataset statistics table (Table 1 in manuscript), we can observe a label imbalance problem for both harmfulness intensity ([Very Harmful, Partially Harmful] vs. Harmless) and target ([Individual, Organization, Community] vs. Entire Society) classification tasks. To deal with the imbalance problem, we use focal loss (FL) lin2017focal, which down-weighs easy examples and focuses training on hard ones. We train GLARE in a multi-task learning setup, where the loss due to target identification is considered only if the meme is partially harmful or very harmful. We train our models using Adam kingma2014adam optimizer and negative log-likelihood (NLL) loss as the objective function. In Table 1, we furnish the details of hyper-parameters used for the training.

Models Hyperparameters
Batch-size Epochs Learning Rate Image Encoder Text Encoder #Parameters
Unimodal TextBERT 16 20 0.001 - Bert-base-uncased 110,683,414
VGG19 64 200 0.01 VGG19 - 138,357,544
DenseNet-161 32 200 0.01 DenseNet-161 - 28,681,538
ResNet-152 32 300 0.01 ResNet-152 - 60,192,808
ResNeXt-101 32 300 0.01 ResNeXt-101 - 83,455,272
Multimodal Late Fusion 16 20 0.0001 ResNet-152 Bert-base-uncased 170,983,752
Concat BERT 16 20 0.001 ResNet-152 Bert-base-uncased 170,982,214
MMBT 16 20 0.001 ResNet-152 Bert-base-uncased 169,808,726
ViLBERT CC 16 10 0.001 Faster RCNN Bert-base-uncased 112,044,290
V-BERT COCO 16 10 0.001 Faster RCNN Bert-base-uncased 247,782,404
GLARE 64 50 0.001 VGG19 DistilBERT-base-uncased 7,608,323
Table 1: Hyperparameters of different baseline models and GLARE.

Appendix B Filtering

To assure a satisfactory quality of the Harm-C and Harm-P datasets, we impose four well-defined and fine-grained filtering criteria during the data collection and annotation process. The criteria are as follows:

  1. 1.

    The meme text must not be in code-mixed or non-English language.

  2. 2.

    The meme text must be readable. (e.g., blurry text, incomplete text, etc. are not allowed)

  3. 3.

    The meme must not be unimodal, i.e. it should not contain only textual or visual content.

  4. 4.

    The meme must not contain several cartoons. (we add these filtering criteria as cartoons are often very hard to be interpreted by AI systems)

Figure 1 shows some example memes which were rejected during filtering process due to the four criteria mentioned above.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Figure 1: Example memes filtered out during annotation. Where meme sub-figures (a): Non-english, (b): Blurry text, (c): Cartoons, (d): Text-only (e): Image-only.

Appendix C Annotation Guidelines

C.1 What do we mean by harmful memes?

The intended ‘harm’ can be expressed in an obvious manner such as by abusing, offending, disrespecting, insulting, demeaning, or disregarding the entity or any socio-cultural or political ideology, belief, principle, or doctrine associated with that entity. Likewise, the ‘harm’ can also be in the form of a more subtle attack such as by mocking or ridiculing a person or an idea.

The entrenched meaning of harmful memes is targeted towards a social entity (e.g., an individual, an organization, a community, etc.), likely to cause calumny/ vilification/ defamation depending on their background (bias, social background, educational background, etc.). The ‘harm’ caused by a meme can be in the form of mental abuse, psycho-physiological injury, proprietary damage, emotional disturbance, compensated public image. A harmful meme typically attacks celebrities or well-known organizations intending to expose their professional demeanor.

C.2 What are the four target categories?

The four target entities are as follows:

  1. 1.

    Individual: A person, usually a celebrity (e.g., well-known politician, actor, artist, scientist, environmentalist, etc. such as Donald Trump, Joe Biden, Vladimir Putin, Hillary Clinton, Barack Obama, Chuck Norris, Greta Thunberg, Michelle Obama).

  2. 2.

    Organization: An organization is a group of people with a particular purpose, such as a business, government department, company, institution or association – comprising more than one person, and having a particular purpose, such as research organizations (e.g., WTO, Google) and political organizations (e.g., Democratic party).

  3. 3.

    Community: A community is a social unit with commonalities based on personal, professional, social, cultural, or political attributes such as religious views, country of origin, gender identity, etc. Communities may share a sense of place situated in a given geographical area (e.g., a country, village, town, or neighborhood) or in virtual space through communication platforms (e.g., online forums based on religion, country of origin, gender).

  4. 4.

    Society: When a meme promotes conspiracies or hate crimes, it becomes harmful to general public, i.e., the entire society.

Dataset Harmfulness Target
Very harmful Partially harmful Harmless Individual Organization Community Society
Harm-C mask (0.0512) trump (0.0642) you (0.0264) trump (0.0541) deadline (0.0709) china (0.0665) mask (0.0441)
trump (0.0404) president (0.0273) home (0.0263 president (0.0263) associated (0.0709) chinese (0.0417) vaccine (0.0430)
wear (0.0385) obama (0.0262) corona (0.0251) donald (0.0231) extra (0.0645) virus (0.0361) alcohol (0.0309)
thinks (0.0308 donald (0.0241) work (0.0222) obama (0.0217) ensure (0.0645) wuhan (0.0359) temperatures (0.0309)
killed (0.0269) virus (0.0213) day (0.0188) covid (0.0203) qanon (0.0600) cases (0.0319) killed (0.0271)
Harm-P photoshopped (0.0589) democratic (0.0164) party (0.02514) biden (0.0331) libertarian (0.0358) liberals (0.0328) crime (0.0201)
married (0.0343) obama (0.0158) debate (0.0151) joe (0.0323) republican (0.0319) radical (0.0325) rights (0.0195)
joe (0.0309) libertarian (0.0156) president (0.0139) obama (0.0316) democratic (0.0293) islam (0.0323) gun (0.0181)
trump (0.0249) republican (0.0140) democratic (0.0111) trump (0.0286) green (0.0146) black (0.0237) taxes (0.0138)
nazis (0.0241) vote (0.0096) green (0.0086) putin (0.0080) government (0.0097) mexicans (0.0168) law (0.0135)
Table 2: Top-5 most frequent words per class for the Harm-C and Harm-P datasets. The tf-idf score per word is given within parenthesis.

C.3 Characteristics of harmful memes:

  • Harmful memes may or may not be offensive, hateful or biased in nature.

  • Harmful memes point out vices, allegations, and other negative aspects of an entity based on verified or unfounded claims or mocks.

  • Harmful memes leave an open-ended connotation to the word ‘community’, including ‘antisocial’ communities such as terrorist groups.

  • The harmful content in harmful memes is often implicit and might require critical judgment to establish the potency it can cause.

  • Harmful memes can be classified on multiple levels, based on the intensity of the harm caused, e.g., very harmful, partially harmful.

  • One harmful meme can target multiple individual, organizations, communities at the same time. In that case, we asked the annotators to go with the best personal judgment.

  • Harm can be expressed in form of sarcasm and/or political satire. Sarcasm is praise which is really an insult; sarcasm generally involves malice, the desire to put someone down. On the other hand, satire is the ironical exposure of the vices or follies of an individual, a group, an institution, an idea, a society, etc., usually with a view to correcting it.

Refer to caption
(a) Annotation interface
Refer to caption
(b) Consolidation interface
Figure 2: Snapshot of the PyBossa GUI used for annotation and consolidation.

Appendix D Annotation Process

We use the crowd-sourcing platform pybossa222https://pybossa.com/ (c.f. Figure 2) to build an annotation interface that shows each meme and request annotations for harmfulness levels and target. Before beginning the annotation process, we requested every annotator to thoroughly go through the annotation guidelines and conducted several discussion sessions to understand if all of them can understand exactly what harmful content is and how to differentiate it from humorous, satirical, hateful, and non-harmful content. The average inter-annotator agreement scores (Cohen’s κ\kappa) bobicev-sokolova-2017-inter for Harm-C and Harm-P 0.6830.683 and 0.7900.790, respectively.

Step 1 - Dry run: At first, we took a subset of 200 memes (100100 from each dataset) and requested each annotator to annotate for harmfulness levels and targets. This step aimed to ensure that each annotator can comprehend the definition of harmfulness and targets. After this step, the average inter-annotator agreement score (Cohen’s κ\kappa) bobicev-sokolova-2017-inter for two tasks across all pairs of annotators was only 0.2410.241 and 0.3250.325, which is low but expected. The annotators discussed their disagreements and re-annotated the memes. This time, the score improved to 0.7040.704 and 0.8260.826, which is satisfactory for both tasks. Hence, we decided to begin the final annotation process.

Step 2 - Final annotation: In the final annotation stage, we divided the two datasets into 55 equal subsets and assigned 33 annotators for each subset. This ensures that each meme is annotated 33 times. We also request the annotators to reject the memes which violets filtering criteria, as described in Appendix B. This process engages an additional level of filtering to ensure adequate quality of the datasets.

Step 3 - Consolidation: After the final annotation, the average inter-annotator agreement score over two whole datasets are 0.6830.683 and 0.7900.790. We observed many memes where the annotation of two annotators differs from the third one - for example, two annotators independently annotated a meme to be partially harmful, but the third annotator annotated as very harmful. In the consolidation phase, we used majority voting to decide the final label. For cases where all the three annotations are different, we employed a fourth annotator to take the final decision.

Refer to caption
(a) Source and label statistics of Harm-C dataset.
Refer to caption
(b) Source and label statistics of Harm-P dataset.
Figure 3: Source and label statistics of two datasets.

Appendix E Lexical Statistics of Two Datasets

This section analyzes the lexical (word-level) statistics of the Harm-C and Harm-P datasets. Figure 4 shows the length distribution of the meme text for both the tasks across two datasets. Furthermore, Table 2 shows the top-5 most frequent words by every separate class in the combined validation and test sets of two datasets. We observe that for very harmful and partially harmful classes, names of US politicians and COVID-19 oriented words are frequent. In the target classes, we notice the presence of various class-specific words such as ‘trump’, ’joe’, ‘obama’, ’republican’, ’wuhan’, ‘china’, ’islam’. To alleviate potential bias caused by these class-specific systems, we intentionally included harmless memes of related to these individuals, groups and entities, which is described more in the Section 44 of main manuscript.

Refer to caption
(a) Meme text length distribution for Harm-C.
Refer to caption
(b) Meme text length distribution for Harm-P.
Figure 4: Normalized histograms of meme text length per class for two datasets.