This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Gender Bias in Fake News: An Analysis

Navya Sahadevan1     Deepak P2 1St. Joseph’s College, Devagiri, Kerala, India
2Queen’s University Belfast, UK
[email protected]       [email protected]
(2022)
Abstract.

Data science research into fake news has gathered much momentum in recent years, arguably facilitated by the emergence of large public benchmark datasets. While it has been well-established within media studies that gender bias is an issue that pervades news media, there has been very little exploration into the relationship between gender bias and fake news. In this work, we provide the first empirical analysis of gender bias vis-a-vis fake news, leveraging simple and transparent lexicon-based methods over public benchmark datasets. Our analysis establishes the increased prevalence of gender bias in fake news across three facets viz., abundance, affect, political terms and proximal words. The insights from our analysis provide a strong argument that gender bias needs to be an important consideration in research into fake news.

Ethics, Fake News, Gender Bias, Statistical Analysis
copyright: acmcopyrightjournalyear: 2023conference: ; February 27-March 3, 2023; Singaporeccs: Fake Newsccs: Gender Biasccs: Empirical Analysisccs: Fairness

1. Introduction

Fake news, a terminology in use since 1890s111https://www.merriam-webster.com/words-at-play/the-real-story-of-fake-news, refers to false or misleading content presented as news. This has been a subject of intense academic and popular interest lately, especially coinciding with the emergence of online media, and professional management of political activity using it (Johnson, 2016). Disinformation or information disorder, as the phenomenon has been recently referred to, has been increasingly recognized as a grave threat to deliberative democracy (McKay and Tenove, 2021). While relationship between fake news and politics has rightly been subject to much scrutiny, fake news has been recently observed within the context of the COVID-19 pandemic (Springer and Özdemir, 2022) as well. In this paper, we analyze the relationship between fake news and gender, one that has been recognized within social sciences viz., media studies (Almenar et al., 2021) and politics (Stabile et al., 2019), and needs to be explored within quantitative Natural Language Processing (NLP).

In this paper, for the first time to our best knowledge, we consider differences in how gender identities are portrayed within fake and real news within popular public textual fake news benchmark datasets. In particular, our interest is in understanding the relative abundance of gender references as well as gender-specific trends on prevalence of affect and political terms, as well as and lexical references in general, and how they vary across fake and real news. Our analysis illustrates the nuanced but consistent nature of the relationship between news veracity and gender, and strongly suggests that gender needs to be an important consideration within critical studies and data science research on fake news.

2. Related Work

It is interesting to note that what may be regarded as the watershed moment for the fake news phenomenon - 2016 US presidential election (Grinberg et al., 2019) - was a contest between female and male contenders. Yet, analysis focusing on gender has been very limited. A notable study (Stabile et al., 2019) touching on the gender aspect during the 2016 election uses quantitative analysis of tweets surrounding the election providing evidence that election-time fake news had supported stereotypes such as women being unfit for leadership positions, hand in hand with villianizing or trivializing women. In contrast to this work centered on a particular election, our study has a broader scope and considers gender bias over popular benchmark datasets.

Research into gender and fake news located within media studies as well as broadly within social sciences have uncovered interesting insights. A survey study over Spanish respondents (Almenar et al., 2021) found that the perceptions of fake news varied across genders in terms of the degrees of concern, but remained underpinned by the same problems. An in-depth and comprehensive deliberation of the issues around disinformation within the context of race and gender (Thakur and Madrigal, 2022) - published in 2022 - raises several interesting points. It suggests that disinformation flows from the same patriarchal context as online gender-based violence and that gendered disinformation seeks to reinforce negative views of women. In other words, gender stereotypes and biases have been argued to form another dimension for disinformation. It is to be noted that such gender issues have even been noted in pandemic disinformation (Sessa, 2020). This backdrop of extant social science research into gender and disinformation motivates our empirical study.

Against the backdrop of such social science literature, our attempt is to take a few early steps in data-driven quantitative studies into analyzing the relationships between gender and textual disinformation using popular public datasets leveraged within data science research on fake news.

3. Research Questions

Our research questions on data-driven analysis of gender vis-a-vis fake news are:

  • RQ1: How do gender groups fare on their relative abundance within fake and real (i.e., non-fake) news?

  • RQ2: How do emotions, political references and sentiments differ around gender mentions across fake and real news?

  • RQ3: How do trends on lexical references surrounding references to gender groups differ across fake and real news?

We explore these questions over large benchmark datasets for fake news research.

4. Analysis Methodology

We first start with outlining the basic building blocks we use for our analysis. We intent that our analysis methodology embeds the ability to be be clearly verifiable by humans and be able to trace back the inferences to particular mentions within the textual articles. This makes lexicon-based statistical approaches more suited than ML-based approaches for the task. Thus, our building blocks use simple and transparent lexicon-based analysis strategies. These building blocks, as we will see, would be used in a straightforward way for the analysis to address the separate RQs.

Gender Mentions: The first key component in our method involves the identification of gender references/mentions within (fake and real) textual news articles. We use a gender reference lexicon extracted from the NLTK corpus222https://www.nltk.org/api/nltk.corpus.html which contains separate lists of mentions of male and female genders, which includes gender-correlated names and pronouns. For example, words such as he and his are part of the male reference list, whereas her and girl are part of the female reference list. The separate lists contained thousands of words each. Lexicon word occurrences within news articles are regarded as respective gender mentions.

Emotions, Political References and Sentiments: Similar to identification of gender mentions, we use the popular word-emotion association lexicon, EmoLex333http://www.saifmohammad.com/, to identify affective terms within textual news articles across eight emotion classes, viz., anger, anticipation, disgust, fear, joy, sadness, surprise and trust. Given our intent of analyzing emotions and sentiments against the backdrop of gender references, we identify emotion mentions within a specified width word window - the width being a hyperparameter - on either side of the gender references. As an example, within the text excerpt: ’there was apparent anger in her face’, the emotion word anger from the anger emotion class would be associated with the female gender pronoun her for window sizes set to 2\geq 2; we did our analysis with window size varying from 55 to 1010. We use the same methodology for identifying political references, our political lexicon compiled from various sources to include all types of political references including current affairs made publicly available on GitHub444https://github.com/pulkurni/politcal-lexcicon.Towards analyzing sentiments, we leveraged a popular Python lexicon-based sentiment library, VADER555https://pypi.org/project/vaderSentiment/. We aggregate the positive/negative sentiment-scores provided for mention-proximal words that appear in VADER’s sentiment dictionaries, to arrive at cumulative sentiment polarities around each gender mention.

Proximal Lexical References: In analyzing the nature of lexical references surrounding gender mentions, we use the same window-based approach as earlier, and associate all words within a fixed width word window to the gender reference.

RQ-specific Analyses: For a news article NN, let the bag of female and male gender mentions be NFN_{F} and NMN_{M} respectively. The bag of emotion words from emotion class ee taken separately around female references in NN, denoted as NFeN_{F}^{e} would be:

NFe=mNF{me|meL(E=e)meW(m)}N_{F}^{e}=\mathop{\cup}_{m\in N_{F}}\{m_{e}|m_{e}\in L(E=e)\wedge m_{e}\in W(m)\}

where W(m)W(m) denotes the adjacency window around the mention mm, and L(E=e)L(E=e) denotes the lexicon for emotion class ee; all set notations above be interpreted as bag notations. NMeN_{M}^{e}, the bag of emotion words around male mentions, is defined similarly.

The overall emotion density around female mentions in NN is measured as:

NFED=eE|NFe||NFW|N_{F}^{ED}=\frac{\sum_{e\in E}|N_{F}^{e}|}{|N_{F}^{W}|}

For political references, we denote the bag of political words around female mentions as:

NFP=mNF{mp|mp𝒫mpW(m)}N_{F}^{P}=\mathop{\cup}_{m\in N_{F}}\{m_{p}|m_{p}\in\mathcal{PL}\wedge m_{p}\in W(m)\}

where 𝒫\mathcal{PL} is the political lexicon employed. NMPN_{M}^{P}, the bag of political terms around male mentions, is defined similarly. As in the case of emotion densities, the political reference density is computed as:

NFPD=|NFP||NFW|N_{F}^{PD}=\frac{|N_{F}^{P}|}{|N_{F}^{W}|}

For sentiment analysis, a sentiment score is computed for each gender mention mm:

(1) S(m)=wW(m)VADER(w)S(m)=\sum_{w\in W(m)}VADER(w)

where VADER(w)VADER(w) denotes positive or negative sentiment score from VADER (0.00.0 assumed if wVADERw\not\in VADER). Each mention is then associated with one of three sentiment labels, Positive, Neutral and Negative, based on the sentiment score. All such news article specific measures would be suitably aggregated over subsets of real and fake news. The results are then normalised and converted into percentage to address RQ1-3, as we will describe.

5. Results and Analysis

Datasets and Setup: In the interest of generality, we conduct our analysis over four popular textual datasets that have been used for fake news research. ISOT666https://www.uvic.ca/ecs/ece/isot/datasets/fake-news/index.php comprises 40k+ articles split roughly evenly between real and fake. LIAR (Wang, 2017) comprises 12.8k statements labelled for veracity. FakeNewsNet (Shu et al., 2020) (FNN) has 21k news articles collected from across Politifact and Gossipcop. The smallest dataset, BuzzFeed (Potthast et al., 2018) comprises a complete sample of news published in Facebook over a week close to the 2016 U.S. election. There is much variety in cardinality, type, distribution and source of news articles across these three datasets, facilitating a well-rounded analysis across them. Our observed trends were largely invariant with window size variations; the results reported are with window size set to 55.

Data Ver. F M
ISOT Real 25 75
Fake 25 75
LIAR Real 24 76
Fake 23 77
FNN Real 23 77
Fake 23 77
Buzz Real 27 73
Fake 27 73
Table 1. Abundance Results (in %)
Data Ver. Female Male
Emo. Poli. Emo. Poli.
ISOT Real 37.1 62.9 35.6 64.4
Fake 48.8 51.2 46.2 53.8
LIAR Real 42.4 57.6 42.2 57.8
Fake 42.8 57.2 43.0 57.0
FNN Real 42.2 57.8 42.5 57.5
Fake 43.1 56.9 43.1 56.9
Buzz Real 27.3 72.7 23.8 76.2
Fake 30.4 69.6 25.4 74.6
Table 2. Emotion/Political Densities (in %)
Data Ver. Positive Neutral Negative
  F M   F M   F M
ISOT Real 35.2 36.2 35.2 36.2 29.6 27.5
Fake 33.1 33.5 33.1 33.5 33.8 33.0
LIAR Real 36.4 37.0 36.5 37.0 27.1 26.0
Fake 33.8 35.3 33.9 35.3 32.3 29.4
FNN Real 36.9 37.1 36.9 37.1 26.3 25.8
Fake 35.9 36.1 35.9 36.9 28.2 27.9
Buzz Real 35.3 34.9 35.3 34.9 32.8 30.3
Fake 32.1 30.5 32.6 30.8 35.3 38.7
Table 3. Sentiment Profiles (in %)

5.1. RQ1: Abundance Analysis

The relative abundance of genders (in percentage) across fake and real news, computed using a normalised sum-based aggregation of NFN_{F} and NWN_{W} over fake and real subsets of the datasets, is found to be similar across the various datasets. The results align with contemporary understandings of severe under-representation of women in news media (Shor et al., 2019). The average value of female representation (25±2.5%25\pm 2.5\%) and male representation (75±2.5%75\pm 2.5\%) is found to be statistically significant and consistent (p<0.05p<0.05) in all the four datasets. This points to a serious issue since these datasets are commonly used for constructing ML models for research in this area. The consequences of over-representation of males in the design of an artificial intelligence models in the media domain could quietly undo substantive parts of decades of advances in the gender equality (Leavy, 2018).

Female Male
Real governor, spokesman, chairman, statement, adviser, governor,spokesman,chairman,statement,adviser
meeting, senior, rival, deputy, ambassador, meeting,senior,rival,deputy,ambassador
Fake wife,interview,attack,march, statement,fact, interview, fact, march, photo,governor
reported,reality,mother,daughter reported, statement, morning, point, supporters
Table 4. Most Frequent Words

5.2. RQ2: Affect Analysis

The relative normalised emotional-political densities - measured as percentage of emotional and political words among proximal words - across genders and news veracities are illustrated in Table 3. A clear trend in the result is that female are represented more emotionally and less politically compared with corresponding male in both fake news and real news, when abundance around gender mentions is considered as an indicator of nature of representations. This is consistent in across datasets, with the exception of FNN which shows no significant disparities across gender groups. To mention, BuzzFeed contain news predominantly relating with US election 2016, hence the proximal words are found to be more political than other datasets as expected. It is also notable that BuzzFeed has a very high difference in emotional content for women across real and fake news, clearly consistent with the observations that women in politics are not well portrayed (Ref: Section 2). There is also an obvious and overarching trend that fake news is more emotional and less political compared with corresponding real news.

The sentiment analysis results appear in Table 3. Two broad trends are unmistakably visible. First, there is a shift from positive and neutral sentiments towards negative sentiments when one moves eyeballs from the real news statistics to fake. Second, this shift is higher in intensity for female mentions. These provide evidence to assert the prevalence of negativity towards females and reflecting extant qualitative observations of gendered narratives targeting women in political fake news (Stabile et al., 2019).

Table 5 illustrates the relative trends across full emotion profiles computed across eight emotion classes. Apart from the expected deterioration in expressions of trust in fake news, it is also notable that the prevalence of fear and anger are much more than prevalence of joy and surprise. These are aligned with current studies that negative emotions are beneficial for fake news virality (Corbu et al., 2021). It is notable that, even though there are slight variations in the percentage of emotions, there are no significant differences in trends across genders.

Data Ver. Anger Anticipation Disgust Fear Joy Sadness Surprise Trust
 F M    F M   F M F M F M   F M   F M F M
ISOT Real 10 9 16 14 4 4 12 11 6 5 11 9 4 3 38 44
Fake 12 12 13 13 7 7 14 14 9 8 11 10 5 5 29 32
LIAR Real 7 6 15 13 3 3 8 8 8 7 9 8 4 3 45 51
Fake 8 8 14 13 4 4 10 9 7 5 8 8 4 3 45 51
FNN Real 7 6 15 13 4 3 9 8 7 6 8 7 4 3 47 53
Fake 8 7 14 13 4 4 10 9 7 5 8 7 4 3 45 50
Buzz Real 11 11 10 12 6 6 14 14 6 5 8 11 3 3 34 39
Fake 16 10 11 15 6 8 15 15 6 5 8 9 3 4 34 33
Table 5. Emotion Profiles (in %)

5.3. RQ3: Neighboring Lexicons

Our analysis of lexical references around gender mentions has been much more revealing. Most words around gender mentions were salutations such as adviser, governor and spokesman in real news and any gendered patterns were overshadowed by such words. This is illustrated in Table 4 which shows the 1010 most frequent words along with male and female in both real news and fake news. When it comes to fake news, words around female mentions contained gendered roles such as wife, mother and daughter in sharp contrast to non-gendered roles such as governor and ambassador in real news. This trend was seen to deepen in fake news with words such as love and pretty coming up additionally among top female proximal references in fake news, though beyond the top-1010. A surprising observation is that attack happens to be the 3rd3^{rd} most common word beside female mentions in fake news, indicating a portrayal of victimization of women in fake news.

This analysis illustrates that sexist stereotypes relating to women - especially those that objectify or victimize them - have an unmistakably higher role in fake news. The lexical references also indicate that fake news is more oriented towards sensationalism (e.g., see words such as attack, photo etc.), something that has been well understood in society.

6. Conclusions

We analyzed gender bias using lexicon-based methods over popular fake news datasets to gather quantitative evidence on the relationship between gender and fake news. Our analysis shows that the issue of gender bias - particularly bias against women - is accentuated within fake news on all four aspects of our analysis viz., abundance, affect, political mentions and word references. This indicates, among other consequences, that gender-agnostic processing of fake news could propagate and/or amplify gender biases because if a data is laden with stereotypical concepts of gender, the resulting application of the technology will perpetuate this bias (Leavy, 2018). Our insights provide a strong argument to consider gender bias across the gamut of fake news research.

While we restricted our analysis to lexicon/dictionary based methods for transparency, our work provides a solid foundation for further studies such as ML-based analyses to uncover details of relationship between gender bias and fake news. In immediate future work, we intend to study ways of extending this analysis to non-binary genders. Usage of parsing-oriented NLP (e.g., dependency parsing) to address the same questions is another interesting direction.

References

  • (1)
  • Almenar et al. (2021) Ester Almenar, Sue Aran-Ramspott, Jaume Suau, and Pere Masip. 2021. Gender differences in tackling fake news: Different degrees of concern, but same problems. Media and Communication 9, 1 (2021), 229–238.
  • Corbu et al. (2021) Nicoleta Corbu, Alina Bârgăoanu, Flavia Durach, and Georgiana Udrea. 2021. Fake News Going Viral: The Mediating Effect of Negative Emotions. Media Literacy and Academic Research 4, 2 (2021).
  • Grinberg et al. (2019) Nir Grinberg, Kenneth Joseph, Lisa Friedland, Briony Swire-Thompson, and David Lazer. 2019. Fake news on Twitter during the 2016 US presidential election. Science 363, 6425 (2019), 374–378.
  • Johnson (2016) Dennis W Johnson. 2016. Campaigning in the twenty-first century: A whole new ballgame? Routledge.
  • Leavy (2018) Susan Leavy. 2018. Gender Bias in Artificial Intelligence: The Need for Diversity and Gender Theory in Machine Learning. In Proceedings of the 1st International Workshop on Gender Equality in Software Engineering (Gothenburg, Sweden) (GE ’18). Association for Computing Machinery, New York, NY, USA, 14–16. https://doi.org/10.1145/3195570.3195580
  • McKay and Tenove (2021) Spencer McKay and Chris Tenove. 2021. Disinformation as a threat to deliberative democracy. Political Research Quarterly 74, 3 (2021), 703–717.
  • Potthast et al. (2018) Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. 2018. A Stylometric Inquiry into Hyperpartisan and Fake News. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 231–240. https://doi.org/10.18653/v1/P18-1022
  • Sessa (2020) Maria Giovanna Sessa. 2020. Misogyny and Misinformation: An analysis of gendered disinformation tactics during the COVID-19 pandemic.
  • Shor et al. (2019) Eran Shor, Arnout Van De Rijt, and Babak Fotouhi. 2019. A large-scale test of gender bias in the media. Sociological science 6 (2019), 526–550.
  • Shu et al. (2020) Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and Huan Liu. 2020. Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big data 8, 3 (2020), 171–188.
  • Springer and Özdemir (2022) Simon Springer and Vural Özdemir. 2022. Disinformation as COVID-19’s twin pandemic: False equivalences, entrenched epistemologies, and causes-of-causes. OMICS: A Journal of Integrative Biology 26, 2 (2022), 82–87.
  • Stabile et al. (2019) Bonnie Stabile, Aubrey Grant, Hemant Purohit, and Kelsey Harris. 2019. Sex, lies, and stereotypes: Gendered implications of fake news for women in politics. Public Integrity 21, 5 (2019), 491–502.
  • Thakur and Madrigal (2022) Dhanaraj Thakur and DeVan Hankerson Madrigal. 2022. Facts and their discontents: A research agenda for online disinformation, race, and gender. (2022).
  • Wang (2017) William Yang Wang. 2017. “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 422–426.