ViSAGe: A Global-Scale Analysis of Visual Stereotypes in
Text-to-Image Generation
Abstract
Recent studies have shown that Text-to-Image (T2I) model generations can reflect social stereotypes present in the real world.
However, existing approaches for evaluating stereotypes have a noticeable lack of coverage of global identity groups and their associated stereotypes.
To address this gap, we introduce the ViSAGe (Visual Stereotypes Around the Globe) dataset to enable evaluation of known nationality-based stereotypes in T2I models, across 135 nationalities.
We enrich an existing textual stereotype resource by distinguishing between stereotypical associations that are more likely to have visual depictions, such as ‘sombrero’, from those that are less visually concrete, such as ‘attractive’.
We demonstrate ViSAGe’s utility through a multi-faceted evaluation of T2I generations. First, we show that stereotypical attributes in ViSAGe are thrice as likely to be present in generated images of corresponding identities as compared to other attributes, and that the offensiveness of these depictions is especially higher for identities from Africa, South America, and South East Asia. Second, we
assess the stereotypical pull of visual depictions of identity groups, which reveals how the ‘default’ representations of all identity groups in ViSAGe have a pull towards stereotypical depictions, and that this pull is even more prominent for identity groups from the Global South.
CONTENT WARNING: Some examples contain offensive stereotypes.
1 Introduction

Text-to-Image (T2I) models are increasingly being used to generate visual content from textual descriptions Rombach et al. (2022); Ramesh et al. (2021), enabling downstream tasks such as creative design, advertising, marketing, and education. More often than not, these T2I models have been trained on large amounts of web-scale data with minimal curation, if any. As a result, these models often reflect and propagate stereotypes about identities and cultures present in the data Bianchi et al. (2023); Ghosh and Caliskan (2023); Luccioni et al. (2023).
While these studies provide crucial evidence for fairness failures of T2I generations, they have a noticeable lack of coverage of global identity groups and their associated known stereotypes. Some studies offer qualitative insights across different identity groups but focus on only a limited set of identities and known stereotypes Bianchi et al. (2023); Qadri et al. (2023). On the other hand, existing larger scale studies focus primarily on stereotypes in the US or Western society Luccioni et al. (2023), and lack global identities and contexts. Given the widespread adoption of these models in global contexts, it is imperative to build extensive, stratified evaluations that cover broader identity groups to prevent further under-representation of already marginalized groups.
We address the above challenges by presenting a systematic, large-scale, cross-cultural evaluation of regional stereotypes present in the generated images from T2I models. Specifically, we ground our evaluations in an existing geo-culturally broad-coverage, textual resource, SeeGULL Jha et al. (2023) that incorporates societal stereotypes for 175 nationality based identity groups. This grounding in existing social stereotype resources aids the critical distinction between spurious correlations in models and stereotypical tendencies which are necessary for model safety interventions Blodgett et al. (2021); Selvam et al. (2023). We enrich this data with human annotations to incorporate the notion of which stereotypes are more likely to be visually depicted versus those that cannot be. Building on this societal grounding, we conduct a study to investigate the extent of stereotypical and offensive depictions of different nationalities in a scalable manner. We also study how the default representation of a nationality relates to their stereotypical as well as non-stereotypical representations. Figure 1 provides a thematic representation of our approach. Our main contributions are as follows.
-
•
We evaluate Text-to-Image generations for cross-cultural regional stereotypes at scale and with global coverage by leveraging existing resources from textual modality.
-
•
We publicly release the dataset ViSAGe111 https://github.com/google-research-datasets/visage : Visual Stereotypes Around the Globe where we make a critical distinction between ‘visual’ and ‘non-visual’ stereotypes in images. We identify a list of 385 visual attributes, and also introduce a broad-coverage image dataset with annotations of visual markers of stereotypes present in 40,057 image-attribute pairs, representing different identity groups.
-
•
We demonstrate the offensiveness of the generated images and investigate the feasibility of using automated methods employing captioning models to identify visual stereotypes in images at scale.
-
•
We analyze the default representations of identity groups and demonstrate how T2I models disproportionately lean towards their stereotypical representations, even when explicitly prompted otherwise.
The above contributions collectively offer a systematic approach for critically examining the stereotypes in images generated from T2I models.
2 Related Work
Stereotypes in Text-to-Image Models
Cho et al. (2023) show that T2I models reflect specific gender/skin tone biases from training data. Fraser and Kiritchenko (2024) examine gender and racial bias in vision language models. Zhang et al. (2023) study gender presentation in T2I models. Ungless et al. (2023) demonstrate that images of non-cisgender identities are more stereotyped and more sexualised. Bianchi et al. (2023) highlight the presence of stereotypes by using prompts containing attributes and associating the generated visual features with demographic groups. Stable Bias Luccioni et al. (2023) prompts T2I models with a combination of ethnicity/gender and profession and evaluates profession-based stereotypes. Basu et al. (2023) use location-based prompts to quantify geographical representativeness of the generated images. Qadri et al. (2023) identify harmful stereotypes and an ‘outsiders gaze’ in image generation in the South Asian context. We prompt T2I models with ‘identity groups’ around the globe to study the prevalent regional stereotypes in their default visual representation. We identify ‘visual’ stereotypes, related to the concept of ‘imageable’ synsets Yang et al. (2020), to conduct this analysis at scale.
Several efforts have also been made to quantify the harms associated with generative models. Hao et al. (2023) propose a theoretical framework for content moderation and its empirical measurement. Naik and Nushi (2023) and Wang et al. (2023) quantify social biases in T2I generated images. Garcia et al. (2023) and Gustafson et al. (2023) highlight demographic biases in image captioning and classification tasks. We investigate the feasibility of automatically identifying visual stereotypes in image generations, and demonstrate their offensiveness.
Stereotype Benchmarks in Textual Modality
StereoSet Nadeem et al. (2021) and CrowS-Pairs Nangia et al. (2020) are datasets used for detecting stereotypes in NLP prediction and classification-based tasks. SeeGULL Jha et al. (2023) is a stereotype repository with a broad coverage of global stereotypes – containing close to 7000 stereotypes for identity groups spanning 178 countries across 8 different geo-political regions across 6 continents. It comprises of (identity, attribute) pairs, where ‘identity’ denotes global identity groups, and ‘attribute’ denotes the associated descriptive stereotypical adjective/adjective phrase, or a noun/noun phrase, such as (Mexicans, sombrero), (Germans, practical), and (Japanese, polite). In this work, we leverage resources from textual modalities, such as SeeGULL, to ground our evaluation of prevalent stereotypes in the generated images.
3 Our Approach
Some stereotypes can be recognized in terms of clearly visual attributes (e.g.: ‘Mexicans, sombreros’) whereas other stereotypes are defined through non-visual characteristics such as social constructs, and adjectives (e.g.: ‘Chinese, intelligent’). While the latter may have some visual markers, they are difficult to represent accurately in images. Therefore, we follow a nuanced two-step approach. As a first step, we answer the fundamental question of which stereotypes can be objectively represented in an image. We identify inherently ‘visual’ attributes which we subsequently use in the second step to detect stereotypes in the generated images by (i) conducting a large-scale annotation study, and (ii) using automated methods.
3.1 Identifying Visual Stereotypes
To mitigate subjectivity in visual representation and ensure a more reliable evaluation of the generated images from T2I Models for stereotypes, it is crucial to make a distinction between clearly identifiable ‘visual’ stereotypes and difficult to represent ‘non-visual’ stereotypes. We undertake a systematic annotation task to differentiate the two. We use an existing stereotype resource in textual modality, SeeGULL Jha et al. (2023), as a reference to ground our evaluations. The dataset has 1994 unique attributes – both ‘visual’ and ‘non-visual. We deduce visual attributes and then map it back to the identity terms in SeeGULL to obtain visual stereotypes.
Annotating Visual Attributes
The annotators are presented with an attribute and a statement of the form of ‘The attribute [attr] can be visually depicted in an image.’ The attribute presented in the sentence is selected from a list of all unique attributes (noun/noun phrases, and adjective/adjective phrases) from SeeGULL. The annotators are required to assign a Likert scale rating to each attribute indicating the extent to which they agree or disagree with the above statement w.r.t. to the given attribute term. We note that the visual nature of attributes lie on a spectrum. While there may be clearly defined attributes on either extremes, the ‘visual’ and ‘non-visual’ attributes in the middle are more subjective. For our subsequent analysis, we exclude all attributes where any annotator expressed uncertainty or disagreement regarding the visual nature. Consequently, we deem terms where all annotators at least agreed about their visual nature, as ‘visual’ resulting in a selection of 385 out of the original 1994 attributes. Please refer to the Appendix 7.1.1 for details. We recruited annotators based out of the United States, and proficient in English reading and writing. For each attribute, we get annotations from 3 annotators that identify with different geographical origin identities - Asian, European, and North American. Annotators were professional data labelers and were compensated at rates above the prevalent market rates.
Mapping to Visual Stereotypes
Using the identified visual attributes (Appendix 7.1.1), we filter out the associated ‘visual stereotypes’ in SeeGULL. Figure 2 shows the global distribution of visual attributes. It also contains examples of visual stereotypes associated with different countries. We observe that around 30 identities, including Omani, Ukranian, Swiss, Canadian, Mongolian, etc., just have a single visual attribute associated with them. Australians (63) have the highest number of visual attributes associated with them followed by Mexicans (46), Indians (34), New Zealand (31), Ethiopians (27), and Japanese (20). The numbers in bracket indicate the number of visual attributes associated with the identity group in SeeGULL. We use these visual stereotypes for identifying their prevalence in images in the next step.

3.2 Detecting Visual Stereotypes in Text-to-Image Generation
Evaluating generated images for the presence of stereotypes is essential to not amplify or reinforce existing biases in the society. We conduct a large-scale annotation task to detect prevalent visual stereotypes in the generated images of 135 identity groups across the globe. The scale of our evaluation is significantly larger than the prior work – in terms of the global coverage of identity groups, and their associated stereotypes.
We use Stable Diffusion-v1.4 Rombach et al. (2022) as the base Text-to-Image model, denoted as , and generate images representative of various identity groups by defining three distinct prompts as follows: (i) , (ii) , and (iii) , where such that is the set encompassing identity terms extracted from the SeeGULL dataset. For each prompt and each identity term , we generate 5 images, resulting in an output set of images for a combination of prompt and identity term. The image generation process can be denoted as: where is the generated images for a given prompt and identity term .
Detecting Stereotypes through Human Annotations
The annotation task aims to examine whether well-known visual stereotypes about identity groups are reflected in the generated images. We generate 15 images per identity group and present it to the annotators. Each image is accompanied by two sets of attributes: (i) a set of visual attributes stereotypically associated with the identity group (Figure 2), and (ii) an equal number of randomly selected visual non-stereotypical attributes, i.e., attributes that are not stereotypically associated with the identity group. We combine the two sets and randomly show 5 attributes in succession alongside each image, until every attribute has been covered. An additional option of ‘None of the above’ is also displayed with each set. The annotators are asked to select all attribute(s) that they believe are visually depicted in the image. Additionally, they are also asked to draw bounding boxes to highlight specific regions, objects, or other indicators that support their selection of the visual attribute within the image. If annotators believe that no attributes are visually represented, they can choose ‘None of the above.’ Overall, we get annotations for 2,025 identity-image pairs and 40,057 image-attribute pairs. Similar to 3.1, we recruited annotators proficient in English reading and writing, residing in United States but identifying with different nationalities. We get annotations for each attribute-image pair from 3 annotators.
Detecting Stereotypes through Automated Methods
Annotating stereotypes in images at scale can often times be resource-intensive and time-consuming. Therefore, we also investigate the feasibility of using automated techniques for stereotype detection in images. We use the already existing CLIP (Contrastive Language-Image Pre-training) embeddings combined with the BART model to generate image captions. An image is mapped into the CLIP embedding space, and the embeddings are then scored against a cache of n-grams. The top-k embeddings are then sent to BART to generate candidate captions. We get top 50 captions for each image for the same set of the 15 images per identity group previously generated. Building on the already identified visual stereotypes, we identify the stereotypes in captions by performing a string match. To further understand how uniquely a stereotypical attribute is present in the caption of the images of an identity group, we compute a salience score of the attributes w.r.t. the identity group . We use a modified tf-idf metric as follows: . The function represents the smoothed relative frequency of attribute for the identity group ; and the function denotes the inverse document frequency of the attribute term in all the captions ‘C’, reflecting the importance of the attribute across all the captions. A higher salience score of an attribute, indicates a more unique association of the attribute with the identity group. For each identity group, we then extract the most salient associated visual stereotypes as present in the captions.
4 Study 1: Stereotypical Depictions

4.1 Stereotypes Identified through Human Annotations
Do generated images reflect known stereotypes?
We use the identified visual stereotypes for evaluating whether already known stereotypical attributes are visually represented in the images of different identity groups. We only consider identity groups that have more than one associated visual stereotypes for our analysis. We find the likelihood of a stereotypical attribute being present in any image of the identity group as follows. If a stereotypical attribute was shown times per identity group across all 15 images and selected times by the annotators, then . Figure 3 presents examples of the most likely stereotypes associated with certain identity groups as annotated by different annotators. We observe that the attributes with the highest likelihood for an identity group, also align with the known stereotypes present in SeeGULL. For example, ‘dark’, ‘thin’, ‘skinny’, and ‘underweight’, the most likely attributes depicting Sudanese individuals, are also known stereotypes in SeeGULL. The attribute ‘poor’ which has a likelihood measure of 0.5 for Bangladeshi, i.e., approximately 50% of images representing Bangladeshis contained a representation of ’poor’, is also an annotated stereotype in SeeGULL. Similarly, attributes ‘beaner’, ‘brown’, ‘festive, ‘dark skin’, and ‘sombrero’ are stereotypes in SeeGULL for Mexicans, and are also present in the images representing a Mexican person. An Indian person is often represented as ‘dark, ‘brown’, and ‘religious’ which are its associated visual stereotypes in SeeGULL. We note that the terms ‘brown’, ‘black’ etc., in this work, as identified in images do not imply race, and instead are about appearance, or appearance-based observed race Hanna et al. (2020); Schumann et al. (2023), or even the colors themselves.
Are some identity groups depicted more stereotypically than others?
It is crucial to understand if representations of some identity groups tend to be more stereotypical than others. To analyze this better, we compute the ‘stereotypical tendency’ for each identity group. We measure the mean likelihood of a stereotype being present in any image representing an identity group and compare it with the likelihood of a randomly selected non-stereotypical attribute being depicted visually in the same set of images .
Let denote the likelihood of a stereotypical attribute being present in an image, and denote the likelihood of a randomly selected non-stereotypical attribute being present in the image corresponding to an identity group . We select an equal number, , of stereotypical and random attributes for any given identity group for a fair comparison. The likelihood of an image being stereotypical for a given identity group is, then, denoted as , where denotes the -th stereotypical attribute associated with the identity group. Similarly, the likelihood of a random attribute being present in the image of the identity group is given by , where denotes -th attribute selected randomly with the identity group. Computing the ratio between the two for an identity group
gives the ‘stereotypical tendency’ of the generated images. Higher the ratio, greater is the likelihood of its stereotypical representation.
We observe that on average, the visual representation of any identity group is thrice as likely to be stereotypical than non-stereotypical, i.e., the visual stereotypical attributes associated with an identity group are thrice as likely to appear in their visual representation when compared to randomly selected visual non-stereotypical attributes. We also compute for all 135 identity groups. We observe that images representing Togolese, Zimbabwean, Swedes, Danish, etc., only contained stereotypical attributes when compared to random attributes, i.e., is infinite or N/A; whereas images representing Nigerians were 27 times more likely to contain stereotypical attributes than randomly selected non-stereotypes. (Please refer to the Tables 1 and 2 in the Appendix for all scores).

How offensive are the depictions of different identity groups?
Visual representation of some stereotypes can be more offensive than others. For example, the stereotypical depiction of an identity group as ‘poor’ can be considered more offensive when compared to the visual depiction of a stereotype ‘rich’. We investigate the offensiveness of images and whether certain identity groups have a more offensive representation compared to others.
Using the offensiveness rating of stereotypical attributes in SeeGULL, we infer an overall offensiveness score for representation of identity groups. Let denote the offensiveness score of a stereotypical attribute associated with an identity group in SeeGULL. The mean offensiveness score for each identity group is then
where denotes the -th stereotypical attribute identified as being present in the image representing an identity group by the annotators. Figure 8 visualizes the normalized offensiveness score for different identity groups. We observe that the representations of people from countries in Africa, South America, and South East Asia, are comparatively more offensive. Jordanians, Uruguayans, Gabonese, Laotian, and Albanians have the most offensive representation; whereas Australians, Swedes, Danish, Norwegians, and Nepalese have the least offensive representation.
4.2 Stereotypes Identified through Automated Methods
We check the feasibility of using automated methods employing captioning models for stereotype detection, and compare the results with and without using visual stereotypes as a reference. Without visual stereotypes to ground the evaluations, the automated techniques detect non-visual attributes like ‘attractive’, ‘smart’, etc., for identity groups. However, using visual attributes as a reference, our approach uncovers more objectively visual stereotypes for identity groups. These stereotypes also have a high likelihood of being present in the images as marked by the annotators 44.69% of the time. Figure 3 shows the most salient visual stereotypes associated with the identity group. Attributes like ‘sombrero’, ‘dark’, and ‘brown’ were the most salient visual stereotypes for Mexicans; ‘poor’ was highly salient with Bangladeshi, ‘dark’ and ‘thin’ for Sudanese, and ‘elegant’ for French. These attributes were also marked as being present by the annotators. This approach also identified stereotypical attributes which were not necessarily depicted in the images, e.g., attributes like ‘cow’, ‘elephant’ for Indians. This could be a limitation in our automated approach or existing errors/biases in the generated captions themselves. Further analysis is required to tease out error and biases propagated by different components of such a system.
5 Study 2: Stereotypical Pull

Prior research has shown that Text-to-Image models can have a ‘very homogenized lens’ when representing certain identities Bianchi et al. (2023); Qadri et al. (2023). We conduct a deeper analysis to better understand this across the generated images of different identity groups. We define ‘stereotypical pull’ for any identity group as a Text-to-Image model’s inclination to generate images aligning with the stereotypical representations of an identity group when presented with (i) neutral prompts, and (ii) explicit non-stereotypical prompts. This points to the model’s tendency to revert to stereotypical depictions, reflecting its inherent biases.
We use the below sets of prompts, to generate 15 images per prompt for 135 identity groups and demonstrate the prevalence of stereotypical pull.
-
•
Default Representation (d): ‘A/An person; where denotes the identity group’.
-
•
Stereotypical Representation (s): (i)‘A/An person described as ’, (ii) ‘A photo/potrait of a/an person’; where is the visual stereotypical attribute associated with the identity group in SeeGULL.
-
•
Non-Stereotypical Representation (ns): (i) ‘A/An person described as ’, (ii) ‘A photo/potrait of a/an person’; where is a visual attribute not associated with in SeeGULL.
Figure 4 visually demonstrates the ‘stereotypical pull’ for the generated images of the identity groups ‘Bangladeshi’, ‘Swiss’, ‘Ethiopian’, and ‘Australian’. For demonstrative purposes, we select opposing stereotypical and non-stereotypical attributes. The default representation of ‘a Bangladeshi person’, looks very similar to the sets of images of ‘a poor Bangladeshi person’ (stereotype), as well as ‘a rich Bangladeshi person’ (non-stereotype). However, the visual representation of ‘a Swiss person’ is quite distinct from the visual representation of ‘a poor Swiss person’ (non-stereotype) and ‘a rich Swiss person’ (stereotype). We observe this pattern even for physical characteristics, where images of ‘an Ethiopian person’, ‘a skinny Ethiopian person (stereotype)’, and ‘a rich Ethiopian person’ (non-stereotype) are all similar looking when compared to that of ‘an Australian person’, ‘a skinny Australian person’ (non-stereotype), and ‘an overweight Australian person’ (stereotype) which are visually quite distinct. The generative models have a tendency to ‘pull’ the image generation towards stereotypes for certain identities when prompted with neutral prompts, and non-stereotypical prompts.
We also compare the stereotypical pull across identity groups from 8 different regions Jha et al. (2023) and demonstrate this phenomenon quantitatively in Figure 5. We compute the mean pairwise cosine similarity between the CLIP embeddings of default (d) representation of the identity group with (i) stereotypical (s) representation , and (ii) non-stereotypical (ns) images . The X-axis in Figure 5 represents the difference of stereotyped and non-stereotypes image sets from the default representations . We also compute the similarity between the stereotyped (s) and the non-stereotyped (ns) images , represented by the Y-axis. For 121 out of 135 identity groups, the default representation of an identity group has a higher similarity score with the ‘stereotyped’ images compared to the ‘non-stereotyped’ images indicating an overall ‘pull’ towards generating stereotypical looking images. Moreover, for identity groups from global south, the similarity between stereotypes and non-stereotyped image sets is also very high, indicating an overall lack of diversity in the sets of generated images.
6 Conclusion
In this work, we present a comprehensive framework for evaluating Text-to-Image models with a focus on ‘regional stereotypes.’ Leveraging existing stereotype benchmarks in textual resources, we perform a three-fold evaluation of generated images. Firstly, we distinguish between inherently visual stereotypes, which can be represented in an image, and ‘non-visual’ stereotypes. Subsequently, we identify visual stereotypes prevalent in the generated images from T2I models across 135 identity groups by conducting a large-scale annotation task. We will publicly release our dataset ‘ViSAGe’ which contains visual stereotypes and the annotated images featuring highlighted visual markers of these stereotypes. Additionally, we quantify the offensiveness of the generated images across different identity groups, and investigate the feasibility of automatically detecting stereotypes in images. Through an extensive case study, we also demonstrate that generated images of almost all identity groups exhibit a more visually ‘stereotypical’ appearance, even when Text-to-Image models are explicitly prompted with neutral or non-stereotypical attributes. This phenomenon is more prominent for identity groups from the global south. We hope that the presented approach and the dataset will provide valuable insights for understanding and addressing regional stereotypes in visual content generation.
Acknowledgements
We thank Kathy Meier-Hellstern, and Utsav Prabhu for their detailed feedback and Susan Hao for useful discussions. We also thank the anonymous reviewers for the constructive feedback during the review process.
Limitations
Intended Usage
Not all attributes are undesirable in a T2I generation context. Determining whether a T2I model should reflect certain attributes associated with specific identity groups is a non-trivial question and can be most accurately determined by the respective developer and the downstream use case, taking into account the type of harm Shelby et al. (2023) experienced by the end user. Our objective in this work is not to advocate for a stereotype-neutral generation but to equip practitioners with resources and evaluation methods to assesss, on a scalable and effective basis, the extent to which global stereotypes are reproduced in model generations. We also aim to facilitate an understanding of the nature of stereotypes, including their potential offensiveness, through this work. We hope this enables model builders and platforms to better prioritize and appropriately intervene in their model generations based on their use case.
Limitations
Although we cover a wide range of visual stereotypes, annotation about the visual nature of attributes is potentially subjective. While we attempt to account for this subjectivity by (1) using a Likert scale as opposed to binary labels, and (2) capture diversity in our annotator pool w.r.t. gender and geographical region, we recognize that our annotations may still fail to capture other dimensions of subjectivity. Moreover, our evaluation of prevalence of stereotypes in the generated images is limited by the stereotypes present in textual resources like SeeGULL. More regional stereotypes from different textual resources can be included to further expand the overall coverage. We evaluate the images generated using Stable Diffusion V1-4 and believe the results would hold even on other Text-to-Image models, but that needs to be verified in the future work where our analysis may be applied to other generative image models. We chose Stable Diffusion due to its ease of access for academic research, and since our core objective is to demonstrate gaps in existing stereotype bias evaluation approaches, rather than demonstrate biases in any particular or all image generation models. Our work focuses only on regional stereotypes and their presence in the generated images. The proposed framework could be used to experiment with other axes like gender, race, ethnicity, etc., as well. We believe that our approach is extensible to other types of stereotypes, provided such data repositories exist. Future iterations of such data collection and evaluation should take more participatory approach and involve communities with lived experiences on the harms of bias in society.
Ethical Considerations
We evaluate stereotypes in the generated visual depictions of people, based on their identity associated with a geographical location. We acknowledge that this is a complex notion of identity and overlaps with other facets of a person’s identity, such as race, which is not a focus of this work. We also emphasize that this work does not cover all possible visual depictions of stereotypes and thus, it’s methods or dataset should not be used as a standard to deem model generations free of stereotypes. This is an initial step towards a systematic, global-scale evaluations of T2I models for stereotypical depictions, and we hope for future work to build on this to attain more coverage of potential stereotypical and harmful depictions in generations.
References
- Basu et al. (2023) Abhipsa Basu, R Venkatesh Babu, and Danish Pruthi. 2023. Inspecting the geographical representativeness of images from text-to-image models. arXiv preprint arXiv:2305.11080.
- Bianchi et al. (2023) Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. 2023. Easily accessible text-to-image generation amplifies demographic stereotypes at large scale. FAccT ’23, page 1493–1504, New York, NY, USA. Association for Computing Machinery.
- Blodgett et al. (2021) Su Lin Blodgett, Gilsinia Lopez, Alexandra Olteanu, Robert Sim, and Hanna Wallach. 2021. Stereotyping Norwegian salmon: An inventory of pitfalls in fairness benchmark datasets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1004–1015, Online. Association for Computational Linguistics.
- Cho et al. (2023) Jaemin Cho, Abhay Zala, and Mohit Bansal. 2023. Dall-eval: Probing the reasoning skills and social biases of text-to-image generation models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3043–3054.
- Fraser and Kiritchenko (2024) Kathleen Fraser and Svetlana Kiritchenko. 2024. Examining gender and racial bias in large vision–language models using a novel dataset of parallel images. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 690–713, St. Julian’s, Malta. Association for Computational Linguistics.
- Garcia et al. (2023) Noa Garcia, Yusuke Hirota, Yankun Wu, and Yuta Nakashima. 2023. Uncurated image-text datasets: Shedding light on demographic bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6957–6966.
- Ghosh and Caliskan (2023) Sourojit Ghosh and Aylin Caliskan. 2023. ‘person’ == light-skinned, western man, and sexualization of women of color: Stereotypes in stable diffusion. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6971–6985, Singapore. Association for Computational Linguistics.
- Gustafson et al. (2023) Laura Gustafson, Chloe Rolland, Nikhila Ravi, Quentin Duval, Aaron Adcock, Cheng-Yang Fu, Melissa Hall, and Candace Ross. 2023. Facet: Fairness in computer vision evaluation benchmark. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20370–20382.
- Hanna et al. (2020) Alex Hanna, Emily Denton, Andrew Smart, and Jamila Smith-Loud. 2020. Towards a critical race methodology in algorithmic fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, page 501–512, New York, NY, USA. Association for Computing Machinery.
- Hao et al. (2023) Susan Hao, Piyush Kumar, Sarah Laszlo, Shivani Poddar, Bhaktipriya Radharapu, and Renee Shelby. 2023. Safety and fairness for content moderation in generative models. arXiv preprint arXiv:2306.06135.
- Jha et al. (2023) Akshita Jha, Aida Mostafazadeh Davani, Chandan K Reddy, Shachi Dave, Vinodkumar Prabhakaran, and Sunipa Dev. 2023. SeeGULL: A stereotype benchmark with broad geo-cultural coverage leveraging generative models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9851–9870, Toronto, Canada. Association for Computational Linguistics.
- Luccioni et al. (2023) Sasha Luccioni, Christopher Akiki, Margaret Mitchell, and Yacine Jernite. 2023. Stable bias: Evaluating societal representations in diffusion models. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- Nadeem et al. (2021) Moin Nadeem, Anna Bethke, and Siva Reddy. 2021. Stereoset: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371.
- Naik and Nushi (2023) Ranjita Naik and Besmira Nushi. 2023. Social biases through the text-to-image generation lens. arXiv preprint arXiv:2304.06034.
- Nangia et al. (2020) Nikita Nangia, Clara Vania, Rasika Bhalerao, and Samuel Bowman. 2020. Crows-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967.
- Prabhakaran et al. (2021) Vinodkumar Prabhakaran, Aida Mostafazadeh Davani, and Mark Díaz. 2021. On releasing annotator-level labels and information in datasets. In Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop, pages 133–138.
- Qadri et al. (2023) Rida Qadri, Renee Shelby, Cynthia L Bennett, and Emily Denton. 2023. Ai’s regimes of representation: A community-centered study of text-to-image models in south asia. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 506–517.
- Ramesh et al. (2021) Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation.
- Rombach et al. (2022) Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695.
- Schumann et al. (2023) Candice Schumann, Gbolahan O Olanubi, Auriel Wright, Ellis Monk Jr, Courtney Heldreth, and Susanna Ricco. 2023. Consensus and subjectivity of skin tone annotation for ml fairness. arXiv preprint arXiv:2305.09073.
- Selvam et al. (2023) Nikil Selvam, Sunipa Dev, Daniel Khashabi, Tushar Khot, and Kai-Wei Chang. 2023. The tail wagging the dog: Dataset construction biases of social bias benchmarks. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1373–1386, Toronto, Canada. Association for Computational Linguistics.
- Shelby et al. (2023) Renee Shelby, Shalaleh Rismani, Kathryn Henne, AJung Moon, Negar Rostamzadeh, Paul Nicholas, N’Mah Yilla-Akbari, Jess Gallegos, Andrew Smart, Emilio Garcia, et al. 2023. Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pages 723–741.
- Ungless et al. (2023) Eddie Ungless, Bjorn Ross, and Anne Lauscher. 2023. Stereotypes and smut: The (mis)representation of non-cisgender identities by text-to-image models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7919–7942, Toronto, Canada. Association for Computational Linguistics.
- Wang et al. (2023) Jialu Wang, Xinyue Gabby Liu, Zonglin Di, Yang Liu, and Xin Eric Wang. 2023. T2iat: Measuring valence and stereotypical biases in text-to-image generation. arXiv preprint arXiv:2306.00905.
- Yang et al. (2020) Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, and Olga Russakovsky. 2020. Towards fairer datasets: Filtering and balancing the distribution of the people subtree in the imagenet hierarchy. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 547–558.
- Zhang et al. (2023) Yanzhe Zhang, Lu Jiang, Greg Turk, and Diyi Yang. 2023. Auditing gender presentation differences in text-to-image models. arXiv preprint arXiv:2302.03675.
7 Appendix
7.1 Annotations
All annotations were procured through a partner vendor who handled the recruitment, obtained informed consent, and provided clean, anonymous ratings within each task. Annotators were paid above prevalent market rates, respecting minimum wage laws. They were recruited such that every data point was annotated by at least one non-male identifying person. Annotators were also diverse in region of residence.
7.1.1 Annotating Visual Attributes
The Likert Scale labels for the annotation task were as follows:
-
•
Strongly Agree: When the attribute can be explicitly identified within an image, like objects, colors, or similar visual elements (e.g., ‘hat,’ ‘sombrero,’ ‘short’).
-
•
Agree: When the attribute can be deduced from visual cues, albeit not explicitly depicted in the image (e.g., ‘fashionable,’ ‘poor,’ ‘impoverished’).
-
•
Disagree: When the attribute is challenging to detect visually but may be inferred from visual cues in specific contexts.
-
•
Strongly Disagree: When the attributes cannot be inferred visually, either explicitly or through visual cues, such as ‘kind,’ ‘talkative,’ ‘warmhearted,’ and the like.
-
•
Unsure: When annotators are uncertain about the attribute’s visual nature.
We assessed the visual nature of 1994 attributes present in the SeeGULL dataset. Recognizing the potential limitations associated with majority voting Prabhakaran et al. (2021), we explored different thresholds to determine the degree of visual nature of an attribute. Figure 6 presents one such consensus plot for the ‘visual nature’ of attributes. X-axis demonstrates the consensus labels, and Y-axis indicates the percentage of attributes for which at least 2 out of 3 annotators reached a consensus.
We observe that 20.41% of the attributes had a ‘strongly agree’ consensus rating from at least 2 annotators, suggesting that these attributes were perceived as extremely visual. Some examples of strongly visual attributes include ‘blonde haired’, ‘champagne’, ‘dark skin’, ‘elephant’, and ‘fat’. For 22.16% of the attributes, annotators ‘agreed’ that they were somewhat visual. Attributes like ‘rich, ‘poor’, ‘snake charmer’, and ‘swarthy’ were included in this category. A roughly equal portion, comprising of 20.26% of the attributes were considered non-visual where majority of the annotators ‘disgareed’ about their visual nature. Attributes like ‘professional’, ‘unemployable’, ‘carefree’, ’abusive’, and ‘calm’ were present in this category. Approximately 7.52% of the attributes were deemed extremely non-visual as indicated by the ’strongly disagree’ label. Some examples include ‘good sense of humor’, ‘just’, and ‘unintelligent’; and annotators were unsure regarding the visual nature of 0.2% of the attributes.
For our subsequent analysis, we exclude all attributes where any annotator expressed uncertainty, disagreement, or strong disagreement regarding the visual nature. Consequently, our ‘visual attributes’ dataset consists solely of attributes for which all annotators either agreed or strongly agreed upon their visual nature, resulting in a selection of 385 out of the original 1994 attributes. We categorize these select attributes as ‘visual’ for our study.

7.1.2 Detecting Visual Stereotypes
Figure 7 presents an example of an annotated data point. We release the image, the identity group, the annotated attribute, and the coordinates of the attribute in the image along with a unique annotator ID identifying the annotator of the given image-attribute pair. We also release the data card 222https://github.com/google-research-datasets/visage with more details.

7.2 Additional Results for Stereotypical Tendency of Identity Groups
We compute for all 135 identity groups. TableS 1 and 2 presents the and the likelihood of images being stereotypical over non-stereotypical for all identity groups.
Identity | (stereo, id) | (random, id) | |
---|---|---|---|
Togolese | 0.35 | 0.00 | N/A |
Zimbabwe | 0.32 | 0.00 | N/A |
Malian | 0.29 | 0.00 | N/A |
Guyanese | 0.16 | 0.00 | N/A |
Sierra Leonean | 0.15 | 0.00 | N/A |
Guatemalan | 0.14 | 0.00 | N/A |
Kosovar | 0.12 | 0.00 | N/A |
Iraq | 0.11 | 0.00 | N/A |
Sweden | 0.10 | 0.00 | N/A |
Denmark | 0.10 | 0.00 | N/A |
South Sudanese | 0.09 | 0.00 | N/A |
Gabonese | 0.05 | 0.00 | N/A |
Mauritanian | 0.03 | 0.00 | N/A |
Greece | 0.03 | 0.00 | N/A |
Kuwaiti | 0.03 | 0.00 | N/A |
Jordanian | 0.02 | 0.00 | N/A |
Bhutan | 0.02 | 0.00 | N/A |
Moroccan | 0.01 | 0.00 | N/A |
Ecuadorian | 0.01 | 0.00 | N/A |
Thailand | 0.01 | 0.00 | N/A |
Liberian | 0.33 | 0.00 | 75.00 |
Panamanian | 0.24 | 0.00 | 54.37 |
Lebanese | 0.19 | 0.00 | 51.54 |
Mauritian | 0.18 | 0.00 | 36.83 |
Sudanese | 0.38 | 0.01 | 27.36 |
Nigerian | 0.08 | 0.00 | 27.30 |
Libyan | 0.24 | 0.01 | 25.54 |
Egypt | 0.13 | 0.01 | 24.83 |
Laos | 0.19 | 0.01 | 21.56 |
Myanmar | 0.20 | 0.01 | 16.00 |
Kenya | 0.17 | 0.01 | 13.56 |
Indian | 0.15 | 0.01 | 12.08 |
Djiboutian | 0.24 | 0.02 | 10.95 |
Ireland | 0.06 | 0.01 | 9.59 |
Norwegian | 0.09 | 0.01 | 9.10 |
Chadian | 0.24 | 0.03 | 8.96 |
Saudi Arabian | 0.17 | 0.02 | 8.57 |
Guinean | 0.15 | 0.02 | 7.93 |
Ghanaian | 0.28 | 0.04 | 7.90 |
Cambodian | 0.22 | 0.03 | 7.67 |
Bangladesh | 0.26 | 0.03 | 7.38 |
Britain | 0.16 | 0.02 | 6.75 |
Ethiopia | 0.17 | 0.03 | 6.56 |
United Kingdom | 0.08 | 0.01 | 6.25 |
Nepali | 0.32 | 0.05 | 5.94 |
Malaysian | 0.15 | 0.03 | 5.92 |
Philippines | 0.19 | 0.03 | 5.86 |
Italy | 0.02 | 0.00 | 5.77 |
Rwandan | 0.13 | 0.02 | 5.33 |
Georgian | 0.07 | 0.01 | 5.00 |
Zambian | 0.20 | 0.05 | 4.24 |
Identity | (stereo, id) | (random, id) | |
---|---|---|---|
Bolivian | 0.14 | 0.04 | 3.82 |
Somalis | 0.24 | 0.07 | 3.71 |
Uruguayan | 0.17 | 0.05 | 3.66 |
Senegalese | 0.07 | 0.02 | 3.63 |
Sri Lanka | 0.11 | 0.03 | 3.54 |
Hondurans | 0.19 | 0.06 | 3.17 |
New Zealand | 0.08 | 0.03 | 3.14 |
North Korea | 0.16 | 0.06 | 2.87 |
England | 0.06 | 0.02 | 2.67 |
Albanian | 0.04 | 0.02 | 2.63 |
China | 0.11 | 0.04 | 2.54 |
Nicaraguan | 0.28 | 0.12 | 2.28 |
Ugandan | 0.18 | 0.08 | 2.25 |
Brazil | 0.13 | 0.06 | 2.22 |
Mozambican | 0.12 | 0.05 | 2.21 |
Russia | 0.07 | 0.03 | 2.19 |
Mexico | 0.20 | 0.09 | 2.17 |
Vietnam | 0.15 | 0.08 | 2.02 |
Japan | 0.12 | 0.06 | 2.00 |
United States | 0.08 | 0.04 | 1.89 |
Pakistani | 0.17 | 0.09 | 1.88 |
Congolese | 0.10 | 0.05 | 1.87 |
Australian | 0.06 | 0.03 | 1.82 |
Indonesian | 0.18 | 0.10 | 1.75 |
Cameroonian | 0.06 | 0.03 | 1.71 |
Afghanistan | 0.07 | 0.04 | 1.67 |
Romanian | 0.03 | 0.02 | 1.57 |
Palestinian | 0.10 | 0.06 | 1.53 |
Tanzanian | 0.15 | 0.10 | 1.47 |
Peru | 0.14 | 0.10 | 1.45 |
Algerian | 0.02 | 0.01 | 1.38 |
South African | 0.04 | 0.03 | 1.28 |
Belgium | 0.02 | 0.01 | 1.25 |
Germany | 0.02 | 0.01 | 1.20 |
Iran | 0.12 | 0.10 | 1.16 |
Argentine | 0.07 | 0.07 | 1.01 |
Gambian | 0.07 | 0.07 | 0.96 |
Singapore | 0.03 | 0.03 | 0.94 |
Angolan | 0.08 | 0.08 | 0.93 |
Israel | 0.04 | 0.05 | 0.89 |
France | 0.06 | 0.07 | 0.89 |
Yemen | 0.11 | 0.15 | 0.74 |
Syrian | 0.13 | 0.19 | 0.71 |
Turkey | 0.05 | 0.07 | 0.69 |
Nepal | 0.03 | 0.04 | 0.62 |
Equatorial Guinean | 0.03 | 0.07 | 0.50 |
Colombia | 0.01 | 0.02 | 0.45 |
Venezuela | 0.06 | 0.14 | 0.39 |
Eritrean | 0.04 | 0.09 | 0.39 |
Barundi | 0.00 | 0.01 | 0.21 |
Chilean | 0.03 | 0.15 | 0.19 |
Comorans | 0.02 | 0.09 | 0.18 |
Spain | 0.02 | 0.14 | 0.12 |
Wales | 0.00 | 0.06 | 0.00 |
7.3 Offensiveness of Visual Stereotypes
Figure 8 visualizes the normalized offensiveness score for different identity groups.

7.4 Additional Results for Stereotype Pull
-
•
: Similarity between the default (d) representation of the identity group with the stereotypical (s) representation.
-
•
: Similarity between the default (d) representation of the images with the non-stereotypical (ns) images.
-
•
: Similarity between the stereotyped (s) and the non-stereotyped (ns) images.
Tables 3, 4 and 5 present the mean cosine similarity between the sets of images for all identity groups across all stereotyped and non-stereotyped attributes. For 121 out of 135 identity groups, the default representation of an identity group has a higher similarity score with the ‘stereotyped’ images compared to the ‘non-stereotyped’ images indicating an overall ‘pull’ towards generating stereotypical looking images. Moreover, for identity groups from global south the mean similarity score across all three measures is constantly higher indicating an overall lack of diversity in their representations across the three sets.
Identity Group | S(d, s) | S(d, ns) | S(s, ns) | Mean Sim |
---|---|---|---|---|
South Sudanese | 0.9824 | 0.964 | 0.9609 | 0.9691 |
Nigerien | 0.9807 | 0.961 | 0.9608 | 0.9675 |
North Korean | 0.9702 | 0.968 | 0.956 | 0.9648 |
Djiboutian | 0.9719 | 0.9627 | 0.9596 | 0.9647 |
Myanmar | 0.9720 | 0.9603 | 0.9551 | 0.9624 |
Bolivian | 0.9688 | 0.9598 | 0.9587 | 0.9624 |
Mauritanian | 0.9736 | 0.9575 | 0.9555 | 0.9622 |
Malawian | 0.9837 | 0.9514 | 0.9491 | 0.9614 |
Bhutanese | 0.9786 | 0.9546 | 0.9494 | 0.9609 |
Mongolian | 0.9719 | 0.9554 | 0.9543 | 0.9605 |
Identity Group | S(d, s) | S(d, ns) | S(s, ns) | Mean Sim |
---|---|---|---|---|
Gabonese | 0.9716 | 0.9583 | 0.9484 | 0.9594 |
Ugandan | 0.9721 | 0.9532 | 0.9522 | 0.9592 |
Malian | 0.9672 | 0.9597 | 0.9505 | 0.9591 |
Rwandan | 0.9775 | 0.9512 | 0.9468 | 0.9585 |
Central African | 0.9769 | 0.9472 | 0.9474 | 0.9571 |
Mozambican | 0.9733 | 0.9471 | 0.9462 | 0.9555 |
Saudi Arabian | 0.9747 | 0.948 | 0.9427 | 0.9551 |
Nepalese | 0.971 | 0.9527 | 0.9409 | 0.9549 |
Ethiopian | 0.9632 | 0.9508 | 0.9453 | 0.9531 |
Nepali | 0.9721 | 0.9498 | 0.937 | 0.953 |
Chadian | 0.9581 | 0.9453 | 0.9479 | 0.9504 |
Guinean | 0.9559 | 0.9429 | 0.9509 | 0.9499 |
Sudanese | 0.9659 | 0.9441 | 0.9396 | 0.9499 |
Equatorial Guinean | 0.9718 | 0.9376 | 0.9378 | 0.9491 |
Senegalese | 0.9496 | 0.952 | 0.9452 | 0.9489 |
Tanzanian | 0.9649 | 0.9433 | 0.9385 | 0.9489 |
Botswana | 0.9719 | 0.9376 | 0.9368 | 0.9487 |
Nicaraguan | 0.9673 | 0.9412 | 0.9309 | 0.9465 |
Cambodian | 0.9676 | 0.9377 | 0.9331 | 0.9461 |
Zambian | 0.9645 | 0.9383 | 0.9344 | 0.9457 |
Liberian | 0.961 | 0.932 | 0.9417 | 0.9449 |
Congolese | 0.9592 | 0.9377 | 0.9368 | 0.9446 |
Angolan | 0.9588 | 0.9344 | 0.937 | 0.9434 |
Togolese | 0.9611 | 0.9338 | 0.9351 | 0.9434 |
Eritrean | 0.9595 | 0.9359 | 0.9342 | 0.9432 |
Sierra Leonean | 0.9542 | 0.9408 | 0.9277 | 0.9409 |
Guatemalan | 0.9481 | 0.9437 | 0.9274 | 0.9397 |
Laos | 0.9607 | 0.9313 | 0.9268 | 0.9396 |
Egyptian | 0.9592 | 0.9366 | 0.917 | 0.9376 |
Yemeni | 0.952 | 0.9361 | 0.9246 | 0.9375 |
Omani | 0.9618 | 0.9329 | 0.9161 | 0.9369 |
Afghans | 0.9579 | 0.9333 | 0.9189 | 0.9367 |
Ecuadorian | 0.9633 | 0.9249 | 0.9218 | 0.9367 |
Guyanese | 0.9646 | 0.9287 | 0.9164 | 0.9366 |
Cameroonian | 0.9463 | 0.9331 | 0.9286 | 0.936 |
Gambian | 0.9293 | 0.9506 | 0.9229 | 0.9343 |
Seychellois | 0.9565 | 0.9258 | 0.9167 | 0.933 |
Zimbabwean | 0.9487 | 0.927 | 0.9184 | 0.9314 |
Paraguayan | 0.97 | 0.9159 | 0.9069 | 0.9309 |
Bangladeshi | 0.9503 | 0.9238 | 0.9099 | 0.928 |
Emiratis | 0.9532 | 0.9326 | 0.8949 | 0.9269 |
Salvadoran | 0.9463 | 0.9132 | 0.9212 | 0.9269 |
Kenyan | 0.9387 | 0.9298 | 0.9113 | 0.9266 |
Ghanaian | 0.9471 | 0.9197 | 0.9115 | 0.9261 |
Sri Lankan | 0.9498 | 0.9247 | 0.9023 | 0.9256 |
Costa Rican | 0.9554 | 0.914 | 0.8994 | 0.9229 |
Chinese | 0.9526 | 0.9121 | 0.8967 | 0.9205 |
Moroccan | 0.9156 | 0.9388 | 0.9068 | 0.9204 |
Iranian | 0.9623 | 0.9054 | 0.8921 | 0.92 |
Panamanian | 0.9439 | 0.9096 | 0.9002 | 0.9179 |
Indian | 0.9332 | 0.9217 | 0.8986 | 0.9178 |
Kuwaiti | 0.9558 | 0.9075 | 0.8883 | 0.9172 |
Ivorians | 0.9685 | 0.8862 | 0.8955 | 0.9167 |
Georgian | 0.9456 | 0.9019 | 0.9016 | 0.9164 |
Indonesian | 0.9404 | 0.9182 | 0.8901 | 0.9163 |
Vietnamese | 0.9301 | 0.9093 | 0.9033 | 0.9142 |
Palestinian | 0.9353 | 0.9067 | 0.8935 | 0.9118 |
Libyan | 0.9419 | 0.8963 | 0.8968 | 0.9117 |
Peruvian | 0.9162 | 0.9207 | 0.8919 | 0.9096 |
Identity Group | S(d, s) | S(d, ns) | S(s, ns) | Mean Sim |
---|---|---|---|---|
Nigerian | 0.9349 | 0.9021 | 0.8907 | 0.9092 |
Iraqi | 0.9329 | 0.9014 | 0.8918 | 0.9087 |
Venezuelan | 0.9363 | 0.9056 | 0.8816 | 0.9078 |
Thai | 0.9269 | 0.9081 | 0.8878 | 0.9076 |
Jordanian | 0.9408 | 0.8903 | 0.882 | 0.9044 |
South African | 0.9312 | 0.8963 | 0.8851 | 0.9042 |
Colombian | 0.9212 | 0.9038 | 0.8847 | 0.9033 |
Somalis | 0.9207 | 0.8971 | 0.8893 | 0.9024 |
Tunisian | 0.9385 | 0.8837 | 0.8756 | 0.8992 |
Syrian | 0.9316 | 0.8723 | 0.8891 | 0.8977 |
Pakistani | 0.9198 | 0.8991 | 0.8733 | 0.8974 |
Malaysian | 0.9489 | 0.8773 | 0.8601 | 0.8954 |
Singapore | 0.9435 | 0.8726 | 0.8659 | 0.894 |
Philippine | 0.9092 | 0.9085 | 0.8638 | 0.8938 |
Hondurans | 0.9505 | 0.8709 | 0.8572 | 0.8929 |
Greeks | 0.9238 | 0.89 | 0.8605 | 0.8914 |
Polish | 0.932 | 0.8804 | 0.8553 | 0.8892 |
Chilean | 0.9241 | 0.8854 | 0.8565 | 0.8887 |
Taiwanese | 0.9041 | 0.8894 | 0.8706 | 0.888 |
Israeli | 0.9172 | 0.8789 | 0.8656 | 0.8872 |
Uruguayan | 0.9057 | 0.8723 | 0.8827 | 0.8869 |
Beninois | 0.9588 | 0.8649 | 0.8354 | 0.8864 |
Algerian | 0.8747 | 0.8896 | 0.8942 | 0.8862 |
Ukrainian | 0.8977 | 0.8819 | 0.8756 | 0.8851 |
Mauritian | 0.8907 | 0.8815 | 0.8828 | 0.885 |
Barundi | 0.9206 | 0.8625 | 0.8681 | 0.8838 |
Mexican | 0.9021 | 0.8926 | 0.8558 | 0.8835 |
Japanese | 0.882 | 0.9109 | 0.8552 | 0.8827 |
Netherlanders | 0.9234 | 0.86 | 0.8524 | 0.8786 |
Kosovar | 0.9252 | 0.8679 | 0.8409 | 0.878 |
Danish | 0.9241 | 0.8551 | 0.8536 | 0.8776 |
Russian | 0.9074 | 0.8804 | 0.84 | 0.8759 |
Lithuanian | 0.8924 | 0.8589 | 0.8569 | 0.8694 |
Romanian | 0.9021 | 0.8786 | 0.8217 | 0.8675 |
Albanian | 0.8668 | 0.8633 | 0.8684 | 0.8662 |
Canadian | 0.9444 | 0.8129 | 0.8355 | 0.8643 |
Bulgarian | 0.87 | 0.8578 | 0.8624 | 0.8634 |
Argentine | 0.8789 | 0.875 | 0.8283 | 0.8607 |
Brazilian | 0.8728 | 0.8635 | 0.8415 | 0.8593 |
Serbian | 0.8896 | 0.8512 | 0.8346 | 0.8585 |
Portuguese | 0.8854 | 0.8526 | 0.8371 | 0.8584 |
Belgian | 0.89 | 0.8459 | 0.8316 | 0.8558 |
Austrian | 0.8875 | 0.8563 | 0.8123 | 0.852 |
Norwegian | 0.8669 | 0.8534 | 0.8349 | 0.8517 |
English | 0.8828 | 0.8533 | 0.8172 | 0.8511 |
Italian | 0.8671 | 0.8598 | 0.8226 | 0.8498 |
Turks | 0.8744 | 0.8567 | 0.8132 | 0.8481 |
Swiss | 0.8826 | 0.8344 | 0.8267 | 0.8479 |
French | 0.8836 | 0.8515 | 0.8062 | 0.8471 |
Macedonian | 0.8432 | 0.8721 | 0.8258 | 0.847 |
Spanish | 0.8591 | 0.8521 | 0.8186 | 0.8433 |
Comorans | 0.8712 | 0.8433 | 0.8118 | 0.8421 |
Croatian | 0.8505 | 0.8693 | 0.8021 | 0.8406 |
United States | 0.8798 | 0.8404 | 0.8004 | 0.8402 |
Irish | 0.8619 | 0.8425 | 0.8068 | 0.837 |
Lebanese | 0.8314 | 0.854 | 0.8164 | 0.834 |
Andorran | 0.845 | 0.8184 | 0.8379 | 0.8338 |
Australian | 0.8694 | 0.8341 | 0.7928 | 0.8321 |
German | 0.8274 | 0.8373 | 0.8088 | 0.8245 |
New Zealand | 0.8601 | 0.8219 | 0.7864 | 0.8228 |
British | 0.8423 | 0.7948 | 0.8262 | 0.8211 |
Swedes | 0.8428 | 0.8585 | 0.7604 | 0.8206 |
Luxembourg | 0.781 | 0.8102 | 0.806 | 0.7991 |
Welsh | 0.7808 | 0.8475 | 0.7326 | 0.787 |
United Kingdom | 0.7981 | 0.7563 | 0.778 | 0.7775 |
Finns | 0.7197 | 0.829 | 0.7021 | 0.7503 |