Racial/Ethnic Categories in AI and Algorithmic Fairness: Why They Matter and What They Represent

Jennifer Mickel [email protected] University of Texas at AustinUnited States of America

(2024)

Abstract.

Racial diversity has become increasingly discussed within the AI and algorithmic fairness literature, yet little attention is focused on justifying the choices of racial categories and understanding how people are racialized into these chosen racial categories. Even less attention is given to how racial categories shift and how the racialization process changes depending on the context of a dataset or model. An unclear understanding of who comprises the racial categories chosen and how people are racialized into these categories can lead to varying interpretations of these categories. These varying interpretations can lead to harm when the understanding of racial categories and the racialization process is misaligned from the actual racialization process and racial categories used. Harm can also arise if the racialization process and racial categories used are irrelevant or do not exist in the context they are applied.

In this paper, we make two contributions. First, we demonstrate how racial categories with unclear assumptions and little justification can lead to varying datasets that poorly represent groups obfuscated or unrepresented by the given racial categories and models that perform poorly on these groups. Second, we develop a framework, CIRCSheets, for documenting the choices and assumptions in choosing racial categories and the process of racialization into these categories to facilitate transparency in understanding the processes and assumptions made by dataset or model developers when selecting or using these racial categories.

racial categories, racialization, algorithmic fairness, race and ethnicity

^†^†copyright: acmlicensed^†^†journalyear: 2024^†^†doi: XXXXXXX.XXXXXXX^†^†conference: ACM Conference on Fairness, Accountability, and Transparency; June 03–06, 2024; Rio de Janeiro, Brazil

1. Introduction

The utilization of racial and ethnic categories in the development of datasets and models facilitates the inclusion and documentation of diverse perspectives. Racial and ethnic categories are especially crucial for datasets and models in which race and ethnicity serve as relevant factors, may act as confounding variables, or enable the ability to audit for fairness using race and ethnicity for fairness purposes. For example, understanding the racial and/or ethnic target of hate speech is crucial for understanding the impact of hate speech, as hate speech can differ based on the race and/or ethnicity of the target (nielsen2002subtle, ). Similarly, in health, race is correlated with health outcomes (barger2009relative, ), and knowledge of a patient’s race and ethnicity can help contextualize the patient’s experience and health history (okoro2021examining, ). In algorithmic fairness settings, knowledge of an individual’s race and ethnicity allows for auditing of existing datasets and systems, and many fairness toolkits, such as Fairlearn, rely on this data (bird2020fairlearn, ; lee2021landscape, ). Despite the benefit of race and ethnicity, little justification is provided for the racial and ethnic categories chosen and why these categories are most relevant for a dataset or model’s particular domain. Furthermore, even if the choice of racial and ethnic categories is justified, even less discussion of how these racial and ethnic categories are assigned to individuals and what factors influence the racialization of people into these categories is given. Discussion of how people are assigned or racialized into these categories is crucial as the racialization of people into particular racial groups varies based on cultural context (drnovvsek2021comparing, ). Discussing this racialization process allows for understanding how the cultural context(s) and domain(s) affect people’s placement and racialization into racial categories.

The racial and ethnic categorization schema used in datasets and models varies based on numerous factors. Some racial schemas used are binary, as in Black/non-Black, Black/White, and White/non-White, while others use multiple racial categories, as in Asian, Black, Hispanic, and White (abdu2023empirical, ). The racial and ethnic categories selected determine what racial and ethnic experiences are valued and will be traceable. In the binary setting, this often leads to the exclusion of people not racialized into these groups, and people with multiple racial identities are obscured. In the case of White/non-White, the experiences of non-White individuals are treated similarly since they are in the same category, even though it is evident that the experiences of non-White individuals vary drastically. For example, the experiences of Asians and Blacks within the US cultural context vary immensely (chanbonpin2015between, ).

In this paper, we discuss in greater depth the effect of racial categorization choices on datasets and models, and we demonstrate the importance of documenting choices and motivations for racial categories by showcasing how ill-defined racial categories can affect datasets and model performance. Our work is motivated by previous scholarship on racial categories in algorithmic and AI fairness (abdu2023empirical, ; benthall2019racial, ; hanna2020towards, ) as well as by existing documentation frameworks for datasets and models (bender2018data, ; chmielinski2022dataset, ; crisan2022interactive, ; diaz2022crowdworksheets, ; gebru2021datasheets, ; holland2020dataset, ; hutchinson2021towards, ; mitchell2019model, ; pushkarna2022data, ). We extend this work by focusing on how the choice of racial categorization and the racialization of people into the chosen racial categories affects how well-represented people are and, subsequently, dataset quality and model performance. To combat these effects, we develop CIRCSheets, a novel framework allowing developers of datasets or models to document their motivations behind why they selected certain racial categories and consider the effects of their choice in racial categories.

2. Background and Related Work

2.1. Racial Categories: The Status Quo

Racial categories used in datasets and models tend to align with the US cultural context (abdu2023empirical, ; benthall2019racial, ; hanna2020towards, ). Abdu et al. (abdu2023empirical, ) identify two main choices for racial categorization: binary and more than two races. When binary racial categorization is chosen, it often operates under a Black/White axis. If the racial classification selected is more than two races, the racial categorizations tend to echo the US census (abdu2023empirical, ; benthall2019racial, ). The common categories used with multiple racial categories were Asian, Black, Hispanic, and White (abdu2023empirical, ).

The use of racial categories in datasets and models can help ensure a wide variety of perspectives are represented and considered. Furthermore, the presence of racial categories aids in analyzing, testing, and auditing datasets and models for disparities between racial groups. Without racial categories, these analyses along the axis of race would be challenging to conduct (sandvig2014auditing, ; vecchione2021algorithmic, ). Unfortunately, poorly defined racial categories can hinder actualizing these benefits (nugraheni2021family, ; rebbeck2022distinct, ). This can occur if a racial category comprises multiple groups whose experiences of racialization vary because the racial group no longer serves as a meaningful proxy for people’s lived experiences within those groups.

An example of a racial category that comprises multiple groups who are racialized differently in the US cultural context is White. Individuals of Middle Eastern and North African (MENA) descent are categorized as White within the US despite many members of MENA not perceiving themselves to be White (marks2020collecting, ). Furthermore, within the US cultural context, their lived experience and racialization differ from people of European ancestry (maghbouleh2022middle, ). Having MENA as part of the White racial category obfuscates the experiences of members of MENA within datasets and models, preventing researchers from observing disparate health outcomes of this group (awad2022lack, ). Practitioners and researchers cannot see if a model performs poorly on MENA or if a dataset accounts for the experiences of people who are part of MENA. Most existing fairness toolkits require demographic information to audit algorithms, so practitioners who use these tools cannot audit their models for information on how the model performs on MENA (lee2021landscape, ).

A racial category can obfuscate people within that category when a multiracial ethnicity is treated as a racial category. For example, Latinx is a multiracial ethnicity, and the experiences of Latinxs can vary drastically based on the cultural context they are in and their race. For example, in the US, the experiences of lighter-skinned and darker-skinned Latinxs differ (uzogara2021belongs, ). Darker-skinned Latinxs racialized as Black in the US cultural context experience anti-Black discrimination from Latinxs and Whites (hernandez2022racial, ). Placing all Latinxs into the Latinx category would obfuscate the experiences of darker-skinned Latinxs and prevent researchers and practitioners from observing whether datasets include darker-skinned Latinxs and if models perform poorly on darker-skinned Latinxs.

2.2. Race and Ethnicity

Race and ethnicity, although similar, are two different concepts. Racial groups are differentiated by physical differences in certain social constructs (bashi1997theory, ; omi2014racial, ). Whereas ethnic groups, are differentiated based on social practices such as ”language, religion, rituals, and other patterns of behavior” (bashi1997theory, ; omi2014racial, ; yu202027, , pp. 106). Often, ethnic categories are treated as racial categories, which can pose a problem when an ethnicity is not synonymous with a race, as in the case of panethnicities (defined in Section 3.3). For example, some Afro-Latinx individuals identify or are racialized as Latinx ethnically and Black racially (hordge2020out, ). This can lead to obfuscation for Afro-Latinxs and members of other multiracial panethnicities because it is unclear whether an individual’s racial identity takes precedence over their ethnic identity or vice versa.

Race and ethnicity, although they have no biological determinant, have real impacts on people’s lives, ranging from their health to education to work (mcchesney2015teaching, ; mukhopadhyay2013real, ; berger2015more, ; carter2017racial, ; wheeler2017racial, ). Documenting race and ethnicity within datasets and models allows us to see how models perform on various races and ethnicities and helps audit the model for disparate impact. Furthermore, practitioners can train models using loss functions or other techniques that utilize race to help mitigate the oppression people of various racial and ethnic groups experience. Loss functions, used to train models, can be designed to help fulfill these goals (keswani2021towards, ). Without knowledge of race and ethnicity, it is incredibly challenging to audit for disparate performance along the axes of race and ethnicity.

2.3. Racialization

Racialization refers to the process by which racial meaning is given to people (murji2005racialization, ). Factors of physical difference, such as skin color and eye shape, among others, affect how people are racialized, as do accents (clair2015sociology, ; omi2014racial, ; shuck2006racializing, ). The process of racialization varies depending on cultural context, and relevant features in one context may be irrelevant in another (telles2014black, ). For example, the racial identification of Latinx adolescents and young adults shifts from adolescence to young adulthood and varies depending on generational time in the US, demonstrating that the process of racialization within the US and Latin American countries varies substantially enough for their racial identities to change (irizarry2023race, ). Furthermore, as time spent in the US increases, an individual’s racial identity is less likely to shift (irizarry2023race, ).

Self-racial identification and external racialization differ. For example, the responses of Puerto Ricans and Dominicans to the race question on the 2010 US Census differ drastically, with respondents interpreting the question of race differently and using different aspects of race to answer the question (roth2010racial, ). This leads to racial self-identification that differs from how Puerto Ricans and Dominicans would be racialized based on their phenotype within the US (roth2010racial, ). This is due, in part, to different cultural contexts between the US, Puerto Rico, and the Dominican Republic (roth2010racial, ). For example, in the Dominican Republic, Black is used to describe Haitians (itzigsohn2005immigrant, ). This leads to the racial self-identification of Hispanics on the US Census racialized as Black in the US cultural context to be a poor proxy for their physical features (telles2018latinos, ).

Salient features of racialization can differ based on the cultural context one is in. In the US, skin color plays a large role in racializing people into racial categories (monk2021beholding, ; omi2014racial, ). In Latin America, physical features other than skin color, such as hair texture and facial structure, play a part in racializing someone as Black, causing Latinx individuals with similar skin tones to be racialized differently due to other physical features such as hair texture and facial features (hernandez2022racial, ). Utilizing racial categories without discussing how people are racialized prevents us from understanding who comprises these racial categories and what factors affect whether people are racialized into particular categories and can lead to harm if we transpose different understandings of racial categories and racialization.

2.4. Racial Categories: Contextual Relevance and History

The choice of racial categories in datasets and models is influenced by an array of sociotechnical factors, ranging from technical factors, such as model limitations, to contextual relevance, such as cultural context (abdu2023empirical, ). Datasets and models developed within the US cultural context tend to utilize racial groups relevant to the US cultural context but provide little justification for these choices (abdu2023empirical, ). Sometimes the US census is used as justification, as in Andrus et al. (andrus2022demographic, ) or prior work, as in Yang et al. (yang2020towards, ), but most position cultural context as a sufficient justification of racial categories, as in Borradaile et al. (borradaile2020whose, ).

Race has been central to political life in the United States (omi2014racial, ). This is evident through political discourse, legal history, and the US Census (omi2014racial, ). The census has been used as a tool to encode these values (abdu2023empirical, ), which is evident when observing the history of racial categories within the US Census. As an example, the Census of 1890 had four categories to classify people with African ancestry out of a total of eight categories (schor2017counting, ). This preoccupation with blackness in 1890 reflects the political climate within the southern states at the time (lee1993racial, ). The US Census of 1960 also reflects the political climate of the time, as Hawaii became a state in 1959 and Hawaiian and part-Hawaiian were added as racial categories to the US Census for the first time (lee1993racial, ; o2000irreconcilable, ). Observing the racial categories in the census over the years showcases how race within the US cultural context has shifted. Before 1860, the racial categories the census included were along the axis of Black and White, but as Asian immigrants immigrated to the US, Asian racial categories were added (hochschild2008racial, ).

Racialization for certain groups varies depending on the context and domain. For example, the racialization of Filipinos varies by context (ocampo2016latinos, ). Some Filipinos identify culturally as Latinx rather than Asian, but within educational contexts, they tend to be treated as Asian rather than Latinx (ocampo2016latinos, ; ocampo2013really, ). This is seen in the literature for some studies racialize Filipinos as Asian, as in Baluran et al. (baluran2023life, ) and Irizarry et al. (irizarry2015utilizing, ) while others racialize Filipinos as Hispanic, as in Treviño (trevino1987standardized, ).

In addition to the context associated with the domain one operates in, racialization is affected based on the cultural context (drnovvsek2021comparing, ). For example, the experience of Central-East European immigrants differs between the UK and Japan (drnovvsek2021comparing, ). In addition to this, the experiences of certain groups within a racial category vary. For example, East Asians and South Asians are both racialized as Asian, yet their experiences differ, which leads Americans of Chinese descent to have a higher life expectancy than Americans of South Indian descent (baluran2023life, ).

Racial categories also differ based on country. Farquharson (farquharson2007racial, ) discusses the racial formation of racial categories in the US, South Africa, and Australia, all of which are settler colonial states and identify race along a Black/White axis. Despite this, within each cultural context, people are racialized into the Black category differently. In South Africa, people of African ancestry who are mixed are racialized as colored, while in the US, they would be considered Black (daya2013panel, ; khanna2010if, ). In Australia, the Aboriginal peoples are racialized as Black, while in the US, they would not be (farquharson2007racial, ). Lack of justification regarding racial categories prevents critical analysis of the sociological foundation of racial categories.

2.5. Researcher Justifications

Abdu et al. (abdu2023empirical, ) identify five existing categories of racial category justification in the algorithmic fairness literature. Researcher justifications fall under data availability, technical factors, appeals to prior scientific work, epistemic concerns, and contextual relevance (abdu2023empirical, ).

The first two categories of justifications, data availability and technical factors, focus on limiting factors that affect racial category justification. Data availability affects the racial categories researchers can choose because the choice of racial categories was made earlier during the data curation process. Furthermore, researchers and practitioners must rely on the information regarding racial categories and racialization provided with the data. In many cases, this means no information is provided (abdu2023empirical, ). Technical factors can affect the racial categories chosen because the model or algorithm may require or be limited to a certain number of features, as in the case of Friedler et al. (friedler2019comparative, ) where their model required a binary racial category as the algorithm’s sensitive attribute.

The last three categories of justification appeal to prior scientific work, epistemic concerns, and contextual relevance, focus on justifications related to the goal of the dataset or model and the domain(s) and cultural context(s) in which the dataset and model will be used. Appeals to prior scientific work utilize existing literature as justification for the racial categories used (abdu2023empirical, ). Justifications regarding epistemic concerns centering racial categories with greater scientific rigor, such as describing what features constitute a person’s placement into a particular racial category (abdu2023empirical, ). Cultural context refers to the racial categories that are relevant in particular societies (abdu2023empirical, ). Oftentimes, there is an assumption of collective understanding that the racial categories chosen are salient for a certain cultural context. For example, datasets developed in the US cultural context, as in Borradaile et al. (borradaile2020whose, ), will justify their choice of racial categories by saying they are relevant to the US context.

3. How racial/ethnic categories can affect datasets and models

With the usage of racial and ethnic categories during dataset and model development, it is often unclear who fits into these categories due to the lack of discussion regarding assumptions about who is racialized into these categories. The cultural relevance and demographic makeup of these categories, as well as the multi-dimensionality of race and ethnicity, can impact a dataset’s quality and a model’s performance. Section 3.2 demonstrates how different demographic distributions, possible in broad or ill-specified racial and ethnic categories, can affect model accuracy on a group level.

3.1. The Effect of Cultural (Ir)relevance

Cultural relevance is crucial when selecting racial categories. Racial categories vary depending on cultural context (farquharson2007racial, ). If the racial categories selected for a cultural context are irrelevant to the domain(s) and context(s) they will be deployed in, the benefit of racial categories is lost, as racial categories lose their meaning when irrelevant.

Some racial categories, such as Black, may exist in multiple different cultural contexts, but the people placed into this category change depending on the context. A poorly defined definition of Black, which occurs when there is little to no discussion of how people are racialized into the category of Black, can lead to the usage of varying definitions of Black, especially if a dataset or model is used in a variety of cultural contexts. This has occurred in the US where people have been categorized as Black even though they would be racialized as white (schor2017counting, ).

To illustrate this effect, imagine a dataset or model is developed for the cultural contexts of the US, South Africa, and Australia, where Black is a culturally relevant racial category (farquharson2007racial, ). The developers are aware that Black as a racial category exists in each of these contexts and select the racialization process for the Black racial category to be culturally relevant to Australia, which refers to the Aboriginal people as Black (farquharson2007racial, ). The developers make this selection without conveying the racialization process of people into the Black category. Another group decides to use the dataset or model in the US or South Africa without understanding that people racialized into the Black category within this dataset or model are Aboriginal. This can lead to downstream issues or harm as the Black category is not relevant to the US or South African context since the racialization process differs from that of Australia. To prevent this from occurring, it is crucial to understand how people are racialized into each racial category of a dataset or model to understand if those racialization processes are culturally relevant to the domain(s) users of the dataset or model want to utilize it for.

3.2. The Effect of Distribution Shift in Broad Categories

Abdu et al. (abdu2023empirical, ) identify two main choices for racial categorization: binary and more than two races. Previous work using binary racial categorization utilizes Black/White, Black/non-Black, or white/non-White (abdu2023empirical, ). Non-Black and non-White are broad categories, and the possible sociodemographic distributions can vary drastically. With these racial categorization schemas, it becomes unclear which groups comprise non-Black and non-White. The non-White category could be comprised solely of Latinx individuals, or it could be comprised of both Black and Latinx individuals. Understanding the composition of broad racial categories and who can be included in these categories is crucial. Otherwise, dataset quality and model performance metrics might differ if the distributions within these broad categories shift.

Classifier	Metric	Asian	African American	Caucasian	Hispanic	Native American	Other	All	Groups Trained On
Everyone	TPR (%)	100.0	73.6	50.6	42.2	100.0	42.9	62.9	62.9
	FPR (%)	0.0	39.7	15.2	22.8	0.0	24.4	27.8	27.8
	PPV (%)	100.0	66.8	70.9	59.4	100.0	54.5	67.0	67.0
	FDR (%)	0.0	33.2	29.1	40.6	0.0	45.5	33.0	33.0
	Acc (%)	100.0	67.2	70.3	61.8	100.0	62.3	67.8	67.8
Black/ White	TPR (%)	100.0	74.2	51.7	44.4	100.0	46.4	63.9	66.3
	FPR (%)	0.0	41.0	15.6	22.8	0.0	24.4	28.5	29.7
	PPV (%)	100.0	66.2	70.8	60.6	100.0	56.5	66.8	67.4
	FDR (%)	0.0	33.8	29.2	39.4	0.0	43.5	33.2	32.6
	Acc (%)	100.0	66.9	70.5	62.7	100.0	63.8	67.9	68.4
White/non-White (Hispanic + Other)	TPR (%)	100.0	71.8	44.4	35.6	100.0	42.9	59.5	42.6
	FPR (%)	0.0	35.1	14.4	15.8	0.0	19.5	24.4	15.2
	PPV (%)	100.0	68.9	69.3	64.0	100.0	60.0	68.6	67.3
	FDR (%)	0.0	31.1	30.7	36.0	0.0	40.0	31.4	32.7
	Acc (%)	100.0	68.5	68.2	62.7	100.0	65.2	68.0	66.9
White/non-White (Hispanic)	TPR (%)	100.0	69.7	44.4	35.6	100.0	35.7	57.9	42.6
	FPR (%)	0.0	34.8	12.3	15.8	0.0	19.5	23.5	13.0
	PPV (%)	100.0	68.5	72.5	64.0	100.0	55.6	68.9	70.9
	FDR (%)	0.0	31.5	27.5	36.0	0.0	44.4	31.1	29.1
	Acc (%)	100.0	67.6	69.4	62.7	100.0	62.3	67.7	68.1
Black/non-Black (Hispanic + Other)	TPR (%)	100.0	73.6	49.4	46.7	100.0	50.0	63.2	69.0
	FPR (%)	0.0	41.3	16.5	21.1	0.0	24.4	28.8	36.7
	PPV (%)	100.0	65.9	68.8	63.6	100.0	58.3	66.3	65.3
	FDR (%)	0.0	34.1	31.2	36.4	0.0	41.7	33.7	34.7
	Acc (%)	100.0	66.5	69.1	64.7	100.0	65.2	67.4	66.1
Black/non-Black (Hispanic)	TPR (%)	100.0	73.9	50.0	46.7	100.0	50.0	63.6	70.7
	FPR (%)	0.0	41.6	16.9	21.1	0.0	24.4	29.1	38.4
	PPV (%)	100.0	65.8	68.5	63.6	100.0	58.3	66.2	65.6
	FDR (%)	0.0	34.2	31.5	36.4	0.0	41.7	33.8	34.4
	Acc (%)	100.0	66.5	69.1	64.7	100.0	65.2	67.4	66.2

Table 1. Recidivism prediction performance is measured by the true positive rate (TPR), false positive rate (FPR), positive predictive value (PPV), false discovery rate (FDR), which is (1 - PPV), and accuracy (Acc). The groups in parentheses next to the category in ”Classifiers” refer to the groups the logistic regression models are trained on. For example, Black/non-Black (Hispanic + Other) means that the non-Black category consisted of those in the categories of Hispanic or Other and the model was trained on Black, Hispanic, and Other data points. The bolded numbers correspond to the classifier with the highest percentage for that particular metric, and the italicized numbers correspond to the classifier with the lowest percentage for that particular metric. If something occurs three or more times, it is not bolded or italicized even if it meets the criteria. Asian and Native American are the same for each classifier, so none of those metrics are bolded or italicized.

To demonstrate the impact of this, we use the dataset associated with COMPAS, an algorithm used to predict the recidivism risk of defendants, to train a logistic regression classifier using varying distributions of data based on the racial and ethnic categories in the dataset (angwin2022machine, ). Our logistic regression classifiers are trained race-blind and use a threshold of 0.5. We test the logistic regression classifiers on each demographic group individually, all demographic groups, and the demographic groups trained on. Our results are showcased in Table 1 which demonstrates that performance metrics vary based on the data each logistic regression model was trained on. The overall accuracy for all groups between the classifiers is within 1%, but, per group, the difference between accuracies can range almost three times that for Hispanic and Other and two times that for African American and Caucasian. This means that the choice of racial categorization schema, racial categories, and who is racialized into these categories can have a real effect on whether someone is more likely to be correctly predicted to rescind. The true positive rate varies within 5%, and the false positive rate varies within 7% across all groups. These figures only increase when looking at each group individually. African American, Hispanic, and Other have higher false positive rates to begin with, so individuals in these groups would be more affected by this variation in false positive rates. The positive predictive value and false discovery rate vary by 2.6% for all groups and up to almost double that for Hispanic (4.6%) and Other (5.5%).

This variation also occurs within racial categorization schemas. For White/non-White, the performance metrics can vary around 5% when comparing Everyone, White/Black, White/non-White (Hispanic + Other), and White/non-White (Hispanic), which would all be valid distributions under the White/non-White categorization. Similar variation occurs for Black/non-Black when comparing Everyone, White/Black, Black/non-Black (Hispanic + Other), and Black/non-Black (Hispanic) which would all be valid distributions under the Black/non-Black categorization. Even in more specific racial and ethnic categories like Asian, Black, Latinx, Indigenous, Pacific Islander, and White this can transpire, for different distributions of various ethnic groups or racial groups can occur in these categories, which can also lead to variation in dataset quality and model performance.

3.3. The Effect of Racial Multi-Dimensionality and Panethnicity

Existing usage of racial categories in datasets and models rarely allows for multiracial and panethnic identities. Due to technical limitations (abdu2023empirical, ), each person is assigned a singular racial category and is rarely assigned more than one racial category. This leads multiracial individuals and their experiences to be obfuscated in either a racial category that comprises part of their racial experience or an ’Other’ category where other multiracial individuals are placed, often with differing experiences of race (ford2021monoracial, ; remedios2013finally, ). This manifests in models as Wolfe et al. (wolfe2022evidence, ) demonstrate that multiracial people are more likely to be assigned a racial or ethnic label of a minority group rather than a majority group.

Panethnic identities are, similarly, seldom adequately represented in the racial and ethnic categories used in datasets and models (irizarry2023race, ). Panethnicity refers to the identity that forms when different ethnic or tribal groups build institutions and identities across these ethnic groups’ boundaries, leading to panethnicities comprised of people of various racial identities (okamoto2014panethnicity, ). There are numerous panethnicities, and Latinx is an example of a panethnicity (mora2022identifies, ).

When panethnicities are included as a category in the chosen racial/ethnic categories selected or used by practitioners, the panethnic categories tend to be treated as a racial category regardless of the other racial identities members of panethnic groups may have. This leads the racial identities of members of this panethnicity to be unaccounted for and causes members of a panethnicity to be treated similarly due to their categorization, obfuscating the varying experiences of people that can be associated, in part, with their racial identity (lopez2013killing, ). This is readily seen within the US cultural context when Latinx as a category is used to solely represent the experiences of Latinx individuals, negatively affecting Afro-Latinxs, as their identities are obfuscated since often they are unable to select a racial category that best describes their racial identity and experience. Many Afro-Latinxs are not accepted as Latinx by their lighter-skinned peers, leading some Afro-Latinx individuals to find solidarity in Black communities where they feel more accepted (hordge2020out, ). Placing Afro-Latinxs solely in the Latinx category would prevent datasets and models from being able to account for these experiences of Afro-Latinxs.

4. CIRCSheets: a documentation framework for Considerations in Racial Categorization Selection

We present CIRCSheets, a framework to articulate the choices of racial categories to better position and understand the effect of racial categorization choices made in developing a dataset or model. This framework allows for an improved understanding of the assumptions and choices made by the users and developers of datasets and models, helping future dataset and model users understand whether the racial categories are relevant to their use case. Furthermore, our framework helps facilitate understanding surrounding the aspects of racialization and cultural context(s) considered when making choices of racial categorization and the previous assumptions made. This documentation allows for an improved understanding of the effect of the racial categories chosen and their racialization process while decreasing the likelihood of misaligned interpretations of the racial categorizations and the racialization processes from the creators of the dataset or model.

4.1. Categories

Considerations

•

Consider how data availability and technical implementation affect how race and ethnicity can be represented in the dataset and/or model.
•

Consider the domain(s) for which the dataset or model is developed for and how this affects the racial categories salient to these domain(s) and the racialization process(es).
•

Consider how well the chosen racial categories represent the population(s) represented by the dataset or the population(s) affected by the model.

Documentation Questions

(1)

What are the racial categories utilized?
(2)

What is the motivation behind using these racial categories?
(3)

Are multiracial ethnic categories utilized?
(4)

If multiracial ethnic categories are used, what is the motivation behind using these categories, and are they being treated as racial categories?
(5)

Are people who select multiple racial categories considered multiracial? Are people who select one or more ethnic categories and one racial category considered multiracial?
(6)

If so, what category are they placed into, and are other people who select multiple different racial and/or ethnic categories also placed into that same category? If not, what category are they placed in, and does ethnicity take priority over race?
(7)

For models, what is the technical implementation of the racial and/or ethnic categories?
(8)

How do ethnic groups fit into these racial categories?
(9)

Can people be obfuscated by these racial categories? If so, do these groups experience erasure and is the model or dataset likely to interact with them?

4.2. Racialization

Considerations

•

Consider what contexts the dataset or model will be used in and how this affects racialization.
•

Consider what factors will be used in the racialization process and who determines an individual’s racial identity.
•

Consider what the most relevant factors of racialization are within the context(s) the dataset or model operates within.

Documentation questions

(1)

Who determines an individual’s racial categorization? Is it the individual?
(2)

Are physical characteristics asked of an individual?
(3)

Is cultural background asked of an individual?
(4)

In what ways could the existing racial information be partial or incorrect? What impact could this have on the dataset or model?
(5)

If using an existing dataset and no racialization information exists, what was the source of the dataset, what cultural context was it developed in, and is there any existing scholarship on the racialization choices of that dataset?

4.3. Cultural Context

Considerations

•

Consider how racial identification can change in the chosen cultural context(s) of your dataset or model.
•

Within the cultural context(s) the dataset or model operates in, consider what groups experience marginalization and how the choice of racial categories can affect what groups have visibility in the dataset or model.
•

For data collection and dataset development, consider what viewpoints associated with racial identification you want to be represented within your dataset.

Documentation questions

(1)

What cultural context(s) is this dataset or model developed for?
(2)

Will this dataset or model be used in different cultural context(s)?
(3)

If the dataset or model is used in different cultural context(s) or domains, is there any misrepresentation that can occur due to changes in racialization or racial categories within these different cultural contexts and domains?

4.4. Multi-racial and pan-ethnicity

Considerations

•

Consider how multiracial individuals and multiracial panethnicities are represented within racial categories and whether the representation of these ethnicities can lead to obfuscation between people of different races within those panethnicities.
•

Consider representing racial categories and ethnicities separately.
•

Consider the representation of multiracial individuals within the dataset or model and whether this reflects their lived experiences within society.
•

Consider whether technical limitations influence whether multiracial individuals can be adequately represented within models.

Documentation questions

(1)

How are multiracial individuals and multiracial panethnicities categorized within the dataset or model?
(2)

Can more than one racial or ethnic category be selected?
(3)

Do the categories given to panethnic individuals effectively communicate their racial and ethnic identities?
(4)

Are there any individuals, such as Afro-Latinxs, who may be inadequately represented by the racial categorizations chosen?

4.5. Knowledge and Positionality

Considerations

•

Consider consulting community members and stakeholders about what racial categories best represent them and how erasure can manifest with fewer racial categories.
•

Consider the epistemic goal of the dataset or model and how choices in racial categories contribute to this goal.
•

Consider how the lived experiences of the dataset or model developers and researchers contribute to which racial categories are chosen.
•

When developing a dataset, consider what racial categories of annotators and examples in the dataset are relevant.

Documentation questions

(1)

What are the cultural backgrounds and cultural knowledge of the dataset or model developers? How familiar and/or knowledgeable are they with the cultural context(s) of the dataset or model they are developing?
(2)

If CIRCSheets is completed by people other than the original dataset or model developers, what are their cultural backgrounds? How familiar and/or knowledgeable are they with the dataset or model’s cultural context(s)?
(3)

If annotators or crowd workers are used to develop a dataset or provide feedback to a model, what are their cultural backgrounds? How familiar and/or knowledgeable are they with the cultural context(s) of the instances they annotate?
(4)

What stakeholders, community members, or other resources were consulted when selecting the racial categories?

5. Case Study

To demonstrate CIRCSheets in action, we apply our framework to the dataset associated with COMPAS using existing knowledge available about these datasets (angwin2022machine, ; kennedy2022introducing, ).

6. Conclusion

In this work, we discuss the importance of racial and ethnic categories and demonstrate the effect these choices can have on dataset quality and model performance with different interpretations of racial categories and racialization processes. Therefore, to facilitate understanding of the racial categories and racialization processes used, we develop CIRCSheets as a documentation tool for developers to communicate their assumptions, motivations, and racialization understanding, as well as, potential pitfalls. This documentation allows future users to better understand the racial and ethnic categories documented and how people are placed into these categories, assisting them in determining whether they can use this information in future tasks, such as auditing datasets and models or deploying models to consumers. Dataset and model users can also use CIRCSheets to communicate their own understanding of existing racial categories when information regarding the racial categories and racialization process in existing datasets and models is unclear or does not exist.

References

[1] Amina A Abdu, Irene V Pasquetto, and Abigail Z Jacobs. An empirical analysis of racial categories in the algorithmic fairness literature. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 1324–1333, 2023.
[2] McKane Andrus and Sarah Villeneuve. Demographic-reliant algorithmic fairness: Characterizing the risks of demographic data collection in the pursuit of fairness. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 1709–1721, 2022.
[3] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias. In Ethics of data and analytics, pages 254–264. Auerbach Publications, 2022.
[4] Germine H Awad, Nadia N Abuelezam, Kristine J Ajrouch, and Matthew Jaber Stiffler. Lack of arab or middle eastern and north african health data undermines assessment of health disparities. American Journal of Public Health, 112(2):209–212, 2022.
[5] Darwin A Baluran. Life expectancy, life disparity, and differential racialization among chinese, asian indians, and filipinos in the united states. SSM-Population Health, 21:101306, 2023.
[6] Steven D Barger, Carrie J Donoho, and Heidi A Wayment. The relative contributions of race/ethnicity, socioeconomic status, health, and social relationships to life satisfaction in the united states. Quality of Life Research, 18:179–189, 2009.
[7] Vilna Bashi and Antonio McDaniel. A theory of immigration and racial stratification. Journal of Black Studies, 27(5):668–682, 1997.
[8] Emily M Bender and Batya Friedman. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6:587–604, 2018.
[9] Sebastian Benthall and Bruce D Haynes. Racial categories in machine learning. In Proceedings of the conference on fairness, accountability, and transparency, pages 289–298, 2019.
[10] Maximus Berger and Zoltán Sarnyai. “more than skin deep”: stress neurobiology and mental health consequences of racial discrimination. Stress, 18(1):1–10, 2015.
[11] Sarah Bird, Miro Dudík, Richard Edgar, Brandon Horn, Roman Lutz, Vanessa Milan, Mehrnoosh Sameki, Hanna Wallach, and Kathleen Walker. Fairlearn: A toolkit for assessing and improving fairness in ai. Microsoft, Tech. Rep. MSR-TR-2020-32, 2020.
[12] Glencora Borradaile, Brett Burkhardt, and Alexandria LeClerc. Whose tweets are surveilled for the police: an audit of a social-media monitoring tool via log files. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 570–580, 2020.
[13] Robert T Carter, Michael Y Lau, Veronica Johnson, and Katherine Kirkinis. Racial discrimination and health outcomes among racial/ethnic minorities: A meta-analytic review. Journal of Multicultural Counseling and Development, 45(4):232–259, 2017.
[14] Kim D Chanbonpin. Between black and white: The coloring of asian americans. Wash. U. Global Stud. L. Rev., 14:637, 2015.
[15] Kasia S Chmielinski, Sarah Newman, Matt Taylor, Josh Joseph, Kemi Thomas, Jessica Yurkofsky, and Yue Chelsea Qiu. The dataset nutrition label (2nd gen): Leveraging context to mitigate harms in artificial intelligence. arXiv preprint arXiv:2201.03954, 2022.
[16] Matthew Clair and Jeffrey S Denis. Sociology of racism. The international encyclopedia of the social and behavioral sciences, 19(2015):857–63, 2015.
[17] Anamaria Crisan, Margaret Drouhard, Jesse Vig, and Nazneen Rajani. Interactive model cards: A human-centered approach to model documentation. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 427–439, 2022.
[18] Michelle Daya, Lize Van Der Merwe, Ushma Galal, Marlo Möller, Muneeb Salie, Emile R Chimusa, Joshua M Galanter, Paul D Van Helden, Brenna M Henn, Chris R Gignoux, et al. A panel of ancestry informative markers for the complex five-way admixed south african coloured population. PloS one, 8(12):e82224, 2013.
[19] Mark Díaz, Ian Kivlichan, Rachel Rosen, Dylan Baker, Razvan Amironesei, Vinodkumar Prabhakaran, and Emily Denton. Crowdworksheets: Accounting for individual and collective identities underlying crowdsourced dataset annotation. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 2342–2351, 2022.
[20] Špela Drnovšek Zorko and Miloš Debnár. Comparing the racialization of central-east european migrants in japan and the uk. Comparative Migration Studies, 9(1):1–17, 2021.
[21] Karen Farquharson. Racial categories in three nations: Australia, south africa and the united states. In Proceedings of ‘Public sociologies: lessons and trans-Tasman Comparisons’, the Annual Conference of The Australian Sociological Association (TASA), 2007.
[22] Karly S Ford, Ashley N Patterson, and Marc P Johnston-Guerrero. Monoracial normativity in university websites: Systematic erasure and selective reclassification of multiracial students. Journal of Diversity in Higher Education, 14(2):252, 2021.
[23] Sorelle A Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P Hamilton, and Derek Roth. A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the conference on fairness, accountability, and transparency, pages 329–338, 2019.
[24] Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. Datasheets for datasets. Communications of the ACM, 64(12):86–92, 2021.
[25] Alex Hanna, Emily Denton, Andrew Smart, and Jamila Smith-Loud. Towards a critical race methodology in algorithmic fairness. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 501–512, 2020.
[26] Tanya Katerí Hernández. Racial innocence: Unmasking Latino anti-Black bias and the struggle for equality. Beacon Press, 2022.
[27] Jennifer L Hochschild and Brenna Marea Powell. Racial reorganization and the united states census 1850–1930: Mulattoes, half-breeds, mixed parentage, hindoos, and the mexican race. Studies in American Political Development, 22(1):59–96, 2008.
[28] Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. The dataset nutrition label. Data Protection and Privacy, 12(12):1, 2020.
[29] Elizabeth Hordge-Freeman and Edlin Veras. Out of the shadows, into the dark: Ethnoracial dissonance and identity formation among afro-latinxs. Sociology of Race and Ethnicity, 6(2):146–160, 2020.
[30] Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, and Margaret Mitchell. Towards accountability for machine learning datasets: Practices from software engineering and infrastructure. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 560–575, 2021.
[31] Yasmiyn Irizarry. Utilizing multidimensional measures of race in education research: The case of teacher perceptions. Sociology of Race and Ethnicity, 1(4):564–583, 2015.
[32] Yasmiyn Irizarry, Ellis P Monk Jr, and Ryon J Cobb. Race-shifting in the united states: Latinxs, skin tone, and ethnoracial alignments. Sociology of Race and Ethnicity, 9(1):37–55, 2023.
[33] Jose Itzigsohn, Silvia Giorguli, and Obed Vazquez. Immigrant incorporation and racial identity: Racial self-identification among dominican immigrants. Ethnic and Racial Studies, 28(1):50–78, 2005.
[34] Brendan Kennedy, Mohammad Atari, Aida Mostafazadeh Davani, Leigh Yeh, Ali Omrani, Yehsong Kim, Kris Coombs, Shreya Havaldar, Gwenyth Portillo-Wightman, Elaine Gonzalez, et al. Introducing the gab hate corpus: defining and applying hate-based rhetoric to social media posts at scale. Language Resources and Evaluation, pages 1–30, 2022.
[35] Vijay Keswani, Matthew Lease, and Krishnaram Kenthapadi. Towards unbiased and accurate deferral to multiple experts. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 154–165, 2021.
[36] Nikki Khanna. “if you’re half black, you’re just black”: Reflected appraisals and the persistence of the one-drop rule. The Sociological Quarterly, 51(1):96–121, 2010.
[37] Michelle Seng Ah Lee and Jat Singh. The landscape and gaps in open source fairness toolkits. In Proceedings of the 2021 CHI conference on human factors in computing systems, pages 1–13, 2021.
[38] Sharon M Lee. Racial classifications in the us census: 1890–1990. Ethnic and racial studies, 16(1):75–94, 1993.
[39] Nancy López. Killing two birds with one stone? why we need two separate questions on race and ethnicity in the 2020 census and beyond. Latino Studies, 11:428–438, 2013.
[40] Neda Maghbouleh, Ariela Schachter, and René D Flores. Middle eastern and north african americans may not be perceived, nor perceive themselves, to be white. Proceedings of the National Academy of Sciences, 119(7):e2117940119, 2022.
[41] Rachel Marks and Nicholas Jones. Collecting and tabulating ethnicity and race responses in the 2020 census. United States Census Bureau, 2020.
[42] Kay Young McChesney. Teaching diversity: The science you need to know to explain why race is not biological. SAGE Open, 5(4):2158244015611712, 2015.
[43] Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency, pages 220–229, 2019.
[44] Ellis P Monk Jr, Michael H Esposito, and Hedwig Lee. Beholding inequality: Race, gender, and returns to physical attractiveness in the united states. American Journal of Sociology, 127(1):194–241, 2021.
[45] G Cristina Mora, Reuben Perez, and Nicholas Vargas. Who identifies as “latinx”? the generational politics of ethnoracial labels. Social Forces, 100(3):1170–1194, 2022.
[46] Carol C Mukhopadhyay, Rosemary Henze, and Yolanda T Moses. How real is race?: A sourcebook on race, culture, and biology. Rowman & Littlefield, 2013.
[47] Karim Murji and John Solomos. Racialization: Studies in theory and practice. Oxford University Press, USA, 2005.
[48] Laura Beth Nielsen. Subtle, pervasive, harmful: Racist and sexist remarks in public as hate speech. Journal of Social issues, 58(2):265–280, 2002.
[49] Suryadewi E Nugraheni and Julia F Hastings. Family-based caregiving: Does lumping asian americans together do more harm than good? Journal of Social, Behavioral, and Health Sciences, 15(1):87–102, 2021.
[50] Anthony C Ocampo. ” am i really asian?”: Educational experiences and panethnic identification among second–generation filipino americans. Journal of Asian American Studies, 16(3):295–324, 2013.
[51] Anthony Christian Ocampo. The Latinos of Asia: How Filipino Americans break the rules of race. Stanford University Press, 2016.
[52] Dina Okamoto and G. Cristina Mora. Panethnicity. Annual Review of Sociology, 40(1):219–239, 2014.
[53] Olihe N Okoro, Vibhuti Arya, Caroline A Gaither, and Adati Tarfa. Examining the inclusion of race and ethnicity in patient cases. American journal of pharmaceutical education, 85(9):8583, 2021.
[54] Eric Steven O’Malley. Irreconcilable rights and the question of hawaiian statehood. Geo. LJ, 89:501, 2000.
[55] Michael Omi and Howard Winant. Racial formation in the United States. Routledge, 2014.
[56] Mahima Pushkarna, Andrew Zaldivar, and Oddur Kjartansson. Data cards: Purposeful and transparent dataset documentation for responsible ai. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 1776–1826, 2022.
[57] Timothy R Rebbeck, Brandon Mahal, Kara N Maxwell, Isla P Garraway, and Kosj Yamoah. The distinct impacts of race and genetic ancestry on health. Nature medicine, 28(5):890–893, 2022.
[58] Jessica D Remedios and Alison L Chasteen. Finally, someone who “gets” me! multiracial people value others’ accuracy about their race. Cultural Diversity and Ethnic Minority Psychology, 19(4):453, 2013.
[59] Clara E Rodriguez. Changing race: Latinos, the census, and the history of ethnicity in the United States, volume 41. NYU Press, 2000.
[60] Wendy D Roth. Racial mismatch: The divergence between form and function in data for monitoring racial discrimination of hispanics. Social Science Quarterly, 91(5):1288–1311, 2010.
[61] Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and discrimination: converting critical concerns into productive inquiry, 22(2014):4349–4357, 2014.
[62] Paul Schor. Counting Americans : how the US Census classified the nation. Oxford University Press, New York, NY, 2017.
[63] Gail Shuck. Racializing the nonnative english speaker. Journal of Language, Identity, and Education, 5(4):259–276, 2006.
[64] Edward Telles. Latinos, race, and the us census. The ANNALS of the American Academy of Political and Social Science, 677(1):153–164, 2018.
[65] Edward Telles and Tianna Paschel. Who is black, white, or mixed race? how skin color, status, and nation shape racial classification in latin america. American Journal of Sociology, 120(3):864–907, 2014.
[66] Fernando M Treviño. Standardized terminology for hispanic populations. American Journal of Public Health, 77(1):69–72, 1987.
[67] Ekeoma E Uzogara. Who belongs in america? latinxs’ skin tones, perceived discrimination, and opposition to multicultural policies. Cultural diversity and ethnic minority psychology, 27(3):354, 2021.
[68] Briana Vecchione, Karen Levy, and Solon Barocas. Algorithmic auditing and social justice: Lessons from the history of audit studies. In Equity and Access in Algorithms, Mechanisms, and Optimization, pages 1–9. 2021.
[69] Sarahn M Wheeler and Allison S Bryant. Racial and ethnic disparities in health and health care. Obstetrics and Gynecology Clinics, 44(1):1–11, 2017.
[70] Robert Wolfe, Mahzarin R Banaji, and Aylin Caliskan. Evidence for hypodescent in visual semantic ai. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 1293–1304, 2022.
[71] Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, and Olga Russakovsky. Towards fairer datasets: Filtering and balancing the distribution of the people subtree in the imagenet hierarchy. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 547–558, 2020.
[72] Henry Yu. 27 Ethnicity, pages 106–110. New York University Press, New York, USA, 2020.