Measuring Model Biases in the Absence of Ground Truth

Anonymous

Abstract

Recent advances in computer vision have led to the development of image classification models that can predict tens of thousands of object classes. Training these models can require millions of examples, leading to a demand of potentially billions of annotations. In practice, however, images are typically sparsely annotated, which can lead to problematic biases in the distribution of ground truth labels that are collected. This potential for annotation bias may then limit the utility of ground truth-dependent fairness metrics (e.g., Equalized Odds).

To circumvent this problem, in this work we introduce a new framing to the measurement of fairness and bias that does not rely on ground truth labels. Instead, we treat the model predictions for a given image as a set of labels, analogous to a “bag of words”approach used in Natural Language Processing (NLP) [Jurafsky2009]. This allows us to explore different association metrics between prediction sets in order to detect patterns of bias and automatically surface potential stereotypes that the model has learned. We apply this approach to examine the relationship between sensitive labels (in our case, gendered labels like “man” and “woman”) and all other labels in the Open Images Dataset (OID). We demonstrate how the statistical properties (especially normalization) of the different association metrics can lead to different sets of labels detected for “gender bias”. We conclude by demonstrating that pointwise mutual information normalized by joint probability (nPMI) is able to detect many labels with significant gender bias, while remaining relatively insensitive to marginal label frequency. This metric can therefore be useful to understand how model predictions are associated in an image dataset without the need for expensive ground truth labelling. Finally, we announce an open-sourced nPMI visualization tool using TensorBoard.