Cohort Shapley value for algorithmic fairness

Masayoshi Mase
Hitachi, Ltd Art B. Owen
Stanford University Benjamin B. Seiler
Stanford University

(May 2021)

Abstract

Cohort Shapley value is a model-free method of variable importance grounded in game theory that does not use any unobserved and potentially impossible feature combinations. We use it to evaluate algorithmic fairness, using the well known COMPAS recidivism data as our example. This approach allows one to identify for each individual in a data set the extent to which they were adversely or beneficially affected by their value of a protected attribute such as their race. The method can do this even if race was not one of the original predictors and even if it does not have access to a proprietary algorithm that has made the predictions. The grounding in game theory lets us define aggregate variable importance for a data set consistently with its per subject definitions. We can investigate variable importance for multiple quantities of interest in the fairness literature including false positive predictions.

1 Introduction

Machine learning is now commonly used to make consequential decisions about people, affecting hiring decisions, loan applications, medical treatments, criminal sentencing and more. It is important to understand and explain these decisions. A critical part of understanding a decision is quantifying the importance of the variables used to make it. When the decisions are about people and some of the variables describe protected attributes of those people, such as race and gender, then variable importance has a direct bearing on algorithmic fairness.

As we describe below, most variable importance measures work by changing a subset of the input variables to a black box. We survey such methods below and argue against that approach. The first problem is that the resulting analysis can depend on some very unreasonable variable combinations. A second problem with changing inputs to a black box is that it necessarily attributes zero importance to a variable that the algorithm did not use. If a protected variable is not actually used in the black box then the algorithm would automatically be considered fair. However this ‘fairness through unawareness’ approach is not reliable, as Adler et al., (2018) and many others have noted. Information about the protected variables can leak in through others with which they are associated. The practice known as ‘redlining’ involves deliberate exploitation of such associations. When studying fairness we must have the possibility of studying a variable not included in the black box. In this paper we develop and illustrate an approach to algorithmic fairness that does not use impossible values and can detect redlining.

1.1 Variable Importance

Variable importance measures have a long history and there has been a recent surge in interest motivated by problems of explainable AI. The global sensitivity analysis literature studies black box functions used in engineering and climate models among others. Much of that work is based on a functional ANOVA model of the function relating the output to inputs. Variance explained is partitioned using Sobol’ indices (Sobol’,, 1993). Saltelli et al., (2008) is an introductory textbook and Razavi et al., (2021) is a current survey of the field. Wei et al., (2015) provide a comprehensive survey of variable importance measures in statistics. They include 197 references of which 24 are themselves surveys. Molnar, (2018) surveys variable importance measures used in explainable AI. Prominent among these are SHAP (Lundberg and Lee,, 2017) and LIME (Ribeiro et al.,, 2016). LIME makes a local linear approximation to a black box function $f(\cdot)$ and one can then take advantage of the easier interpretability of linear models. SHAP is based on Shapley value from cooperative game theory (Shapley,, 1953), that we describe in more detail below.

Ordinarily we can compare two prediction methods by waiting to see how accurate they are on unknown future data or, if necessary, use holdout data. This does not carry over to deciding whether LIME or SHAP or some other method has made a better explanation of a past decision. Suppose that we want to understand something like why a given applicant was turned down for a loan. When $f(\cdot)$ is available to us in closed form then there is no doubt at all about whether the loan would have been offered under any assortment of hypothetical feature combinations that we choose. We can compare methods by how they define importance. Choosing among definitions is a different activity, essentially a philosophical one, and we face tradeoffs. Our problem is one of identifying the causes of given effects when we have perfect knowledge of $f(\cdot)$ for all input variable combinations. This is different from the more common learning task of quantifying the effects of given causes. That distinction was made by Holland, (1988) in studying causal inference. It goes back at least to Mill, (1843).

A problem with many, but not all, measures of variable importance is that they are based on changing some feature values from one level to another, while holding other features constant and then looking at the changes to $f(\cdot)$ . When the underlying features are highly correlated, then some of these combinations can be quite implausible, casting doubt on any variable importance methods that use them. In extreme cases, the combinations can be physically impossible (e.g., systolic blood pressure below diastolic) or logically impossible (e.g., birth date after graduation date). The use of these combinations has also been criticized by Hooker and Mentch, (2019).

To avoid using impossible values, Cohort Shapley method¹¹1https://github.com/cohortshapley/cohortshapley from Mase et al., (2019) uses only observed data values. For any target subject $t\in\{1,\dots,n\}$ and any feature variable $j=1,\dots,d$ we obtain a Shapley value $\phi_{j}=\phi_{j}(t)$ that measures the impact of the value of feature $j$ on $f(\cdot)$ for subject $t$ . As we describe below, those impacts can be positive or negative. Using one of the Shapley axioms we will be able to aggregate from individual subjects to an impact measure for the entire data set. We can also disaggregate from the entire data set to a subset, such as all subjects with the protected level of a protected variable.

1.2 Contributions

Refer to caption — Figure 1: Histograms of the cohort Shapley impact of race on whether a person in the COMPAS data is predicted to reoffend. Orange bars represent Black subjects; blue represent White subjects.

In this paper we use the cohort Shapley method from Mase et al., (2019) to measure variable importance. One of the features of cohort Shapley is that it can attribute importance to a variable that is not actually used in $f(\cdot)$ . This is controversial and not all authors approve but it is essential in the present context of algorithmic fairness.

Our contribution is to provide a method of quantifying bias at both individual and group levels with axiomatic consistency between the individual and group measures and without requiring any impossible or unobserved combinations of input variables. The approach can even be used when the prediction algorithm is not available as for instance when it is proprietary.

We investigate the COMPAS data (Angwin et al.,, 2016) which include predictions of who is likely to commit a crime. There is great interest in seeing whether the algorithm is unfair to Black subjects. In the given context, the prediction code $f(\cdot)$ is a proprietary algorithm, unavailable to us and so we cannot change any of the values in it. What we have instead are the predictions on a set of subjects. Cohort Shapley can work with the predictions because it does not require any hypothetical feature combinations. Figure 1 shows cohort Shapley impacts for the race of the subjects in the COMPAS data, computed in a way that we describe below. As it turns out the impact of race on whether a subject was predicted to reoffend was always positive for Black subjects and it was always negative for White subjects. For some other measures that we show, the histograms overlap. The average impact for Black subjects was $0.067$ and the average for White suspects was $-0.101$ . For context, the response values were $1$ for those predicted to reoffend and $0$ for those predicted not to reoffend.

An outline of this paper is as follows. Section 2 introduces our notation, defines Shapley value, and presents a few of the variable importance measures and fairness definitions from the literature. Section 3 describes the COMPAS recidivism data set. Section 4 has our analysis of that data including a Bayesian bootstrap for uncertainty quantification. Section 5 has our conclusions. An appendix presents some additional COMPAS results like the ones we selected for discussion.

2 Notation and definitions

For subjects $i=1,\dots,n$ , the value of feature $j=1,\dots,d$ is $x_{ij}\in\mathcal{X}_{j}$ . We consider categorical variables $x_{ij}$ . Continuously distributed values can be handled as discussed in Section 5. The features for subject $i$ are encoded in $\boldsymbol{x}_{i}\in\mathcal{X}=\prod_{j=1}^{d}\mathcal{X}_{j}$ . For each subject there is a response value $y_{i}\in\mathbb{R}$ . The feature indices belong to the set $1{:}d\equiv\{1,\dots,d\}$ and similarly the subject indices are in $1{:}n\equiv\{1,\dots,n\}$ . For $u\subseteq 1{:}d$ the tuple $\boldsymbol{x}_{u}$ is $(x_{j})_{j\in u}\in\mathcal{X}_{u}=\prod_{j\in u}\mathcal{X}_{j}$ , and $\boldsymbol{x}_{iu}=(x_{ij})_{j\in u}$ .

There is also an algorithmic prediction $\hat{y}_{i}$ . It is usual in variable importance problems to have $\hat{y}_{i}=f(\boldsymbol{x}_{i})$ for a function $f(\cdot)$ that may be difficult to interpret (e.g., a black box). This $f(\cdot)$ is often an approximation of $\mathbb{E}(y\!\mid\!\boldsymbol{x})$ though it need not be of that form. Cohort Shapley uses only the values $\hat{y}_{i}$ so it does not require access to $f(\cdot)$ . Since it only uses a vector of values, one per subject, it can be used to find the important variables in $y_{i}$ , or $\hat{y}_{i}$ or combinations such as the residual $y_{i}-\hat{y}_{i}$ .

In many settings $y_{i}$ and $\hat{y}_{i}$ are both binary. The value $y_{i}=1$ may mean that subject $i$ is worthy of a loan, or is predicted to commit a crime, or should be sent to intensive care and $\hat{y}_{i}\in\{0,1\}$ is an estimate of $y_{i}$ . In this case, measures such as $\mathrm{FP}_{i}=1\{\hat{y}_{i}=1\ \&\ y_{i}=0\}$ describing a false positive are of interest.

Here are some typographic conveniences that we use. When $j\not\in u$ , then $u+j$ is $u\cup\{j\}$ . For $u\subseteq 1{:}d$ we use $-u$ for $1{:}d\setminus u$ . The subscript to $\boldsymbol{x}$ may contain a comma or not depending on what is clearer. For instance, we use $\boldsymbol{x}_{iu}$ but also $\boldsymbol{x}_{i,-u}$ .

2.1 Shapley values

Shapley value (Shapley,, 1953) is used in game theory to define a fair allocation of rewards to a team that has cooperated to produce something of value. Many variable importance problems can be formulated as a team of input variables generating an output value or an output variance explained. We then want to apportion importance or impact to the individual variables.

Suppose that a team of $d$ members produce a value $\mathrm{val}(1{:}d)$ , and that we have at our disposal the value $\mathrm{val}(u)$ that would have been produced by the team $u$ , for all $2^{d}$ teams $u\subseteq 1{:}d$ . Let $\phi_{j}$ be the reward for player $j$ . It is convenient to work with incremental values $\mathrm{val}(j\!\mid\!u)=\mathrm{val}(u+j)-\mathrm{val}(u)$ for sets $u$ with $j\not\in u$ .

Shapley introduced quite reasonable criteria:

1)

Efficiency: $\sum_{j=1}^{d}\phi_{j}=\mathrm{val}(1{:}d)$ .
2)

Symmetry: If $\mathrm{val}(i\!\mid\!u)=\mathrm{val}(j\!\mid\!u)$ for all $u\subseteq 1{:}d\setminus\{i,j\}$ , then $\phi_{i}=\phi_{j}$ .
3)

Dummy: if $\mathrm{val}(j\!\mid\!u)=0$ for all $u\subseteq 1{:}d\setminus\{j\}$ , then $\phi_{j}=0$ .
4)

Additivity: if $\mathrm{val}(u)$ and $\mathrm{val}^{\prime}(u)$ lead to values $\phi_{j}$ and $\phi_{j}^{\prime}$ then the game producing $(\mathrm{val}+\mathrm{val}^{\prime})(u)$ has values $\phi_{j}+\phi^{\prime}_{j}$ .

He found that the unique valuation that satisfies all four of these criteria is

\displaystyle\phi_{j}=\frac{1}{d}\sum_{u\subseteq-j}{d-1\choose|u|}^{-1}\mathrm{val}(j\!\mid\!u)

(1)

Formula (1) is not very intuitive. Another way to explain Shapley value is as follows. We could build a team from ${\varnothing}$ to $1{:}d$ in $d$ steps, adding one member at a time. There are $d!$ different orders in which to add team members. The Shapley value $\phi_{j}$ is the increase in value coming from the addition of member $j$ , averaged over all $d!$ different orders. From equation (1) we see that Shapley value does not change if we add or subtract the same quantity from all $\mathrm{val}(u)$ . It can be convenient to make $\mathrm{val}({\varnothing})=0$ .

We discuss some variable importance measures based on Shapley value below. We use the framework from Sundararajan and Najmi, (2020). They survey many uses of Shapley value in explainable AI.

2.2 Changing variables

Most mechanisms for investigating variable importance proceed by changing some, but not all, of the input variables to $f(\cdot)$ . For $u\subseteq 1{:}d$ the hybrid point $\boldsymbol{w}=\boldsymbol{x}_{u}{:}\boldsymbol{z}_{-u}$ has

w_{j}=\begin{cases}x_{j},&j\in u\\ z_{j},&j\not\in u.\end{cases}

We can investigate changes to $\boldsymbol{x}_{u}$ by examining $f(\boldsymbol{x}_{u}{:}\boldsymbol{z}_{-u})-f(\boldsymbol{x})$ for various points $\boldsymbol{x},\boldsymbol{z}\in\mathcal{X}$ .

Global sensitivity analysis (Razavi et al.,, 2021) works with an analysis of variance (ANOVA) defined in terms of $\boldsymbol{x}\in\mathcal{X}$ with independent components for which $\sigma^{2}=\mathrm{var}(f(\boldsymbol{x}))<\infty$ . The sets $\mathcal{X}_{j}$ can be discrete or continuous. It is common there to measure the importance of a subset of variables by variance explained. For instance, the lower Sobol’ index for variables $\boldsymbol{x}_{u}$ is $\underline{\tau}^{2}_{u}=\mathrm{var}(\mathbb{E}(f(\boldsymbol{x})\!\mid\!\boldsymbol{x}_{u}))$ which is usually normalized to $\underline{\tau}^{2}_{u}/\sigma^{2}$ . The upper Sobol’ index is $\overline{\tau}^{2}_{u}=\sigma^{2}-\underline{\tau}^{2}_{-u}$ , that is, everything not explained by $\boldsymbol{x}_{-u}$ is attributed to $\boldsymbol{x}_{u}$ . These measures are very natural in settings where the components of $\boldsymbol{x}$ can all vary freely and independently, but such is usually not the case for inputs to a black box machine learning model.

The quantitative influence function (QII) method of Datta et al., (2016) uses a model in which the features are statistically independent from their marginal distributions.

In baseline Shapley (see Sundararajan and Najmi, (2020)), there is a baseline tuple $\boldsymbol{x}_{b}\in\mathcal{X}$ of $\boldsymbol{x}$ , such as the sample average $\bar{\boldsymbol{x}}$ . Then the value for variable subset $u$ and subject $t$ is $f(\boldsymbol{x}_{t,u}{:}\boldsymbol{x}_{b,-u})$ . We get the same Shapley values using $\mathrm{val}(u)=f(\boldsymbol{x}_{t,u}{:}\boldsymbol{x}_{b,-u})-f(\boldsymbol{x}_{b})$ . Then the total value to explain for subject $t$ is $f(\boldsymbol{x}_{t})-f(\boldsymbol{x}_{b})$ . The counterfactuals are changes in some subset of the values in $\boldsymbol{x}_{t}$ . Important variables are those that move the value the most from the baseline $f(\boldsymbol{x}_{b})$ to subject $t$ ’s value. Shapley’s combination provides a principled weighting of the effect of changing $x_{b,j}$ to $x_{t,j}$ given some subset $u$ of variables that have also been changed.

Conditional expectation Shapley (Sundararajan and Najmi,, 2020) uses a joint distribution $D$ on $\mathcal{X}$ and then for subject $t$ $\mathrm{val}(u)=\mathbb{E}_{D}(f(\boldsymbol{x})\mid\boldsymbol{x}_{u}=\boldsymbol{x}_{t,u})$ under $D$ . SHAP (Lundberg and Lee,, 2017) is a conditional expectation Shapley using a distribution $D$ in which the features are independent. Cohort Shapley (Mase et al.,, 2019) that we describe next is very nearly conditional expectation Shapley with $D$ equal to the empirical distribution.

2.3 Changing knowledge

In cohort Shapley the counterfactual does not involve replacing $\boldsymbol{x}_{tu}$ by $\boldsymbol{x}_{bu}$ . It is instead about concealing the values of $\boldsymbol{x}_{tu}$ . It is what Kumar et al., (2020) call a conditional method because it requires specification of a conditional distribution on the features. The version we present here is a form of conditional expectation Shapley using the empirical distribution of the data.

For every pair of values $x_{tj},x_{ij}\in\mathcal{X}_{j}$ we can declare them to be similar or not. In the present context we take similarity to just be $x_{tj}=x_{ij}$ . For a target subject $t\in 1{:}n$ and set $u\subseteq 1{:}d$ we define the cohort $C_{u}=C_{tu}=\{i\in 1{:}n\mid\boldsymbol{x}_{iu}=\boldsymbol{x}_{tu}\}$ . These are the subjects who match subject $t$ on all of the variables $j\in u$ . They may or may not match the target on $j\not\in u$ . None of the cohorts is empty because they all contain $t$ . The value function in cohort Shapley is the cohort mean

\mathrm{val}(u)=\frac{1}{|C_{tu}|}\sum_{i\in C_{tu}}\hat{y}_{i}.

Here $\hat{y}_{i}=f(\boldsymbol{x}_{i})$ but we write it this way because in practice we might not have $f(\cdot)$ at our disposal, just the predictions $\hat{y}_{i}$ .

The incremental value $\mathrm{val}(j\!\mid\!u)$ is then how much the cohort mean moves when we reveal $x_{tj}$ in addition to previously revealed variables $\boldsymbol{x}_{tu}$ . Having specified the values, the Shapley formula provides attributions. Note that $\mathrm{val}({\varnothing})$ is the plain average of all $\hat{y}_{i}$ and Shapley value is unchanged by subtracting $\mathrm{val}({\varnothing})$ from all of the $\mathrm{val}(u)$ . Furthermore, when there are many variables, then very commonly $C_{t,1{:}d}=\{t\}$ and then the total value to be explained, $\mathrm{val}(1{:}d)-\mathrm{val}({\varnothing})$ is simply $\hat{y}_{t}-(1/n)\sum_{i=1}^{n}\hat{y}_{i}$ .

2.4 Definitions of fairness

Just as there are many ways to define variable importance, there are multiple ways to define what fairness means. Some of those definitions are mutually incompatible and some of them differ from legal definitions. Here we present a few of the issues. See Corbett-Davies and Goel, (2018), Chouldechova and Roth, (2018), Berk et al., (2018), and Friedler et al., (2019) for surveys. We do not make assertions about which definitions are preferable.

For $y,\hat{y}\in\{0,1\}$ , let $n_{y\hat{y}}$ be the number of subjects with $y_{i}=y$ and $\hat{y}_{i}=\hat{y}$ . These four counts and their derived properties can be computed for any subset of the subjects. The false positive rate (FPR) is $n_{01}/n_{0\text{\tiny$\bullet$}}$ , where a bullet indicates that we are summing over the levels of that index. We ignore uninteresting corner cases such as $n_{0\text{\tiny$\bullet$}}=0$ ; when there are no subjects with $y=0$ then we have no interest in the proportion of them with $\hat{y}=1$ . The false negative rate (FNR) is $n_{10}/n_{1\text{\tiny$\bullet$}}$ . The prevalence of the trait under study is $p=n_{1\text{\tiny$\bullet$}}/n_{\text{\tiny$\bullet$}\text{\tiny$\bullet$}}$ . The positive predictive value (PPV) is $n_{11}/n_{\text{\tiny$\bullet$}1}$ . As Chouldechova, (2017, equation (2.6)) notes, these values satisfy

\displaystyle\mathrm{FPR}=\frac{p}{1-p}\frac{1-\mathrm{PPV}}{\mathrm{PPV}}(1-\mathrm{FNR}).

(2)

2.5 Aggregation and disaggregation

The additivity axiom of Shapley value is convenient for us. It means that importances for the residual $y_{i}-\hat{y}_{i}$ is simply the difference of importances for $y_{i}$ and $\hat{y}_{i}$ . We can aggregate importances from subjects to the whole data set by summing, or more interpretably averaging over $t=1,\dots,n$ . We can also disaggregate from the whole data set to subsets of special interest by averaging cohort Shapley values over target subjects $t\in v\subset 1{:}n$ .

3 COMPAS data

COMPAS is a tool from Northpointe Inc. for judging the risk that a criminal defendant will commit another crime (re-offend) within two years. Each subject is rated into one of ten deciles with higher deciles considered higher risk of reoffending. Angwin et al., (2016) investigated whether that algorithm was biased against Black people. They obtained data for subjects in Broward County Florida, including the COMPAS decile, the subjects’ race, age, gender, number of prior crimes and whether the crime for which they were charged is a felony or not. Angwin et al., (2016) describe how they processed their data including how they found followup data on offences committed and how they matched those to subjects for whom they had prior COMPAS scores. They also note that race was not one of the variables used in the COMPAS predictions.

The example is controversial. Angwin et al., (2016) find that COMPAS is biased because it gave a higher rate of false positives for Black subjects and a higher rate of false negatives for White subjects. Flores et al., (2016) and Dieterich et al., (2016) disagree raising the issue of $\hat{y}\!\mid\!y$ fairness versus $y\!\mid\!\hat{y}$ fairness. The prevelance of reoffences differed between Black and White subjects forcing $y\!\mid\!\hat{y}$ and $\hat{y}\!\mid\!y$ notions to be incompatible. Flores et al., (2016) also questioned whether the subjects in the data set were at comparable stages in the legal process to those for which COMPAS was designed.

Following Chouldechova, (2017) we focus on just Black and White subjects. That provides a sample of 5278 subjects from among the original 6172 subjects. As in that paper we record the number of prior convictions as a categorical variable with five levels: 0, 1–3, 4–6, 7–10 and $>$ 10. Following Angwin et al., (2016), we record the subjects’ ages as a categorical variable with three levels: $<$ 25, 25–45, $>$ 45. Also, following Chouldechova, (2017), we consider the prediction $\hat{y}_{i}$ to be $1$ if subject $i$ is in deciles 5–10 and $\hat{y}_{i}=0$ for subject $i$ in deciles 1–4.

4 Exploration of the COMPAS data

In this section we use cohort Shapley to study some fairness issues in the COMPAS data, especially individual level metrics. We have selected what we found to be the strongest and most interesting findings. The larger set of figures and tables from which these are drawn are in the appendix. The appendix also compares some conventional group fairness metrics for Black and White subjects.

We have computed Shapley impacts for these responses: $y_{i}$ , $\hat{y}_{i}$ , $y_{i}-\hat{y}_{i}$ , $\mathrm{FP}_{i}=1\{y_{i}=0\,\&\,\hat{y}_{i}=1\}$ and $\mathrm{FN}_{i}=1\{y_{1}=1\,\&\,\hat{y}_{i}=1\}$ . If $\mathrm{FP}_{i}=1$ then subject $i$ received a false positive prediction. Note that the sample average value of $\mathrm{FP}_{i}$

\hat{\mathbb{E}}(\mathrm{FP}_{i})=\frac{n_{01}}{n_{\text{\tiny$\bullet$}\text{\tiny$\bullet$}}}=\mathrm{FPR}\times\frac{n_{0\text{\tiny$\bullet$}}}{n_{\text{\tiny$\bullet$}\text{\tiny$\bullet$}}}=\mathrm{FPR}\times(1-p).

Being wrongly predicted to reoffend is an adverse outcome that we study. Its expected value is the FPR times the fraction of non-reoffending subjects. Similarly, $\hat{\mathbb{E}}(\mathrm{FN}_{i})=\mathrm{FNR}\times p$ .

4.1 Graphical analysis

Figure 2 shows histograms of Shapley impacts of race for the subjects in the COMPAS data. The first panel there reproduces the data from Figure 1 showing a positive impact for every Black subject and a negative one for every White subject for the prediction $\hat{y}$ . For the actual response $y$ , the histograms overlap slightly. By additivity of Shapley value the impacts for $y-\hat{y}$ can be found by subtracting the impact for $\hat{y}$ from that for $y$ for each subject $i=1,\dots,n$ . The histograms of $y_{i}-\hat{y}_{i}$ show that the impact of race on the residual is typically positive for White subjects and negative for Black subjects.

Overlap of distributions is seen in equal opportunity (false positive rate: FPR) and false negative rate (FNR). In there, small number of adversely affected White and beneficially affected Black are observed. We can inspect further on these overlapped subjects to understand which conditions are exceptional cases. Also we can inspect on tail of the distribution to understand the extreme cases where bias is very likely observed.

Figure 3 shows histograms of Shapley impacts for race but now they are color coded by the subjects’ gender. We see that the impact on $\hat{y}$ is bimodal by race for both male and female subjects, but the effect is larger in absolute value for male subjects. The impacts for the response, the residual and false negative and positive values do not appear to be bimodal by race for female subjects, but they do for male subjects. The case of FN is perhaps different. The modes in Figure 2 are close together and not so apparent in Figure 3 which has slightly different bin boundaries. It is clear from these figures that the race differences we see are much stronger among male subjects.

The appendix includes some figures that each include 25 histograms. They are Shapley impact histograms for all five features color coded by each of the five features. There is one such figure for each of $y$ , $\hat{y}$ , $y-\hat{y}$ , $\mathrm{FP}$ and $\mathrm{FN}$ .

4.2 Tabular summary

Tables 2 through 6 in the appendix record mean Shapley impacts for the five predictors and our responses measures, disaggregated by race and gender.

Variable	White	Black	Male	Female
race_factor	0.054	$-$ 0.035	$-$ 0.001	0.003
gender_factor	$-$ 0.001	0.001	0.020	$-$ 0.083

Variable	White-Male	White-Female	Black-Male	Black-Female
race_factor	0.057	0.042	$-$ 0.036	$-$ 0.030
gender_factor	0.023	$-$ 0.083	0.018	$-$ 0.083

Table 1: Mean cohort Shapley impact of groups on residual

y-\hat{y}

We take a particular interest in the residual $y_{t}-\hat{y}_{t}$ . It equals $1$ for false negatives and $-1$ for false positives and $0$ when the prediction was correct. Table 1 shows a subset of the largest values for the residual $y_{i}-\hat{y}_{i}$ .

What we see there is that revealing that a subject is Black tells us that, on average, that subject’s residual $y_{t}-\hat{y}_{t}$ is decreased by $3.6$ %. Revealing that the subject is White increases the residual by $5.4$ %. Revealing race makes very little difference to the residual averaged over male or over female subjects. Revealing gender makes quite a large difference of $-8.3$ % for female subjects and $+2.0\%$ for male subjects.

To judge the uncertainty in the values in Table 1 we applied the Bayesian bootstrap of Rubin, (1981). That algorithm randomly reweights each data point by a unit mean exponential random variable. By never fully deleting any observation it allows one to also bootstrap individual subjects’ Shapley values though there is not space to do so in this article. Figure 4 shows violin plots of 1000 bootstrapped cohort Shapley values.

4.3 FPR and FNR revisited

The risk of being falsely predicted to reoffend involves two factors: having $y_{i}=0$ and having $\hat{y}_{i}=1$ given that $y_{i}=0$ . FPR is commonly computed only over subjects with $y_{i}=0$ . Accordingly, in this section we study it by subsetting the subjects to $\{i\mid y_{i}=0\}$ and finding cohort Shapley value of $\hat{y}_{i}=1$ for the features.

Figure 5 shows cohort Shapley impact of race factor conditioned on race factor for FPR and FNR. The distribution of impacts conditioned on race are separated in the figure. This conditional analysis shows a stronger disadvantage for Black subjects.

5 Conclusions

Variable importance in statistics and machine learning ordinarily considers three issues: whether changing $x_{j}$ has a causal future impact on $y$ , whether omitting $x_{j}$ from a data set degrades prediction accuracy and whether changing $x_{j}$ has an important mechanical effect on $f(\boldsymbol{x})$ (Jiang and Owen,, 2003). These are all different from each other and the third choice is the most useful one for explaining the decisions of a prediction method.

Cohort Shapley (Mase et al.,, 2019) looks at a fourth issue: whether learning the value of $x_{tj}$ in a sample is informative about the prediction $\hat{y}_{t}$ . It is well suited to settings such as the COMPAS data from Broward County, where: the original algorithm is not available to researchers, the protected variable of greatest interest was not included in the model, and the set of subjects for which fairness is of interest is different from the set on which the algorithm was trained.

Cohort Shapley does not address counterfactuals, such as whether subject $t$ would have had $\hat{y}_{t}=0$ , but for the fact that $x_{tj}=B$ instead of $W$ . At face value, such counterfactuals directly address the issue of interest. Unfortunately there can be many other variables and combinations of variables that would also have changed the outcome, providing an equally plausible explanation. Taking account of them all combinations can bring in implausible or even impossible variable combinations.

Cohort Shapley avoids using unobserved combinations of variables. In a fairness setting we might find that every combination of variables is logically and physically possible, though some combinations may still be quite unlikely, such as the largest number of prior offences at the youngest of ages.

We do not offer any conclusions on the fairness of the COMPAS algorithm. There is a problem of missing variables that requires input from domain experts.

If some such variable is not in our data set then accounting for it could change the magnitude or even the sign of some of the effects we see. Some other issues that we believe require input from domain experts are: choosing the appropriate set of subjects to study, determining which responses are relevant to fairness and which combination of protected and unprotected variables are most suitable to include. We do however believe that given a set of subjects and a set of variables, that cohort Shapley supports many different graphical and numerical investigations. It can do so for variables not used in a black box at hand, and it does not even require access to the black box function.

Cohort Shapley value can be used to detect algorithmic bias in an algorithm circumventing some limitations described above. As with any testing method, there is the possibility of false positives and false negatives. Any bias that is detected must be followed up by further examination. The positive social outcomes of our method arise from providing a tool that can detect and illustrate biases that might have otherwise been missed. Negative social outcomes can also arise from false positives and false negatives due to missing variables.

Acknowledgments

This work was supported by the US National Science Foundation under grant IIS-1837931 and by a grant from Hitachi, Ltd.

References

Adler et al., (2018) Adler, P., Falk, C., Friedler, S. A., Nix, T., Rybeck, G., Scheidegger, C., Smith, B., and Venkatasubramanian, S. (2018). Auditing black-box models for indirect influence. Knowledge and Information Systems, 54(1):95–122.
Angwin et al., (2016) Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). Machine bias: there’s software used across the country to predict future criminals. and it’s biased against blacks.
Berk et al., (2018) Berk, R., Heidari, H., Jabbari, S., Kearns, M., and Roth, A. (2018). Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research.
Chouldechova, (2017) Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153–163.
Chouldechova and Roth, (2018) Chouldechova, A. and Roth, A. (2018). The frontiers of fairness in machine learning. Technical report, arXiv:1810.08810.
Corbett-Davies and Goel, (2018) Corbett-Davies, S. and Goel, S. (2018). The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023.
Corbett-Davies et al., (2017) Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., and Huq, A. (2017). Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining, pages 797–806.
Datta et al., (2016) Datta, A., Sen, S., and Zick, Y. (2016). Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP), pages 598–617. IEEE.
Dieterich et al., (2016) Dieterich, W., Mendoza, C., and Brennan, T. (2016). COMPAS risk scales: Demonstrating accuracy equity and predictive parity. Technical report, Northpoint Inc.
Flores et al., (2016) Flores, A. W., Bechtel, K., and Lowenkamp, C. T. (2016). False positives, false negatives, and false analyses: A rejoinder to machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. Federal Probation, 80(2):38–46.
Friedler et al., (2019) Friedler, S. A., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E. P., and Roth, D. (2019). A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the conference on fairness, accountability, and transparency, pages 329–338.
Hardt et al., (2016) Hardt, M., Price, E., and Srebro, N. (2016). Equality of opportunity in supervised learning. In Thirtieth Conference on Neural Information Processing Systems (NIPS 2016).
Holland, (1988) Holland, P. W. (1988). Causal inference, path analysis and recursive structural equations models. Technical Report 88–81, ETS Research Report Series.
Hooker and Mentch, (2019) Hooker, G. and Mentch, L. (2019). Please stop permuting features: An explanation and alternatives. Technical report, arXiv:1905.03151.
Jiang and Owen, (2003) Jiang, T. and Owen, A. B. (2003). Quasi-regression with shrinkage. Mathematics and Computers in Simulation, 62(3-6):231–241.
Kleinberg et al., (2016) Kleinberg, J., Mullainathan, S., and Raghavan, M. (2016). Inherent trade-offs in the fair determination of risk scores. Technical report, arXiv:1609.05807.
Kumar et al., (2020) Kumar, I. E., Venkatasubramanian, S., Scheidegger, C., and Friedler, S. (2020). Problems with Shapley-value-based explanations as feature importance measures. In The 37th International Conference on Machine Learning (ICML 2020).
Lundberg and Lee, (2017) Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pages 4765–4774.
Mase et al., (2019) Mase, M., Owen, A. B., and Seiler, B. B. (2019). Explaining black box decisions by Shapley cohort refinement. Technical report, arXiv:1911.00467.
Mill, (1843) Mill, J. S. (1843). A system of logic. Parker, London.
Molnar, (2018) Molnar, C. (2018). Interpretable machine learning: A Guide for Making Black Box Models Explainable. Leanpub.
Razavi et al., (2021) Razavi, S., Jakeman, A., Saltelli, A., Prieur, C., Iooss, B., Borgonovo, E., Plischke, E., Piano, S. L., Iwanaga, T., Becker, W., Tarantola, S., Guillaume, J. H. A., Jakeman, J., Gupta, H., Milillo, N., Rabitti, G., Chabridon, V., Duan, Q., Sun, X., Smith, S., Sheikholeslami, R., Hosseini, N., Asadzadeh, M., Puy, A., Kucherenko, S., and Maier, Holger, R. (2021). The future of sensitivity analysis: An essential discipline for systems modeling and policy support. Environmental Modelling & Software, 137:104954.
Ribeiro et al., (2016) Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, New York. ACM.
Rubin, (1981) Rubin, D. B. (1981). The Bayesian bootstrap. The annals of statistics, 9(1):130–134.
Saltelli et al., (2008) Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M., and Tarantola, S. (2008). Global sensitivity analysis: the primer. John Wiley & Sons.
Shapley, (1953) Shapley, L. S. (1953). A value for n-person games. In Kuhn, H. W. and Tucker, A. W., editors, Contribution to the Theory of Games II (Annals of Mathematics Studies 28), pages 307–317. Princeton University Press, Princeton, NJ.
Sobol’, (1993) Sobol’, I. M. (1993). Sensitivity estimates for nonlinear mathematical models. Mathematical Modeling and Computational Experiment, 1:407–414.
Sundararajan and Najmi, (2020) Sundararajan, M. and Najmi, A. (2020). The many Shapley values for model explanation. In The 37th International Conference on Machine Learning (ICML 2020).
Wei et al., (2015) Wei, P., Lu, Z., and Song, J. (2015). Variable importance analysis: A comprehensive review. Reliability Engineering & System Safety, 142:399–432.
Xiang, (2020) Xiang, A. (2020). Reconciling legal and technical approaches to algorithmic bias. Tennessee Law Review, 88(3):2021.

Appendix A Appendix

This appendix includes conventional group fairness metrics and whole bias distribution analysis results by cohort Shapley on COMPAS data example.

Figure 6 shows some conventional group fairness metrics for the COMPAS data set. Horizontal bars show group specific means and the vertical dashed lines show population means. We see that Black subjects had a higher average value of $\hat{y}$ than White subjects. Black subjects also had a higher average of $y$ but a lower average residual $y-\hat{y}$ . Using $B$ and $W$ to denote the two racial groups, $\hat{\mathbb{E}}(y-\hat{y}\!\mid\!B)\doteq-0.035$ and $\hat{\mathbb{E}}(y-\hat{y}\!\mid\!W)\doteq 0.054$ . The FPR was higher for Black subjects and the FNR was higher for White subjects.

Figure 7 describes 5 x 5 matrix of histograms on individualized bias (impact) on demographic parity of prediction. The columns represent the variables where we look into their biases and the rows represent the conditioned variables for grouping when we generate the histogram. The colors in the figure indicate categories in the conditioned variables. Figure 8 describes histograms of individualized bias on demographic parity of response. Figure 9 describes histograms of individualized bias on residual. Figure 10 describes histograms of individualized bias on false positive that corresponds to equal opportunity. Figure 10 describes histograms of individualized bias on false negative.

Table 2: Mean cohort Shapley impact of groups on prediction

\hat{y}

Variable	White	Black	Male	Female
priors_count	$-$ 0.024	0.016	0.006	$-$ 0.023
crime_factor	$-$ 0.004	0.003	0.001	$-$ 0.004
age_factor	$-$ 0.018	0.012	0.000	$-$ 0.001
race_factor	$-$ 0.101	0.067	0.002	$-$ 0.006
gender_factor	$-$ 0.001	0.000	0.000	$-$ 0.002

Variable	White-Male	White-Female	Black-Male	Black-Female
priors_count	$-$ 0.023	$-$ 0.027	0.024	$-$ 0.019
crime_factor	$-$ 0.002	$-$ 0.010	0.003	0.001
age_factor	$-$ 0.017	$-$ 0.020	0.011	0.015
race_factor	$-$ 0.112	$-$ 0.065	0.072	0.045
gender_factor	$-$ 0.008	0.025	0.006	$-$ 0.025

Table 3: Mean cohort Shapley impact of groups on response

y

Variable	White	Black	Male	Female
priors_count	$-$ 0.018	0.012	0.004	$-$ 0.018
crime_factor	$-$ 0.003	0.002	0.001	$-$ 0.003
age_factor	$-$ 0.010	0.007	0.000	$-$ 0.001
race_factor	$-$ 0.048	0.032	0.001	$-$ 0.003
gender_factor	$-$ 0.002	0.001	0.021	$-$ 0.085

Variable	White-Male	White-Female	Black-Male	Black-Female
priors_count	$-$ 0.017	$-$ 0.021	0.017	$-$ 0.015
crime_factor	$-$ 0.001	$-$ 0.006	0.002	0.000
age_factor	$-$ 0.010	$-$ 0.009	0.007	0.007
race_factor	$-$ 0.055	$-$ 0.023	0.035	0.014
gender_factor	0.015	$-$ 0.058	0.024	$-$ 0.107

Table 4: Mean cohort Shapley impact of groups on residual

y-\hat{y}

Variable	White	Black	Male	Female
priors_count	0.007	$-$ 0.004	$-$ 0.001	0.006
crime_factor	0.001	$-$ 0.001	$-$ 0.000	0.001
age_factor	0.008	$-$ 0.005	$-$ 0.000	0.000
race_factor	0.054	$-$ 0.035	$-$ 0.001	0.003
gender_factor	$-$ 0.001	0.001	0.020	$-$ 0.083

Variable	White-Male	White-Female	Black-Male	Black-Female
priors_count	0.007	0.007	$-$ 0.006	0.005
crime_factor	0.001	0.003	$-$ 0.001	$-$ 0.000
age_factor	0.007	0.011	$-$ 0.004	$-$ 0.009
race_factor	0.057	0.042	$-$ 0.036	$-$ 0.031
gender_factor	0.023	$-$ 0.083	0.018	$-$ 0.083

Table 5: Mean cohort Shapley impact of groups on false positive.

Variable	White	Black	Male	Female
priors_count	$-$ 0.002	0.002	0.000	$-$ 0.002
crime_factor	$-$ 0.001	0.000	0.000	$-$ 0.001
age_factor	$-$ 0.006	0.004	0.000	$-$ 0.000
race_factor	$-$ 0.033	0.022	0.001	$-$ 0.002
gender_factor	0.001	$-$ 0.000	$-$ 0.011	0.044

Variable	White-Male	White-Female	Black-Male	Black-Female
priors_count	$-$ 0.002	$-$ 0.003	0.002	$-$ 0.001
crime_factor	$-$ 0.000	$-$ 0.002	0.000	0.001
age_factor	$-$ 0.005	$-$ 0.008	0.003	0.007
race_factor	$-$ 0.034	$-$ 0.022	0.023	0.015
gender_factor	$-$ 0.013	0.047	$-$ 0.009	0.042

Table 6: Mean cohort Shapley impact of groups on false negative.

Variable	White	Black	Male	Female
priors_count	0.004	$-$ 0.003	$-$ 0.001	0.004
crime_factor	0.001	$-$ 0.001	$-$ 0.000	0.001
age_factor	0.002	$-$ 0.001	$-$ 0.000	0.000
race_factor	0.021	$-$ 0.014	$-$ 0.000	0.001
gender_factor	$-$ 0.001	0.000	0.010	$-$ 0.039

Variable	White-Male	White-Female	Black-Male	Black-Female
priors_count	0.005	0.004	$-$ 0.004	0.004
crime_factor	0.001	0.002	$-$ 0.001	0.000
age_factor	0.001	0.002	$-$ 0.001	$-$ 0.002
race_factor	0.021	0.020	$-$ 0.014	$-$ 0.015
gender_factor	0.010	$-$ 0.037	0.009	$-$ 0.041