This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Cohort Shapley value for algorithmic fairness

Masayoshi Mase
Hitachi, Ltd
   Art B. Owen
Stanford University
   Benjamin B. Seiler
Stanford University
(May 2021)
Abstract

Cohort Shapley value is a model-free method of variable importance grounded in game theory that does not use any unobserved and potentially impossible feature combinations. We use it to evaluate algorithmic fairness, using the well known COMPAS recidivism data as our example. This approach allows one to identify for each individual in a data set the extent to which they were adversely or beneficially affected by their value of a protected attribute such as their race. The method can do this even if race was not one of the original predictors and even if it does not have access to a proprietary algorithm that has made the predictions. The grounding in game theory lets us define aggregate variable importance for a data set consistently with its per subject definitions. We can investigate variable importance for multiple quantities of interest in the fairness literature including false positive predictions.

1 Introduction

Machine learning is now commonly used to make consequential decisions about people, affecting hiring decisions, loan applications, medical treatments, criminal sentencing and more. It is important to understand and explain these decisions. A critical part of understanding a decision is quantifying the importance of the variables used to make it. When the decisions are about people and some of the variables describe protected attributes of those people, such as race and gender, then variable importance has a direct bearing on algorithmic fairness.

As we describe below, most variable importance measures work by changing a subset of the input variables to a black box. We survey such methods below and argue against that approach. The first problem is that the resulting analysis can depend on some very unreasonable variable combinations. A second problem with changing inputs to a black box is that it necessarily attributes zero importance to a variable that the algorithm did not use. If a protected variable is not actually used in the black box then the algorithm would automatically be considered fair. However this ‘fairness through unawareness’ approach is not reliable, as Adler et al., (2018) and many others have noted. Information about the protected variables can leak in through others with which they are associated. The practice known as ‘redlining’ involves deliberate exploitation of such associations. When studying fairness we must have the possibility of studying a variable not included in the black box. In this paper we develop and illustrate an approach to algorithmic fairness that does not use impossible values and can detect redlining.

1.1 Variable Importance

Variable importance measures have a long history and there has been a recent surge in interest motivated by problems of explainable AI. The global sensitivity analysis literature studies black box functions used in engineering and climate models among others. Much of that work is based on a functional ANOVA model of the function relating the output to inputs. Variance explained is partitioned using Sobol’ indices (Sobol’,, 1993). Saltelli et al., (2008) is an introductory textbook and Razavi et al., (2021) is a current survey of the field. Wei et al., (2015) provide a comprehensive survey of variable importance measures in statistics. They include 197 references of which 24 are themselves surveys. Molnar, (2018) surveys variable importance measures used in explainable AI. Prominent among these are SHAP (Lundberg and Lee,, 2017) and LIME (Ribeiro et al.,, 2016). LIME makes a local linear approximation to a black box function f()f(\cdot) and one can then take advantage of the easier interpretability of linear models. SHAP is based on Shapley value from cooperative game theory (Shapley,, 1953), that we describe in more detail below.

Ordinarily we can compare two prediction methods by waiting to see how accurate they are on unknown future data or, if necessary, use holdout data. This does not carry over to deciding whether LIME or SHAP or some other method has made a better explanation of a past decision. Suppose that we want to understand something like why a given applicant was turned down for a loan. When f()f(\cdot) is available to us in closed form then there is no doubt at all about whether the loan would have been offered under any assortment of hypothetical feature combinations that we choose. We can compare methods by how they define importance. Choosing among definitions is a different activity, essentially a philosophical one, and we face tradeoffs. Our problem is one of identifying the causes of given effects when we have perfect knowledge of f()f(\cdot) for all input variable combinations. This is different from the more common learning task of quantifying the effects of given causes. That distinction was made by Holland, (1988) in studying causal inference. It goes back at least to Mill, (1843).

A problem with many, but not all, measures of variable importance is that they are based on changing some feature values from one level to another, while holding other features constant and then looking at the changes to f()f(\cdot). When the underlying features are highly correlated, then some of these combinations can be quite implausible, casting doubt on any variable importance methods that use them. In extreme cases, the combinations can be physically impossible (e.g., systolic blood pressure below diastolic) or logically impossible (e.g., birth date after graduation date). The use of these combinations has also been criticized by Hooker and Mentch, (2019).

To avoid using impossible values, Cohort Shapley method111https://github.com/cohortshapley/cohortshapley from Mase et al., (2019) uses only observed data values. For any target subject t{1,,n}t\in\{1,\dots,n\} and any feature variable j=1,,dj=1,\dots,d we obtain a Shapley value ϕj=ϕj(t)\phi_{j}=\phi_{j}(t) that measures the impact of the value of feature jj on f()f(\cdot) for subject tt. As we describe below, those impacts can be positive or negative. Using one of the Shapley axioms we will be able to aggregate from individual subjects to an impact measure for the entire data set. We can also disaggregate from the entire data set to a subset, such as all subjects with the protected level of a protected variable.

1.2 Contributions

Refer to caption
Figure 1: Histograms of the cohort Shapley impact of race on whether a person in the COMPAS data is predicted to reoffend. Orange bars represent Black subjects; blue represent White subjects.

In this paper we use the cohort Shapley method from Mase et al., (2019) to measure variable importance. One of the features of cohort Shapley is that it can attribute importance to a variable that is not actually used in f()f(\cdot). This is controversial and not all authors approve but it is essential in the present context of algorithmic fairness.

Our contribution is to provide a method of quantifying bias at both individual and group levels with axiomatic consistency between the individual and group measures and without requiring any impossible or unobserved combinations of input variables. The approach can even be used when the prediction algorithm is not available as for instance when it is proprietary.

We investigate the COMPAS data (Angwin et al.,, 2016) which include predictions of who is likely to commit a crime. There is great interest in seeing whether the algorithm is unfair to Black subjects. In the given context, the prediction code f()f(\cdot) is a proprietary algorithm, unavailable to us and so we cannot change any of the values in it. What we have instead are the predictions on a set of subjects. Cohort Shapley can work with the predictions because it does not require any hypothetical feature combinations. Figure 1 shows cohort Shapley impacts for the race of the subjects in the COMPAS data, computed in a way that we describe below. As it turns out the impact of race on whether a subject was predicted to reoffend was always positive for Black subjects and it was always negative for White subjects. For some other measures that we show, the histograms overlap. The average impact for Black subjects was 0.0670.067 and the average for White suspects was 0.101-0.101. For context, the response values were 11 for those predicted to reoffend and 0 for those predicted not to reoffend.

An outline of this paper is as follows. Section 2 introduces our notation, defines Shapley value, and presents a few of the variable importance measures and fairness definitions from the literature. Section 3 describes the COMPAS recidivism data set. Section 4 has our analysis of that data including a Bayesian bootstrap for uncertainty quantification. Section 5 has our conclusions. An appendix presents some additional COMPAS results like the ones we selected for discussion.

2 Notation and definitions

For subjects i=1,,ni=1,\dots,n, the value of feature j=1,,dj=1,\dots,d is xij𝒳jx_{ij}\in\mathcal{X}_{j}. We consider categorical variables xijx_{ij}. Continuously distributed values can be handled as discussed in Section 5. The features for subject ii are encoded in 𝒙i𝒳=j=1d𝒳j\boldsymbol{x}_{i}\in\mathcal{X}=\prod_{j=1}^{d}\mathcal{X}_{j}. For each subject there is a response value yiy_{i}\in\mathbb{R}. The feature indices belong to the set 1:d{1,,d}1{:}d\equiv\{1,\dots,d\} and similarly the subject indices are in 1:n{1,,n}1{:}n\equiv\{1,\dots,n\}. For u1:du\subseteq 1{:}d the tuple 𝒙u\boldsymbol{x}_{u} is (xj)ju𝒳u=ju𝒳j(x_{j})_{j\in u}\in\mathcal{X}_{u}=\prod_{j\in u}\mathcal{X}_{j}, and 𝒙iu=(xij)ju\boldsymbol{x}_{iu}=(x_{ij})_{j\in u}.

There is also an algorithmic prediction y^i\hat{y}_{i}. It is usual in variable importance problems to have y^i=f(𝒙i)\hat{y}_{i}=f(\boldsymbol{x}_{i}) for a function f()f(\cdot) that may be difficult to interpret (e.g., a black box). This f()f(\cdot) is often an approximation of 𝔼(y𝒙)\mathbb{E}(y\!\mid\!\boldsymbol{x}) though it need not be of that form. Cohort Shapley uses only the values y^i\hat{y}_{i} so it does not require access to f()f(\cdot). Since it only uses a vector of values, one per subject, it can be used to find the important variables in yiy_{i}, or y^i\hat{y}_{i} or combinations such as the residual yiy^iy_{i}-\hat{y}_{i}.

In many settings yiy_{i} and y^i\hat{y}_{i} are both binary. The value yi=1y_{i}=1 may mean that subject ii is worthy of a loan, or is predicted to commit a crime, or should be sent to intensive care and y^i{0,1}\hat{y}_{i}\in\{0,1\} is an estimate of yiy_{i}. In this case, measures such as FPi=1{y^i=1&yi=0}\mathrm{FP}_{i}=1\{\hat{y}_{i}=1\ \&\ y_{i}=0\} describing a false positive are of interest.

Here are some typographic conveniences that we use. When juj\not\in u, then u+ju+j is u{j}u\cup\{j\}. For u1:du\subseteq 1{:}d we use u-u for 1:du1{:}d\setminus u. The subscript to 𝒙\boldsymbol{x} may contain a comma or not depending on what is clearer. For instance, we use 𝒙iu\boldsymbol{x}_{iu} but also 𝒙i,u\boldsymbol{x}_{i,-u}.

2.1 Shapley values

Shapley value (Shapley,, 1953) is used in game theory to define a fair allocation of rewards to a team that has cooperated to produce something of value. Many variable importance problems can be formulated as a team of input variables generating an output value or an output variance explained. We then want to apportion importance or impact to the individual variables.

Suppose that a team of dd members produce a value val(1:d)\mathrm{val}(1{:}d), and that we have at our disposal the value val(u)\mathrm{val}(u) that would have been produced by the team uu, for all 2d2^{d} teams u1:du\subseteq 1{:}d. Let ϕj\phi_{j} be the reward for player jj. It is convenient to work with incremental values val(ju)=val(u+j)val(u)\mathrm{val}(j\!\mid\!u)=\mathrm{val}(u+j)-\mathrm{val}(u) for sets uu with juj\not\in u.

Shapley introduced quite reasonable criteria:

  1.  1)

    Efficiency: j=1dϕj=val(1:d)\sum_{j=1}^{d}\phi_{j}=\mathrm{val}(1{:}d).

  2.  2)

    Symmetry: If val(iu)=val(ju)\mathrm{val}(i\!\mid\!u)=\mathrm{val}(j\!\mid\!u) for all u1:d{i,j}u\subseteq 1{:}d\setminus\{i,j\}, then ϕi=ϕj\phi_{i}=\phi_{j}.

  3.  3)

    Dummy: if val(ju)=0\mathrm{val}(j\!\mid\!u)=0 for all u1:d{j}u\subseteq 1{:}d\setminus\{j\}, then ϕj=0\phi_{j}=0.

  4.  4)

    Additivity: if val(u)\mathrm{val}(u) and val(u)\mathrm{val}^{\prime}(u) lead to values ϕj\phi_{j} and ϕj\phi_{j}^{\prime} then the game producing (val+val)(u)(\mathrm{val}+\mathrm{val}^{\prime})(u) has values ϕj+ϕj\phi_{j}+\phi^{\prime}_{j}.

He found that the unique valuation that satisfies all four of these criteria is

ϕj=1duj(d1|u|)1val(ju)\displaystyle\phi_{j}=\frac{1}{d}\sum_{u\subseteq-j}{d-1\choose|u|}^{-1}\mathrm{val}(j\!\mid\!u) (1)

Formula (1) is not very intuitive. Another way to explain Shapley value is as follows. We could build a team from {\varnothing} to 1:d1{:}d in dd steps, adding one member at a time. There are d!d! different orders in which to add team members. The Shapley value ϕj\phi_{j} is the increase in value coming from the addition of member jj, averaged over all d!d! different orders. From equation (1) we see that Shapley value does not change if we add or subtract the same quantity from all val(u)\mathrm{val}(u). It can be convenient to make val()=0\mathrm{val}({\varnothing})=0.

We discuss some variable importance measures based on Shapley value below. We use the framework from Sundararajan and Najmi, (2020). They survey many uses of Shapley value in explainable AI.

2.2 Changing variables

Most mechanisms for investigating variable importance proceed by changing some, but not all, of the input variables to f()f(\cdot). For u1:du\subseteq 1{:}d the hybrid point 𝒘=𝒙u:𝒛u\boldsymbol{w}=\boldsymbol{x}_{u}{:}\boldsymbol{z}_{-u} has

wj={xj,juzj,ju.w_{j}=\begin{cases}x_{j},&j\in u\\ z_{j},&j\not\in u.\end{cases}

We can investigate changes to 𝒙u\boldsymbol{x}_{u} by examining f(𝒙u:𝒛u)f(𝒙)f(\boldsymbol{x}_{u}{:}\boldsymbol{z}_{-u})-f(\boldsymbol{x}) for various points 𝒙,𝒛𝒳\boldsymbol{x},\boldsymbol{z}\in\mathcal{X}.

Global sensitivity analysis (Razavi et al.,, 2021) works with an analysis of variance (ANOVA) defined in terms of 𝒙𝒳\boldsymbol{x}\in\mathcal{X} with independent components for which σ2=var(f(𝒙))<\sigma^{2}=\mathrm{var}(f(\boldsymbol{x}))<\infty. The sets 𝒳j\mathcal{X}_{j} can be discrete or continuous. It is common there to measure the importance of a subset of variables by variance explained. For instance, the lower Sobol’ index for variables 𝒙u\boldsymbol{x}_{u} is τ¯u2=var(𝔼(f(𝒙)𝒙u))\underline{\tau}^{2}_{u}=\mathrm{var}(\mathbb{E}(f(\boldsymbol{x})\!\mid\!\boldsymbol{x}_{u})) which is usually normalized to τ¯u2/σ2\underline{\tau}^{2}_{u}/\sigma^{2}. The upper Sobol’ index is τ¯u2=σ2τ¯u2\overline{\tau}^{2}_{u}=\sigma^{2}-\underline{\tau}^{2}_{-u}, that is, everything not explained by 𝒙u\boldsymbol{x}_{-u} is attributed to 𝒙u\boldsymbol{x}_{u}. These measures are very natural in settings where the components of 𝒙\boldsymbol{x} can all vary freely and independently, but such is usually not the case for inputs to a black box machine learning model.

The quantitative influence function (QII) method of Datta et al., (2016) uses a model in which the features are statistically independent from their marginal distributions.

In baseline Shapley (see Sundararajan and Najmi, (2020)), there is a baseline tuple 𝒙b𝒳\boldsymbol{x}_{b}\in\mathcal{X} of 𝒙\boldsymbol{x}, such as the sample average 𝒙¯\bar{\boldsymbol{x}}. Then the value for variable subset uu and subject tt is f(𝒙t,u:𝒙b,u)f(\boldsymbol{x}_{t,u}{:}\boldsymbol{x}_{b,-u}). We get the same Shapley values using val(u)=f(𝒙t,u:𝒙b,u)f(𝒙b)\mathrm{val}(u)=f(\boldsymbol{x}_{t,u}{:}\boldsymbol{x}_{b,-u})-f(\boldsymbol{x}_{b}). Then the total value to explain for subject tt is f(𝒙t)f(𝒙b)f(\boldsymbol{x}_{t})-f(\boldsymbol{x}_{b}). The counterfactuals are changes in some subset of the values in 𝒙t\boldsymbol{x}_{t}. Important variables are those that move the value the most from the baseline f(𝒙b)f(\boldsymbol{x}_{b}) to subject tt’s value. Shapley’s combination provides a principled weighting of the effect of changing xb,jx_{b,j} to xt,jx_{t,j} given some subset uu of variables that have also been changed.

Conditional expectation Shapley (Sundararajan and Najmi,, 2020) uses a joint distribution DD on 𝒳\mathcal{X} and then for subject tt val(u)=𝔼D(f(𝒙)𝒙u=𝒙t,u)\mathrm{val}(u)=\mathbb{E}_{D}(f(\boldsymbol{x})\mid\boldsymbol{x}_{u}=\boldsymbol{x}_{t,u}) under DD. SHAP (Lundberg and Lee,, 2017) is a conditional expectation Shapley using a distribution DD in which the features are independent. Cohort Shapley (Mase et al.,, 2019) that we describe next is very nearly conditional expectation Shapley with DD equal to the empirical distribution.

2.3 Changing knowledge

In cohort Shapley the counterfactual does not involve replacing 𝒙tu\boldsymbol{x}_{tu} by 𝒙bu\boldsymbol{x}_{bu}. It is instead about concealing the values of 𝒙tu\boldsymbol{x}_{tu}. It is what Kumar et al., (2020) call a conditional method because it requires specification of a conditional distribution on the features. The version we present here is a form of conditional expectation Shapley using the empirical distribution of the data.

For every pair of values xtj,xij𝒳jx_{tj},x_{ij}\in\mathcal{X}_{j} we can declare them to be similar or not. In the present context we take similarity to just be xtj=xijx_{tj}=x_{ij}. For a target subject t1:nt\in 1{:}n and set u1:du\subseteq 1{:}d we define the cohort Cu=Ctu={i1:n𝒙iu=𝒙tu}C_{u}=C_{tu}=\{i\in 1{:}n\mid\boldsymbol{x}_{iu}=\boldsymbol{x}_{tu}\}. These are the subjects who match subject tt on all of the variables juj\in u. They may or may not match the target on juj\not\in u. None of the cohorts is empty because they all contain tt. The value function in cohort Shapley is the cohort mean

val(u)=1|Ctu|iCtuy^i.\mathrm{val}(u)=\frac{1}{|C_{tu}|}\sum_{i\in C_{tu}}\hat{y}_{i}.

Here y^i=f(𝒙i)\hat{y}_{i}=f(\boldsymbol{x}_{i}) but we write it this way because in practice we might not have f()f(\cdot) at our disposal, just the predictions y^i\hat{y}_{i}.

The incremental value val(ju)\mathrm{val}(j\!\mid\!u) is then how much the cohort mean moves when we reveal xtjx_{tj} in addition to previously revealed variables 𝒙tu\boldsymbol{x}_{tu}. Having specified the values, the Shapley formula provides attributions. Note that val()\mathrm{val}({\varnothing}) is the plain average of all y^i\hat{y}_{i} and Shapley value is unchanged by subtracting val()\mathrm{val}({\varnothing}) from all of the val(u)\mathrm{val}(u). Furthermore, when there are many variables, then very commonly Ct,1:d={t}C_{t,1{:}d}=\{t\} and then the total value to be explained, val(1:d)val()\mathrm{val}(1{:}d)-\mathrm{val}({\varnothing}) is simply y^t(1/n)i=1ny^i\hat{y}_{t}-(1/n)\sum_{i=1}^{n}\hat{y}_{i}.

2.4 Definitions of fairness

Just as there are many ways to define variable importance, there are multiple ways to define what fairness means. Some of those definitions are mutually incompatible and some of them differ from legal definitions. Here we present a few of the issues. See Corbett-Davies and Goel, (2018), Chouldechova and Roth, (2018), Berk et al., (2018), and Friedler et al., (2019) for surveys. We do not make assertions about which definitions are preferable.

For y,y^{0,1}y,\hat{y}\in\{0,1\}, let nyy^n_{y\hat{y}} be the number of subjects with yi=yy_{i}=y and y^i=y^\hat{y}_{i}=\hat{y}. These four counts and their derived properties can be computed for any subset of the subjects. The false positive rate (FPR) is n01/n0n_{01}/n_{0\text{\tiny$\bullet$}}, where a bullet indicates that we are summing over the levels of that index. We ignore uninteresting corner cases such as n0=0n_{0\text{\tiny$\bullet$}}=0; when there are no subjects with y=0y=0 then we have no interest in the proportion of them with y^=1\hat{y}=1. The false negative rate (FNR) is n10/n1n_{10}/n_{1\text{\tiny$\bullet$}}. The prevalence of the trait under study is p=n1/np=n_{1\text{\tiny$\bullet$}}/n_{\text{\tiny$\bullet$}\text{\tiny$\bullet$}}. The positive predictive value (PPV) is n11/n1n_{11}/n_{\text{\tiny$\bullet$}1}. As Chouldechova, (2017, equation (2.6)) notes, these values satisfy

FPR=p1p1PPVPPV(1FNR).\displaystyle\mathrm{FPR}=\frac{p}{1-p}\frac{1-\mathrm{PPV}}{\mathrm{PPV}}(1-\mathrm{FNR}). (2)

See also Kleinberg et al., (2016).

Equation (2) shows how some natural definitions of fairness conflict. FPR and FNR describe y^y\hat{y}\!\mid\!y, while PPV describes yy^y\!\mid\!\hat{y}. If two subsets of subjects have the same PPV but different prevalences pp, then they cannot also match up on FPR and FNR. Fairness in yy^y\!\mid\!\hat{y} terms and fairness in y^y\hat{y}\!\mid\!y terms can only coincide in trivial settings such as when y^=y\hat{y}=y always or empirically unusual settings with equal prevalence between subjects having different values of a protected attribute.

Hardt et al., (2016) writes about measures of demographic parity, equalized odds and equal opportunity. Demographic parity, requires Pr(y^=1𝒙p)=Pr(y^=1)\Pr(\hat{y}=1\!\mid\!\boldsymbol{x}_{p})=\Pr(\hat{y}=1) where 𝒙p\boldsymbol{x}_{p} specifies the levels of one (or more) protected variables. Equalized odds requires Pr(y^=1y=y,𝒙p)=Pr(y^=1y=y),y0,1\Pr(\hat{y}=1\!\mid\!y=y^{\prime},\boldsymbol{x}_{p})=\Pr(\hat{y}=1\!\mid\!y=y^{\prime}),y^{\prime}\in{0,1}, so that false positive rates and true positive rates are both equal across groups. Because FNR=1TPR\mathrm{FNR}=1-\mathrm{TPR} the false negative rates are also equal. Equal opportunity is defined in terms of a preferred outcome. When y=0y=0 is preferred, it requires that Pr(y^=1y=0,𝒙p)=Pr(y^=1y=0)\mathrm{Pr}(\hat{y}=1\!\mid\!y=0,\boldsymbol{x}_{p})=\mathrm{Pr}(\hat{y}=1\!\mid\!y=0) That is, the false positive rate is unaffected by the protected variables.

Chouldechova, (2017) considers a score ss (such as y^\hat{y}) to be well calibrated if Pr(y=1s,𝒙p)=Pr(y=1s)\mathrm{Pr}(y=1\!\mid\!s,\boldsymbol{x}_{p})=\mathrm{Pr}(y=1\!\mid\!s) for all levels of ss. A score attains predictive parity if Pr(y=1s>s,𝒙p)=Pr(y=1s>s)\mathrm{Pr}(y=1\!\mid\!s>s_{*},\boldsymbol{x}_{p})=\mathrm{Pr}(y=1\!\mid\!s>s_{*}) for all thresholds ss_{*}. The distinction between well calibratedness and predictive parity is relevant when there are more than two levels for the score.

There is some debate about when or whether using protected variables can lead to improved fairness. See Xiang, (2020) who gives a summary of legal issues surrounding fairness and Corbett-Davies et al., (2017) who study whether imposing calibration or other criteria might adversely affect the groups they are meant to help.

2.5 Aggregation and disaggregation

The additivity axiom of Shapley value is convenient for us. It means that importances for the residual yiy^iy_{i}-\hat{y}_{i} is simply the difference of importances for yiy_{i} and y^i\hat{y}_{i}. We can aggregate importances from subjects to the whole data set by summing, or more interpretably averaging over t=1,,nt=1,\dots,n. We can also disaggregate from the whole data set to subsets of special interest by averaging cohort Shapley values over target subjects tv1:nt\in v\subset 1{:}n.

3 COMPAS data

COMPAS is a tool from Northpointe Inc. for judging the risk that a criminal defendant will commit another crime (re-offend) within two years. Each subject is rated into one of ten deciles with higher deciles considered higher risk of reoffending. Angwin et al., (2016) investigated whether that algorithm was biased against Black people. They obtained data for subjects in Broward County Florida, including the COMPAS decile, the subjects’ race, age, gender, number of prior crimes and whether the crime for which they were charged is a felony or not. Angwin et al., (2016) describe how they processed their data including how they found followup data on offences committed and how they matched those to subjects for whom they had prior COMPAS scores. They also note that race was not one of the variables used in the COMPAS predictions.

The example is controversial. Angwin et al., (2016) find that COMPAS is biased because it gave a higher rate of false positives for Black subjects and a higher rate of false negatives for White subjects. Flores et al., (2016) and Dieterich et al., (2016) disagree raising the issue of y^y\hat{y}\!\mid\!y fairness versus yy^y\!\mid\!\hat{y} fairness. The prevelance of reoffences differed between Black and White subjects forcing yy^y\!\mid\!\hat{y} and y^y\hat{y}\!\mid\!y notions to be incompatible. Flores et al., (2016) also questioned whether the subjects in the data set were at comparable stages in the legal process to those for which COMPAS was designed.

Following Chouldechova, (2017) we focus on just Black and White subjects. That provides a sample of 5278 subjects from among the original 6172 subjects. As in that paper we record the number of prior convictions as a categorical variable with five levels: 0, 1–3, 4–6, 7–10 and >>10. Following Angwin et al., (2016), we record the subjects’ ages as a categorical variable with three levels: <<25, 25–45, >>45. Also, following Chouldechova, (2017), we consider the prediction y^i\hat{y}_{i} to be 11 if subject ii is in deciles 5–10 and y^i=0\hat{y}_{i}=0 for subject ii in deciles 1–4.

4 Exploration of the COMPAS data

In this section we use cohort Shapley to study some fairness issues in the COMPAS data, especially individual level metrics. We have selected what we found to be the strongest and most interesting findings. The larger set of figures and tables from which these are drawn are in the appendix. The appendix also compares some conventional group fairness metrics for Black and White subjects.

We have computed Shapley impacts for these responses: yiy_{i}, y^i\hat{y}_{i}, yiy^iy_{i}-\hat{y}_{i}, FPi=1{yi=0&y^i=1}\mathrm{FP}_{i}=1\{y_{i}=0\,\&\,\hat{y}_{i}=1\} and FNi=1{y1=1&y^i=1}\mathrm{FN}_{i}=1\{y_{1}=1\,\&\,\hat{y}_{i}=1\}. If FPi=1\mathrm{FP}_{i}=1 then subject ii received a false positive prediction. Note that the sample average value of FPi\mathrm{FP}_{i}

𝔼^(FPi)=n01n=FPR×n0n=FPR×(1p).\hat{\mathbb{E}}(\mathrm{FP}_{i})=\frac{n_{01}}{n_{\text{\tiny$\bullet$}\text{\tiny$\bullet$}}}=\mathrm{FPR}\times\frac{n_{0\text{\tiny$\bullet$}}}{n_{\text{\tiny$\bullet$}\text{\tiny$\bullet$}}}=\mathrm{FPR}\times(1-p).

Being wrongly predicted to reoffend is an adverse outcome that we study. Its expected value is the FPR times the fraction of non-reoffending subjects. Similarly, 𝔼^(FNi)=FNR×p\hat{\mathbb{E}}(\mathrm{FN}_{i})=\mathrm{FNR}\times p.

4.1 Graphical analysis

Figure 2 shows histograms of Shapley impacts of race for the subjects in the COMPAS data. The first panel there reproduces the data from Figure 1 showing a positive impact for every Black subject and a negative one for every White subject for the prediction y^\hat{y}. For the actual response yy, the histograms overlap slightly. By additivity of Shapley value the impacts for yy^y-\hat{y} can be found by subtracting the impact for y^\hat{y} from that for yy for each subject i=1,,ni=1,\dots,n. The histograms of yiy^iy_{i}-\hat{y}_{i} show that the impact of race on the residual is typically positive for White subjects and negative for Black subjects.

Overlap of distributions is seen in equal opportunity (false positive rate: FPR) and false negative rate (FNR). In there, small number of adversely affected White and beneficially affected Black are observed. We can inspect further on these overlapped subjects to understand which conditions are exceptional cases. Also we can inspect on tail of the distribution to understand the extreme cases where bias is very likely observed.

Figure 3 shows histograms of Shapley impacts for race but now they are color coded by the subjects’ gender. We see that the impact on y^\hat{y} is bimodal by race for both male and female subjects, but the effect is larger in absolute value for male subjects. The impacts for the response, the residual and false negative and positive values do not appear to be bimodal by race for female subjects, but they do for male subjects. The case of FN is perhaps different. The modes in Figure 2 are close together and not so apparent in Figure 3 which has slightly different bin boundaries. It is clear from these figures that the race differences we see are much stronger among male subjects.

Refer to caption
Figure 2: Histogram of cohort Shapley value of race factor on COMPAS recidivism data.
Refer to caption
Figure 3: Histogram of cohort Shapley value of race factor for each gender on COMPAS recidivism data.

The appendix includes some figures that each include 25 histograms. They are Shapley impact histograms for all five features color coded by each of the five features. There is one such figure for each of yy, y^\hat{y}, yy^y-\hat{y}, FP\mathrm{FP} and FN\mathrm{FN}.

4.2 Tabular summary

Tables 2 through 6 in the appendix record mean Shapley impacts for the five predictors and our responses measures, disaggregated by race and gender.

Variable White Black Male Female
race_factor  0.054 -0.035 -0.001  0.003
gender_factor -0.001  0.001  0.020 -0.083
Variable White-Male White-Female Black-Male Black-Female
race_factor  0.057  0.042 -0.036 -0.030
gender_factor  0.023 -0.083  0.018 -0.083
Table 1: Mean cohort Shapley impact of groups on residual yy^y-\hat{y}.

We take a particular interest in the residual yty^ty_{t}-\hat{y}_{t}. It equals 11 for false negatives and 1-1 for false positives and 0 when the prediction was correct. Table 1 shows a subset of the largest values for the residual yiy^iy_{i}-\hat{y}_{i}.

What we see there is that revealing that a subject is Black tells us that, on average, that subject’s residual yty^ty_{t}-\hat{y}_{t} is decreased by 3.63.6%. Revealing that the subject is White increases the residual by 5.45.4%. Revealing race makes very little difference to the residual averaged over male or over female subjects. Revealing gender makes quite a large difference of 8.3-8.3% for female subjects and +2.0%+2.0\% for male subjects.

To judge the uncertainty in the values in Table 1 we applied the Bayesian bootstrap of Rubin, (1981). That algorithm randomly reweights each data point by a unit mean exponential random variable. By never fully deleting any observation it allows one to also bootstrap individual subjects’ Shapley values though there is not space to do so in this article. Figure 4 shows violin plots of 1000 bootstrapped cohort Shapley values.

Refer to caption
Figure 4: Bayesian bootstrap violin plot of aggregated cohort Shapley race factor impact of groups on residual yy^y-\hat{y}. Red crosses represent aggregated cohort Shapley values of groups without bootstrap.

4.3 FPR and FNR revisited

The risk of being falsely predicted to reoffend involves two factors: having yi=0y_{i}=0 and having y^i=1\hat{y}_{i}=1 given that yi=0y_{i}=0. FPR is commonly computed only over subjects with yi=0y_{i}=0. Accordingly, in this section we study it by subsetting the subjects to {iyi=0}\{i\mid y_{i}=0\} and finding cohort Shapley value of y^i=1\hat{y}_{i}=1 for the features.

Figure 5 shows cohort Shapley impact of race factor conditioned on race factor for FPR and FNR. The distribution of impacts conditioned on race are separated in the figure. This conditional analysis shows a stronger disadvantage for Black subjects.

Refer to caption
Figure 5: Histogram of cohort Shapley value of race factor on COMPAS recidivism data for false positive subsetting to yi=0y_{i}=0 and false negative subsetting to yi=1y_{i}=1.

5 Conclusions

Variable importance in statistics and machine learning ordinarily considers three issues: whether changing xjx_{j} has a causal future impact on yy, whether omitting xjx_{j} from a data set degrades prediction accuracy and whether changing xjx_{j} has an important mechanical effect on f(𝒙)f(\boldsymbol{x}) (Jiang and Owen,, 2003). These are all different from each other and the third choice is the most useful one for explaining the decisions of a prediction method.

Cohort Shapley (Mase et al.,, 2019) looks at a fourth issue: whether learning the value of xtjx_{tj} in a sample is informative about the prediction y^t\hat{y}_{t}. It is well suited to settings such as the COMPAS data from Broward County, where: the original algorithm is not available to researchers, the protected variable of greatest interest was not included in the model, and the set of subjects for which fairness is of interest is different from the set on which the algorithm was trained.

Cohort Shapley does not address counterfactuals, such as whether subject tt would have had y^t=0\hat{y}_{t}=0, but for the fact that xtj=Bx_{tj}=B instead of WW. At face value, such counterfactuals directly address the issue of interest. Unfortunately there can be many other variables and combinations of variables that would also have changed the outcome, providing an equally plausible explanation. Taking account of them all combinations can bring in implausible or even impossible variable combinations.

Cohort Shapley avoids using unobserved combinations of variables. In a fairness setting we might find that every combination of variables is logically and physically possible, though some combinations may still be quite unlikely, such as the largest number of prior offences at the youngest of ages.

We do not offer any conclusions on the fairness of the COMPAS algorithm. There is a problem of missing variables that requires input from domain experts.

If some such variable is not in our data set then accounting for it could change the magnitude or even the sign of some of the effects we see. Some other issues that we believe require input from domain experts are: choosing the appropriate set of subjects to study, determining which responses are relevant to fairness and which combination of protected and unprotected variables are most suitable to include. We do however believe that given a set of subjects and a set of variables, that cohort Shapley supports many different graphical and numerical investigations. It can do so for variables not used in a black box at hand, and it does not even require access to the black box function.

Cohort Shapley value can be used to detect algorithmic bias in an algorithm circumventing some limitations described above. As with any testing method, there is the possibility of false positives and false negatives. Any bias that is detected must be followed up by further examination. The positive social outcomes of our method arise from providing a tool that can detect and illustrate biases that might have otherwise been missed. Negative social outcomes can also arise from false positives and false negatives due to missing variables.

Acknowledgments

This work was supported by the US National Science Foundation under grant IIS-1837931 and by a grant from Hitachi, Ltd.

References

  • Adler et al., (2018) Adler, P., Falk, C., Friedler, S. A., Nix, T., Rybeck, G., Scheidegger, C., Smith, B., and Venkatasubramanian, S. (2018). Auditing black-box models for indirect influence. Knowledge and Information Systems, 54(1):95–122.
  • Angwin et al., (2016) Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). Machine bias: there’s software used across the country to predict future criminals. and it’s biased against blacks.
  • Berk et al., (2018) Berk, R., Heidari, H., Jabbari, S., Kearns, M., and Roth, A. (2018). Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research.
  • Chouldechova, (2017) Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153–163.
  • Chouldechova and Roth, (2018) Chouldechova, A. and Roth, A. (2018). The frontiers of fairness in machine learning. Technical report, arXiv:1810.08810.
  • Corbett-Davies and Goel, (2018) Corbett-Davies, S. and Goel, S. (2018). The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023.
  • Corbett-Davies et al., (2017) Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., and Huq, A. (2017). Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining, pages 797–806.
  • Datta et al., (2016) Datta, A., Sen, S., and Zick, Y. (2016). Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP), pages 598–617. IEEE.
  • Dieterich et al., (2016) Dieterich, W., Mendoza, C., and Brennan, T. (2016). COMPAS risk scales: Demonstrating accuracy equity and predictive parity. Technical report, Northpoint Inc.
  • Flores et al., (2016) Flores, A. W., Bechtel, K., and Lowenkamp, C. T. (2016). False positives, false negatives, and false analyses: A rejoinder to machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. Federal Probation, 80(2):38–46.
  • Friedler et al., (2019) Friedler, S. A., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E. P., and Roth, D. (2019). A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the conference on fairness, accountability, and transparency, pages 329–338.
  • Hardt et al., (2016) Hardt, M., Price, E., and Srebro, N. (2016). Equality of opportunity in supervised learning. In Thirtieth Conference on Neural Information Processing Systems (NIPS 2016).
  • Holland, (1988) Holland, P. W. (1988). Causal inference, path analysis and recursive structural equations models. Technical Report 88–81, ETS Research Report Series.
  • Hooker and Mentch, (2019) Hooker, G. and Mentch, L. (2019). Please stop permuting features: An explanation and alternatives. Technical report, arXiv:1905.03151.
  • Jiang and Owen, (2003) Jiang, T. and Owen, A. B. (2003). Quasi-regression with shrinkage. Mathematics and Computers in Simulation, 62(3-6):231–241.
  • Kleinberg et al., (2016) Kleinberg, J., Mullainathan, S., and Raghavan, M. (2016). Inherent trade-offs in the fair determination of risk scores. Technical report, arXiv:1609.05807.
  • Kumar et al., (2020) Kumar, I. E., Venkatasubramanian, S., Scheidegger, C., and Friedler, S. (2020). Problems with Shapley-value-based explanations as feature importance measures. In The 37th International Conference on Machine Learning (ICML 2020).
  • Lundberg and Lee, (2017) Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pages 4765–4774.
  • Mase et al., (2019) Mase, M., Owen, A. B., and Seiler, B. B. (2019). Explaining black box decisions by Shapley cohort refinement. Technical report, arXiv:1911.00467.
  • Mill, (1843) Mill, J. S. (1843). A system of logic. Parker, London.
  • Molnar, (2018) Molnar, C. (2018). Interpretable machine learning: A Guide for Making Black Box Models Explainable. Leanpub.
  • Razavi et al., (2021) Razavi, S., Jakeman, A., Saltelli, A., Prieur, C., Iooss, B., Borgonovo, E., Plischke, E., Piano, S. L., Iwanaga, T., Becker, W., Tarantola, S., Guillaume, J. H. A., Jakeman, J., Gupta, H., Milillo, N., Rabitti, G., Chabridon, V., Duan, Q., Sun, X., Smith, S., Sheikholeslami, R., Hosseini, N., Asadzadeh, M., Puy, A., Kucherenko, S., and Maier, Holger, R. (2021). The future of sensitivity analysis: An essential discipline for systems modeling and policy support. Environmental Modelling & Software, 137:104954.
  • Ribeiro et al., (2016) Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, New York. ACM.
  • Rubin, (1981) Rubin, D. B. (1981). The Bayesian bootstrap. The annals of statistics, 9(1):130–134.
  • Saltelli et al., (2008) Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M., and Tarantola, S. (2008). Global sensitivity analysis: the primer. John Wiley & Sons.
  • Shapley, (1953) Shapley, L. S. (1953). A value for n-person games. In Kuhn, H. W. and Tucker, A. W., editors, Contribution to the Theory of Games II (Annals of Mathematics Studies 28), pages 307–317. Princeton University Press, Princeton, NJ.
  • Sobol’, (1993) Sobol’, I. M. (1993). Sensitivity estimates for nonlinear mathematical models. Mathematical Modeling and Computational Experiment, 1:407–414.
  • Sundararajan and Najmi, (2020) Sundararajan, M. and Najmi, A. (2020). The many Shapley values for model explanation. In The 37th International Conference on Machine Learning (ICML 2020).
  • Wei et al., (2015) Wei, P., Lu, Z., and Song, J. (2015). Variable importance analysis: A comprehensive review. Reliability Engineering & System Safety, 142:399–432.
  • Xiang, (2020) Xiang, A. (2020). Reconciling legal and technical approaches to algorithmic bias. Tennessee Law Review, 88(3):2021.

Appendix A Appendix

This appendix includes conventional group fairness metrics and whole bias distribution analysis results by cohort Shapley on COMPAS data example.

Figure 6 shows some conventional group fairness metrics for the COMPAS data set. Horizontal bars show group specific means and the vertical dashed lines show population means. We see that Black subjects had a higher average value of y^\hat{y} than White subjects. Black subjects also had a higher average of yy but a lower average residual yy^y-\hat{y}. Using BB and WW to denote the two racial groups, 𝔼^(yy^B)0.035\hat{\mathbb{E}}(y-\hat{y}\!\mid\!B)\doteq-0.035 and 𝔼^(yy^W)0.054\hat{\mathbb{E}}(y-\hat{y}\!\mid\!W)\doteq 0.054. The FPR was higher for Black subjects and the FNR was higher for White subjects.

Refer to caption
Figure 6: Group fairness metrics on COMPAS recidivism data.

Figure 7 describes 5 x 5 matrix of histograms on individualized bias (impact) on demographic parity of prediction. The columns represent the variables where we look into their biases and the rows represent the conditioned variables for grouping when we generate the histogram. The colors in the figure indicate categories in the conditioned variables. Figure 8 describes histograms of individualized bias on demographic parity of response. Figure 9 describes histograms of individualized bias on residual. Figure 10 describes histograms of individualized bias on false positive that corresponds to equal opportunity. Figure 10 describes histograms of individualized bias on false negative.

Refer to caption
Figure 7: Histograms of cohort Shapley value indicating individualized bias for demographic parity of prediction on a variable for each category in a conditioned variable.
Refer to caption
Figure 8: Histograms of cohort Shapley value indicating individualized bias for demographic parity of response on a variable for each category in a conditioned variable.
Refer to caption
Figure 9: Histograms of cohort Shapley value indicating individualized bias for residual on a variable for each category in a conditioned variable.
Refer to caption
Figure 10: Histograms of cohort Shapley value indicating individualized bias for equal opportunity (false positive rate) on a variable for each category in a conditioned variable.
Refer to caption
Figure 11: Histograms of cohort Shapley value indicating individualized bias for false negative rate on a variable for each category in a conditioned variable.
Table 2: Mean cohort Shapley impact of groups on prediction y^\hat{y}.
Variable White Black Male Female
priors_count -0.024  0.016  0.006 -0.023
crime_factor -0.004  0.003  0.001 -0.004
age_factor -0.018  0.012  0.000 -0.001
race_factor -0.101  0.067  0.002 -0.006
gender_factor -0.001  0.000  0.000 -0.002
Variable White-Male White-Female Black-Male Black-Female
priors_count -0.023 -0.027  0.024 -0.019
crime_factor -0.002 -0.010  0.003  0.001
age_factor -0.017 -0.020  0.011  0.015
race_factor -0.112 -0.065  0.072  0.045
gender_factor -0.008  0.025  0.006 -0.025
Table 3: Mean cohort Shapley impact of groups on response yy.
Variable White Black Male Female
priors_count -0.018  0.012  0.004 -0.018
crime_factor -0.003  0.002  0.001 -0.003
age_factor -0.010  0.007  0.000 -0.001
race_factor -0.048  0.032  0.001 -0.003
gender_factor -0.002  0.001  0.021 -0.085
Variable White-Male White-Female Black-Male Black-Female
priors_count -0.017 -0.021  0.017 -0.015
crime_factor -0.001 -0.006  0.002  0.000
age_factor -0.010 -0.009  0.007  0.007
race_factor -0.055 -0.023  0.035  0.014
gender_factor  0.015 -0.058  0.024 -0.107
Table 4: Mean cohort Shapley impact of groups on residual yy^y-\hat{y}.
Variable White Black Male Female
priors_count  0.007 -0.004 -0.001  0.006
crime_factor  0.001 -0.001 -0.000  0.001
age_factor  0.008 -0.005 -0.000  0.000
race_factor  0.054 -0.035 -0.001  0.003
gender_factor -0.001  0.001  0.020 -0.083
Variable White-Male White-Female Black-Male Black-Female
priors_count  0.007  0.007 -0.006  0.005
crime_factor  0.001  0.003 -0.001 -0.000
age_factor  0.007  0.011 -0.004 -0.009
race_factor  0.057  0.042 -0.036 -0.031
gender_factor  0.023 -0.083  0.018 -0.083
Table 5: Mean cohort Shapley impact of groups on false positive.
Variable White Black Male Female
priors_count -0.002  0.002  0.000 -0.002
crime_factor -0.001  0.000  0.000 -0.001
age_factor -0.006  0.004  0.000 -0.000
race_factor -0.033  0.022  0.001 -0.002
gender_factor  0.001 -0.000 -0.011  0.044
Variable White-Male White-Female Black-Male Black-Female
priors_count -0.002 -0.003  0.002 -0.001
crime_factor -0.000 -0.002  0.000  0.001
age_factor -0.005 -0.008  0.003  0.007
race_factor -0.034 -0.022  0.023  0.015
gender_factor -0.013  0.047 -0.009  0.042
Table 6: Mean cohort Shapley impact of groups on false negative.
Variable White Black Male Female
priors_count  0.004 -0.003 -0.001  0.004
crime_factor  0.001 -0.001 -0.000  0.001
age_factor  0.002 -0.001 -0.000  0.000
race_factor  0.021 -0.014 -0.000  0.001
gender_factor -0.001  0.000  0.010 -0.039
Variable White-Male White-Female Black-Male Black-Female
priors_count  0.005  0.004 -0.004  0.004
crime_factor  0.001  0.002 -0.001  0.000
age_factor  0.001  0.002 -0.001 -0.002
race_factor  0.021  0.020 -0.014 -0.015
gender_factor  0.010 -0.037  0.009 -0.041