Fairness-Aware Graph Neural Networks: A Survey

April Chen Harvard University [email protected] , Ryan A. Rossi Adobe Research [email protected] , Namyong Park Meta AI [email protected] , Puja Trivedi University of Michigan [email protected] , Yu Wang Vanderbilt University [email protected] , Tong Yu Adobe Research [email protected] , Sungchul Kim Adobe Research [email protected] , Franck Dernoncourt Adobe Research [email protected] and Nesreen K. Ahmed Intel Labs [email protected]

Abstract.

Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data and the fundamental aggregation mechanism that lies at the heart of the large class of GNN models. In this article, we examine and categorize fairness techniques for improving the fairness of GNNs. Previous work on fair GNN models and techniques are discussed in terms of whether they focus on improving fairness during a preprocessing step, during training, or in a post-processing phase. Furthermore, we discuss how such techniques can be used together whenever appropriate, and highlight the advantages and intuition as well. We also introduce an intuitive taxonomy for fairness evaluation metrics including graph-level fairness, neighborhood-level fairness, embedding-level fairness, and prediction-level fairness metrics. In addition, graph datasets that are useful for benchmarking the fairness of GNN models are summarized succinctly. Finally, we highlight key open problems and challenges that remain to be addressed.

Fairness, Bias, Graph Neural Networks

^†^†journal: TKDD^†^†journalvolume: 1^†^†journalnumber: 1^†^†article: 1^†^†journalyear: 2023^†^†publicationmonth: 6^†^†ccs: Computing methodologies Artificial intelligence^†^†ccs: Computing methodologies Machine learning^†^†ccs: Mathematics of computing Approximation algorithms^†^†ccs: Information systems Data mining

1. Introduction

Graph Neural Networks (GNNs) have become increasingly important in recent years and successfully used in many areas since many real-world data are represented as graphs, such as social networks, financial data, and molecular data. Despite the initial success of GNNs, learning fair and unbiased GNN models remains a fundamentally important and challenging problem. Due to the inherent nature of GNNs and how they leverage the graph structure to perform aggregation over neighborhoods, ensuring fairness and controlling the accuracy and fairness trade-off is significantly more difficult compared to non-graph models that use i.i.d. data.

Since neighborhoods lie at the heart of all GNN-based methods (Wu et al., 2020), the fairness of the trained GNN models and the resulting embeddings learned are fundamentally tied to the neighborhoods used to iteratively train these models (Zhang et al., 2023; Chen et al., 2022; Hussain et al., 2022). Furthermore, it is often very difficult to design GNNs that mitigate such unfairness and bias issues that arise due to the nature of the graph structure, input features, and most importantly, the fundamental GNN assumptions and design that make this a far more challenging and complex problem compared to traditional bias and unfairness mitigation techniques for i.i.d. data (Song et al., 2022a; Xu et al., 2023b; Salganik et al., 2022). In fact, GNNs are largely designed to leverage such bias and unfairness in the data to achieve superior accuracy at the expense of fairness (Kose and Shen, 2022c; Singer and Radinsky, 2022; Kose and Shen, 2022b, a; Liu et al., 2023b; Dong et al., 2023; Kose and Shen, 2022d; Wang et al., 2023). As an example, a GNN-based recommender may suggest fewer job opportunities to individuals of a specific gender or ethnic group. This is due to the fact that most graph data is highly skewed towards one or more groups and often even shows a rough power-law relationship as observed in the literature across a variety of domains in the last decade (Newman, 2003; Watts and Strogatz, 1998). Therefore, fairness in such models are both practically and theoretically important to develop better GNN models that are significantly more fair while also accurate (Liu et al., 2022a) for downstream prediction tasks such as node classification (Ma et al., 2021; Loveland et al., 2022b; Agarwal et al., 2021; Zhu et al., 2023), link prediction (Buyl and De Bie, 2020; Li et al., 2020; Patro et al., 2020; Spinelli et al., 2021; Rahman et al., 2019), and link classification (Chen et al., 2022).

In this work, we discuss the fairness issues that arise in GNNs and survey the techniques to improve fairness in GNNs. We highlight three fundamental facets that can lead to bias when training GNN models. First, the underlying graph structure $G$ used for training is often biased, e.g., when considering an attribute of a node (representing an individual) such as political views, we often observe significant homophily among the neighbors of nodes. In fact, in such data, there are often very tightly-knit communities of individuals that all retweet or follow each other. Second, the features given as input to GNNs can also be biased and unfair in a variety of ways. Such features when used independently may essentially have all the unfairness issues of traditional i.i.d. data. Third, the underlying mechanism used for aggregation and training of GNNs is inherently biased, and this is a much more difficult issue to resolve compared to traditional fairness on i.i.d. data. Overall, fairness issues in GNNs arise due to various factors such as biased training data including both the input features along with the graph structure, as well as the training and aggregation mechanisms that lie at the heart of GNNs. Addressing these issues requires careful consideration of the data, model, and evaluation metrics to ensure fair and unbiased predictions.

1.1. Summary of main contributions

The key contributions of this work are as follows:

(1)

A comprehensive survey of existing work on bias and unfairness mitigation techniques for GNNs. We also survey graph fairness metrics and summarize existing graph datasets used in the literature by the domain the graph originates (e.g., social network) along with task it can be used for and the dataset statistics and characteristics useful for various graph settings.
(2)

We introduce a few intuitive taxonomies for bias mitigation in GNNs and survey existing methods using these taxonomies. The taxonomy categorizes techniques based on whether the approach mitigates unfairness at the pre-processing stage, training stage, or at the post-processing stage by debiasing the learned embeddings directly. Methods are also categorized by the type of input graph data supported such as whether the graph is homogeneous, bipartite, heterogeneous, or temporal, as well as by the underlying graph learning task for which the method was designed.
(3)

We identify key open problems and challenges that are important for future work to address in this rapidly emerging but critically important field.

1.2. Scope of this article

In this article, we focus on examining and categorizing various fairness techniques for graph neural networks. We do not attempt to survey the abundance of work on fairness in graph mining (Dong et al., 2022b; Zhang et al., 2022) and graph machine learning in general (Choudhary et al., 2022). In contrast, we focus solely on fair GNN models as opposed to general graph fairness. In some cases, techniques we survey may have been used in a different context. Regardless of the context, we examine the general applicability and benefits of these techniques when used for improving fairness in GNN models.

2. Problem Formulation

Given a graph $G=(V,E)$ consisting of a set of nodes $V$ along with a set of edges $E\subseteq V\times V$ that encode dependencies between pairs of nodes in $V$ . Furthermore, every node $v\in V$ typically has a $k$ -dimensional feature vector $\bm{\mathrm{x}}_{v}$ associated with it. This can be represented compactly as a node feature matrix $\bm{\mathrm{X}}=[\,\bm{\mathrm{x}}_{1}\;\bm{\mathrm{x}}_{2}\,\cdots\,\bm{\mathrm{x}}_{|V|}\,]^{\top}\in\mathbb{R}^{|V|\times k}$ . We also have one or more sensitive attributes $\bm{\mathrm{s}}=\big{[}\,s_{1}\;s_{2}\,\cdots\,s_{i}\,\cdots\,s_{|V|}\,\big{]}$ where $s_{i}$ is the sensitive attribute value of node $i$ . The graph is encoded as a sparse adjacency matrix $\bm{\mathrm{A}}$ where $A_{vu}=1$ if $(v,u)\in E$ and $A_{vu}=0$ otherwise. GNN functions operate over the local neighborhoods of the nodes in the graph, that is, the neighborhood $N_{v}$ for node $v$ is defined as $N_{v}=\{u\in V\,|\,(v,u)\in E\}$ . Hence, $N_{v}$ is the set of nodes adjacent to $v$ . From $N_{v}$ , we define the multiset of features from the neighborhood of $v$ as $\bm{\mathrm{X}}_{N_{v}}=\{\!\!\{\bm{\mathrm{x}}_{u}\,|\,u\in N_{v}\}\!\!\}$ .

A key challenge of ensuring fairness in this setting with respect to the sensitive attribute $\bm{\mathrm{s}}$ is that this sensitive information is often encoded in the graph’s adjacency matrix $\bm{\mathrm{A}}$ and even the feature matrix $\bm{\mathrm{X}}$ . Both $\bm{\mathrm{A}}$ and $\bm{\mathrm{X}}$ are fundamental to the training of GNNs. In terms of the graph structure $\bm{\mathrm{A}}$ , this occurs when the sensitive attribute values of the neighborhood $N_{v}$ of a node $v$ are overwhelming the same. This implies the presence of homophily in $G$ where nodes sharing the same sensitive attribute value are more likely to be connected. Conversely, the feature matrix $\bm{\mathrm{X}}$ may also be highly correlated with the sensitive attribute $\bm{\mathrm{s}}$ , especially when diffused over the graph structure $\bm{\mathrm{A}}$ :

(1)

\displaystyle\bm{\mathrm{H}}=\textsc{GNN}(\bm{\mathrm{X}},\bm{\mathrm{A}})

where GNN can be any GNN layer. More formally, let $\phi$ denote a local diffusion (propagation) function that operates over the neighborhood of a node. Then for node $v\in V$ , we have $\bm{\mathrm{h}}_{v}=\phi(\bm{\mathrm{x}}_{v},\bm{\mathrm{X}}_{N_{v}})$ . The majority of GNNs can be categorized into convolutional (Eq. 2), attentional (Eq. 3), or message-passing (Eq. 4) (Bronstein et al., 2021). In particular,

(2)	$\displaystyle\bm{\mathrm{h}}_{v}$	$\displaystyle=\phi\Bigg{(}\bm{\mathrm{x}}_{v}\bigoplus_{u\in N_{v}}c_{uv}\;\psi(\bm{\mathrm{x}}_{u})\Bigg{)}\qquad\qquad\;\;\;\;\text{(Convolutional)}$
(3)	$\displaystyle\bm{\mathrm{h}}_{v}$	$\displaystyle=\phi\Bigg{(}\bm{\mathrm{x}}_{v}\bigoplus_{u\in N_{v}}a(\bm{\mathrm{x}}_{u},\bm{\mathrm{x}}_{v})\;\psi(\bm{\mathrm{x}}_{u})\Bigg{)}\qquad\qquad\text{(Attentional)}$
(4)	$\displaystyle\bm{\mathrm{h}}_{v}$	$\displaystyle=\phi\Bigg{(}\bm{\mathrm{x}}_{v}\bigoplus_{u\in N_{v}}\psi(\bm{\mathrm{x}}_{u},\bm{\mathrm{x}}_{v})\Bigg{)}\qquad\qquad\;\text{(Message-passing)}$

where $\psi$ and $\phi$ are neural networks (e.g., ReLU) and $\bigoplus$ is any aggregator such as $\sum$ , mean, max, among others (Rossi et al., 2018).

The fair GNN learning problem is to learn a low-dimensional fair embedding matrix $\bm{\mathrm{H}}\in\mathbb{R}^{n\times d}$ of the nodes such that $d\ll n$ . Most importantly, the embeddings must encode different properties of the graph structure along with the input features. Typically, it is assumed that two nodes connected in the graph have similar embeddings. Furthermore, the embeddings $\bm{\mathrm{H}}$ must be independent of the sensitive attributes $\bm{\mathrm{s}}$ such that no information is revealed about the sensitive attribute $\bm{\mathrm{s}}$ from the learned embeddings $\bm{\mathrm{H}}$ . This problem is often very challenging since the graph structure $\bm{\mathrm{A}}$ , and more specifically, the neighborhoods $\{N_{1},N_{2},\ldots,N_{|V|}\}$ of nodes in the graph $G$ (and/or input features $\bm{\mathrm{X}}$ ) are often strongly correlated with the sensitive attribute $\bm{\mathrm{s}}$ . Therefore, the goal is often to balance the trade-off between fairness and accuracy.

2.1. Taxonomy of GNN Fairness Techniques

Techniques for developing GNNs that are fair and unbiased with respect to the properties above can be categorized as pre-processing, in-training, post-processing, and hybrid.

(1)

Pre-processing (Sec. 4.1): Using a pre-processing technique to remove bias or unfairness present in the graph structure $\bm{\mathrm{A}}$ or input features $\bm{\mathrm{X}}$ before using GNNs.
(2)

In-training (Sec. 4.2): By modifying the objective function of GNNs to learn fair and unbiased embeddings during training. This can be the addition of constraints or regularization to the objective function or adding attention weights to GATs that focus on fairness weighting.
(3)

Post-processing (Sec. 4.3): Using a post-processing technique to remove bias from the resulting embeddings of a GNN model. This can be simply adjusting the embeddings to be independent of the sensitive attribute.
(4)

Hybrid (Sec. 4.4): A hybrid technique combines two or more of the previous techniques to ensure a better and more robust degree of fairness with respect to the sensitive attribute(s). An example of this might be to use a preprocessing technique such as rewiring the graph to ensure exact neighborhood fairness (Chen et al., 2022) and then using an in-training technique that adds a fairness constraint to the objective of a GNN model to further ensure fairness of the learned embeddings.

Fairness techniques for GNNs are summarized and categorized according to our proposed taxonomy in Table 1. Notably, we propose a simple and intuitive taxonomy that categorizes fairness techniques for GNNs based on the (i) type of input graph supported such as homogeneous, bipartite, or heterogeneous, (ii) type of unfairness mitigation technique based on whether bias/unfairness mitigation is performed as a pre-processing routine, during training, or as a post-processing technique after learning the embeddings, and the (iii) graph learning task such as node classification, link prediction, or link classification.

2.2. Graph Tasks

There are three fundamental tasks that all practical applications of GNNs leverage, namely, whether the application is edge-based, node-based, or graph-based. These three general graph machine learning tasks can be formulated as:

(5)	$\displaystyle\bm{\mathrm{z}}_{i}$	$\displaystyle=f(\bm{\mathrm{h}}_{i})\qquad\qquad\text{(node-based task)}$
(6)	$\displaystyle\bm{\mathrm{z}}_{ij}$	$\displaystyle=f(\bm{\mathrm{h}}_{i},\bm{\mathrm{h}}_{j})\qquad\quad\!\text{(edge-based task)}$
(7)	$\displaystyle\bm{\mathrm{z}}_{G}$	$\displaystyle=f\big{(}\oplus_{i\in V}\bm{\mathrm{h}}_{i}\big{)}\qquad\!\text{(graph-based task)}$

where $\bm{\mathrm{z}}_{i}$ , $\bm{\mathrm{z}}_{ij}$ , and $\bm{\mathrm{z}}_{G}$ are the final embeddings of node $i\in V$ , potential edge $(i,j)$ , and graph $G$ . An example of a node-based application is node classification, whereas examples of edge-based applications include link prediction and link classification (Rossi et al., 2012). For graph-based tasks, the most common application is graph classification.

Table 1. Overview of the proposed taxonomy for fairness-aware GNNs. Techniques are categorized by the type of input graph supported, whether the approach focuses on pre-processing, in-training, or post-processing, as well as the task the technique was designed. Using this taxonomy, we provide a qualitative and quantitative comparison of fairness methods for graphs.

Input

Approach

Task

Method

Homogeneous graph

Bipartite graph

Heterogeneous graph

Knowledge graph

Hypergraph

Temporal graph

Pre-processing

In-training

Post-processing

Link prediction

Link classification

Node classification

Clustering

Other

\mathsf{\sf\bf FairDrop}

(Spinelli et al., 2021)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf FairOT}

(Laclau et al., 2021)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf UGE}

(Wang et al., 2022b)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf FairNeigh}

(Chen et al., 2022)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf FairAdj}

(Li et al., 2020)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf FairRec}

(Patro et al., 2020)

✗

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf FairGO}

(Wu et al., 2021)

✗

✓

✗

✓

✗

\mathsf{\sf\bf DeBayes}

(Buyl and De Bie, 2020)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf FIPR}

(Buyl and Bie, 2021)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf FairGAT}

(Kose and Shen, 2023)

✓

✗

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf Fairwalk}

(Rahman et al., 2019)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf FairEdit}

(Loveland et al., 2022a)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf FairEGM}

(Current et al., 2022)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf HM\text{-}EIICT}

(Saxena et al., 2021)

✓

✗

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf DegFairGCN}

(Liu et al., 2023a)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf GMMD}

(Zhu et al., 2023)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf BeMap}

(Lin et al., 2023)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf CFC}

(Bose and Hamilton, 2019)

✓

✗

✓

✗

\mathsf{\sf\bf FLIP}

(Masrour et al., 2020)

✓

✗

✓

✗

\mathsf{\sf\bf DKGE}

(Fisher et al., 2020)

✗

✓

✗

✓

✗

\mathsf{\sf\bf FairGNN}

(Dai and Wang, 2021)

✓

✗

✓

✗

\mathsf{\sf\bf MONET}

(Palowitch and Perozzi, 2020)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf NIFTY}

(Agarwal et al., 2021)

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf Crosswalk}

(Khajehnejad et al., 2021)

✓

✗

✓

✗

✓

✗

✓

✗

✓

\mathsf{\sf\bf InFoRM}

(Kang et al., 2020)

✓

✗

✓

✗

✓

\mathsf{\sf\bf REDRESS}

(Dong et al., 2021)

✓

✗

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf FairHELP}

(Cao et al., 2023)

✗

✓

✗

✓

✗

✓

✗

\mathsf{\sf\bf FAN}

(Arduini et al., 2020)

✗

✓

✗

✓

✗

\mathsf{\sf\bf HyperGCL}

(Wei et al., 2022)

✗

✓

✗

✓

✗

\mathsf{\sf\bf FairEvolveGCN}

(Song et al., 2022b)

✗

✓

✗

✓

✗

✓

✗

3. Fairness Evaluation Metrics

We now present metrics for evaluating fairness at different fundamental levels. We categorize these fundamental fairness evaluation metrics into graph-level fairness metrics (Sec. 3.1), neighborhood-level fairness metrics (Sec. 3.2), embedding-level fairness metrics (Sec. 3.3), and prediction-level fairness metrics (Sec. 3.4).

3.1. Graph-level Fairness Metrics

We first present metrics for evaluating fairness at the graph-level. Intuitively, graph-level fairness metrics consider the bias that arises from the graph structure $G$ for a specific sensitive attribute $\bm{\mathrm{s}}$ . These metrics are largely based on the notion of homophily that is assumed by the vast majority of graph models. Homophily is the notion that nodes that are neighbors (adjacent) are more likely to share the same attribute value. Note that these fairness evaluation metrics are independent of the trained model and its predictions.

One such simple metric for measuring the homophily in a graph is as follows:

Definition 1 (Homophily Ratio $h$ ).

Given a graph $G=(V,E)$ and a sensitive attribute $\bm{\mathrm{s}}$ with $|S|$ unique values, then let $\bm{\mathrm{C}}\in\mathbb{R}^{|S|\times|S|}$ be defined as

(8)

\displaystyle C_{ij}=|\{(u,v):(u,v)\in E\wedge s_{u}=i\wedge s_{v}=j\}|

Intuitively, $C_{ij}$ is the frequency that two nodes connected by an edge in $G$ have attribute values of $i\in S$ and $j\in S$ . Then, the homophily ratio $h$ of $G$ is:

(9)

\displaystyle h(G)=\frac{\sum_{i}C_{ii}}{\sum_{i}\sum_{j}C_{ij}}=\frac{\sum_{i}C_{ii}}{|E|}

where $h\in[0,1]$ . At the extremes, a graph with $h=1$ implies that all edges in $G$ connect nodes that have the same sensitive attribute value, and therefore, are highly biased, whereas for $h=0$ , then we have the opposite where edges in $G$ connect to nodes with completely different labels.

There is also another commonly used metric based on the notion of assortativity:

Definition 2 (Assortativity Coefficient $r$ ).

Given a graph $G=(V,E)$ and a sensitive attribute $\bm{\mathrm{s}}$ with $|S|$ unique values, then let $\bm{\mathrm{F}}\in\mathbb{R}^{|S|\times|S|}$ be defined as

(10)

\displaystyle F_{ij}=\frac{|\{(u,v):(u,v)\in E\wedge s_{u}=i\wedge s_{v}=j\}|}{|E|}

where $F_{ij}$ is the fraction of edges in $G$ that connect two nodes with attribute values $i\in S$ and $j\in S$ . Notice that $\sum_{i,j}F_{ij}=1$ . Let $a_{i}=\sum_{j}F_{ij}$ and $b_{j}=\sum_{i}F_{ij}$ , then the assortativity coefficient $r$ of $G$ is:

(11)

\displaystyle r(G)=\frac{\sum_{i}F_{ii}-\sum_{i}a_{i}b_{i}}{1-\sum_{i}a_{i}b_{i}}

where $r(G)\in[-1,1]$ . Intuitively, $r(G)=1$ implies that all edges in $G$ are between nodes with the same sensitive attribute value whereas $r(G)=0$ implies the opposite, that is, all edges in $G$ are between nodes with different sensitive attribute values.

These graph-level metrics are important to understand fairness with respect to only the graph structure and sensitive attributes. More importantly, suppose a pre-processing fairness approach is used over the initial graph $G$ to make it more fair, thus resulting in another modified graph $G^{\prime}$ . The graph-level fairness evaluation metrics can be used over this new modified graph $G^{\prime}$ to evaluate whether it is more fair or not compared to the original graph or even another graph derived from another approach. These evaluation metrics can also be used internally during the training process.

3.2. Neighborhood-level Fairness Metrics

We now formally present a neighborhood fairness metric that can be leveraged prior to training a graph neural network model to determine the overall localized fairness in the graph with respect to one or more sensitive attributes. This metric indicates the impact on fairness from the neighborhoods on the learned embeddings. In other words, it reveals the overall local fairness when a GNN-based approach is used since these methods all leverage neighborhoods for learning the embeddings of the nodes in the graph. Therefore, this metric can reveal the overall fairness apriori to training a large-scale GNN model, and based on this, can leverage our approach or future state-of-the-art to mitigate the identified fairness issues that are revealed by the neighborhood fairness metric. More formally, the entropy-based neighborhood fairness metric is defined as follows:

Definition 3 (Local Node Neighborhood Fairness).

Let $\bm{\mathrm{c}}_{i}$ be the vector of the frequency of the sensitive attribute values of the neighbors $N_{i}$ of node $i$ such that $c_{ik}=|N_{i}^{k}|$ where $N_{i}^{k}$ is the subset of $N_{i}$ with sensitive attribute value $k$ . Then the neighborhood fairness metric quantifying the localized fairness of a neighborhood of a node $i$ is:

(12)

\displaystyle\mathbb{F}(\bm{\mathrm{p}}_{i})=-\sum_{k}\;p_{ik}\log p_{ik}

where $\bm{\mathrm{p}}_{i}=\frac{\bm{\mathrm{c}}_{i}}{\sum_{k}c_{ik}}$ is the probability distribution vector $\sum_{k}p_{ik}=1$ of node $i$ . Intuitively, when $\mathbb{F}(\bm{\mathrm{p}}_{i})=1$ , then the neighborhood of $i$ is said to be completely fair, as no information is revealed from the neighborhood of $i$ about the sensitive attribute value of $i$ . In other words, when $\mathbb{F}(\bm{\mathrm{p}}_{i})=1$ , the neighborhood $N_{i}$ leaks no information about the sensitive attribute of $i$ (maximum fairness). Conversely, when $\mathbb{F}(\bm{\mathrm{p}}_{i})=0$ , then knowing $\bm{\mathrm{p}}_{i}$ reveals significant information about the sensitive attribute (least uncertainty). Hence, $\mathbb{F}(\bm{\mathrm{p}}_{i})=0$ indicates a neighborhood with minimum fairness (max unfairness) whereas $\mathbb{F}(\bm{\mathrm{p}}_{i})=1$ indicates a neighborhood with maximum fairness.

Notice that maximum neighborhood fairness is achieved when $\mathbb{F}(\bm{\mathrm{p}}_{i})=1$ , that is, $\bm{\mathrm{p}}_{i}$ is the uniform probability distribution, therefore, revealing no information about the sensitive attribute value $s_{i}$ of node $i$ . Conversely, maximum neighborhood unfairness is achieved when $\mathbb{F}(\bm{\mathrm{p}}_{i})=0$ , indicating that the sensitive attribute value of $i$ is deterministic, that is, able to be predicted with no uncertainty. Using Definition 3, we define the overall neighborhood fairness metric of a graph $G$ is defined as follows:

Definition 4 (Neighborhood Fairness).

The neighborhood fairness $\mathbb{F}(G)$ of a graph $G$ is

(13)

\displaystyle\mathbb{F}(G)=\frac{1}{|V|}\sum_{i\in V}\mathbb{F}(\bm{\mathrm{p}}_{i})

where $\mathbb{F}(G)$ is an intuitive metric characterizing the inherit fairness of $G$ over all the local neighborhoods. Thus, capturing the local fairness of the graph $G$ with respect to the sensitive attribute $\bm{\mathrm{s}}=\big{[}\,s_{1}\,s_{2}\,\cdots\,s_{i}\,\cdots\,s_{n}\,\big{]}$ .

Since neighborhoods lie at the heart of all GNN-based methods (Wu et al., 2020), the fairness of the trained GNN models and the resulting node embeddings are fundamentally tied to the neighborhoods used to train these models.

3.3. Embedding-level Fairness Metrics

To measure the fairness of the learned embeddings, one can leverage the notion of representation bias (RB). This metric enables one to understand if the node embeddings given as output by some arbitrary approach can be leveraged by an adversary to recover the sensitive attribute values of the nodes in the graph. More formally, for classifier $c$ , let $P_{c}(s,\bm{\mathrm{z}}_{i})$ denote the estimated probability that node $i$ with embedding $\bm{\mathrm{z}}_{i}$ has sensitive attribute value $s\in S$ . Then representation bias (RB) score (Buyl and De Bie, 2020) is:

(14)

\displaystyle\text{RB}=\sum_{s\in S}\frac{1}{V_{s}}\texttt{AUC}\big{(}\{P_{c}(A(j)|\bm{\mathrm{z}}_{j})|j\in V_{s}\}\big{)}

where $V_{s}=\{j\in V|A(j)=s\}$ and $A(j)$ is the sensitive attribute value for node $j$ . Eq. 14 uses weighted one-vs-rest AUC score to measure prediction performance. Intuitively, if a model learns fair embeddings, then the classifier trained using the node embeddings should perform poorly (close to random if truly independent). However, if we are able to predict the sensitive attribute of a node with high accuracy using only the learned embeddings, then they are obviously not independent.

3.4. Prediction-level Fairness Metrics

3.4.1. Statistical Parity (SP)

The statistical parity (SP) metric (also called demographic parity, or DP) measures the difference between the group-level selection rates of the largest and the smallest groups. More formally, given the prediction $\hat{Y}$ along with the sensitive attribute value $s$ , $\Delta\mathtt{SP}$ is:

(15)

\displaystyle\Delta\mathtt{SP}=\left|\mathbb{P}(\hat{Y}=1\,|\,s=1)\,-\,\mathbb{P}(\hat{Y}=1\,|\,s=0)\right|

where $\Delta\mathtt{SP}=0$ implies all groups have the same selection rates, and thus, completely fair. Statistical parity measures the preferential treatment gap between the groups. However, $\Delta\mathtt{SP}$ does not consider whether the individual is qualified or not since it does not consider the ground-truth $Y$ . Note that Eq. 15 is defined for sensitive attributes with only two groups, though, is easy to generalize to $k$ groups by considering the difference between the largest and smallest group-level selection rate across all values of the sensitive attribute:

(16)

\displaystyle\Delta\mathtt{SP}=\left|\max_{s}\mathbb{E}\big{[}\hat{Y}\,|\,S=s\big{]}\,-\,\min_{s}\mathbb{E}\big{[}\hat{Y}\,|\,S=s\big{]}\right|

The above formulation also generalizes to the link prediction task. More formally, statistical parity for a link prediction function $h:V\times V\rightarrow\{0,1\}$ is:

(17)		$\displaystyle\Delta\mathtt{SP}$	$\displaystyle=\left\|\mathbb{P}\big{(}h(v,u)=1\,\|\,s_{v}\not=s_{u}\big{)}\,-\,\mathbb{P}\big{(}h(v,u)=1\,\|\,s_{v}=s_{u}\big{)}\right\|$
		$\displaystyle=\left\|\mathbb{P}\big{(}h(v,u)=1\,\|\,s_{v}\oplus s_{u}=1\big{)}\,-\,\mathbb{P}\big{(}h(v,u)=1\,\|\,s_{v}\oplus s_{u}=0\big{)}\right\|$

where $s_{v}\oplus s_{u}=1$ (or $s_{v}\not=s_{u}$ ) implies that $v$ and $u$ belong to different groups whereas $s_{v}\oplus s_{u}=0$ (or $s_{v}=s_{u}$ ) implies they belong to the same group. Hence, in the case of link prediction, we only consider whether the sensitive attribute values are the same $s_{v}=s_{u}$ or not $s_{v}\not=s_{u}$ , since an edge either exists or not, $h(v,u)\in\{0,1\}$ .

3.4.2. Equal Opportunity (EO)

The equal opportunity metric requires the non-discrimination only within the “advantaged” outcome group. More formally, given the ground-truth $Y$ , the prediction $\hat{Y}$ , along with the sensitive attribute value $s$ , $\Delta\mathtt{EO}$ is:

(18)

\displaystyle\Delta\mathtt{EO}=\left|\mathbb{P}(\hat{Y}=1\,|\,Y=1,\,s=1)\,-\,\mathbb{P}(\hat{Y}=1\,|\,Y=1,\,s=0)\right|

where lower values of $\Delta\mathtt{EO}$ imply better fairness. The above equal opportunity metric is a relaxation of the equalized odds metric (Hardt et al., 2016) that measures the difference of true positives, true negatives, false positives and false negatives between the groups. More formally, given the ground-truth $Y$ , the prediction $\hat{Y}$ , along with the sensitive attribute value $s$ , $\Delta\mathtt{EO}$ is:

(19)

\displaystyle\Delta\mathtt{EO}=\left|\mathbb{P}(\hat{Y}=1\,|\,Y=y,\,s=1)\,-\,\mathbb{P}(\hat{Y}=1\,|\,Y=y,\,s=0)\right|,\quad y\in\{0,1\}

where $\Delta\mathtt{EO}=0$ implies all groups have the same true positive, true negative, false positive, and false negative rates, and are therefore fair.

4. GNN Fairness Techniques

For graph fairness, techniques can generally take one of three entry points for their mitigation – modifying graphs before training with pre-processing (Sec. 4.1), modifying the training process (Sec. 4.2), modifying the outputs with post-processing (Sec. 4.3), or a hybrid approach that combines two or more mitigation techniques at different stages (Sec. 4.4). We summarize and categorize GNN fairness techniques in Table 1 using the proposed taxonomy that categorizes fairness techniques for GNNs based on the (i) type of input graph supported (e.g., homogeneous, bipartite, heterogeneous), (ii) type of unfairness mitigation technique based on whether bias mitigation is performed during pre-processing, training, or post-processing, and (iii) graph learning task such as node classification, link prediction, or link classification.

Refer to caption — Figure 1. Exact neighborhood fairness is NP-hard for GNNs even at the graph-level. Greedy fairness optimization via neighborhood edge augmentation. In each step, a vertex $i\in V$ is selected, and the neighborhood is modified to make it fair, *e.g.*, by adding two edges as shown in (a). As this process continues for every node in a greedy fashion, there is no guarantee that the subsequent nodes remain fair, unless we explicitly incorporate this constraint, which makes this problem in general very complex, and even with this additional constraint to revisit nodes or carefully change the graph to ensure nodes visited previously remain fair, there is no guarantee that all such nodes can actually be fair. This intuition is also useful for understanding in-training methods discussed in Sec 4.2 that sample or augment neighborhoods to reduce bias for each node during the neighborhood aggregation process when training the GNN.

4.1. Pre-Processing

Pre-processing techniques remove bias or unfairness before GNN training occurs, by targeting the input graph structure $\bm{\mathrm{A}}$ , input features $\bm{\mathrm{X}}$ , or both. For instance, work by Spinelli et al. (2021) proposed a pre-processing approach that randomly removes edges from the graph before training to debias the resulting GNN model. More recently, Chen et al. (2022) developed a GNN fairness framework based on the proposed notion of neighborhood fairness. The framework consists of two main components where the first component constructs unbiased and fair neighborhoods by adding and removing edges to ensure each neighborhood is unbiased with respect to the sensitive attribute while preserving structures important for prediction tasks such as link prediction and classification. The second component provides additional flexibility by enabling the fair neighborhoods to be modified via a function to capture certain application or data-dependent constraints. These fair neighborhoods are then leveraged by any arbitrary GNN model to learn fair embeddings for downstream graph learning tasks. An intuitive illustration showing the difficulty of guaranteeing exact fairness with respect to the neighborhoods is shown in Figure 1. In particular, we see that even in this simple example, it becomes impossible to ensure fairness with respect to each neighborhood in the graph, since when one neighborhood is made fair, it can impact the surrounding neighborhoods. It is also straightforward to see that an iterative optimization approach to solve this would require a significant computational cost without any guarantees of fairness across each neighborhood, and the neighborhoods that have the largest impact are often the ones that have the most impact when training GNNs, since they are the ones that connect to many other nodes, and therefore, updating the neighborhood, and even the embedding when using this approach during training of the GNN impacts the embeddings of all other nodes connected. Current et al. (2022) studied a few graph modification strategies that perform either microscopic or macroscopic edits to the input graph. One of the proposed methods adds a new node for each existing node to balance biases in the graph, whereas the other methods only include a fixed number of existing nodes and include weights for the edges as a means to debias the graph for GNN training. Another approach called FairAdj (Li et al., 2020) seeks to learn a fair adjacency matrix for a downstream link prediction task by updating the normalized adjacency matrix while keeping the original graph unchanged. This approach rewires the graph to preserve structural constraints for fairness while trying to also preserve accuracy as much as possible. Furthermore, they introduce dyadic fairness that expects the prediction of a link between two nodes to be statistically independent of their sensitive attributes, hence, $P(g(u,v)|S(u)=S(v))\!=\!P(g(u,v)|S(u)\not=S(v))$ . Yang et al. (2022) proposed data reparation through optimal transport techniques to obtain dyadic fairness. Similarly, Laclau et al. (2021) proposed a repairing procedure for the graph adjacency matrix with a trade-off between group and individual fairness.

4.2. In-Training

Most GNN-based bias mitigation techniques have focused on modifying the objective function of GNNs to learn fair and unbiased embeddings during training. This can be through the addition of constraints or regularization to the objective function, adding attention weights to GATs that focus on fairness weighting, or careful sampling of the explicit neighborhoods for updating the embedding via aggregation, as well as many other ways to mitigate bias during training, which are discussed in detail below. For instance, DegFairGCN (Liu et al., 2023a) considers two groups of nodes based on low and high-degree when performing neighbor aggregations, namely, $\mathcal{S}_{0}=\{v\in\mathcal{V}|\mathrm{deg}_{1}(v)\leq K\}$ and $\mathcal{S}_{1}=\mathcal{V}\setminus\mathcal{S}_{0}$ where $\mathcal{S}_{0}$ is the low-degree nodes, $\mathcal{S}_{1}$ is the high-degree node group, and $K$ is a threshold for creating such groups. Using these two groups, they modify the neighborhood aggregation used to consider these two groups differently, attempting to debias them accordingly during training. Instead of using traditional sensitive attributes for fairness evaluation, that work used node degree as the sensitive attribute, which is problematic. FairEdit leverages both greedy edge additions and deletions to improve fairness in GNNs (Loveland et al., 2022a). Recent work by Kose and Shen (2023) developed an attention mechanism that mitigates bias called FairGAT. The fairness-aware attention mechanism can be leveraged in other attention-oriented GNNs as well (Lee et al., 2019). All these approaches discussed thus far have focused on reducing bias in the neighborhood used for aggregation either by sampling, modifying, or reweighting the nodes in the neighborhood right before its used for aggregation during training. However, performing aggregation using these locally “fair” neighborhoods has even fewer guarantees than approaches that modify the graph structure before training, see Fig 1 and the discussion in Sec. 4.1 for intuition. Other work by Palowitch and Perozzi (2020) introduced a neural network architecture component called MONET that performs training-time debiasing by ensuring the embeddings are trained on a hyperplane orthogonal to the metadata. Agarwal et al. (2021) developed a novel objective function to account for fairness and stability called NIFTY. They also introduce a layer-wise weight normalization to enforce fairness in the GNN architecture. Further, Buyl and De Bie (2020) proposed a Bayesian method called DeBayes that leverages a biased prior to learn debiased embeddings. Dong et al. (2021) introduced a rank-based approach called REDRESS for mitigating individual unfairness in GNNs where the goal is to ensure GNNs infer similar predictions for individual nodes that are similar to one another. The approach jointly optimizes the utility maximization of GNNs and rank-based individual fairness in an end-to-end fashion.

Zhu et al. (2023) proposed a fairness-aware message-passing framework for GNNs called GMMD for node classification that seeks to jointly optimize both representation fairness and graph smoothing. Similarly, Lin et al. (2023) developed a balanced message-passing approach for GNNs called BeMap. This approach uses a sampling strategy to balance the number of 1-hop neighbors of each type for every node in the graph. This is in principle similar to the first step of FairNeigh, though performed during training. Dong et al. (2022a) developed an approach called Edits for mitigating both attribute-based bias as well as structural bias in GNNs based on the Wasserstein distance. However, attribute and structural debiasing are independent of one another, as opposed to being jointly considered, which is important since GNNs are trained by leveraging both. More recently, He et al. (2023) proposed an efficient approach called FairMILE for ensuring fairness in GNNs via a multi-level framework that leverages graph coarsening to obtain base embeddings and then refines these to obtain an embedding for each node of the graph. There are also many other in-training approaches that leverage an adversarial framework, by incorporating the objective that an adversarial model should not have high accuracy in predicting the sensitive attribute (Wang et al., 2022a; Liu et al., 2022b; Khajehnejad et al., 2020; Singh et al., 2021).

There have also been a number of works focused on GNN-based recommendation (Xu et al., 2023a; Wu et al., 2022; Salganik et al., 2022; Medda et al., 2023; Liu et al., 2022b; Chizari et al., 2023; Wu et al., 2023). FairRec (Patro et al., 2020) was proposed for the closely related task of recommendation. In particular, they studied fairness in recommender systems involving customers and producers, and proposed an approach called FairRec that is based on fair allocation of indivisible goods. FairRec guarantees at least maximin share of exposure for most producers and envy-free up to one good fairness for every customer. Li et al. (2021) proposed an adversarial in-training method to learn fair user embeddings for fair recommendations. Separately, Li et al. (2022) designed a framework for fair sequential recommendations, which can both do end-to-end training and also learn fairness-aware preference graph embeddings. There have also been some recent works that exploit communities to obtain fair link predictions in complex networks, such as HM-EIICT (Saxena et al., 2021). Tsioutsiouliklis et al. (2021) developed algorithms for fairness in the PageRank algorithm, requiring fairness on the proportion of PageRank score assigned to each group, and then fairness on the derived personalized PageRank to each node.

Recent work has also considered mitigating fairness issues in a wide range of different types of graphs, including hypergraphs (Wei et al., 2022), heterogeneous information networks (Cao et al., 2023; Zeng et al., 2021), knowledge graphs (Wang et al., 2022a; Vannur et al., 2021), and even temporal networks (Song et al., 2022b; Xu et al., 2021). For hypergraphs, Wei et al. (2022) proposed HyperGCL and show that their method for augmenting hypergraphs improves fairness in representation learning. Cao et al. (2023) proposed FairHELP for deriving fair embeddings for heterogeneous information networks. Most recently, temporal networks have been considered by Song et al. (2022b) where they propose an approach for improving individual fairness for dynamic GNNs such as EvolveGCN. For this, they introduce a simple regularization-based method to achieve individual fairness in the dynamic graph setting for GNNs. Other papers have identified node degree as a source of bias (Tang et al., 2020; Kang et al., 2022; Jiang et al., 2022). The latter defines node-degree disparity in terms of the Rawlsian difference principle, and proposes a RawlsGCN-Graph pre-processing method and a RawlsGCN-Grad in-processing method for fair predictive accuracy. Recent work by Loveland et al. (2022b) investigated fairness in GNNs when neighborhoods in the graph are heterophilous as opposed to homophilous. In this setting, they find that several fairness metrics can be significantly improved when leveraging heterophilous GNNs that naturally handle disassortativity.

4.3. Post-Processing

Post-processing techniques take the output embeddings of a GNN model and remove bias from them. Techniques include adding filters or otherwise removing information about the sensitive attribute from those embeddings. Masrour et al. (2020) addressed the problem of link prediction homophily with postprocessing, as well as an adversarial framework. Fisher et al. (2020) developed techniques for debiasing knowledge graph embeddings. Dai and Wang (2021) proposed FairGNN, a framework for fair node classification using GNNs given limited sensitive attribute information. Bose and Hamilton (2019) investigated incorporating compositional fairness constraints into graph embeddings, creating sensitive attribute filters that can be optionally applied after training. FairGO (Wu et al., 2021) was proposed to ensure fairness by taking the original embeddings from a method and then leveraging a composition of filters that transform the embeddings to a new filtered embedding space to improve fairness. This transformation leverages adversarial learning of a user-centric graph to obfuscate the sensitive attribute from the underlying embeddings. Kose et al. (2023) designed a fairness-aware filter to reduce the bias in the learned embeddings from GNNs by essentially removing the sensitive information. This technique can be used in other GNN designs. They also demonstrate the effectiveness theoretically compared to the fairness-agnostic embedding that arises without their fairness-aware filter.

4.4. Hybrid

A hybrid technique combines two or more techniques that are used at different stages (e.g., pre-processing, during training, and post-processing) to ensure a better and more robust degree of fairness with respect to the sensitive attribute(s). Few papers explicitly propose hybrid methods, which combine two or more of the previous techniques, but it is very much a possibility to improve fairness and robustness through some combinations. An example of this might be to use a preprocessing technique such as rewiring the graph to ensure exact neighborhood fairness (Chen et al., 2022) and then using an in-training technique that adds a fairness constraint to the objective of a GNN model to further ensure fairness of the learned embeddings. While there have not been many hybrid fairness techniques for GNNs, there has been one work by Kang et al. (2020) that focused on a related graph learning method for graph mining tasks. In particular, InFoRM first modifies the graph structure to remove bias, then attempts to debias the mining model by solving an optimization problem, and finally solves a similar problem for debiasing the results from the mining model. It is straightforward to leverage many of the GNN fairness techniques discussed in the previous sections together to obtain novel hybrid approaches for providing even more fair GNNs and results with potentially stronger fairness guarantees as well.

Table 2. Summary of datasets used for fairness evaluation of GNNs. Note

|V|

and

|E|

are the number of nodes and edges, respectively. Further,

|S|

is the number of unique values of the sensitive attribute

S

. Also,

|\mathcal{S}|

is the number of sensitive attributes and

|\bm{\mathrm{X}}|

is the number of input features (if any). We categorize datasets according to their domain (recommendation, social networks, collaboration, web graphs, similarity, or citation networks) along with the task that each dataset was used.

	Dataset	$\|V\|$	$\|E\|$	$S$ ( $\|S\|$ )	$\|\bm{\mathrm{X}}\|$	Description	Task
rec	ML-100K	2,625	100K	gender (2), age (7), job (21)	12	user-by-movie	LP
	ML-1M	10,040	1M	gender (2), age (7), job (21)	11	user-by-movie	LP
	LastFM	49,900	518,647	gender (2), age (3)	$-$	user-by-artist	LP
	Amazon	334,863	925,872	product category (4)	$-$	user-by-product	LP
	Yelp	12,683	211,721	food genre (4)	14	user-by-business	LP
social	Pokec	1.63M	30.6M	gender (2), region (2)	59	friendship	LP
	Pokec-n	66,569	729,129	region (2)	59	friendship	NC
	Pokec-z	67,797	882,765	region (2)	59	friendship	NC
	Twitter^∗	81,306	1,768,149	political view (2)	1,364	who-follows-whom	LP
	Facebook	1,034	26,749	gender (2)	224	friendship	LP
	fb-Ok97	3,111	73,230	gender (2)	8	friendship	LP,NC
	fb-UNC28	4,018	65,287	gender (2)	8	friendship	LP,NC
	fb-Rice	1,205	42,443	age (2)	2	friendship	LP, NC
	Google+	4,938	547,923	gender (2)	5	friendship	LP
	NBA	403	10,621	country (2)	96	who-follows-whom	NC
	fb-Gender	7,315	89,733	gender (2)	$-$	friendship	LP
	Retweet-pol	18,470	61,157	political view (2)	$-$	friendship	LP
	Dutch school	26	221	gender (2)	$-$	friendship	LP
	Epinion	8,806	157,887	$-$	$-$	user-trusts-user	LP
	Ciao	7,317	85,205	$-$	$-$	user-trusts-user	LP
	Filmtrust	3,579	35,494	$-$	$-$	user-trusts-user	LP
collab	Citeseer	3,327	4,732	topic (6)	3,703	coauthorship	LP
	Cora	2,708	5,429	topic (7)	1,433	coauthorship	LP
	Pubmed	19,717	44,338	topic (3)	500	coauthorship	LP
	DBLP	3,980	6,965	continent (5), gender(2)	$-$	coauthorship	LP
	Hospital-cont	75	1139	job (4)	$-$	proximity	LC
web	WebKB	265	530	topic (5)	500	web graph	LP
	Chameleon	2,277	31,371	gen. node degree (2)	2,325	web graph	NC
	Squirrel	5,201	198,353	gen. node degree (2)	2,089	web graph	NC
sim	German	1,000	21,742	gender (2)	27	client similarity	NC
	Recidivism	18,876	311,870	race (2)	18	defendant sim.	NC
	Credit	30,000	1,421,858	age (2)	13	individual sim.	NC
cit	EMNLP	2,600	7,969	gen. node degree (2)	8	citation network	NC

5. Datasets

We summarize datasets commonly used for evaluating fairness of GNNs in Table 2. Notably, the datasets are organized by application domain including recommendation (bipartite graphs), social networks, collaboration networks, web graphs, similarity graphs, and citation networks. Furthermore, $|V|$ and $|E|$ are the number of nodes and edges, whereas $|S|$ is the number of unique values of the sensitive attribute $S$ . We also denote $|\mathcal{S}|$ as the number of sensitive attributes and $|\bm{\mathrm{X}}|$ as the number of input features. While most work focuses on a single sensitive attribute, there are some datasets with multiple sensitive attributes that can be used for fairness techniques designed for multiple sensitive attributes. Furthermore, we also summarize the various fair graph learning tasks that each dataset was used, which include link prediction (LP), node classification (NC), and link classification (LC). Notably, there is only a single link classification dataset used in the literature. This may be due to the task being close to real-world data found in industry, but also evaluating fairness in link classification requires having one or more sensitive attributes on the nodes or links.

6. Open Problems & Challenges

In this section, we discuss open problems and highlight important challenges for future work.

6.1. Feature vs. Feature-less Setting

Previous work on developing fairness techniques for GNNs has mainly focused on graphs with features. The reason for this is quite simple. GNNs require an initial feature matrix, and therefore, most work has simply used datasets that naturally come with an input feature matrix. However, this severely limits the graph datasets used for evaluation to those that have one or more sensitive attributes, as well as an entire feature matrix. Most importantly though, input features are often highly correlated with the sensitive attribute, and therefore potentially (and often) add multiple confounding factors when evaluating fairness techniques that are mostly focused on the structure of the graph, as opposed to the correlation of input features with respect to the sensitive attribute.

For these fundamental reasons, one should also consider a new setting when evaluating techniques for improving fairness in GNNs, which we call the feature-less setting. In this proposed setting, we do not include any underlying features as input, and instead initialize the feature matrix either uniformly at random or based on the graph structure, e.g., using SVD or an unsupervised embedding approach such as node2vec (Grover and Leskovec, 2016) or DeepGL (Rossi et al., 2018). This new feature-less setting may actually be more useful, as most graphs do not naturally come with input features (see NetworkRepository by Rossi and Ahmed (2015)), and therefore, it opens the door for evaluation on a much larger scale and gives rise to entirely new use cases and practical applications for such approaches. Nevertheless, one can also study graphs with features under this setting by simply ignoring them. Studying both the feature and feature-less setting allows for a better evaluation and understanding of the approach under different conditions and controlling for different factors that may influence the fairness, accuracy, and the underlying conclusions that are drawn from the experiments. Understanding how previous work performs in this setting remains an open problem.

6.2. Theoretical Limits

Understanding the theoretical limits in terms of the fairness and accuracy trade-offs and deriving theoretical guarantees for such techniques is fundamentally important. Despite this, theoretically analyzing existing fairness techniques for GNNs remains a largely open problem for future work.

6.3. Link Classification

While most work has focused on developing techniques for either node classification or link prediction, the problem of link classification remains largely unexplored. In this task, we are given a small fraction of link labels for training and need to predict the remaining held-out labels of the links. This is fundamentally different from link prediction since in link classification we are given the entire graph $G$ along with a sensitive attribute on the nodes that is never seen by the algorithm, and need to correctly infer the label of the link (which is already existing in the graph). Link classification in GNNs are often a multi-class problem where the number of unique labels to infer is much larger than inferring only two labels, which is the simplest binary link classification task.

6.4. Heterogeneous and Temporal Networks

Most work has only focused on developing fairness techniques for GNNs on simple graphs. However, it is unclear how such techniques perform when the graph is heterogeneous, that is, nodes and edges may be of completely different types. Similarly, ensuring fairness when the graph structure and possibly even its attributes of it are changing over time remains an open and challenging problem. These different types of networks may also lead to the need for fairness metrics for the evaluation of such techniques for these important types of networks.

6.5. Large Language Models

Recently, GNNs have found applications in language models (Meng et al., 2021). Fairness in such models is of fundamental importance due to their wide-scale use in many applications, yet very challenging due to how these models are trained. Future work should focus on developing fair and unbiased GNN-based language models.

7. Conclusion

Given the importance of GNNs due to their representational power and state-of-the-art predictive performance, this paper has surveyed techniques for improving the fairness of GNNs. After presenting a taxonomy for fairness in GNNs that categorizes techniques based on the type of input graph supported, type of fairness technique (post-processing, in-training, post-processing), and the graph learning task. We also introduce an intuitive taxonomy for graph fairness evaluation metrics including graph-level fairness, neighborhood-level fairness, embedding fairness, and prediction-level fairness metrics for GNNs. Furthermore, we summarize the graph datasets useful for benchmarking GNN fairness techniques and categorize them according to the domain and graph learning task. As discussed in Section 6, there remains significant work to do. One important and largely open problem is understanding the theoretical limits in terms of the fairness and accuracy trade-offs and deriving theoretical guarantees for such techniques. Theoretically analyzing existing fairness techniques across the different categories of approaches (pre-processing, in-training, and post-processing) remains a largely open problem for future work as well.

References

(1)
Agarwal et al. (2021) Chirag Agarwal, Himabindu Lakkaraju, and Marinka Zitnik. 2021. Towards a unified framework for fair and stable graph representation learning. In Uncertainty in Artificial Intelligence. PMLR, 2114–2124.
Arduini et al. (2020) Mario Arduini, Lorenzo Noci, Federico Pirovano, Ce Zhang, Yash Raj Shrestha, and Bibek Paudel. 2020. Adversarial learning for debiasing knowledge graph embeddings. arXiv preprint arXiv:2006.16309 (2020).
Bose and Hamilton (2019) Avishek Bose and William Hamilton. 2019. Compositional fairness constraints for graph embeddings. In International Conference on Machine Learning. PMLR, 715–724.
Bronstein et al. (2021) Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. 2021. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv:2104.13478 (2021).
Buyl and Bie (2021) Maarten Buyl and Tijl De Bie. 2021. The kl-divergence between a graph model and its fair i-projection as a fairness regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 351–366.
Buyl and De Bie (2020) Maarten Buyl and Tijl De Bie. 2020. DeBayes: a Bayesian Method for Debiasing Network Embeddings. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 1220–1229.
Cao et al. (2023) Meng Cao, Jianqing Song, Jinliang Yuan, Baoming Zhang, and Chongjun Wang. 2023. FairHELP: Fairness-Aware Heterogeneous Information Network Embedding for Link Prediction. In International Conference on Database Systems for Advanced Applications. Springer, 320–330.
Chen et al. (2022) April Chen, Ryan Rossi, Nedim Lipka, Jane Hoffswell, Gromit Chan, Shunan Guo, Eunyee Koh, Sungchul Kim, and Nesreen K Ahmed. 2022. Graph Learning with Localized Neighborhood Fairness. arXiv preprint arXiv:2212.12040 (2022).
Chizari et al. (2023) Nikzad Chizari, Keywan Tajfar, and María N Moreno-García. 2023. Bias Assessment Approaches for Addressing User-Centered Fairness in GNN-Based Recommender Systems. Information 14, 2 (2023), 131.
Choudhary et al. (2022) Manvi Choudhary, Charlotte Laclau, and Christine Largeron. 2022. A Survey on Fairness for Machine Learning on Graphs. arXiv preprint arXiv:2205.05396 (2022).
Current et al. (2022) Sean Current, Yuntian He, Saket Gurukar, and Srinivasan Parthasarathy. 2022. FairMod: Fair Link Prediction and Recommendation via Graph Modification. arXiv preprint arXiv:2201.11596 (2022).
Dai and Wang (2021) Enyan Dai and Suhang Wang. 2021. Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 680–688.
Dong et al. (2021) Yushun Dong, Jian Kang, Hanghang Tong, and Jundong Li. 2021. Individual fairness for graph neural networks: A ranking based approach. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 300–310.
Dong et al. (2022a) Yushun Dong, Ninghao Liu, Brian Jalaian, and Jundong Li. 2022a. Edits: Modeling and mitigating data bias for graph neural networks. In Proceedings of the ACM Web Conference 2022. 1259–1269.
Dong et al. (2022b) Yushun Dong, Jing Ma, Chen Chen, and Jundong Li. 2022b. Fairness in Graph Mining: A Survey. arXiv preprint arXiv:2204.09888 (2022).
Dong et al. (2023) Yushun Dong, Binchi Zhang, Yiling Yuan, Na Zou, Qi Wang, and Jundong Li. 2023. Reliant: Fair knowledge distillation for graph neural networks. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM). SIAM, 154–162.
Fisher et al. (2020) Joseph Fisher, Arpit Mittal, Dave Palfrey, and Christos Christodoulopoulos. 2020. Debiasing knowledge graph embeddings. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 7332–7345.
Grover and Leskovec (2016) Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks.
Hardt et al. (2016) Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. Advances in neural information processing systems 29 (2016).
He et al. (2023) Yuntian He, Saket Gurukar, and Srinivasan Parthasarathy. 2023. Efficient Fair Graph Representation Learning Using a Multi-level Framework. In Companion Proceedings of the ACM Web Conference 2023. 298–301.
Hussain et al. (2022) Hussain Hussain, Meng Cao, Sandipan Sikdar, Denis Helic, Elisabeth Lex, Markus Strohmaier, and Roman Kern. 2022. Adversarial Inter-Group Link Injection Degrades the Fairness of Graph Neural Networks. In 2022 IEEE International Conference on Data Mining (ICDM). IEEE, 975–980.
Jiang et al. (2022) Zhimeng Jiang, Xiaotian Han, Chao Fan, Zirui Liu, Na Zou, Ali Mostafavi, and Xia Hu. 2022. FMP: Toward Fair Graph Message Passing against Topology Bias. arXiv preprint arXiv:2202.04187 (2022).
Kang et al. (2020) Jian Kang, Jingrui He, Ross Maciejewski, and Hanghang Tong. 2020. InFoRM: Individual Fairness on Graph Mining. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Virtual Event, CA, USA) (KDD ’20). Association for Computing Machinery, New York, NY, USA, 379–389.
Kang et al. (2022) Jian Kang, Yan Zhu, Yinglong Xia, Jiebo Luo, and Hanghang Tong. 2022. RawlsGCN: Towards Rawlsian Difference Principle on Graph Convolutional Network. In Proceedings of the ACM Web Conference 2022. 1214–1225.
Khajehnejad et al. (2021) Ahmad Khajehnejad, Moein Khajehnejad, Mahmoudreza Babaei, Krishna P Gummadi, Adrian Weller, and Baharan Mirzasoleiman. 2021. CrossWalk: Fairness-enhanced Node Representation Learning. arXiv preprint arXiv:2105.02725 (2021).
Khajehnejad et al. (2020) Moein Khajehnejad, Ahmad Asgharian Rezaei, Mahmoudreza Babaei, Jessica Hoffmann, Mahdi Jalili, and Adrian Weller. 2020. Adversarial graph embeddings for fair influence maximization over social networks. arXiv preprint arXiv:2005.04074 (2020).
Kose and Shen (2022a) O Deniz Kose and Yanning Shen. 2022a. Fair node representation learning via adaptive data augmentation. arXiv preprint arXiv:2201.08549 (2022).
Kose and Shen (2022b) O Deniz Kose and Yanning Shen. 2022b. Fairness-aware adaptive network link prediction. In 2022 30th European Signal Processing Conference (EUSIPCO). IEEE, 677–681.
Kose and Shen (2022c) O Deniz Kose and Yanning Shen. 2022c. Fairnorm: Fair and fast graph neural network training. arXiv preprint arXiv:2205.09977 (2022).
Kose and Shen (2022d) Oyku Deniz Kose and Yanning Shen. 2022d. Fast&fair: Training acceleration and bias mitigation for GNNs. Transactions on Machine Learning Research (2022).
Kose and Shen (2023) O Deniz Kose and Yanning Shen. 2023. FairGAT: Fairness-aware Graph Attention Networks. arXiv preprint arXiv:2303.14591 (2023).
Kose et al. (2023) O Deniz Kose, Yanning Shen, and Gonzalo Mateos. 2023. Fairness-Aware Graph Filter Design. arXiv preprint arXiv:2303.11459 (2023).
Laclau et al. (2021) Charlotte Laclau, Ievgen Redko, Manvi Choudhary, and Christine Largeron. 2021. All of the fairness for edge prediction with optimal transport. In International Conference on Artificial Intelligence and Statistics. PMLR, 1774–1782.
Lee et al. (2019) John Boaz Lee, Ryan A Rossi, Sungchul Kim, Nesreen K Ahmed, and Eunyee Koh. 2019. Attention models in graphs: A survey. ACM Transactions on Knowledge Discovery from Data (TKDD) 13, 6 (2019), 1–25.
Li et al. (2022) Cheng-Te Li, Cheng Hsu, and Yang Zhang. 2022. FairSR: Fairness-aware Sequential Recommendation through Multi-Task Learning with Preference Graph Embeddings. ACM Transactions on Intelligent Systems and Technology (TIST) 13, 1 (2022), 1–21.
Li et al. (2020) Peizhao Li, Yifei Wang, Han Zhao, Pengyu Hong, and Hongfu Liu. 2020. On dyadic fairness: Exploring and mitigating bias in graph connections. In International Conference on Learning Representations.
Li et al. (2021) Yunqi Li, Hanxiong Chen, Shuyuan Xu, Yingqiang Ge, and Yongfeng Zhang. 2021. Towards personalized fairness based on causal notion. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1054–1063.
Lin et al. (2023) Xiao Lin, Jian Kang, Weilin Cong, and Hanghang Tong. 2023. BeMap: Balanced Message Passing for Fair Graph Neural Network. arXiv preprint arXiv:2306.04107 (2023).
Liu et al. (2022b) Haifeng Liu, Nan Zhao, Xiaokun Zhang, Hongfei Lin, Liang Yang, Bo Xu, Yuan Lin, and Wenqi Fan. 2022b. Dual constraints and adversarial learning for fair recommenders. Knowledge-Based Systems 239 (2022), 108058.
Liu et al. (2022a) Yazheng Liu, Xi Zhang, and Sihong Xie. 2022a. Trade less Accuracy for Fairness and Trade-off Explanation for GNN. In 2022 IEEE International Conference on Big Data (Big Data). IEEE, 4681–4690.
Liu et al. (2023a) Zemin Liu, Trung-Kien Nguyen, and Yuan Fang. 2023a. On Generalized Degree Fairness in Graph Neural Networks. arXiv:2302.03881 (2023).
Liu et al. (2023b) Zheyuan Liu, Chunhui Zhang, Yijun Tian, Erchi Zhang, Chao Huang, Yanfang Ye, and Chuxu Zhang. 2023b. Fair Graph Representation Learning via Diverse Mixture of Experts. In The Web Conference.
Loveland et al. (2022a) Donald Loveland, Jiayi Pan, Aaresh Farrokh Bhathena, and Yiyang Lu. 2022a. FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing. arXiv preprint arXiv:2201.03681 (2022).
Loveland et al. (2022b) Donald Loveland, Jiong Zhu, Mark Heimann, Ben Fish, Michael T Schaub, and Danai Koutra. 2022b. On graph neural network fairness in the presence of heterophilous neighborhoods. arXiv preprint arXiv:2207.04376 (2022).
Ma et al. (2021) Jiaqi Ma, Junwei Deng, and Qiaozhu Mei. 2021. Subgroup generalization and fairness of graph neural networks. Advances in Neural Information Processing Systems 34 (2021), 1048–1061.
Masrour et al. (2020) Farzan Masrour, Tyler Wilson, Heng Yan, Pang-Ning Tan, and Abdol Esfahanian. 2020. Bursting the filter bubble: Fairness-aware network link prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 841–848.
Medda et al. (2023) Giacomo Medda, Francesco Fabbri, Mirko Marras, Ludovico Boratto, Mihnea Tufis, and Gianni Fenu. 2023. GNNUERS: Fairness Explanation in GNNs for Recommendation via Counterfactual Reasoning. arXiv preprint arXiv:2304.06182 (2023).
Meng et al. (2021) Yuxian Meng, Shi Zong, Xiaoya Li, Xiaofei Sun, Tianwei Zhang, Fei Wu, and Jiwei Li. 2021. GNN-LM: Language modeling based on global contexts via GNN. arXiv preprint arXiv:2110.08743 (2021).
Newman (2003) Mark EJ Newman. 2003. The structure and function of complex networks. SIAM review 45, 2 (2003), 167–256.
Palowitch and Perozzi (2020) John Palowitch and Bryan Perozzi. 2020. Debiasing graph representations via metadata-orthogonal training. In 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 435–442.
Patro et al. (2020) Gourab K Patro, Arpita Biswas, Niloy Ganguly, Krishna P Gummadi, and Abhijnan Chakraborty. 2020. FairRec: Two-sided fairness for personalized recommendations in two-sided platforms. In WWW. 1194–1204.
Rahman et al. (2019) Tahleen Rahman, Bartlomiej Surma, Michael Backes, and Yang Zhang. 2019. Fairwalk: Towards fair graph embedding. (2019).
Rossi and Ahmed (2015) Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI. https://networkrepository.com
Rossi et al. (2012) Ryan A Rossi, Luke K McDowell, David William Aha, and Jennifer Neville. 2012. Transforming graph data for statistical relational learning. Journal of Artificial Intelligence Research 45 (2012), 363–441.
Rossi et al. (2018) Ryan A Rossi, Rong Zhou, and Nesreen K Ahmed. 2018. Deep Inductive Graph Representation Learning. IEEE Transactions on Knowledge and Data Engineering 32, 3 (2018), 438–452.
Salganik et al. (2022) Rebecca Salganik, Fernando Diaz, and Golnoosh Farnadi. 2022. Analyzing the Effect of Sampling in GNNs on Individual Fairness. arXiv preprint arXiv:2209.03904 (2022).
Saxena et al. (2021) Akrati Saxena, George Fletcher, and Mykola Pechenizkiy. 2021. HM-EIICT: Fairness-aware link prediction in complex networks using community information. Journal of Combinatorial Optimization (2021), 1–18.
Singer and Radinsky (2022) Uriel Singer and Kira Radinsky. 2022. EqGNN: Equalized Node Opportunity in Graphs. In Proceedings of the AAAI conference on artificial intelligence, Vol. 36. 8333–8341.
Singh et al. (2021) Aditya Singh, Anubhav Gupta, Hardik Wadhwa, Siddhartha Asthana, and Ankur Arora. 2021. Temporal debiasing using adversarial loss based gnn architecture for crypto fraud detection. In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 391–396.
Song et al. (2022a) Weihao Song, Yushun Dong, Ninghao Liu, and Jundong Li. 2022a. Guide: Group equality informed individual fairness in graph neural networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1625–1634.
Song et al. (2022b) Zixing Song, Yueen Ma, and Irwin King. 2022b. Individual Fairness in Dynamic Financial Networks. In NeurIPS 2022 Workshop: New Frontiers in Graph Learning.
Spinelli et al. (2021) Indro Spinelli, Simone Scardapane, Amir Hussain, and Aurelio Uncini. 2021. Fairdrop: Biased edge dropout for enhancing fairness in graph representation learning. IEEE Transactions on Artificial Intelligence 3, 3 (2021), 344–354.
Tang et al. (2020) Xianfeng Tang, Huaxiu Yao, Yiwei Sun, Yiqi Wang, Jiliang Tang, Charu Aggarwal, Prasenjit Mitra, and Suhang Wang. 2020. Investigating and mitigating degree-related biases in graph convolutional networks. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1435–1444.
Tsioutsiouliklis et al. (2021) Sotiris Tsioutsiouliklis, Evaggelia Pitoura, Panayiotis Tsaparas, Ilias Kleftakis, and Nikos Mamoulis. 2021. Fairness-aware pagerank. In Proceedings of the Web Conference 2021. 3815–3826.
Vannur et al. (2021) Lingraj S Vannur, Balaji Ganesan, Lokesh Nagalapatti, Hima Patel, and MN Tippeswamy. 2021. Data augmentation for fairness in personal knowledge base population. In Trends and Applications in Knowledge Discovery and Data Mining: PAKDD Workshops. Springer, 143–152.
Wang et al. (2022b) Nan Wang, Lu Lin, Jundong Li, and Hongning Wang. 2022b. Unbiased graph embedding with biased graph observations. In Proceedings of the ACM Web Conference 2022. 1423–1433.
Wang et al. (2023) Xuemin Wang, Tianlong Gu, Xuguang Bao, and Liang Chang. 2023. Fair and Privacy-Preserving Graph Neural Network. In International Conference on Database Systems for Advanced Applications. Springer, 731–735.
Wang et al. (2022a) Yihe Wang, Mohammad Mahdi Khalili, and Xiang Zhang. 2022a. Towards Fair Representation Learning in Knowledge Graph with Stable Adversarial Debiasing. In 2022 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 901–909.
Watts and Strogatz (1998) Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of ‘small-world’networks. nature 393, 6684 (1998), 440–442.
Wei et al. (2022) Tianxin Wei, Yuning You, Tianlong Chen, Yang Shen, Jingrui He, and Zhangyang Wang. 2022. Augmentations in Hypergraph Contrastive Learning: Fabricated and Generative. https://doi.org/10.48550/ARXIV.2210.03801
Wu et al. (2022) Kun Wu, Jacob Erickson, Wendy Hui Wang, and Yue Ning. 2022. Equipping Recommender Systems with Individual Fairness via Second-order Proximity Embedding. In 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 171–175.
Wu et al. (2021) Le Wu, Lei Chen, Pengyang Shao, Richang Hong, Xiting Wang, and Meng Wang. 2021. Learning fair representations for recommendation: A graph-based perspective. In Proceedings of the Web Conference 2021. 2198–2208.
Wu et al. (2023) Yao Wu, Jian Cao, and Guandong Xu. 2023. FASTER: A Dynamic Fairness-assurance Strategy for Session-based Recommender Systems. ACM Transactions on Information Systems (2023).
Wu et al. (2020) Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems 32, 1 (2020), 4–24.
Xu et al. (2023a) Can Xu, Yin Zhang, Hongyang Chen, Ligang Dong, and Weigang Wang. 2023a. A fairness-aware graph contrastive learning recommender framework for social tagging systems. Information Sciences 640 (2023), 119064.
Xu et al. (2021) Jiahui Xu, Ling Chen, Mingqi Lv, Chaoqun Zhan, Sanjian Chen, and Jian Chang. 2021. HighAir: A hierarchical graph neural network-based air quality forecasting method. arXiv preprint arXiv:2101.04264 (2021).
Xu et al. (2023b) Paiheng Xu, Yuhang Zhou, Bang An, Wei Ai, and Furong Huang. 2023b. GFairHint: Improving Individual Fairness for Graph Neural Networks via Fairness Hint. arXiv preprint arXiv:2305.15622 (2023).
Yang et al. (2022) Moyi Yang, Junjie Sheng, Xiangfeng Wang, Wenyan Liu, Bo Jin, Jun Wang, and Hongyuan Zha. 2022. Obtaining Dyadic Fairness by Optimal Transport. arXiv preprint arXiv:2202.04520 (2022).
Zeng et al. (2021) Ziqian Zeng, Rashidul Islam, Kamrun Naher Keya, James Foulds, Yangqiu Song, and Shimei Pan. 2021. Fair representation learning for heterogeneous information networks. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 15. 877–887.
Zhang et al. (2023) He Zhang, Xingliang Yuan, Quoc Viet Hung Nguyen, and Shirui Pan. 2023. On the interaction between node fairness and edge privacy in graph neural networks. arXiv preprint arXiv:2301.12951 (2023).
Zhang et al. (2022) Wenbin Zhang, Jeremy C Weiss, Shuigeng Zhou, and Toby Walsh. 2022. Fairness amidst non-iid graph data: A literature review. arXiv preprint arXiv:2202.07170 (2022).
Zhu et al. (2023) Huaisheng Zhu, Guoji Fu, Zhimeng Guo, Zhiwei Zhang, Teng Xiao, and Suhang Wang. 2023. Fairness-aware Message Passing for Graph Neural Networks. arXiv preprint arXiv:2306.11132 (2023).