Reinforced Causal Explainer for Graph Neural Networks

Xiang Wang, Yingxin Wu, An Zhang, Fuli Feng, Xiangnan He, Tat-Seng Chua X. Wang, Y. Wu, F. Feng, X. He are with University of Science and Technology of China; CCCD Key Lab of Ministry of Culture and Tourism. E-mail: [email protected], [email protected], [email protected], [email protected].
A. Zhang and T. Chua are with National University of Singapore. E-mail: [email protected], [email protected]. A. Zhang is the corresponding author.

Abstract

Explainability is crucial for probing graph neural networks (GNNs), answering questions like “Why the GNN model makes a certain prediction?”. Feature attribution is a prevalent technique of highlighting the explanatory subgraph in the input graph, which plausibly leads the GNN model to make its prediction. Various attribution methods have been proposed to exploit gradient-like or attention scores as the attributions of edges, then select the salient edges with top attribution scores as the explanation. However, most of these works make an untenable assumption — the selected edges are linearly independent — thus leaving the dependencies among edges largely unexplored, especially their coalition effect. We demonstrate unambiguous drawbacks of this assumption — making the explanatory subgraph unfaithful and verbose. To address this challenge, we propose a reinforcement learning agent, Reinforced Causal Explainer (RC-Explainer). It frames the explanation task as a sequential decision process — an explanatory subgraph is successively constructed by adding a salient edge to connect the previously selected subgraph. Technically, its policy network predicts the action of edge addition, and gets a reward that quantifies the action’s causal effect on the prediction. Such reward accounts for the dependency of the newly-added edge and the previously-added edges, thus reflecting whether they collaborate together and form a coalition to pursue better explanations. It is trained via policy gradient to optimize the reward stream of edge sequences. As such, RC-Explainer is able to generate faithful and concise explanations, and has a better generalization power to unseen graphs. When explaining different GNNs on three graph classification datasets, RC-Explainer achieves better or comparable performance to state-of-the-art approaches w.r.t. two quantitative metrics: predictive accuracy, contrastivity, and safely passes sanity checks and visual inspections. Codes and datasets are available at https://github.com/xiangwang1223/reinforced_causal_explainer.

Index Terms:

Graph Neural Networks, Feature Attribution, Explainable Methods, Cause-Effect.

1 Introduction

Graph neural networks (GNNs) [1, 2] have exhibited impressive performance in a variety of tasks, where the data are graph-structured. Their success comes mainly from the powerful representation learning, which incorporates graph structure in an end-to-end fashion. Alongside performance, explainability becomes central to the practical impact of GNNs, especially in real-world applications on fairness, security, and robustness [3, 4, 5]. Aiming to answer questions like “Why the GNN model made a certain prediction?”, we focus on post-hoc [6], local [7], model-agnostic [8] explainability — that is, considering the target GNN model as a black-box (i.e., post-hoc), an explainer interprets its predictions of individual instances (i.e., local), which is applicable to any GNNs (i.e., model-agnostic). Towards this end, a prevalent paradigm is feature attribution and selection [9, 10, 11, 12]. Typically, given an input graph, it distributes the model’s outcome prediction to the input features (i.e., edges), and then selects the salient substructure (i.e., the subset of edges) as an explanatory subgraph. Such an explanatory subgraph is expected to provide insight into the model workings.

Towards high-quality feature attribution, it is essential to uncover the relationships between the input features and outcome predictions underlying the GNN model. Most existing explainers reveal the relationships via (1) gradient-like signals w.r.t. edges [13, 14, 15], which are obtained by backpropagating the model outcome to the graph structure; (2) masks or attention scores of edges [16, 17, 18], which are derived from the masking functions or attention networks to approximate the target prediction via the fractional (masked or attentive) graph; or (3) prediction changes on perturbed edges [19, 20, 4], which are fetched by perturbing the graph structure, such as leaving subgraphs out and auditing the outcome change [19] or computing the Shapley values [4]. Subsequently, the subset of edges with top attributions composes an explanatory subgraph most influential to the model’s decision.

Nonetheless, we argue that these explainers are prone to generating suboptimal explanations since two key factors remain largely unexplored:

•

Causal effects of edges. It is crucial to specify the edges that are the plausible causation of the model outcome, rather than the edges irrelevant or spuriously correlated with the outcome [21, 22]. However, as the explainers using gradient- and attention-like scores typically approach the input-outcome relationships from an associational standpoint, they hardly distinguish causal from noncausal effects of edges. Take Figure 1 as a running example, where SA [13] and GNNExplainer [16] explain why the GIN model [23] predicts the molecule graph as mutagenic. As the nitrogen-carbon (N-C) bond often connects with the nitro group (NO₂), it is spuriously correlated with the mutagenic property, thus ranked top by SA. Feeding such spurious edges solely into the model, however, hardly recovers the target prediction, thus unfaithfully reflect the model’s decision.
•

Dependencies among edges. Most explainers ignore the edge dependencies when probing the edge attributions and constructing the explanatory subgraph. One key reason is that they draw attributions of edges independently [24, 19, 25]. In fact, edges usually collaborate and cooperate with other edges to approach the decision boundary instead. Such highly-dependent edges form a coalition that can frame a prototype memorized in the model to make decisions. Considering SA’s explanation, compared to the individual N-C bond, its combination with the carbon-carbon (C=C) bond brings no unique information about the model prediction, as only a marginal improvement on predictive accuracy is achieved¹¹1When removing the counterparts, the single N-C bond and its combination with the C=C bond are predicted as mutagenic with probability of 0.31 and 0.35, respectively.. In contrast, two nitrogen-oxygen (N=O) bonds form a nitro group (NO₂), which is a typical coalition responsible for the mutagenic property [3] and purses a higher influence on the prediction than individuals²²2The probability of being mutagenic increases from 0.72 (feeding the first N=O bond solely into GIN) to 0.95 (feeding the first two N=O bonds together into GIN).. Clearly, the N=O bonds within NO₂ ought to be better post-hoc explanations.

Refer to caption — Figure 1: A real example of explaining the mutagenicity classification of a molecule graph. (a-c) show explanations of SA, GNNExplainer, and RC-Explainer, respectively, where the important edges are highlighted with red color and the top-3 edges are listed. Best viewed in color.

In this work, we explore the edges’ causal effects and dependencies, in order to generate explanations that are faithful to the model’s decision-making process and consistent to human cognition. Indeed, it is challenging but can be solved after we introduce the screening strategy [26] equipped with causality [27]. Specifically, the screening strategy frames the explanation task as sequentially adding edges — that is, it starts from an empty set as the explanatory subgraph, and incrementally adds edges into the subgraph, one edge at a time step. During the screening, causality gifts us the manner to assess the dependencies of the previously added edges and the edge candidate being selected, answering the “What would happen to the prediction, if we add this edge into the GNN’s input?” question. The basis is doing causal attribution on the edge candidate as its individual causal effect (ICE) [27, 28], given the previous selections. Formally, it compares the model outcomes under treatment (i.e., the GNN takes the coalition of the edge and the previous selection as input) and control (i.e., the GNN tasks the previous selection only). Positive causal attribution indicates the edge coalition offers unique information strongly relevant to the prediction; otherwise, the edge is redundant or irrelevant.

Towards this end, we propose a reinforcement learning (RL) agent, Reinforced Causal Explainer (RC-Explainer), to achieve the causal screening strategy. The insight is that the RL agent with the stochastic policy can determine automatically where to search, given the uncertainty information of the learned policy, which can be updated promptly by the stream of reward signals. Technically, RC-Explainer uses a simple neural network as the policy network to learn the probability of edge candidates being selected, and then select one potential edge as the action, at each step. Such a sequence of edges forms the policy, and gets rewards that consist of the causal attributions of each compositional edge and subgraphs. As such, we can to exhibit the dependencies of explained edges, and highlight the influence of edge coalition. We resort to policy gradient to optimize the policy network. With a global understanding of the GNN, our RC-Explainer is able to offer model-level explanations for each graph instance, and generalize to unseen graphs. Experiments on three datasets showcase the effectiveness of our explanations, which are more consistent, concise, and faithful to the predictions compared with existing methods.

Contributions of this work are summarized as follows:

•

We emphasize the importance of edges’ causal effects and dependencies to reveal the edge attributions and build the explanatory subgraph, so as to interpret the GNN predictions faithfully and concisely.
•

We frame the explanation as a sequential decision process and develop an RL agent, RC-Explainer, to take the cause-effect look at the edge dependencies.
•

We conduct extensive experiments on three datasets, showing the effectiveness of our RC-Explainer w.r.t. predictive accuracy, contrastivity, sanity checks, and visual inspections.

2 Related Work

In the literature of explainers, there are many dichotomies approaching explanations — between post-hoc and intrinsic [6, 29, 30] (i.e., the target model is explained post hoc by an additional explainer method or is inherently interpretable), between local and global [7] (i.e., the explainer offers explanations for data instances individually or holistically), between model-agnostic and model-specific [8] (i.e., the explainer is comparable across model types or customized for a specific model). We focus on post-hoc, local, and model-agnostic explainability in this work.

2.1 Explainability in Non-Graph Neural Networks

As a prevalent technique, feature attribution [9, 10, 11, 12] has shown great potential to generate post-hoc explanations for neural networks, especially convolutional neural networks (CNNs). In general, the existing attribution methods roughly fall into three groups:

•

One research line [31, 9, 32, 10, 8] decomposes the model outcome to the input features via backpropagation, such that the gradient-like signals are viewed as the importance of input features. For example, Gradient [31] uses pure gradients w.r.t. input features. Grad-CAM [9] additionally leverages the layer-wise context to improve gradients.
•

Another research line [33, 34] introduces the trainable attention or masking networks and generates attention scores on input features. The networks are trained to approximate the decision boundary of the model via the attentive or masked features. For example, L2X [33] learns to generate feature masks, with the aim of maximizing the mutual information between the masked features and the target predictions.
•

Some works [7, 20] perform the input perturbations and monitor the changes on model behaviors (e.g., prediction, loss), so as to reveal the input-outcome relationships. The basic idea is that the model outcomes are highly likely to significantly change if essential features are occluded. For example, CXPlain [20] learns the marginal effect of a feature by occluding it.

2.2 Explainability in Graph Neural Networks

Compared to the extensive studies in CNNs, explainability in GNNs is less explored and remains a challenging open problem. Inspired by the methods devised for CNNs, recent works explaining GNNs by:

•

Gradient-like signals w.r.t. the graph structure [13, 14, 15]: For example, SA [13] adopts gradients of GNN’s loss w.r.t. adjacency matrix as edge scores, while Grad-CAM [14] is extended on GNNs.
•

Masks or attention scores of structural features [16, 17, 18]: The basic idea is to maximize the mutual information between the fractional (attentive) graph and the target prediction. For instance, GNNExplainer [16] customizes masks on the adjacency matrix for each graph independently. Later, PGExplainer [17] trains a neural network to generate masks for multiple graphs collectively. More recently, ReFine [18] first pre-trains an attention network over class-wise graphs to latch on the global explanation view, and then fine-tunes the local explanations for individual graphs.
•

Prediction changes on structure perturbations [20, 19, 35, 4]: To get the node attributions, PGM-Explainer [19] applies random perturbations on nodes and learns a Bayesian network upon the perturbation-prediction data to identify significant nodes. More recently, SubgraphX [4] employs the Monte Carlo tree search algorithm to explore different subgraphs and uses Shapley values to measure each subgraph’s importance.

Note that XGNN [3] focuses on model-level explanations, rather than local explanations for individual predictions. Moreover, it fails to preserve the local fidelity of individual graphs, as its explanation may not be a substructure existing in the input graph. In contrast, our RC-Explainer explains each graph with a global view of the GNN model, which can preserve the local fidelity.

As suggested in the previous work [19], most of these works assume that the features are attributed independently, but ignore their dependencies, especially the coalition effect. Our work differs from them — we reformulate the explanation generation as a sequential decision process, accounting for the edge relationships and causal effects at each step, towards more faithful and concise explanations.

3 Preliminaries

We first summarize the background of GNNs, and then describe the task of generating local, post-hoc explanations.

3.1 Background of GNNs

Let represent a graph-structured data instance as $\mathcal{G}=\{e|e\in\mathcal{E}\}$ , where one edge $e=(v,u)\in\mathcal{E}$ involves two nodes $v$ and $u\in\mathcal{V}$ , to highlight the structural features (i.e., the presence of an edge and its endpoints). Typically, each node $v$ is assigned with a $d$ -dimensional feature $\textbf{x}_{v}\in\mathbb{R}^{d}$ .

Various GNNs [36, 1, 37, 2] have been proposed to incorporate such structural features into the representation learning in an end-to-end fashion, so as to facilitate the downstream prediction tasks. Following the supervised learning paradigm, we can systematize the GNN model $f$ as a combination of two modules $f_{2}\circ f_{1}$ , where $f_{1}$ is the encoder to generate representations and $f_{2}$ is the predictor to output the predictions. Clearly, representation learning in the encoder is at the core of GNNs, which typically involves two crucial components:

•

Representation aggregation, which distills the vectorized information from the adjacent nodes to update the representations of ego nodes recursively:

	$\displaystyle\textbf{a}^{(l)}_{v}$	$\displaystyle=\text{AGGREGATE}^{(l)}(\{\textbf{z}^{(l-1)}_{u}\|u\in\mathcal{N}_{v}\}),$
	$\displaystyle\textbf{z}^{(l)}_{v}$	$\displaystyle=\text{UPDATE}^{(l)}(\textbf{z}^{(l-1)}_{v},\textbf{a}^{(l)}_{v}),$		(1)

where $\textbf{a}^{(l)}_{v}$ is the aggregation of information propagated from $v$ ’s neighbors $\mathcal{N}_{v}=\{u|(v,u)\in\mathcal{E}\}$ ; $\textbf{z}^{(0)}_{v}$ is initialized with $\textbf{x}_{v}$ , and $\textbf{z}^{(l)}_{v}$ is $v$ ’s representation after $l$ layers; $\text{AGGREGATE}^{(l)}(\cdot)$ and $\text{UPDATE}^{(l)}(\cdot)$ denote the aggregation and update functions, respectively.

•

Representation readout, which finalizes the representations of node $v$ or graph $\mathcal{G}$ for the prediction tasks:

	$\displaystyle\textbf{z}_{v}$	$\displaystyle=\text{EDGE-READOUT}(\{\textbf{z}^{(l)}_{v}\|l=[0,\cdots,L]\}),$		(2)
	$\displaystyle\textbf{z}_{\mathcal{G}}$	$\displaystyle=\text{GRAPH-READOUT}(\{\textbf{z}_{v}\|v\in\mathcal{V}\}),$		(3)

where $\textbf{z}_{v}$ is used for the node-level tasks (e.g., node classification, link prediction), while $\textbf{z}_{\mathcal{G}}$ is used for the graph-level tasks (e.g., graph classification, graph regression, graph matching); $\text{EDGE-READOUT}(\cdot)$ and $\text{GRAPH-READOUT}(\cdot)$ represent readout functions of nodes and graphs, respectively.

Having obtained the powerful representations, the predictor performs predictions, such as hiring inner product over two node representations to do link prediction, or employing neural networks on a single node representation to perform node classification.

Without loss of generality, we consider the graph classification problem in this work. Let $f:\mathbb{G}\rightarrow\{1,\cdots,C\}$ be a trained GNN model, which classifies a graph instance $\mathcal{G}\in\mathbb{G}$ into $C$ classes:

\displaystyle\hat{y}_{c}=f(\mathcal{G})=f_{2}\circ f_{1}(\mathcal{G}),

(4)

where $\hat{y}_{c}$ is the predicted class being explained, which is assigned with the largest probability $p_{\theta}(\hat{y}_{c}|\mathcal{G})$ ; $f$ is parameterized by $\theta$ including parameters of encoder and predictor.

3.2 Task Description

Explainability of GNNs aims to answer the questions like “Given a graph instance $\mathcal{G}$ of interest, what determines the GNN model $f$ to making a certain output $\hat{y}_{c}$ ?”. A prevalent technique to offer local, post-hoc, and model-agnostic explanations is feature attribution [9, 10, 11, 12], which decomposes the prediction to the input features. As such, each input feature of graph is associated with an attribution score to indicate how much it contributes to the prediction.

Formally, the task of an explainer is to derive the top $K$ important edges and construct a faithful explanatory subgraph $\mathcal{G}^{*}_{K}=\{e^{*}_{1},\cdots,e^{*}_{K}\}\subseteq\mathcal{G}$ , such that $\mathcal{G}^{*}_{K}$ offers evidence supporting $f$ ’s prediction $\hat{y}_{c}$ . Wherein, $e^{*}_{k}$ is the edge ranked at the $k$ -th position. In this work, we follow the prior studies [16, 17, 14] and focus mainly on the structural features (i.e., the existence of an edge and its endpoints), leaving the identification of salient content features (i.e., node features) in future work.

4 Methodology

Towards generating an explanatory subgraph, we first frame the attribution of the holistic subgraph from the causality standpoint and emphasize its limitations. We then propose the attribution of the edge sequence (termed causal screening), which sequentially selects one edge by measuring its causal effect on the target prediction and its dependencies on the previously-selected edges. To efficiently achieve the idea of causal screening, we devise a reinforcement learning agent, Reinforced Causal Explainer (RC-Explainer).

4.1 Causal Attribution of A Holistic Subgraph

Formally, we can construct the explanatory subgraph $\mathcal{G}^{*}_{K}$ by maximizing an attribution metric $A(\cdot)$ :

\displaystyle\mathcal{G}^{*}_{K}=\arg\max_{\mathcal{G}_{K}\subseteq\mathcal{G}}A(\mathcal{G}_{K}|\hat{y}_{c}),

(5)

which is optimized over all possible combinations of $K$ edges: $\{\mathcal{G}_{K}|\mathcal{G}_{K}\subseteq\mathcal{G},|\mathcal{G}_{K}|=K\}$ ; $A:\mathbb{G}\rightarrow\mathbb{R}$ measures the contribution of each candidate subgraph $\mathcal{G}_{K}$ to the target prediction $\hat{y}_{c}=f(\mathcal{G})$ .

Towards causal explainability, $A(\mathcal{G}_{K}|\hat{y}_{c})$ needs to quantify the causal effect of $\mathcal{G}_{K}$ . We can directly manipulate the value of the input graph variable and investigate how the model prediction would be going on. Such an operation is the intervention in causal inference [38, 27], which is founded on the $do(\cdot)$ calculus. It cuts off all the incoming links of a variable and forcefully assigns a variable with a certain value, which is no longer affected by its causal parents. For instance, $do(G=\mathcal{G})$ fixes the value of graph variable $G$ as a specific graph instance $\mathcal{G}$ , short for $do(\mathcal{G})$ .

Through interventions, we formulate individual causal effect (ICE) [38, 27] in the attribution function $A(\mathcal{G}_{K}|\hat{y}_{c})$ . In particular, we view the input graph variable $G$ as our control variable and conduct two interventions: $do(G=\mathcal{G}_{K})$ and $do(G=\emptyset)$ , which separately indicate that the input receives treatment (i.e., feeding $\mathcal{G}_{K}$ into the GNN) and control (i.e., feeding uninformative reference into the GNN). The ICE is the difference between potential outcomes under treatment and control: $Y(do(\mathcal{G}_{K}))-Y(do(\emptyset))$ , where $Y$ is the prediction variable. However, this difference does leaves the target prediction $\hat{y}_{c}$ untouched, thus easily distills degenerated explanation from $\mathcal{G}$ . Hence, we introduce a variable $I(G;Y)$ to estimate the mutual information [39, 40] between $G$ and $Y$ , which is defined as:

\displaystyle I(G;Y)=H(Y)-H(Y|G),

where $H(Y)$ and $H(Y|G)$ are the entropy and conditional entropy terms, respectively. Involving the target prediction $\hat{y}_{c}$ , we reformulate the ICE of $\mathcal{G}_{K}$ as:

\displaystyle A(\mathcal{G}_{K}|\hat{y}_{c})=I(do(\mathcal{G}_{K});\hat{y}_{c})-I(do(\emptyset);\hat{y}_{c}),

(6)

where $I(do(\mathcal{G}_{K});\hat{y}_{c})=I(\mathcal{G}_{K};\hat{y}_{c})$ is to quantify the amount of information pertinent to the target prediction $\hat{y}_{c}$ held in the intervened graph $\mathcal{G}_{K}$ ; analogously, $I(do(\emptyset);\hat{y}_{c})=I(\emptyset;\hat{y}_{c})$ .

Limitations. Nonetheless, directly optimizing Equation (5) faces two obstacles: (1) This optimization is generally NP-hard [33, 41], since the constraint $|\mathcal{G}_{K}|=K$ casts the task as a combinatorial optimization problem, where the number of possible subgraphs $\{\mathcal{G}_{K}\subseteq\mathcal{G}\}$ is super-exponential with the number of edges; and (2) it only exhibits the contribution of a subgraph to the prediction holistically. However, highlighting the importance of each componential edge is more preferred than showing a holistic subgraph solely.

4.2 Causal Screening of An Edge Sequence

To address these limitations, we propose the causal screening strategy to assess the causal effect of an edge sequence instead. The basic idea is integrating the screening strategy [26] with the cause-effect estimation of edges. Specifically, the explanatory subgraph starts from the empty set, and incorporates the salient edges incrementally, one edge at a time. Formally, the objective function is:

\displaystyle e^{*}_{k}=\arg\max_{e_{k}\in\mathcal{O}_{k}}A(e_{k}|\mathcal{G}^{*}_{k-1},\hat{y}_{c}),\quad k=1,2,\cdots,K,

(7)

where $\mathcal{G}^{*}_{k-1}=\{e^{*}_{1},\cdots,e^{*}_{k-1}\}$ is the set of the first $(k-1)$ added edges after $(k-1)$ steps, and $\mathcal{G}_{0}=\emptyset$ at the initial step; at the $k$ -th step, $e^{*}_{k}$ is the edge selected from the pool of edge candidates $\mathcal{O}_{k}=\mathcal{G}\setminus\mathcal{G}^{*}_{k-1}$ . Having established $\mathcal{G}^{*}_{k}$ by merging $\mathcal{G}^{*}_{k-1}$ and $e^{*}_{k}$ , we repeat this procedure until finding all top- $K$ edges as the final explanatory subgraph.

At each step, $A(e_{k}|\mathcal{G}^{*}_{k-1},\hat{y}_{c})$ guides the edge selection, which estimates the causal effect of $e_{k}$ , conditioning on the previously selected edges $\mathcal{G}^{*}_{k-1}$ . Specifically, given $\mathcal{G}^{*}_{k-1}$ , we perform two interventions: $do(\mathcal{G}^{*}_{k-1}\cup\{e_{k}\})$ represents that the input graph receives treatment (i.e., the GNN takes the combination of $e_{k}$ and $\mathcal{G}^{*}_{k-1}$ as input); whereas, $do(\mathcal{G}^{*}_{k-1})$ denotes that the input graph is under control (i.e., the GNN takes $\mathcal{G}^{*}_{k-1}$ solely). Afterward, we define the ICE of $e_{k}$ as:

$\displaystyle A(e_{k}$	$\displaystyle\|\mathcal{G}^{*}_{k-1},\hat{y}_{c})$
	$\displaystyle=I(do(\mathcal{G}^{}_{k-1}\cup\{e_{k}\});\hat{y}_{c})-I(do(\mathcal{G}^{}_{k-1});\hat{y}_{c})$
	$\displaystyle=-H(\hat{y}_{c}\|\mathcal{G}^{}_{k-1}\cup\{e_{k}\})+H(\hat{y}_{c}\|\mathcal{G}^{}_{k-1})$
	$\displaystyle=-p_{\theta}(\hat{y}_{c}\|\mathcal{G})\log{\frac{p_{\theta}(\hat{y}_{c}\|\mathcal{G}^{}_{k-1})}{p_{\theta}(\hat{y}_{c}\|\mathcal{G}^{}_{k-1}\cup\{e_{k}\})}},$	(8)

where $H(\hat{y}_{c}|\mathcal{G}^{*}_{k-1})=-p_{\theta}(\hat{y}_{c}|\mathcal{G})\log{p_{\theta}(\hat{y}_{c}|\mathcal{G}^{*}_{k-1})}$ is the term of conditional entropy; analogously, $H(\hat{y}_{c}|\mathcal{G}^{*}_{k-1}\cup\{e_{k}\})=-p_{\theta}(\hat{y}_{c}|\mathcal{G})\log{p_{\theta}(\hat{y}_{c}|\mathcal{G}^{*}_{k-1}\cup\{e_{k}\})}$ ; $p_{\theta}(\hat{y}_{c}|\mathcal{G})$ , $p_{\theta}(\hat{y}_{c}|\mathcal{G}^{*}_{k-1})$ , and $p_{\theta}(\hat{y}_{c}|\mathcal{G}^{*}_{k-1}\cup\{e_{k}\})$ are the prediction probabilities of the target class $\hat{y}_{c}$ , when feeding $\mathcal{G}$ , $\mathcal{G}^{*}_{k-1}$ , and $\mathcal{G}^{*}_{k-1}\cup\{e_{k}\}$ into the model $f$ , respectively. As a result, Equation (4.2) explicitly assesses $e_{k}$ ’s interactions with the previously added edges. One positive influence will be assigned with $e_{k}$ , if it can collaborate with $\mathcal{G}^{*}_{k-1}$ and pursue a prediction more similar with $\hat{y}_{c}$ ; otherwise, a negative influence indicates that $e_{k}$ is not suitable to participate in the interpretation at step $k$ . Hence, it allows us to answer causality-related questions like “Given the previously selected edges, what is the causal effect of an edge on the target prediction?”.

One straightforward solution to optimizing Equation (7) is through greedy sequential exhaustive search. One step consists of first calculating the ICE scores of all edge candidates, and then adding the edge with the most significant score to connect the previously added edges. This step is iteratively repeated for $K$ times. The greedy sequential exhaustive search is at the core of many feature selection algorithms [42], having shown great success.

Limitations. However, there are two inherent limitations in the exhaustive search: (1) This parameter-free approach explains every graph instance individually, hence is insufficient to offer a global understanding of the GNN, such as class-wise explanations [17, 18]. (2) The computational complexity of interpreting a graph is $O(2(|\mathcal{G}|-K)\times K/2))$ . This will be a bottleneck when explaining large-scale graphs that involve massive edges like social networks. As a result, these limitations hinder the exhaustive search from being efficiently and widely used in real-world scenarios.

4.3 Reinforced Causal Explainer (RC-Explainer)

To remedy the limitations, we revisit the sequential selection of edges from the viewpoint of reinforcement learning (RL) [43, 44]. We then devise a RL agent, RC-Explainer, whose policy network learns how to conduct the causal screening. As the policy network explores the post-hoc explanations over the population of all training graphs, RC-Explainer is able to systematize the global view of important patterns and hold the global understanding of the model’s workings. Moreover, benefiting from the RL scheme, RC-Explainer can efficiently determine the edge sequence with the significant causal attributions.

4.3.1 Causal Screening as Reinforcement Learning

Following previous work [43], we frame the causal screening process as a Markov Decision Process (MDP) $M=\{\mathcal{S},\mathcal{A},P,R\}$ . Herein, $\mathcal{S}=\{s_{k}\}$ is the set of states abstracting edge sequences during exploration, and $\mathcal{A}=\{a_{k}\}$ is the set of actions, which adds an edge to the current edge sequences at each step. Under the Markov property [43, 45], $P(s_{k}|s_{k-1},a_{k})$ is the transition dynamics that specifies the new edge sequence $s_{k}$ after making an action of edge addition $a_{k}$ on the prior state $s_{k-1}$ . $R(s_{k-1},a_{k})$ is to quantify the reward after making the action $a_{k}$ from the prior state $s_{k-1}$ . As such, the trajectory $(s_{0},a_{1},r_{1},s_{1},\cdots,a_{K},r_{K},s_{K})$ naturally depicts the generation of an edge sequence, where the step-wise reward $r_{K}$ reflects $a_{K}$ ’s causal effect and coalition effect with the previously-added edges. Note that, various graphs with the model predictions describe different environments being explored by the RL agent. Here we elaborate the foregoing key elements for one target graph $\mathcal{G}$ and its prediction $\hat{y}_{c}$ as follows.

State Space. At step $k$ , the state $s_{k}$ indicates the subgraph $\mathcal{G}_{k}$ consisting of the explored edges: $s_{k}=\mathcal{G}_{k}$ , where the initial state $s_{0}=\mathcal{G}_{0}=\emptyset$ . Figure 2 illustrates the changes in varying states.

Action Space. Observing the state $s_{k-1}=\mathcal{G}_{k-1}$ , the available action space $\mathcal{A}_{k}$ is the complement of $\mathcal{G}_{k-1}$ , formally $\mathcal{A}_{k}=\mathcal{G}\setminus\mathcal{G}_{k-1}$ . The RL agent picks up a salient edge $e_{k}$ from $\mathcal{A}_{k}$ and connects it to the previous selection $\mathcal{G}_{k-1}$ . That is, this action of edge addition can be represented as $a_{k}=e_{k}$ for simplicity.

State Transition Dynamics. Having made the action $a_{k}=e_{k}$ at step $k$ , the transition of the state $s_{k}$ is merging $e_{k}$ into the previous state $s_{k-1}$ : $\mathcal{G}_{k}=\mathcal{G}_{k-1}\cup\{e_{k}\}$ .

Reward Design. To measure the quality of the action $a_{k}$ at step $k$ and guide the further explorations, we consider two factors into the reward design:

•

Validity of action $a_{k}=e_{k}$ , which reflects the ICE of $e_{k}$ (cf. Equation (4.2)). A positive attribution suggests that the newly-added edge $e_{k}$ contributes uniquely and positively to the explanation — that is, it not only serves as the plausible causal determinant to the target prediction $\hat{y}_{c}$ , but also cooperates with the previous edges $\mathcal{G}_{k-1}$ as an effective coalition. In contrast, a vanishing (near-to-zero) or negative attribution reveals that the edge is redundant to the previous edges or fails to explain the target prediction.
•

Validity of state $s_{k}=\mathcal{G}_{k}$ , which justifies the predictive ability of the current explanatory subgraph. Here we assign a positive score ( $1$ ) to $\mathcal{G}_{k}$ that can successfully explain the target prediction, while we use a negative score ( $-1$ ) to penalize $\mathcal{G}_{k}$ for the wrong prediction.

In a nutshell, the reward $R(s_{k-1},a_{k})$ for the intervention action $a_{k}=e_{k}$ at state $s_{k-1}=\mathcal{G}_{k-1}$ is formulated as:

\displaystyle R(\mathcal{G}_{k-1},e_{k})=\begin{cases}A(e_{k}|\mathcal{G}_{k-1},\hat{y}_{c})+1,\text{if}~{}f_{\theta}(\mathcal{G}_{k-1}\cup\{e_{k}\})=\hat{y}_{c}\\ A(e_{k}|\mathcal{G}_{k-1},\hat{y}_{c})-1,\text{otherwise.}\end{cases}

4.3.2 Policy Network

Having outlined the causal screening environment, we now present the policy network $q_{\phi}$ of RC-Explainer to explore in the environment. Specifically, it takes the pair of intermediate state $G_{k-1}$ and the target prediction $\hat{y}_{c}$ as the input, and aims to determine the next action $e_{k}$ :

\displaystyle e_{k}\sim q_{\phi}(\mathcal{G}_{k-1},\hat{y}_{c}),

(9)

where $\phi$ summarizes the trainable parameters of the policy network; $e_{k}$ is yielded with the probability $P_{\phi}(e_{k}|\mathcal{G}_{k-1},\hat{y}_{c})$ of being added to the explanatory subgraph.

Representation Learning of Action Candidates. With the space of action candidates $\mathcal{A}_{k}=\mathcal{G}\setminus\mathcal{G}_{k-1}$ , our policy network first learns the representation for each action candidate $a_{k}\in\mathcal{A}_{k}$ . As introduced in Section 4.3.1, $a_{k}$ is performed on the edge $e_{k}$ , whose two endpoints are denoted by $u$ and $v$ , i.e., $e_{k}=(v_{,}u)$ . We employ another GNN model $g$ over the full graph $\mathcal{G}$ to create the node representations of $u$ and $v$ , and then combine them together to obtain the action representation of $e_{k}$ as follows:

\displaystyle\textbf{z}_{e_{k}}=\text{MLP}_{1}([\textbf{z}_{v}||\textbf{z}_{u}||\textbf{x}_{e_{k}}]),

(10)

where $g$ yields $\textbf{z}_{v}\in\mathbb{R}^{d^{\prime}}$ and $\textbf{z}_{u}\in\mathbb{R}^{d^{\prime}}$ to separately represent $v$ and $u$ (cf. Equation (3)); $\textbf{x}_{e}\in\mathbb{R}^{d_{2}}$ is the pre-existing feature of $e_{k}$ , which can be ignored when no feature is available; $\cdot||\cdot$ is the concatenation operator. We use a MLP with one hidden layer $\textbf{W}^{(2)}\sigma(\textbf{W}^{(1)}[\textbf{z}_{v}||\textbf{z}_{u}||\textbf{x}_{e_{k}}])$ to obtain $\textbf{z}_{e_{k}}\in\mathbb{R}^{d^{\prime\prime}}$ , where $\sigma$ is a ReLU nonlinearity. Note that the model parameters $\mu$ of $g$ are trainable, while the model parameters $\theta$ of the target GNN model $f$ are fixed.

Selection of Action. Having established the representations of action candidates, we aim to select one action from the space and perform it. Instead of trying candidates exhaustively, the policy network learns the importance of making an action $a_{k}=e_{k}$ to the current state $s_{k-1}=\mathcal{G}_{k-1}$ :

\displaystyle p_{e_{k}}=\text{MLP}_{2,c}([\textbf{z}_{e_{k}}||\textbf{z}_{\mathcal{G}_{k-1}}]),

(11)

where $\textbf{z}_{\mathcal{G}_{k-1}}$ is the representation of the current explanatory subgraph $\mathcal{G}_{k-1}$ , which aggregates the representations of its compositional nodes via Equation (3). We use a class-specific MLP with one hidden layer $\textbf{W}^{(4,c)}\sigma(\textbf{W}^{(3)}[\textbf{z}_{e_{k}}||\textbf{z}_{\mathcal{G}_{k-1}}])$ to get the scalar $p_{e_{k}}$ , where $c$ corresponds to the target class $\hat{y}_{c}$ .

Thereafter, we apply a softmax function over all action candidates $\mathcal{A}_{k}$ to convert their importance scores into the probability distribution. Formally, the probability of $e_{k}$ being selected as the action is as:

\displaystyle P_{\phi}(e_{k}|\mathcal{G}_{k-1},\hat{y}_{c})=\text{SOFTMAX}_{\mathcal{A}_{k}}(p_{e_{k}}),

(12)

where $\phi$ collects parameters of $g_{\mu}$ , MLP₁, and $\{\text{MLP}_{2,c}\}_{c=1}^{C}$ . It is worth mentioning that the class-wise MLPs latch on class-wise knowledge across the training graphs with the same prediction.

4.3.3 Policy Gradient Training

However, SGD cannot be directly used for the optimization, since the discrete sampling within the policy network blocks gradients. To solve this issue, we adopt the policy gradient-based training framework, REINFORCE [46, 43], to optimize the policy network as follows:

\displaystyle\max_{\phi}\mathbb{E}_{\mathcal{G}\in\mathcal{O}}\mathbb{E}_{k}[R(\mathcal{G}_{k-1},e_{k})\log{P_{\phi}(e_{k}|\mathcal{G}_{k-1},\hat{y}_{c})}].

(13)

Obviously, this training framework encourages the actions with large rewards — sequentially finding the edges with the salient causal attributions to construct the explanatory subgraph.

As a result, our RC-Explainer inherits the desired characteristics of causal screening, which considers the causal effects of edges and the dependencies among edges. Furthermore, RC-Explainer is optimized on all graphs in the training set; thus, it holds a global view of the target model’s inner workings and achieves better generalization ability to generate explanations for unseen graph instances.

4.4 Discussion

4.4.1 Time Complexity

When explaining one graph $\mathcal{G}$ , the time cost mainly comes from the two components: (1) the representation learning of node representations, (2) the step-wise representation learning of edge candidates and action prediction. Specifically, using a trainable GNN model $g_{\mu}$ to generate node representations has computational complexity $O(\sum_{l=1}^{L}|\mathcal{G}|\times d_{l}\times d_{l-1})$ , where $d_{l}$ is the representation dimension at $l$ -th layer. At each step $k$ , the time cost is $O(|\mathcal{A}_{k}|\times 2d^{\prime}\times d^{\prime\prime})$ for creating representations of edge candidates, while being $O(|\mathcal{A}_{k}|\times d^{\prime\prime})$ for predicting one action. As a result, the cost of generating the explanation $\mathcal{G}_{K}$ is $O(\sum_{k=1}^{K}|\mathcal{A}_{k}|\times(2d^{\prime}\times d^{\prime\prime}+d^{\prime\prime}))$ . In total, the time complexity of the whole training episode is $O(\sum_{\mathcal{G}\in\mathcal{O}}(\sum_{l=1}^{L}|\mathcal{G}|\times d_{l}\times d_{l-1}+\sum_{k=1}^{K}|\mathcal{A}_{k}|\times(2d^{\prime}\times d^{\prime\prime}+d^{\prime\prime})))$ , where $\mathcal{O}$ is the training set.

4.4.2 Potential Limitations

We list two potential limitations RC-Explainer might face, expensive time complexity and out-of-distribution issue:

•

Although RC-Explainer benefits from reinforcement learning and is more efficient than the exhaustive search, it might still suffer from the high computational cost, when generating the sampling probabilities over the large action space. See Section 5.2.4 for the evaluations w.r.t. empirical time complexity. It limits RC-Explainer’s development on large scale graphs. Hence we leave the pruning of action space and the improvement of scalability in future work, such as pre-judging the probability of edges being selected and extracting a small set of edges as the action candidates.
•

The causal attributions of holistic subgraph (cf. Equation (6)) and edge sequence (cf. Equation (4.2)) are founded upon on the feature removal principle [47] — that is, the complement of the subgraph is discarded, while the target model takes the subgraph as the input only. Nonetheless, feature removal makes subgraphs off the distribution of the full graphs, thus causing the out-of-distribution (OOD) issue. The OOD subgraph hardly follows the degree distribution [48], graph sizes [49] or possibly violates constraints [45]. As a result, the OOD issue could bring a spurious correlation between the true importance of subgraph and the model prediction, making the ICE estimation unfaithful and unreliable. See Section 5.2.5 for the failure cases, where the explanations suffer from the OOD effect. We will explore the solution to the OOD issue in future work, such as using the counterfactual generation to fulfil the subgraph and make it conform to the original distribution

5 Experiments

To demonstrate the effectiveness of RC-Explainer, we investigate the following research question: Can RC-Explainer provide more reasonable explanations than the state-of-the-art explainer methods?

TABLE I: Dataset statistics with model configurations.

	Mutagenicity	REDDIT-MULTI-5K	Visual-Genome
Graphs#	4,337	4,999	4,443
Classes#	2	5	5
Avg. Nodes#	30.32	508.52	35.32
Avg. Edges#	30.77	594.87	18.04
Target GNNs	GIN	k-GNN	APPNP
Layers#	2	3	2
Accuracy	0.806	0.644	0.640

5.1 Experimental Settings

5.1.1 Dataset Description.

We use three benchmark datasets in the experiments:

•

Mutagenicity [50, 51] has $4,377$ molecule graphs, where each graph is labeled with one of two labels: mutagenic and non-mutagenic, based on the mutagenic effect on a bacterium. We trained a GIN model [23, 52] to perform the binary classification.
•

REDDIT-MULTI-5K [53] has $4,999$ social networks labeled with five different classes to indicate the topics of question/answer communities. Upon them, we trained a k-GNN model [54] as a classifier.
•

Visual Genome [55] is a collection of images. Each image is coupled with a scene graph, where nodes are the objects with bounding boxes and edges are the relations between objects. Following the previous work [14], we extract $4,443$ (images, scene graphs) pairs covering five classes: stadium, street, farm, surfing, forest. A GNN model, APPNP [56], is trained as a classifier, where the regional images are treated as the node features.

Table I shows the statistics of datasets with the configurations of GNN models. We follow prior studies [14, 16] and partition each dataset into training, validation, and testing sets with the ratio of $80\%$ : $10\%$ : $10\%$ . Having obtained the trained GNNs on the training sets, we train the parametric explainers on the training sets, and tune their hyper-parameters on the validation sets. We run the explainers five times on the testing sets to report the average results.

5.1.2 Evaluation Metrics.

It is of crucial importance to evaluate the explanations quantitatively. However, the ground-truth knowledge of explanations is usually unavailable in real-world datasets. Hence, many prior works [33, 57, 58, 12] have proposed some metrics to assess the explanations without the ground truth. Here we adopt two widely-adopted metrics:

•

Predictive Accuracy [33, 57] (ACC@ $\mu$ ) is to measure whether using the explanatory subgraph can successfully recover the target prediction:

$\displaystyle\text{ACC($\mu$)}=\mathbb{E}_{\mathcal{G}\sim\mathbb{G}}[\mathbb{I}(f(\mathcal{G}),f(\mathcal{G}^{*}_{K}))],$ (14)

where $\mu$ is the selection ratio (e.g., $5\%$ ), $K=\lceil\mu\times|\mathcal{G}|\rceil$ is the size of explanatory subgraph; $\mathbb{I}(\cdot)$ is the indicator function to check whether $f(\mathcal{G})$ equals to $f(\mathcal{G}^{*}_{K})$ . Moreover, we also report the ACC curve over different selection ratios $[0.1,0.2,\cdots,1.0]$ and denote the area under curve as ACC-AUC.

•

Contrastivity [14] (CST) quantifies the invariance between class-discriminative explanations. It is built upon the intuition that a reasonable method should yield differing explanations, in response to the target prediction changes. Here we formulate it as the Spearman rank correlation between class-discriminative edge scores:

\displaystyle\text{CST}=\mathbb{E}_{\mathcal{G}\sim\mathbb{G}}\mathbb{E}_{s\neq\hat{y}}[|\rho(\Phi(\mathcal{G},s),\Phi(\mathcal{G},\hat{y}))|],

(15)

where $s$ means we permute the label $\hat{y}$ of $\mathcal{G}$ being interpreted; $\Phi(\mathcal{G},\hat{y})$ is the attribution scores of all edges; $|\rho(\cdot)|$ is the Spearman rank correlation with the absolute value to measure the invariance between the edge attributions, in response to the label changes.

Moreover, as suggested in the prior study [12], the explanations generated by a reasonable explainer should be dependent on the target model — that is, presenting a faithful understanding of how the target model works. Hence we also conduct sanity check:

•

Sanity Check on model randomization is to compare the attribution scores on the trained GNN $f$ with that on an untrained GNN $\hat{f}$ with randomly-initialized parameters. Similar attributions infer that the explainer is insensitive to the model changes, thus fails to pass the check. Here we frame it as the rank correlation between these two attributions:

\displaystyle\text{SC}=\mathbb{E}_{\mathcal{G}\sim\mathbb{G}}[|\rho(\Phi(\mathcal{G},f(\mathcal{G})),\Phi(\mathcal{G},\tilde{f}(\mathcal{G})))|].

(16)

Similar attributions infer the explainer is insensitive to properties of the model, thus failing to pass the check.

5.1.3 Baselines.

We compare RC-Explainer with the state-of-the-art explainers, covering the gradient-based (SA [13], Grad-CAM[14]), masking-based (GNNExplainer [16]), attention-based (PGExplainer [17]), and perturbation-based (CXPlain [20], PGM-Explainer[19]).

TABLE II: Predictive accuracy of explanations derived from explainers. The best performance is highlighted with ^∗, while the second-best performance is underlined.

		SA	Grad-CAM	GNNExplainer	PGExplainer	CXPlain	PGM-Explainer	RC-Explainer
ACC@ $10\%\uparrow$	Mutagenicity	0.568	0.608	0.607	0.674	0.868	0.603	0.986^∗
	REDDIT-MULTI-5K	0.158	0.158	0.125	0.175	0.279	0.300	0.472^∗
	Visual Genome	0.695	0.894	0.735	0.580	0.578	0.535	0.976^∗
ACC-AUC $\uparrow$	Mutagenicity	0.767	0.764	0.872	0.899	0.886	0.737	0.964^∗
	REDDIT-MULTI-5K	0.362	0.399	0.429	0.412	0.439	0.377	0.521^∗
	Visual Genome	0.829	0.917^∗	0.802	0.757	0.775	0.754	0.901

TABLE III: Other quantitative analyses for explainers w.r.t. contrastivity metrics, sanity check, and time complexity. Symbol

(\cdot)

indicates the rank of RC-Explainer over all methods.

		SA	Grad-CAM	GNNExplainer	PGExplainer	CXPlain	PGM-Explainer	RC-Explainer
CST $\downarrow$	Mutagenicity	0.975	0.768	0.690	0.202	0.587	0.343	0.311⁽²⁾
	REDDIT-MULTI-5K	0.513	0.211	0.945	0.145	0.664	0.061	0.481⁽⁴⁾
	Visual Genome	0.462	0.426	0.417	0.421	0.320	0.403	0.306⁽¹⁾
SC $\downarrow$	Mutagenicity	0.221	0.254	0.124	0.278	0.327	0.597	0.248⁽³⁾
	REDDIT-MULTI-5K	0.183	0.537	0.040	0.123	0.696	0.829	0.465⁽⁴⁾
	Visual Genome	0.321	0.375	0.676	0.831	0.266	0.588	0.309⁽²⁾
Time (per graph)	Mutagenicity	0.011	0.010	2.57	0.026	1.74	1.19	0.680
	REDDIT-MULTI-5K	0.008	0.008	2.14	0.021	13.4	64.2	23.8
	Visual Genome	0.015	0.015	2.71	0.028	1.77	1.58	0.339

5.1.4 Parameter Settings.

To facilitate reproducibility, we release codes and datasets at https://github.com/xiangwang1223/reinforced_causal_explainer, and summarize the hyperparameter settings in the supplementary material. For nonparametric methods (SA and Grad-CAM), we use the codes released by the original papers [13, 14]. For the other parametric methods, we conduct a grid search to confirm the optimal settings for each method. To be more specific, the optimizer is set as Adam, the learning rate is tuned in $\{10^{-3},10^{-2},10^{-1}\}$ , and the weight decay is searched in $\{10^{-5},10^{-4},10^{-3}\}$ . Other model-specific hyperparameters are set as follows: For GNNExplainer, the weight of mutual information is fixed as $0.5$ ; For PGExplainer, the temperature for reparameterization is $0.1$ ; For PGM-Explainer, the number of perturbations is tuned in $\{10,100,1000\}$ with different perturbation modes.

5.2 Evaluation of Explanations

5.2.1 Evaluation w.r.t. Predictive Accuracy

Table II reports the empirical results w.r.t. ACC@ $10\%$ and ACC-AUC, and Figure 3 demonstrates the ACC curves over different selection ratios. We find that:

•

RC-Explainer outperforms the compared baselines by a large margin across all three datasets, when only $10\%$ of edges are selected as the explanatory subgraphs. For example, it achieves significant improvements over the strongest baselines w.r.t. ACC@ $10\%$ by $13.59\%$ and $25.51\%$ in Mutagenicity and REDDIT-MULTI-5K, respectively. This verifies the rationality and effectiveness of RC-Explainer. We ascribe these improvements to two key characteristics of causal screening: (1) Assessing the causal effects of edges enables us to better distinguish the causal relationships from the spurious correlations between input edges and model outputs, thus encouraging causal explainability instead of statistical interpretability; (2) Benefiting from the screening strategy, RC-Explainer can reveal the dependencies of the candidate edges and the previous selections. It allows us to identify and explicit the coalition effect of edges, and reduce the redundancy in explanations.
•

The explanatory subgraphs derived from RC-Explainer are influential to the model predictions. Especially, in Mutagenicity and REDDIT-MULTI-5K, RC-Explainer’s ACC-AUC scores are $0.964$ and $0.901$ , close to the optimal fidelity. This validates that RC-Explainer faithfully reflects the workings of the target GNNs.
•

As Figure 3 shows, increasing the selection ratios might decrease RC-Explainer’s predictive accuracy. This again justifies the rationality of our causal screening: when the causal determinants or coalitions have been selected, adding more noncausal or redundant edges has only a negligible impact on the accuracy.
•

Analyzing the ACC curves and ACC-AUC scores in Visual Genome, we find that RC-Explainer only achieves comparable performance to Grad-CAM. One possible reason is that node features could be more informative about scenes than the existence of edges. Taking Figure 4 as an example, the visual features of the road node might be sufficient to predict the street scene. Hence, Grad-CAM benefits from the context-enhanced gradients that capture the node features thus achieves high-quality explanations. It inspires us to further investigate the cooperation of node feature and graph structure in the explanation generation.
•

In general, the line of causal explainability (i.e., CXPlain, PGM-Explainer, RC-Explainer) performs better than the line of statistical explainability (i.e., SA, Grad-CAM, GNNExplainer, PGExplainer). Because using gradients, masks, or attentions is prone to capturing the spurious input-output correlations and missing the causation. This is consistent with prior studies [20, 19].
•

Within the line of causal explainability, RC-Explainer outperforms CXPlain and PGM-Explainer, suggesting that screening to combine an edge with the previous selection estimates the edges’ causal effects more accurately. This emphasizes the effectiveness of considering edge dependencies.

5.2.2 Evaluation w.r.t. Contrastivity

We move on to the contrastivity reported in Table III to check how explanations differ when the target predictions change. We find that: (1) Over all approaches, RC-Explainer achieves the lowest and the second-lowest CST scores in Visual Genome and Mutagenicity, respectively. This indicates that the explanations of RC-Explainer are class-discriminative, thus offering an understanding of the decision boundary between different classes. For instance, in REDDIT-MULTI-5K, different patterns of user behaviors can describe various topics of communities. See Section 5.2.5 for more supporting pieces of evidence; (2) Jointly inspecting the ACC curves in Figure 3, we observe that RC-Explainer will randomly pick up edges, once the most influential edges have been selected. These randomly-added edges could increase the ranking correlations of class-specific explanations, thus influencing the overall CST scores of RC-Explainer.

5.2.3 Evaluation w.r.t. Sanity Check

We then focus on the sanity checks presented in Table III and find that: (1) RC-Explainer achieves relatively low SC scores and safely passes the checks, indicating that the attributions on the trained and untrained GNNs differ significantly. This evidently shows that our feature attribution is dependent on the target GNN, which is necessary for faithfully interpreting the inner workings of the target model; (2) Jointly analyzing the SC scores across three datasets, we find that RC-Explainer is less sensitive to the target model’s status in Visual Genome. We conjecture that the untrained GNN still works as the kernel function to filter some useful information, thus somehow reflects the edge importance. We leave the justification of possible reasons for future work.

5.2.4 Evaluation w.r.t. Time Complexity

We present the actual runtime of all approaches in Table III, which reports the inference time to generate an explanation. Our findings are: (1) In Mutagenicity and Visual Genome, RC-Explainer computes the attribution scores significantly faster than GNNExplainer and PGM-Explainer. Because the policy network in RC-Explainer is shared across all graph instances, while GNNExplainer needs to retrain the attention network for each instance and PGM-Explainer requires a large number of random perturbations to generate explanations; (2) On REDDIT-MULTI-5K, RC-Explainer is slower on large-scale graphs due to the huge action space. We can solve this issue by pruning the action space, which is left to future work; (3) Gradient-based methods generate much faster than most parametric methods.

5.2.5 Evaluation w.r.t. Visual Inspection

In this section, we conduct the visual inspections of two examples in Mutagenicity and Visual Genome to give an intuitive impression of explanations. As demonstrated in Figure 4, we find that:

•

RC-Explainer is able to capture the edges that plausibly cause the GNN’s prediction. For example, when interpreting the prediction of Street in Visual Genome, it assigns the most convincing edges (car, on, road) and (van, in, street) with the largest attribution scores. Whereas, the other methods easily distribute attention on tree- or human-related edges, as tree and human frequently occur with street. However, using such spurious correlations as causation fails to interpret the prediction reliably. This again verifies the importance of causal interpretability.
•

The sequential decision process enables RC-Explainer to measure the relationships of candidate edges and previous selections, thus identifying the coalition of edges and avoiding the redundant edges. For instance, when explaining the prediction of mutagenic in Mutagenicity, RC-Explainer can highlight the functional groups, such as two NO₂ chemical groups, responsible for the mutagenic property [59]. Whereas, at most one NO₂ is captured by the baselines, as they hardly consider the coalition effect of an edge combination. As for the non-mutagenic molecules, RC-Explainer tends to pick bonds like carbon-hydrogen (C-H). This makes sense because the GNN models will not predict the subgraphs with these chemical bonds as mutagenic.
•

Moreover, we probe the sequential decision process of generating explanations of mutagenic. Figure 5 illustrates the process, with step-wise rewards and probabilities of edges being added. Clearly, at the first two steps, RC-Explainer selects the N=O bonds from two groups; afterward, it completes the NO₂ groups by adding the other N=O bonds. Compared to the influential N=O bonds, others like C-H contribute less to the prediction.
•

We also present two failure cases of RC-Explainer in Figure 6, to show the potential out-of-distribution issue. In these cases of Visual Genome, RC-Explainer first picks up the most convincing edges (light, on, road) and (surfboard, on, water) when $\mu=0.1$ , which can almost recover the target predictions of Street and Surfing. It then chooses (light, on, road) and (footprint, on, sand) when $\mu=0.2$ , since these edges frequently occur with the target scenes. However, the later edges are not only redundant or spurious to introduce noises into the explanations, but also disjoint with the previous selection to make the subgraphs out of the original distribution. Such cases plausibly cause the fluctuations in the accuracy, especially the steep drop between $\mu=0.1$ and $\mu=0.2$ in Figure 3.

6 Conclusion

In this work, we focused on explaining the predictions made by graph neural networks. We framed the explanation as a sequential decision process and proposed a novel reinforcement learning agent, RC-Explainer. It approaches better explanations of GNN predictions by considering the causal effect of edges and dependencies among edges. As such, the policy network of RC-Explainer learns the causal screening strategy to efficiently yield the inﬂuential sequence of edges. We offered extensive experiments to evaluate the explanations from different dimensions, such as predictive accuracy, contrastivity, sanity check, and visual inspections.

Although doing interventions is effective to evaluate the causal effects of graph structures, the intervened substructures might be outside of the training distribution and cause the out-of-distribution issue [29]. In the future, we will exploit counterfactual generation and reasoning to solve this issue. Moreover, we will explore the feature coalitions and interactions from the game-theoretic perspective.

Acknowledgments

This work is supported by the National Key Research and Development Program of China (2020AAA0106000), the National Natural Science Foundation of China (U19A2079, U21B2026).

References

[1] W. L. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in NeurIPS, 2017, pp. 1024–1034.
[2] V. P. Dwivedi, C. K. Joshi, T. Laurent, Y. Bengio, and X. Bresson, “Benchmarking graph neural networks,” CoRR, vol. abs/2003.00982, 2020.
[3] H. Yuan, J. Tang, X. Hu, and S. Ji, “XGNN: towards model-level explanations of graph neural networks,” in KDD, R. Gupta, Y. Liu, J. Tang, and B. A. Prakash, Eds., 2020, pp. 430–438.
[4] H. Yuan, H. Yu, J. Wang, K. Li, and S. Ji, “On explainability of graph neural networks via subgraph explorations,” in ICML, 2021.
[5] J. Zeng, X. Wang, J. Liu, Y. Chen, Z. Liang, T. Chua, and Z. L. Chua, “Shadewatcher: Recommendation-guided cyber threat analysis using system audit records,” in IEEE Symposium on Security and Privacy, 2022.
[6] C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, 2019.
[7] M. T. Ribeiro, S. Singh, and C. Guestrin, “”why should I trust you?”: Explaining the predictions of any classifier,” in KDD, 2016, pp. 1135–1144.
[8] A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagating activation differences,” in ICML, vol. 70, 2017, pp. 3145–3153.
[9] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in ICCV, 2017, pp. 618–626.
[10] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in ICML, vol. 70, 2017, pp. 3319–3328.
[11] A. Chattopadhyay, P. Manupriya, A. Sarkar, and V. N. Balasubramanian, “Neural network attributions: A causal perspective,” in ICML, vol. 97, 2019, pp. 981–990.
[12] M. Ancona, E. Ceolini, C. Öztireli, and M. Gross, “Towards better understanding of gradient-based attribution methods for deep neural networks,” in ICLR, 2018.
[13] F. Baldassarre and H. Azizpour, “Explainability techniques for graph convolutional networks,” CoRR, vol. abs/1905.13686, 2019.
[14] P. E. Pope, S. Kolouri, M. Rostami, C. E. Martin, and H. Hoffmann, “Explainability methods for graph convolutional neural networks,” in CVPR, 2019, pp. 10 772–10 781.
[15] T. Schnake, O. Eberle, J. Lederer, S. Nakajima, K. T. Schutt, K.-R. Muller, and G. Montavon, “Higher-order explanations of graph neural networks via relevant walks.” arXiv, 2020.
[16] Z. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec, “Gnnexplainer: Generating explanations for graph neural networks,” in NeurIPS, 2019, pp. 9240–9251.
[17] D. Luo, W. Cheng, D. Xu, W. Yu, B. Zong, H. Chen, and X. Zhang, “Parameterized explainer for graph neural network,” in NeurIPS, 2020.
[18] X. Wang, Y. Wu, A. Zhang, X. He, and T.-S. Chua, “Towards multi-grained explainability for graph neural networks,” NeurIPS, vol. 34, 2021.
[19] M. N. Vu and M. T. Thai, “Pgm-explainer: Probabilistic graphical model explanations for graph neural networks,” in NeurIPS, 2020.
[20] P. Schwab and W. Karlen, “Cxplain: Causal explanations for model interpretation under uncertainty,” in NeurIPS, 2019, pp. 10 220–10 230.
[21] R. Moraffah, M. Karami, R. Guo, A. Raglin, and H. Liu, “Causal interpretability for machine learning-problems, methods and evaluation,” SIGKDD Explorations, vol. 22, no. 1, pp. 18–33, 2020.
[22] J. Pearl, “Theoretical impediments to machine learning with seven sparks from the causal revolution,” in WSDM, 2018, p. 3.
[23] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in ICLR, 2019.
[24] S. M. Lundberg and S. Lee, “A unified approach to interpreting model predictions,” in NeurIPS, 2017, pp. 4765–4774.
[25] C. Frye, D. de Mijolla, L. Cowton, M. Stanley, and I. Feige, “Shapley-based explainability on the data manifold,” in ICLR, 2021.
[26] C. F. Aliferis, A. Statnikov, I. Tsamardinos, S. Mani, and X. D. Koutsoukos, “Local causal and markov blanket induction for causal discovery and feature selection for classification part i: Algorithms and empirical evaluation.” JMLR, vol. 11, no. 1, 2010.
[27] J. Pearl, Causality. Cambridge university press, 2009.
[28] ——, Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.
[29] Y. Wu, X. Wang, A. Zhang, X. He, and T. seng Chua, “Discovering invariant rationales for graph neural networks,” in ICLR, 2022.
[30] Y. Li, X. Wang, J. Xiao, W. Ji, and T. seng Chua, “Invariant grounding for video question answering,” in CVPR, 2022.
[31] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” in ICLR, 2014.
[32] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PloS one, vol. 10, no. 7, 2015.
[33] J. Chen, L. Song, M. J. Wainwright, and M. I. Jordan, “Learning to explain: An information-theoretic perspective on model interpretation,” in ICML, 2018, pp. 882–891.
[34] S. Bang, P. Xie, H. Lee, W. Wu, and E. Xing, “Explaining a black-box using deep variational information bottleneck approach,” arXiv, 2019.
[35] Q. Huang, M. Yamada, Y. Tian, D. Singh, D. Yin, and Y. Chang, “Graphlime: Local interpretable model explanations for graph neural networks,” CoRR, vol. abs/2001.06216, 2020.
[36] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in ICML, vol. 70, 2017, pp. 1263–1272.
[37] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,” in ICLR, 2018.
[38] J. Pearl, M. Glymour, and N. P. Jewell, Causal inference in statistics: A primer. John Wiley & Sons, 2016.
[39] T. M. Cover and J. A. Thomas, Elements of Information Theory. Wiley, 2001.
[40] M. I. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, Y. Bengio, R. D. Hjelm, and A. C. Courville, “Mutual information neural estimation,” in ICML, vol. 80, 2018, pp. 530–539.
[41] S. Zhu, I. Ng, and Z. Chen, “Causal discovery with reinforcement learning,” in ICLR, 2020.
[42] I. Tsamardinos, C. F. Aliferis, and A. R. Statnikov, “Algorithms for large scale markov blanket discovery,” in FLAIRS, 2003, pp. 376–381.
[43] J. You, B. Liu, Z. Ying, V. S. Pande, and J. Leskovec, “Graph convolutional policy network for goal-directed molecular graph generation,” in NeurIPS, 2018, pp. 6412–6422.
[44] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
[45] Q. Liu, M. Allamanis, M. Brockschmidt, and A. L. Gaunt, “Constrained graph variational autoencoders for molecule design,” in NeurIPS, 2018, pp. 7806–7815.
[46] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in NeurIPS, 1999, pp. 1057–1063.
[47] I. Covert, S. M. Lundberg, and S. Lee, “Feature removal is a unifying principle for model explanation methods,” CoRR, vol. abs/2011.03623, 2020.
[48] J. Leskovec, J. M. Kleinberg, and C. Faloutsos, “Graphs over time: densification laws, shrinking diameters and possible explanations,” in KDD, 2005, pp. 177–187.
[49] B. Bevilacqua, Y. Zhou, and B. Ribeiro, “Size-invariant graph representations for graph classification extrapolations,” in ICML, ser. Proceedings of Machine Learning Research, vol. 139. PMLR, 2021, pp. 837–851.
[50] J. Kazius, R. McGuire, and R. Bursi, “Derivation and validation of toxicophores for mutagenicity prediction,” Journal of medicinal chemistry, vol. 48, no. 1, pp. 312–320, 2005.
[51] K. Riesen and H. Bunke, “Iam graph database repository for graph based pattern recognition and machine learning,” in SPR and SSPR, 2008, pp. 287–297.
[52] W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. S. Pande, and J. Leskovec, “Strategies for pre-training graph neural networks,” in ICLR, 2020.
[53] P. Yanardag and S. V. N. Vishwanathan, “Deep graph kernels,” in KDD, 2015, pp. 1365–1374.
[54] C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe, “Weisfeiler and leman go neural: Higher-order graph neural networks,” in AAAI, 2019, pp. 4602–4609.
[55] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L. Li, D. A. Shamma, M. S. Bernstein, and L. Fei-Fei, “Visual genome: Connecting language and vision using crowdsourced dense image annotations,” IJCV, vol. 123, no. 1, pp. 32–73, 2017.
[56] J. Klicpera, A. Bojchevski, and S. Günnemann, “Predict then propagate: Graph neural networks meet personalized pagerank,” in ICLR, 2019.
[57] J. Liang, B. Bai, Y. Cao, K. Bai, and F. Wang, “Adversarial infidelity learning for model interpretation,” in KDD, 2020, pp. 286–296.
[58] J. Adebayo, J. Gilmer, M. Muelly, I. J. Goodfellow, M. Hardt, and B. Kim, “Sanity checks for saliency maps,” in NeurIPS, 2018, pp. 9525–9536.
[59] A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman, and C. Hansch, “Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity,” Journal of medicinal chemistry, vol. 34, no. 2, pp. 786–797, 1991.
[60] E. Ranjan, S. Sanyal, and P. P. Talukdar, “ASAP: adaptive structure aware pooling for learning hierarchical graph representations,” in AAAI, 2020, pp. 5470–5477.
[61] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in ICLR, 2017.

Xiang Wang is now a professor at the University of Science and Technology of China (USTC). He received his Ph.D. degree from National University of Singapore in 2019. His research interests include recommender systems, graph learning, and explainable deep learning techniques. He has published some academic papers on international conferences such as NeurIPS, ICLR, KDD, WWW, SIGIR. He serves as a program committee member for several top conferences such as KDD, SIGIR, WWW, and IJCAI, and invited reviewer for prestigious journals such as TKDE, TOIS, TNNLS.

TABLE IV: Model architectures of target models and RC-explainer

	Datasets	Mutag	BA3-motif	Reddit-5k
Target Model	Type of GNN	PTGNN [52]	ASAP [60]	GCN [61]
Target Model	#Neurons of GNN	[75,32,75,32,16,2]	[64,64,64,128,3]	[32,32,32,64,5]
RC-Explainer	Type of GNN $g$	PTGNN [52]	ASAP [60]	GCN [61]
	#Neurons of GNN $g$	[75,32,75,32,16,2]	[64,64,64,128,3]	[32,32,32,64,5]
	#Neurons of MLP₁	[64,128,64,32]	[128,256,128,64]	[64,128,64,32]
	#Neurons of MLP_2,c	[32,32,2]	[64,64,3]	[32,32,5]

TABLE V: Hyperparameter settings of RC-explainer

	Datasets	Mutag	BA3-motif	Reddit-5k
RC-Explainer	Learning Rate $lr$	$\{0.00001,\textbf{0.0001},0.001\}$	$\{0.00001,0.0001,\textbf{0.001}\}$	$\{0.00001,\textbf{0.0001},0.001\}$
	Weight Decay $l_{2}$	$\{\textbf{0.00001},0.0001,0.001\}$	$\{0.00001,\textbf{0.0001},0.001\}$	$\{0.00001,\textbf{0.0001},0.001\}$
	Reward Mode	{MI, Binary, CE}	{MI, Binary, CE}	{MI, Binary, CE}
	Beam Search	$\{2,4,\textbf{8},16\}$	$\{2,4,\textbf{8},16\}$	$\{2,4,\textbf{8},16\}$

Appendix A Reproducibility

We have released our codes and datasets at https://github.com/xiangwang1223/reinforced_causal_explainer to ensure the reproducibility.

We summarize the model architectures in Table IV, where the target model is being explained via RC-Explainer. Within RC-Explainer, $g$ is the GNN model to perform the representation learning, MLP₁ is the multilayer perceptron to generate the representations for action candidates, while MLP_2,c is the class-specific multilayer perceptron to yield the importance score of each action candidate.

Moreover, we present the hyperparameter settings of RC-Explainer in Table V, where $\{\cdot\}$ indicates the range of tuning each hyperparameter and the bold numbers are our final settings.

$\displaystyle A(e_{k}$	$\displaystyle\|\mathcal{G}^{*}_{k-1},\hat{y}_{c})$
	$\displaystyle=I(do(\mathcal{G}^{}_{k-1}\cup\{e_{k}\});\hat{y}_{c})-I(do(\mathcal{G}^{}_{k-1});\hat{y}_{c})$
	$\displaystyle=-H(\hat{y}_{c}\|\mathcal{G}^{}_{k-1}\cup\{e_{k}\})+H(\hat{y}_{c}\|\mathcal{G}^{}_{k-1})$
	$\displaystyle=-p_{\theta}(\hat{y}_{c}\|\mathcal{G})\log{\frac{p_{\theta}(\hat{y}_{c}\|\mathcal{G}^{}_{k-1})}{p_{\theta}(\hat{y}_{c}\|\mathcal{G}^{}_{k-1}\cup\{e_{k}\})}},$	(8)