Towards More Practical Adversarial Attacks on Graph Neural Networks

Jiaqi Ma
[email protected]
&Shuangrui Ding ²²footnotemark: 2
[email protected]
&Qiaozhu Mei¹¹footnotemark: 1 ³³footnotemark: 3
[email protected]
School of Information, University of Michigan, Ann Arbor, Michigan, USAEqual contribution.Department of EECS, University of Michigan, Ann Arbor, Michigan, USA

Abstract

We study the black-box attacks on graph neural networks (GNNs) under a novel and realistic constraint: attackers have access to only a subset of nodes in the network, and they can only attack a small number of them. A node selection step is essential under this setup. We demonstrate that the structural inductive biases of GNN models can be an effective source for this type of attacks. Specifically, by exploiting the connection between the backward propagation of GNNs and random walks, we show that the common gradient-based white-box attacks can be generalized to the black-box setting via the connection between the gradient and an importance score similar to PageRank. In practice, we find attacks based on this importance score indeed increase the classification loss by a large margin, but they fail to significantly increase the mis-classification rate. Our theoretical and empirical analyses suggest that there is a discrepancy between the loss and mis-classification rate, as the latter presents a diminishing-return pattern when the number of attacked nodes increases. Therefore, we propose a greedy procedure to correct the importance score that takes into account of the diminishing-return pattern. Experimental results show that the proposed procedure can significantly increase the mis-classification rate of common GNNs on real-world data without access to model parameters nor predictions¹¹1The code for the experiments in this paper is available at https://github.com/Mark12Ding/GNN-Practical-Attack. A follow-up project with improved experiment setup is available at https://github.com/TheaperDeng/GNN-Attack-InfMax..

1 Introduction

Graph neural networks (GNNs) [23], the family of deep learning models on graphs, have shown promising empirical performance on various applications of machine learning to graph data, such as recommender systems [28], social network analysis [12], and drug discovery [17]. Like other deep learning models, GNNs have also been shown to be vulnerable under adversarial attacks [31], which has recently attracted increasing research interest [9]. Indeed, adversarial attacks have been an efficient tool to analyze both the theoretical properties as well as the practical accountability of graph neural networks. As graph data have more complex structures than image or text data, researchers have come up with diverse adversarial attack setups. For example, there are different tasks (node classification and graph classification), assumptions of attacker’s knowledge (white-box, grey-box, and black-box), strategies (node feature modification and graph structure modification), and corresponding budget or other constraints (norm of feature changes or number of edge changes).

Despite these research efforts, there is still a considerable gap between the existing attack setups and the reality. It is unreasonable to assume that an attacker can alter the input of a large proportion of nodes, and even if there is a budget limit, it is unreasonable to assume that they can attack any node as they wish. For example, in a real-world social network, the attackers usually only have access to a few bot accounts, and they are unlikely to be among the top nodes in the network; it is difficult for the attackers to hack and alter the properties of celebrity accounts. Moreover, an attacker usually has limited knowledge about the underling machine learning model used by the platform (e.g., they may roughly know what types of models are used but have no access to the model parameters or training labels). Motivated by the real-world scenario of attacks, in this paper we study a new type of black-box adversarial attack for node classification tasks, which is more restricted and more practical, assuming that the attacker has no access to the model parameters or predictions. Our setup differs from existing work with a novel constraint on node access, where attackers only have access to a subset of nodes in the graph, and they can only manipulate a small number of them.

The proposed black-box adversarial attack requires a two-step procedure: 1) selecting a small subset of nodes to attack under the limits of node access; 2) altering the node attributes or edges under a per-node budget. In this paper, we focus on the first step and study the node selection strategy. The key insight of the proposed strategy lies in the observation that, with no access to the GNN parameters or predictions, the strong structural inductive biases of the GNN models can be exploited as an effective information source of attacks. The structural inductive biases encoded by various neural architectures (e.g., the convolution kernel in convolutional neural networks) play important roles in the success of deep learning models. GNNs have even more explicit structural inductive biases due to the graph structure and their heavy weight sharing design. Theoretical analyses have shown that the understanding of structural inductive biases could lead to better designs of GNN models [26, 11]. From a new perspective, our work demonstrates that such structural inductive biases can turn into security concerns in a black-box attack, as the graph structure is usually exposed to the attackers.

Following this insight, we derive a node selection strategy with a formal analysis of the proposed black-box attack setup. By exploiting the connection between the backward propagation of GNNs and random walks, we first generalize the gradient-norm in a white-box attack into a model-independent importance score similar to the PageRank. In practice, attacking the nodes with high importance scores increases the classification loss significantly but does not generate the same effect on the mis-classification rate. Our theoretical and empirical analyses suggest that such discrepancy is due to the diminishing-return effect of the mis-classification rate. We further propose a greedy correction procedure for calculating the importance scores. Experiments on three real-world benchmark datasets and popular GNN models show that the proposed attack strategy significantly outperforms baseline methods. We summarize our main contributions as follows:

1.

We propose a novel setup of black-box attacks for GNNs with a constraint of limited node access, which is by far the most restricted and practical compared to existing work.
2.

We demonstrate that the structural inductive biases of GNNs can be exploited as an effective information source of black-box adversarial attacks.
3.

We analyze the discrepancy between classification loss and mis-classification rate and propose a practical greedy method of adversarial attacks for node classification tasks.
4.

We empirically verify the effectiveness of the proposed method on three benchmark datasets with popular GNN models.

2 Related Work

2.1 Adversarial Attack on GNNs

The study of adversarial attacks on graph neural networks has surged recently. A taxonomy of existing work has been summarized by Jin et al. [9], and we give a brief introduction here. First, there are two types of machine learning tasks on graphs that are commonly studied, node-level classification and graph-level classification. We focus on the node-level classification in this paper. Next, there are a couple of choices of the attack form. For example, the attack can happen either during model training (poisoning) or during model testing (evasion); the attacker may aim to mislead the prediction on specific nodes (targeted attack) [31] or damage the overall task performance (untargeted attack) [30]; the adversarial perturbation can be done by modifying node features, adding or deleting edges, or injecting new nodes [18]. Our work belongs to untargeted evasion attacks. For the adversarial perturbation, most existing works of untargeted attacks apply global constraints on the proportion of node features or the number of edges to be altered. Our work sets a novel local constraint on node access, which is more realistic in practice: perturbation on top (e.g., celebrity) nodes is prohibited and only a small number of nodes can be perturbed. Finally, depending on the attacker’s knowledge about the GNN model, existing work can be split into three categories: white-box attacks [24, 5, 22] have access to full information about the model, including model parameters, input data, and labels; grey-box attacks [30, 31, 18] have partial information about the model and the exact setups vary in a range; in the most challenging setting, black-box attacks [6, 2, 4] can only access the input data and sometimes the black-box predictions of the model. In this work, we consider an even more strict black-box attack setup, where model predictions are invisible to the attackers. As far as we know, the only existing works that conduct untargeted black-box attacks without access to model predictions are those by Bojchevski and Günnemann [2] and Chang et al. [4]. However both of them require the access to embeddings of nodes, which are prohibited as well in our setup.

2.2 Structural Inductive Bias of GNNs

While having an extremely restricted black-box setup, we demonstrate that effective adversarial attacks are still possible due to the strong and explicit structural inductive biases of GNNs.

Structural inductive biases refer to the structures encoded by various neural architectures, such as the weight sharing mechanisms in convolution kernels of convolutional neural networks, or the gating mechanisms in recurrent neural networks. Such neural architectures have been recognized as a key factor for the success of deep learning models [29], which (partially) motivate some recent developments of neural architecture search [29], Bayesian deep learning [21], Lottery Ticket Hypothesis [7], etc. The natural graph structure and the heavy weight sharing mechanism grant GNN models even more explicit structural inductive biases. Indeed, GNN models have been theoretically shown to share similar behaviours as Weisfeiler-Lehman tests [14, 25] or random walks [26]. On the positive side, such theoretical analyses have led to better GNN model designs [26, 11].

Our work instead studies the negative impact of the structural inductive biases in the context of adversarial attacks: when the graph structure is exposed to the attacker, such structural information can turn into the knowledge source for an attack. While most existing attack strategies more-or-less utilize some structural properties of GNNs, they are utilized in a data-driven manner which requires querying the GNN model, e.g., learning to edit the graph via a trial-and-error interaction with the GNN model [6]. We formally establish connections between the structural properties and attack strategies without any queries to the GNN model.

3 Principled Black-Box Attack Strategies with Limited Node Access

In this section, we derive principled strategies to attack GNNs under the novel black-box setup with limited node access. We first analyze the corresponding white-box attack problem in Section 3.2 and then adapt the theoretical insights from the white-box setup to the black-box setup and propose a black-box attack strategy in Section 3.3. Finally, in Section 3.4, we correct the proposed strategy by taking into account of the diminishing-return effect for the mis-classification rate.

3.1 Preliminary Notations

We first introduce necessary notations. We denote a graph as $G=(V,E)$ , where $V=\{1,2,\ldots,N\}$ is the set of $N$ nodes, and $E\subseteq V\times V$ is the set of edges. For a node classification problem, the nodes of the graph are collectively associated with node features $X\in\mathbb{R}^{N\times D}$ and labels $y\in\{1,2,\ldots,K\}^{N}$ , where $D$ is the dimensionality of the feature vectors and $K$ is the number of classes. Each node $i$ ’s local neighborhood including itself is denoted as $\mathcal{N}_{i}=\{j\in V\mid(i,j)\in E\}\cup\{i\}$ , and its degree as $d_{i}=|\mathcal{N}_{i}|$ . To ease the notation, for any matrix $A\in\mathbb{R}^{D_{1}\times D_{2}}$ in this paper, we refer $A_{j}$ to the transpose of the $j$ -th row of the matrix, i.e., $A_{j}\in\mathbb{R}^{D_{2}}$ .

GNN models. Given the graph $G$ , a GNN model is a function $f_{G}:\mathbb{R}^{N\times D}\rightarrow\mathbb{R}^{N\times K}$ that maps the node features $X$ to output logits of each node. We denote the output logits of all nodes as a matrix $H\in\mathbb{R}^{N\times K}$ and $H=f_{G}(X)$ . A GNN $f_{G}$ is usually built by stacking a certain number ( $L$ ) of layers, with the $l$ -th layer, $1\leq l\leq L$ , taking the following form:

H_{i}^{(l)}=\sigma\left(\sum_{j\in\mathcal{N}_{i}}\alpha_{ij}\bm{W}_{l}H_{j}^{(l-1)}\right),

(1)

where $H^{(l)}\in\mathbb{R}^{N\times D_{l}}$ is the hidden representation of nodes with $D_{l}$ dimensions, output by the $l$ -th layer; $\bm{W}_{l}$ is a learnable linear transformation matrix; $\sigma$ is an element-wise nonlinear activation function; and different GNNs have different normalization terms $\alpha_{ij}$ . For instance, $\alpha_{ij}=1/\sqrt{d_{i}d_{j}}$ or $\alpha_{ij}=1/d_{i}$ in Graph Convolutional Networks (GCN) [10]. In addition, $H^{(0)}=X$ and $H=H^{(L)}$ .

Random walks. A random walk [13] on $G$ is specified by the matrix of transition probabilities, $M\in\mathbb{R}^{N\times N}$ , where

M_{ij}=\left\{\begin{array}[]{ll}1/d_{i},&\text{ if }(i,j)\in E\text{ or }j=i,\\ 0,&\text{ otherwise.}\end{array}\right.

Each $M_{ij}$ represents the probability of transiting from $i$ to $j$ at any given step of the random walk. And powering the transition matrix by $t$ gives us the $t$ -step transition matrix $M^{t}$ .

3.2 White-Box Adversarial Attacks with Limited Node Access

Problem formulation. Given a classification loss $\mathcal{L}:\mathbb{R}^{N\times K}\times\{1,\ldots,K\}^{N}\rightarrow\mathbb{R}$ , the problem of white-box attack with limited node access can be formulated as an optimization problem as follows:

$\displaystyle\max_{S\subseteq V}\quad$	$\displaystyle\mathcal{L}(H,y)$	(2)
subject to	$\displaystyle\|S\|\leq r,d_{i}\leq m,\forall i\in S$
	$\displaystyle H=f(\tau(X,S)),$

where $r,m\in\mathbb{Z}^{+}$ respectively specify the maximum number of nodes and the maximum degree of nodes that can be attacked. Intuitively, we treat high-degree nodes as a proxy of celebrity accounts in a social network. For simplicity, we have omitted the subscript $G$ of the learned GNN classifier $f_{G}$ . The function $\tau:\mathbb{R}^{N\times D}\times 2^{V}\rightarrow\mathbb{R}^{N\times D}$ perturbs the feature matrix $X$ based on the selected node set $S$ (i.e., attack set). Under the white-box setup, theoretically $\tau$ can also be optimized to maximize the loss. However, as our goal is to study the node selection strategy under the black-box setup, we set $\tau$ as a pre-determined function. In particular, we define the $j$ -th row of the output of $\tau$ as $\tau(X,S)_{j}=X_{j}+\mathbbm{1}[j\in S]\epsilon$ , where $\epsilon\in\mathbb{R}^{D}$ is a small constant noise vector constructed by attackers’ domain knowledge about the features. In other words, the same small noise vector is added to the features of every attacked node.

We use the Carlili-Wagner loss for our analysis, a close approximation of cross-entropy loss and has been used in the analysis of adversarial attacks on image classifiers [3]:

\displaystyle\mathcal{L}(H,y)\triangleq\sum_{j=1}^{N}\mathcal{L}_{j}(H_{j},y_{j})\triangleq\sum_{j=1}^{N}\max_{k\in\{1,\ldots,K\}}H_{jk}-H_{jy_{j}}.

(3)

The change of loss under perturbation. Next we investigate how the overall loss changes when we select and perturb different nodes. We define the change of loss when perturbing the node $i$ as a function of the perturbed feature vector $x$ :

\Delta_{i}(x)=\mathcal{L}(f(X^{\prime}),y)-\mathcal{L}(f(X),y),\text{ where }X^{\prime}_{i}=x\text{ and }X^{\prime}_{j}=X_{j},\forall j\neq i.

To concretize the analysis, we consider the GCN model with $\alpha_{ij}=\frac{1}{d_{i}}$ in our following derivations. Suppose $f$ is an $L$ -layer GCN. With the connection between GCN and random walk [26] and Assumption 1 on the label distribution, we can show that, in expectation, the first-order Taylor approximation $\widetilde{\Delta}_{i}(x)\triangleq\Delta_{i}(X_{i})+(\nabla_{x}\Delta_{i}(X_{i}))^{T}(x-X_{i})$ is related to the sum of the $i$ -th column of the $L$ -step random walk transition matrix $M^{L}$ . We formally summarize this finding in Proposition 1.

Assumption 1 (Label Distribution).

Assume the distribution of the labels of all nodes follows the same constant categorical distribution, i.e.,

\Pr[y_{j}=k]=q_{k},\forall j=1,2,\ldots,N,

where $0<q_{k}<1$ for $k=1,2,\ldots,K$ and $\sum_{k=1}^{K}q_{k}=1$ . Moreover, since the classifier $f$ has been well-trained and fixed, the prediction of $f$ should capture certain relationships among the $K$ classes. Specifically, we assume the chance for $f$ predicting any node $j$ as any class $k\in\{1,\ldots,K\}$ , conditioned on the node label $y_{j}=l\in\{1,\ldots,K\}$ , confines to a certain distribution $p(k\mid l)$ , i.e.,

\displaystyle\Pr[\left(\operatorname*{argmax}_{c\in\{1,\ldots,K\}}H_{jc}\right)=k\mid y_{j}=l]=p(k\mid l).

Proposition 1.

For an $L$ -layer GCN model, if Assumption 1 and a technical assumption about the GCN²²2This is an assumption made by Xu et al. [26], which we list as Assumption 5 in Appendix A.1. hold, then

\delta_{i}\triangleq\operatorname*{\mathbb{E}}\left[\widetilde{\Delta}_{i}\left(x\right)|_{x=\tau(X,\{i\})_{i}}\right]=C\sum_{j=1}^{N}[M^{L}]_{ji},

where $C$ is a constant independent of $i$ .

3.3 Adaptation from the White-Box Setup to the Black-Box Setup

Now we turn to the black-box setup where we have no access to the model parameters or predictions. This means we are no longer able to evaluate the objective function $\mathcal{L}(H,y)$ of the optimization problem (2). Proposition 1 shows that the relative ratio of $\delta_{i}/\delta_{j}$ between different nodes $i\neq j$ only depends on the random walk transition matrix, which we can easily calculate based on the graph $G$ . This implies that we can still approximately optimize the problem (2) in the black-box setup.

Node selection with importance scores. Consider the change of loss under the perturbation of a set of nodes $S$ . If we write the change of loss as a function of the perturbed features and take the first order Taylor expansion, which we denote as $\delta$ , we have $\delta=\sum_{i\in S}\delta_{i}$ . Therefore $\delta$ is maximized by the set of $r$ nodes with degrees less than $m$ and the largest possible $\delta_{i}$ , where $m,r$ are the limits of node access defined in the problem (2). Therefore, we can define an importance score for each node $i$ as the sum of the $i$ -th column of $M^{L}$ , i.e., $I_{i}=\sum_{j=1}^{N}[M^{L}]_{ji}$ , and simply select the nodes with the highest importance scores to attack. We denote this strategy as RWCS (Random Walk Column Sum). We note that RWCS is similar to PageRank. The difference between RWCS and PageRank is that the latter uses the stationary transition matrix $M^{\infty}$ for a random walk with restart.

Empirically, RWCS indeed significantly increases the classification loss (as shown in Section 4.2). The nonlinear loss actually increases linearly w.r.t. the perturbation strength (the norm of the perturbation noise $\epsilon$ ) for a wide range, which indicates that $\widetilde{\Delta}_{i}$ is a good approximation of $\Delta_{i}$ . Surprisingly, RWCS fails to continue to increase the mis-classification rate (which matters more in real applications) when the perturbation strength becomes larger. Details of this empirical finding are shown in Figure 1 in Section 4.2. We conduct additional formal analyses on the mis-classification rate in the following section and find a diminishing-return effect of adding more nodes to the attack set when the perturbation strength is adequate.

3.4 Diminishing-Return of Mis-classification Rate and its Correction

Analysis of the diminishing-return effect. Our analysis is based on the investigation that each target node $i\in V$ will be mis-classified as we increase the attack set.

To assist the analysis, we first define the concepts of vulnerable function and vulnerable set below.

Definition 1 (Vulnerable Function).

We define the vulnerable function $g_{i}:2^{V}\rightarrow\{0,1\}$ of a target node $i\in V$ as, for a given attack set $S\subseteq V$ ,

\displaystyle g_{i}(S)=\left\{\begin{array}[]{lr}1,&\text{ if $i$ is mis-classified when attacking $S$},\\ 0,&\text{if $i$ is correctly-classified when attacking $S$.}\end{array}\right.

Definition 2 (Vulnerable Set).

We define the vulnerable set of a target node $i\in V$ as a set of all attack sets that could lead $i$ to being mis-classified:

A_{i}\triangleq\{S\subseteq V\mid g_{i}(S)=1\}.

We also make the following assumption about the vulnerable function.

Assumption 2.

$g_{i}$ is non-decreasing for all $i\in V$ , i.e., if $T\subseteq S\subseteq V$ , then $g_{i}(T)\leq g_{i}(S)$ .

With the definitions above, the mis-classification rate can be written as the average of the vulnerable functions: $h(S)=\frac{1}{N}\sum_{i=1}^{N}g_{i}(S)$ . By Assumption 2, $h$ is also clearly non-decreasing.

We further define the basic vulnerable set to characterize the minimal attack sets that can lead a target node to being mis-classified.

Definition 3 (Basic Vulnerable Set).

$\forall i\in V$ , we call $B_{i}\subseteq A_{i}$ a basic vulnerable set of $i$ if,

1)

$\emptyset\notin B_{i}$ ; if $\emptyset\in A_{i},B_{i}=\emptyset$ ;
2)

if $\emptyset\notin A_{i}$ , for any nonempty $S\in A_{i}$ , there exists a $T\in B_{i}$ s.t. $T\subseteq S$ ;
3)

for any distinct $S,T\in B_{i}$ , $|S\cap T|<\min(|S|,|T|)$ .

And the existence of such a basic vulnerable set is guaranteed by Lemma 1.

Lemma 1.

For any $i\in V$ , there exists a unique $B_{i}$ .

The distribution of the sizes of the element sets of $B_{i}$ is closely related to the perturbation strength on the features. When the perturbation is small, we may have to perturb multiple nodes before the target node is mis-classified, and thus the element sets of $B_{i}$ will be large. When perturbation is relatively large, we may be able to turn a target node to be mis-classified by perturbing a single node, if chosen wisely. In this case $B_{i}$ will have a lot of singleton sets.

Our following analysis (Proposition 2) shows that $h$ has a diminishing-return effect if the vulnerable sets of nodes on the graph present homophily (Assumption 3), which is common in real-world networks, and the perturbation on features becomes considerably large (Assumption 4).

Assumption 3 (Homophily).

$\forall S\in\cup_{i=1}^{N}A_{i}$ and $|S|>1$ , there are $b(S)\geq 1$ nodes s.t., for any node $j$ among these nodes, $S\in A_{j}$ .

Intuitively, the vulnerable sets present strong homophily if $b(S)$ ’s are large.

Assumption 4 (Considerable Perturbation).

$\forall S\in\cup_{i=1}^{N}A_{i}$ and if $|S|>1$ , then there are $\lceil p(S)\cdot b(S)\rceil$ nodes s.t., for any node $j$ among these nodes, there exists a set $T\subseteq S$ , $|T|=1$ , and $T\in A_{j}$ . And $\frac{r}{r+1}<p(S)\leq 1$ .

Proposition 2.

If Assumptions 3 and 4 hold, $h$ is $\gamma$ -approximately submodular for some $0<\gamma<\frac{1}{r}$ , i.e., there exists a non-decreasing submodular function $\tilde{h}:2^{V}\rightarrow\mathbb{R}_{+}$ , s.t. $\forall S\subseteq V$ ,

(1-\gamma)\tilde{h}(S)\leq h(S)\leq(1+\gamma)\tilde{h}(S).

As greedy methods are guaranteed to enjoy a constant approximation ratio for such approximately submodular functions [8], Proposition 2 motivates us to develop a greedy correction procedure to compensate the diminishing-return effect when calculating the importance scores.

The greedy correction procedure. We propose an iterative node selection procedure and apply two greedy correction steps on top of the RWCS strategy, motivated by Assumption 3 and 4.

To accommodate Assumption 3, after each node is selected into the attack set, we exclude a $k$ -hop neighborhood of the selected node for next iteration, for a given constant integer $k$ . The intuition is that nodes in a local neighborhood may contribute to similar target nodes due to homophily. To accommodate Assumption 4, we adopt an adaptive version of RWCS scores. First, we binarize the $L$ -step random walk transition matrix $M^{L}$ as $\widetilde{M}$ , i.e.,

\left[\widetilde{M}\right]_{ij}=\left\{\begin{array}[]{lr}1,&\text{ if }\text{$[M^{L}]_{ij}$ is among Top-$l$ of $[M^{L}]_{i}$}\text{ and }[M^{L}]_{ij}\neq 0,\\ 0,&\text{otherwise,}\end{array}\right.

(4)

where $l$ is a given constant integer. Next, we define a new adaptive influence score as a function of a matrix $Q$ : $\widetilde{I}_{i}(Q)=\sum_{j=1}^{N}[Q]_{ji}.$ In the iterative node selection procedure, we initialize $Q$ as $\widetilde{M}$ . We select the node with highest score $\widetilde{I}_{i}(Q)$ subsequently. After each iteration, suppose we have selected the node $i$ in this iteration, we will update $Q$ by setting to zero for all the rows where the elements of the $i$ -th column are $1$ . The underlying assumption of this operation is that, adding $i$ to the selected set is likely to mis-classify all the target nodes corresponding to the aforementioned rows, which complies Assumption 4. We name this iterative procedure as the GC-RWCS (Greedily Corrected RWCS) strategy, and summarize it in Algorithm 1.

Input: number of nodes limit

r

; maximum degree limit

m

; neighbor hops

k

; binarized transition matrix

\widetilde{M}

; the adaptive influence score function

\widetilde{I}_{i},\forall i\in V

Output: the set

S

to be attacked.

2 Initialize the candidate set

P=\{i\in V\mid d_{i}\leq m\}

, and the score matrix

Q=\widetilde{M}

;

3 Initialize

S=\emptyset

;

4 for $t=1,2,\ldots,r$ do

z\leftarrow\operatorname*{argmax}_{i\in P}\widetilde{I}_{i}(Q)

;

S\leftarrow S\cup\{z\}

;

P\leftarrow P\setminus\{i\in P\mid\text{shortest-path}(i,z)\leq k\}

;

q\leftarrow Q_{\cdot,z}

;

9 for $i\in V$ do

10 if $q_{i}$ is $1$ then

Q_{i}\leftarrow\textbf{0}

;

15return

S

;

Algorithm 1 The GC-RWCS Strategy for Node Selection.

Finally, we want to mention that, while the derivation of RWCS and GC-RWCS requires the knowledge of the number of layers $L$ for GCN, we find that the empirical performance of the proposed attack strategies are not sensitive w.r.t. the choice of $L$ . Therefore, the proposed methods are applicable to the black-box setup where we do not know the exact $L$ of the model.

4 Experiments

4.1 Experiment Setup

GNN models. We evaluate the proposed attack strategies on two common GNN models, GCN [10] and JK-Net [26]. For JK-Net, we test on its two variants, JKNetConcat and JKNetMaxpool, which apply concatenation and element-wise max at last layer respectively. We set the number of layers for GCN as 2 and the number of layers for both JK-Concat and JK-Maxpool as 7. The hidden size of each layer is 32. For the training, we closely follow the hyper-parameter setup in Xu et al. [26].

Datasets. We adopt three citation networks, Citeseer, Cora, and Pubmed, which are standard node classification benchmark datasets [16, 27]. Following the setup of JK-Net [26], we randomly split each dataset by $60\%$ , $20\%$ , and $20\%$ for training, validation, and testing. And we draw 40 random splits.

Baseline methods for comparison. As we summarized in Section 2.1, our proposed black-box adversarial attack setup is by far the most restricted, and none of existing attack strategies for GNN can be applied. We compare the proposed attack strategies with baseline strategies by selecting nodes with top centrality metrics. We compare with three well-known network metrics capturing different aspects of node centrality: Degree, Betweenness, and PageRank and name the attack strategies correspondingly. In classical network analysis literature [15], real-world networks are shown to be fragile under attacks to high-centrality nodes. Therefore we believe these centrality metrics serve as reasonable baselines under our restricted black-box setup. For the purpose of sanity check, we also include a trivial baseline Random, which randomly selects the nodes to be attacked.

Hyper-parameters for GC-RWCS. For the proposed GC-RWCS strategy, we fix the number of step $L=4$ , the neighbor-hop parameter $k=1$ and the parameter $l=30$ for the binarized $\widetilde{M}$ in Eq. (4) for all models on all datasets. Note that $L=4$ is different from the number of layers of both GCN and JK-Nets in our experiments. But we achieve effective attack performance. We also conduct a sensitivity analysis in Appendix B.2 and demonstrate the proposed method is not sensitive w.r.t. $L$ .

Nuisance parameters of the attack procedure. For each dataset, we fix the limit on the number of nodes to attack, $r$ , as $1\%$ of the graph size. After the node selection step, we also need to specify how to perturb the node features, i.e., the design of $\epsilon$ in $\tau$ function in the optimization problem (2). In a real-world scenario, $\epsilon$ should be designed with domain knowledge about the classification task, without access to the GNN models. In our experiments, we have to simulate the domain knowledge due to the lack of semantic meaning of each individual feature in the benchmark datasets. Formally, we construct the constant perturbation $\epsilon\in\mathbb{R}^{D}$ as follows, for $j=1,2,\ldots,D$ ,

\epsilon_{j}=\left\{\begin{array}[]{lr}\lambda\cdot\text{sign}(\sum_{i=1}^{N}\frac{\partial\mathcal{L}(H,y)}{\partial X_{ij}}),&\text{if }j\in\text{arg top-$J$}\left(\left[\absolutevalue{\sum_{i=1}^{N}\frac{\partial\mathcal{L}(H,y)}{\partial X_{il}}}\right]_{l=1,2,\ldots,D}\right),\\ 0,&\text{otherwise,}\end{array}\right.

(5)

where $\lambda$ is the magnitude of modification. We fix $J=\lfloor 0.02D\rfloor$ for all datasets. While gradients of the model are involved, we emphasize that we only use extremely limited information of the gradients: determining a few number of important features and the binary direction to perturb for each selected feature, only at the global level by averaging gradients on all nodes. We believe such coarse information is usually available from domain knowledge about the classification task. The perturbation magnitude for each feature is fixed as a constant $\lambda$ and is irrelevant to the model. In addition, the same perturbation vector is added to the features of all the selected nodes. The construction of the perturbation is totally independent of the selected nodes.

4.2 Experiment Results

Verifying the discrepancy between the loss and the mis-classification rate. We first provide empirical evidence for the discrepancy between classification loss (cross-entropy) and mis-classification rate. We compare the RWCS strategy to baseline strategies with varying perturbation strength as measured by $\lambda$ in Eq. (5). The results shown in Figure 1 are obtained by attacking GCN on Citeseer. First, we observe that RWCS increases the classification loss almost linearly as $\lambda$ increases, indicating our approximation of the loss by first-order Taylor expansion actually works pretty well in practice. Not surprisingly, RWCS performs very similarly as PageRank. And RWCS performs much better than other centrality metrics in increasing the classification loss, showing the effectiveness of Proposition 1. However, we see the decrease of classification accuracy when attacked by RWCS (and PageRank) quickly saturates as $\lambda$ increases. The GC-RWCS strategy that is proposed to correct the importance scores is able to decreases the classification accuracy the most as $\lambda$ becomes larger, although it increases the classification loss the least.

Full experiment results. We then provide the full experiment results of attacking GCN, JKNetConcat, and JKNetMaxpool on all three datasets in Table 1. The perturbation strength is set as $\lambda=1$ . The thresholds $10\%$ and $30\%$ indicate that we set the limit on the maximum degree $m$ as the lowest degree of the top $10\%$ and $30\%$ nodes respectively.

The results clearly demonstrate the effectiveness of the proposed GC-RWCS strategy. GC-RWCS achieves the best attack performance on almost all experiment settings, and the difference to the second-best strategy is significant in almost all cases. It is also worth noting that the proposed GC-RWCS strategy is able to decrease the node classification accuracy by up to $33.5\%$ , and GC-RWCS achieves a $70\%$ larger decrease of the accuracy than the Random baseline in most cases (see Table 3 in Appendix B.2). And this is achieved by merely adding the same constant perturbation vector to the features of $1\%$ of the nodes in the graph. This verifies that the explicit structural inductive biases of GNN models make them vulnerable even in the extremely restricted black-box attack setup.

Table 1: Summary of the attack performance. The lower the accuracy (in

\%

) the better the attacks. The bold marker denotes the best performance. The asterisk (*) means the difference between the best strategy and the second-best strategy is statistically significant by a t-test at significance level 0.05. The error bar (

\pm

) denotes the standard error of the mean by 40 independent trials.

	Cora			Citeseer			Pubmed
Method	GCN	JKNetConcat	JKNetMaxpool	GCN	JKNetConcat	JKNetMaxpool	GCN	JKNetConcat	JKNetMaxpool
None	85.6 $\pm$ 0.3	86.2 $\pm$ 0.2	85.8 $\pm$ 0.3	75.1 $\pm$ 0.2	72.9 $\pm$ 0.3	73.2 $\pm$ 0.3	85.7 $\pm$ 0.1	85.8 $\pm$ 0.1	85.7 $\pm$ 0.1
Threshold $10\%$
Random	81.3 $\pm$ 0.3	68.8 $\pm$ 0.8	68.8 $\pm$ 1.3	71.3 $\pm$ 0.3	60.8 $\pm$ 0.8	61.7 $\pm$ 0.9	82.0 $\pm$ 0.3	75.9 $\pm$ 0.7	75.4 $\pm$ 0.7
Degree	78.2 $\pm$ 0.4	60.7 $\pm$ 1.0	59.9 $\pm$ 1.5	67.5 $\pm$ 0.4	52.5 $\pm$ 0.8	53.7 $\pm$ 1.0	78.9 $\pm$ 0.5	63.4 $\pm$ 1.0	63.3 $\pm$ 1.2
Pagerank	79.4 $\pm$ 0.4	71.6 $\pm$ 0.6	70.0 $\pm$ 1.0	70.1 $\pm$ 0.3	61.5 $\pm$ 0.5	62.6 $\pm$ 0.6	80.3 $\pm$ 0.3	71.3 $\pm$ 0.8	71.2 $\pm$ 0.8
Betweenness	79.7 $\pm$ 0.4	60.5 $\pm$ 0.9	60.3 $\pm$ 1.6	68.9 $\pm$ 0.3	53.5 $\pm$ 0.8	55.1 $\pm$ 1.0	78.5 $\pm$ 0.6	67.1 $\pm$ 1.1	66.2 $\pm$ 1.1
RWCS	79.5 $\pm$ 0.3	71.2 $\pm$ 0.5	69.9 $\pm$ 1.0	69.9 $\pm$ 0.3	60.8 $\pm$ 0.6	62.2 $\pm$ 0.7	79.8 $\pm$ 0.3	70.7 $\pm$ 0.8	70.7 $\pm$ 0.8
GC-RWCS	78.5 $\pm$ 0.5	52.7 $\pm$ 1.0*	53.3 $\pm$ 1.9*	65.1 $\pm$ 0.5*	46.6 $\pm$ 0.8*	48.2 $\pm$ 1.1*	77.3 $\pm$ 0.7	62.1 $\pm$ 1.2	60.6 $\pm$ 1.4*
Threshold $30\%$
Random	82.6 $\pm$ 0.4	70.7 $\pm$ 1.1	71.8 $\pm$ 1.1	72.6 $\pm$ 0.3	62.7 $\pm$ 0.8	63.9 $\pm$ 0.8	82.6 $\pm$ 0.2	77.3 $\pm$ 0.4	77.4 $\pm$ 0.5
Degree	80.7 $\pm$ 0.4	64.9 $\pm$ 1.4	67.0 $\pm$ 1.5	70.4 $\pm$ 0.4	56.9 $\pm$ 0.8	58.7 $\pm$ 0.9	81.5 $\pm$ 0.4	72.4 $\pm$ 0.7	72.3 $\pm$ 0.7
Pagerank	82.6 $\pm$ 0.3	79.6 $\pm$ 0.4	79.7 $\pm$ 0.4	72.9 $\pm$ 0.2	70.2 $\pm$ 0.3	70.3 $\pm$ 0.3	83.0 $\pm$ 0.2	79.3 $\pm$ 0.3	79.6 $\pm$ 0.3
Betweenness	81.8 $\pm$ 0.4	64.1 $\pm$ 1.3	65.9 $\pm$ 1.4	70.7 $\pm$ 0.3	56.3 $\pm$ 0.8	58.3 $\pm$ 0.9	81.3 $\pm$ 0.3	74.1 $\pm$ 0.5	74.6 $\pm$ 0.5
RWCS	82.8 $\pm$ 0.3	79.3 $\pm$ 0.5	79.5 $\pm$ 0.4	72.9 $\pm$ 0.2	69.8 $\pm$ 0.3	70.1 $\pm$ 0.3	82.1 $\pm$ 0.2	77.8 $\pm$ 0.3	78.4 $\pm$ 0.3
GC-RWCS	80.7 $\pm$ 0.5	59.1 $\pm$ 1.6*	61.1 $\pm$ 1.6*	67.8 $\pm$ 0.5*	49.0 $\pm$ 0.9*	50.7 $\pm$ 1.1*	80.3 $\pm$ 0.5*	69.2 $\pm$ 0.7*	70.0 $\pm$ 0.7*

5 Conclusion

In this paper, we propose a novel black-box adversarial attack setup for GNN models with constraint of limited node access, which we believe is by far the most restricted and realistic black-box attack setup. Nonetheless, through both theoretical analyses and empirical experiments, we demonstrate that the strong and explicit structural inductive biases of GNN models make them still vulnerable to this type of adversarial attacks. We also propose a principled attack strategy, GC-RWCS, based on our theoretical analyses on the connection between the GCN model and random walk, which corrects the diminishing-return effect of the mis-classification rate. Our experimental results show that the proposed strategy significantly outperforms competing attack strategies under the same setup.

Acknowledgments

This work was in part supported by the National Science Foundation under grant numbers 1633370 and 1620319.

Broader Impact

For the potential positive impacts, we anticipate that the work may raise the public attention about the security and accountability issues of graph-based machine learning techniques, especially when they are applied to real-world social networks. Even without accessing any information about the model training, the graph structure alone can be exploited to damage a deep learning framework with a rather executable strategy.

On the potential negative side, as our work demonstrates that there is a chance to attack existing GNN models effectively without any knowledge but a simple graph structure, this may expose a serious alert to technology companies who maintain the platforms and operate various applications based on the graphs. However, we believe making this security concern transparent can help practitioners detect potential attack in this form and better defend the machine learning driven applications.

References

Barabási and Albert [1999] Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
Bojchevski and Günnemann [2018] Aleksandar Bojchevski and Stephan Günnemann. Adversarial attacks on node embeddings via graph poisoning. arXiv preprint arXiv:1809.01093, 2018.
Carlini and Wagner [2017] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. IEEE, 2017.
Chang et al. [2020] Heng Chang, Yu Rong, Tingyang Xu, Wenbing Huang, Honglei Zhang, Peng Cui, Wenwu Zhu, and Junzhou Huang. A restricted black-box adversarial framework towards attacking graph embedding models. In AAAI Conference on Artificial Intelligence, 2020.
Chen et al. [2018] Jinyin Chen, Yangyang Wu, Xuanheng Xu, Yixian Chen, Haibin Zheng, and Qi Xuan. Fast gradient attack on network embedding. arXiv preprint arXiv:1809.02797, 2018.
Dai et al. [2018] Hanjun Dai, Hui Li, Tian Tian, Xin Huang, Lin Wang, Jun Zhu, and Le Song. Adversarial attack on graph structured data. arXiv preprint arXiv:1806.02371, 2018.
Frankle and Carbin [2018] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.
Horel and Singer [2016] Thibaut Horel and Yaron Singer. Maximization of approximately submodular functions. In Advances in Neural Information Processing Systems, pages 3045–3053, 2016.
Jin et al. [2020] Wei Jin, Yaxin Li, Han Xu, Yiqi Wang, and Jiliang Tang. Adversarial attacks and defenses on graphs: A review and empirical study. arXiv preprint arXiv:2003.00653, 2020.
Kipf and Welling [2016] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
Klicpera et al. [2018] Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. Predict then propagate: Graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997, 2018.
Li et al. [2017] Cheng Li, Jiaqi Ma, Xiaoxiao Guo, and Qiaozhu Mei. Deepcas: An end-to-end predictor of information cascades. In Proceedings of the 26th international conference on World Wide Web, pages 577–586, 2017.
Lovász et al. [1993] László Lovász et al. Random walks on graphs: A survey. Combinatorics, Paul erdos is eighty, 2(1):1–46, 1993.
Morris et al. [2019] Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4602–4609, 2019.
Newman [2018] Mark Newman. Networks. Oxford university press, 2018.
Sen et al. [2008] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. Collective classification in network data. AI magazine, 29(3):93–93, 2008.
Shi et al. [2020] Chence Shi, Minkai Xu, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, and Jian Tang. Graphaf: a flow-based autoregressive model for molecular graph generation. arXiv preprint arXiv:2001.09382, 2020.
Sun et al. [2019] Yiwei Sun, Suhang Wang, Xianfeng Tang, Tsung-Yu Hsieh, and Vasant Honavar. Node injection attacks on graphs via reinforcement learning. arXiv preprint arXiv:1909.06543, 2019.
Veličković et al. [2017] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
Wang et al. [2019] Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, et al. Deep graph library: Towards efficient and scalable deep learning on graphs. arXiv preprint arXiv:1909.01315, 2019.
Wilson [2020] Andrew Gordon Wilson. The case for bayesian deep learning. arXiv preprint arXiv:2001.10995, 2020.
Wu et al. [2019] Huijun Wu, Chen Wang, Yuriy Tyshetskiy, Andrew Docherty, Kai Lu, and Liming Zhu. Adversarial examples for graph data: Deep insights into attack and defense. In IJCAI, 2019.
Wu et al. [2020] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2020.
Xu et al. [2019] Kaidi Xu, Hongge Chen, Sijia Liu, Pin-Yu Chen, Tsui-Wei Weng, Mingyi Hong, and Xue Lin. Topology attack and defense for graph neural networks: An optimization perspective. arXiv preprint arXiv:1906.04214, 2019.
Xu et al. [2018a] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018a.
Xu et al. [2018b] Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. Representation learning on graphs with jumping knowledge networks. arXiv preprint arXiv:1806.03536, 2018b.
Yang et al. [2016] Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. Revisiting semi-supervised learning with graph embeddings. arXiv preprint arXiv:1603.08861, 2016.
Ying et al. [2018] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 974–983, 2018.
Zoph and Le [2016] Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
Zügner and Günnemann [2019] Daniel Zügner and Stephan Günnemann. Adversarial attacks on graph neural networks via meta learning. In International Conference on Learning Representations (ICLR), 2019.
Zügner et al. [2018] Daniel Zügner, Amir Akbarnejad, and Stephan Günnemann. Adversarial attacks on neural networks for graph data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2847–2856, 2018.

Appendix A Proofs

A.1 Proof of Proposition 1

We first remind the reader for some notations, a GCN model is denoted as a function $f$ , the feature matrix is $X\in\mathbb{R}^{N\times D}$ , and the output logits $H=f(X)\in\mathbb{R}^{N\times K}$ . The $L$ -step random walk transition matrix is $M^{L}$ . More details can be found in in Section 3.1

We give in Lemma 2 the connection between GCN models and random walks. Lemma 2 relies on a technical assumption about the GCN model (Assumption 5) and the proof can be found in Xu et al. [26].

Assumption 5 (Xu et al. [26]).

All paths in the computation graph of the given GCN model are independently activated with the same probability of success $\rho$ .

Lemma 2.

(Xu et al. [26].) Given an $L$ -layer GCN with averaging as $\alpha_{i,j}=1/d_{i}$ in Eq. 1, assume that all path in the computation graph of the model are activated with the same probability of success $\rho$ (Assumption 5). Then, for any node $i,j\in V$ ,

\mathbb{E}\left[\frac{\partial H_{j}}{\partial X_{i}}\right]=\rho\cdot\prod_{l=L}^{1}\bm{W}_{l}[M^{L}]_{ji},

(6)

where $\bm{W}_{l}$ is the learnable parameter at $l$ -th layer.

Then we are able to prove Proposition 1 below.

Proof.

First, we derive the gradient of the loss $\mathcal{L}(H,y)$ w.r.t. the feature $X_{i}$ of node $i$ ,

$\displaystyle\nabla_{X_{i}}\mathcal{L}(H,y)$	$\displaystyle=\nabla_{X_{i}}\left(\sum_{j=1}^{N}\mathcal{L}_{j}(H_{j},y_{j})\right)$
	$\displaystyle=\sum_{j=1}^{N}\nabla_{X_{i}}\mathcal{L}_{j}(H_{j},y_{j})$
	$\displaystyle=\sum_{j=1}^{N}\left(\frac{\partial H_{j}}{\partial X_{i}}\right)^{T}\frac{\partial\mathcal{L}_{j}(H_{j},y_{j})}{\partial H_{j}},$	(7)

where $H_{j}$ is the $j$ th row of $H$ but being transposed as column vectors and $y_{j}$ is the true label of node $j$ . Note that $\frac{\partial\mathcal{L}_{j}(H_{j},y_{j})}{\partial H_{j}}\in\mathbb{R}^{K}$ , and $\frac{\partial H_{j}}{\partial X_{i}}\in\mathbb{R}^{K\times D}$ .

Next, we plug Eq. 7 into $\widetilde{\Delta}_{i}\left(x\right)|_{x=\tau(X,\{i\})_{i}}$ . For simplicity, We write $\widetilde{\Delta}_{i}\left(x\right)|_{x=\tau(X,\{i\})_{i}}$ as $\widetilde{\Delta}_{i}$ in the rest of the proof.

	$\displaystyle\widetilde{\Delta}_{i}$	$\displaystyle=\left(\nabla_{X_{i}}\mathcal{L}(H,y)\right)^{T}\epsilon$
		$\displaystyle=\sum_{j=1}^{N}\left(\frac{\partial\mathcal{L}_{j}(H_{j},y_{j})}{\partial H_{j}}\right)^{T}\frac{\partial H_{j}}{\partial X_{i}}\epsilon.$		(8)

Denote $a^{j}\triangleq\frac{\partial\mathcal{L}_{j}(H_{j},y_{j})}{\partial H_{j}}\in\mathbb{R}^{K}$ . From the definition of loss

\mathcal{L}_{j}(H_{j},y_{j})=\max_{k\in\{1,\ldots,K\}}H_{jk}-H_{jy_{j}},

we have

\displaystyle a^{j}_{k}=\left\{\begin{array}[]{lr}-1,&\text{ if }k=y_{j}\text{ and }y_{j}\neq\operatorname*{argmax}_{c\in\{1,\ldots,K\}}H_{jc},\\ 1,&\text{ if }k\neq y_{j}\text{ and }k=\operatorname*{argmax}_{c\in\{1,\ldots,K\}}H_{jc},\\ 0,&\text{otherwise,}\end{array}\right.

for $k=1,2,\ldots,K$ . Under Assumption 1, the expectation of each element of $a^{j}$ is

\operatorname*{\mathbb{E}}[a^{j}_{k}]=-q_{k}(1-p(k\mid k))+\sum_{w=1,w\neq k}^{K}p(k\mid w)q_{w},k=1,2,\ldots,K

which is a constant independent of $H_{j}$ and $y_{j}$ . Therefore, we can write

\operatorname*{\mathbb{E}}[a^{j}]=c,\forall j=1,2,\ldots,N,

where $c\in\mathbb{R}^{K}$ is a constant vector independent of $j$ .

Taking expectation of Eq. (8) and plug in the result of Lemma 2,

	$\displaystyle\operatorname*{\mathbb{E}}\left[\widetilde{\Delta}_{i}\right]$	$\displaystyle\approx\mathbb{E}\left[\sum_{j=1}^{N}\left(\frac{\partial\mathcal{L}_{j}(H_{j},y_{j})}{\partial H_{j}}\right)^{T}\frac{\partial H_{j}}{\partial X_{i}}\epsilon\right]$
		$\displaystyle=\sum_{j=1}^{N}\operatorname*{\mathbb{E}}[a^{j}]^{T}\left(\rho\prod_{l=L}^{1}W_{l}[M^{L}]_{ji}\right)\epsilon$
		$\displaystyle=\left(\rho c^{T}\prod_{l=L}^{1}W_{l}\epsilon\right)\sum_{j=1}^{N}[M^{L}]_{ji}$
		$\displaystyle=C\sum_{j=1}^{N}[M^{L}]_{ji},$

where $C=\rho c^{T}\prod_{l=L}^{1}W_{l}\epsilon$ is a constant scalar independent of $i$ . ∎

A.2 Proofs for the Results in Section 3.4

Proof of Lemma 1.

Proof.

If $A_{i}=\emptyset$ , $B_{i}\subseteq A_{i}$ so $B_{i}=\emptyset$ . The three conditions of Definition 3 are also trivially true. Below we investigate the case $A_{i}\neq\emptyset$ .

The existence can be given by a constructive proof. We check the nonempty elements in $A_{i}$ one by one with any order. If this element is a super set of any other element in $A_{i}$ , we skip it. Otherwise, we put it into $B_{i}$ . Then we verify that the resulted $B_{i}$ is a basic vulnerable set for $i$ . $B_{i}\subseteq A_{i}$ . For condition 1), clearly, $\emptyset\notin B_{i}$ and if $\emptyset\in A_{i}$ , all nonempty elements in $A_{i}$ are skipped so $B_{i}=\emptyset$ . For condition 2), given $\emptyset\notin A_{i}$ , for any nonempty $S\in A_{i}$ , if $S\in B_{i}$ , the condition holds. If $S\notin B_{i}$ , by construction, there exists a nonempty strict subset $S_{1}\subset S$ and $S_{1}\in A_{i}$ . If $S_{1}\in B_{i}$ , the condition holds. If $S_{1}\notin B_{i}$ , we can similarly find a nonempty strict subset $S_{2}\subset S$ and $S_{2}\in A_{i}$ . Recursively, we can get a series $S\supset S_{1}\supset S_{2}\supset\cdots$ . As $S$ is finite, we will have a set $S_{k}$ that no longer has strict subset so $S_{k}\in B_{i}$ . Therefore the condition holds. Condition 3) means any set in $B_{i}$ is not a subset of another set in $B_{i}$ . This condition holds by construction.

Now we prove the uniqueness. Suppose there are two distinct basic vulnerable sets $B_{i}\neq C_{i}$ . Without loss of generality, we assume $S\in B_{i}$ but $S\notin C_{i}$ . $B_{i}\neq\emptyset$ so $\emptyset\notin A_{i}$ . Further $S\in A_{i}$ , hence $C_{i}\neq\emptyset$ . As $S\in B_{i}\subseteq A_{i}$ , $S\neq\emptyset$ , and $C_{i}$ satisfies condition 2), there will be a nonempty $T\in C_{i}$ s.t. $T\subset S$ . If $T\in B_{i}$ , then condition 3) is violated for $B_{i}$ . If $T\notin B_{i}$ , there will be a nonempty $T^{\prime}\in B_{i}$ s.t. $T^{\prime}\subset T$ . But $T^{\prime}\subset S$ also violates condition 3). By contradiction we prove the uniqueness. ∎

In order to prove Proposition 2, we first would like to construct a submodular function that is close to $h$ , with the help of Lemma 3 below.

Lemma 3.

If $\forall i\in V$ , $B_{i}$ is either empty or only contains singleton sets, then $h$ is submodular.

Proof.

We first prove the case when $\forall i\in V,A_{i}\neq\emptyset$ .

First, we show that $\forall i\in V$ , if $A_{i}\neq\emptyset$ , for any nonempty $S\subseteq V,g_{i}(S)=1$ if and only if $B_{i}=\emptyset$ or $\exists T\in B_{i},T\subseteq S$ . On one hand, if $g_{i}(S)=1$ , then $S\in A_{i}$ . If $\emptyset\in A_{i},B_{i}=\emptyset$ . If $\emptyset\notin A_{i}$ , by condition 2) of the basic vulnerable set, $\exists T\in B_{i},T\subseteq S$ . On the other hand, if $\exists T\in B_{i},T\subseteq S$ , $g_{i}(T)=1$ , by Assumption 2, $g_{i}(S)\geq g_{i}(T)$ , so $g_{i}(S)=1$ . If $B_{i}=\emptyset$ , as $A_{i}\neq\emptyset$ , if $\emptyset\notin A_{i}$ , the condition 2) of Definition 3 will be violated. Therefore $\emptyset\in A_{i}$ so $g_{i}(\emptyset)=1$ . Still by Assumption 2, $g_{i}(S)\geq g_{i}(\emptyset)$ , so $g_{i}(S)=1$ .

Define a function $e:V\rightarrow 2^{V}$ s.t. for any node $i\in V$ ,

e(i)=\{j\in V\mid\{i\}\in B_{j}\}.

Given $B_{i}$ is either empty or only contains singleton sets for any $i\in V$ , for any nonempty $S\subseteq V$

$\displaystyle h(S)$	$\displaystyle=\frac{1}{N}\sum_{i=1}^{N}g_{i}(S)$	(9)
	$\displaystyle=\frac{1}{N}\left\|\{j\in V\mid B_{j}=\emptyset\text{ or }\exists T\in B_{j},T\subseteq S\}\right\|$
	$\displaystyle=\frac{1}{N}\left\|\{j\in V\mid B_{j}=\emptyset\text{ or }\exists\{i\}\in B_{j},i\in S\}\right\|$
	$\displaystyle=\frac{1}{N}\left\|\{j\in V\mid B_{j}=\emptyset\text{ or }\exists i\in S,\{i\}\in B_{j}\}\right\|$
	$\displaystyle=\frac{1}{N}\left(\left\|\cup_{i\in S}e(i)\right\|+\left\|\{j\in V\mid B_{j}=\emptyset\}\right\|\right).$

$\left|\{j\in V\mid B_{j}=\emptyset\}\right|$ is a constant independent of $S$ . Therefore, maximizing $h(S)$ over $S$ with $|S|\leq r$ is equivalent to maximizing $\left|\cup_{i\in S}e(i)\right|$ over $S$ with $|S|\leq r$ , which is a maximum coverage problem. Therefore $h$ is submodular.

The case of allowing some nodes to have empty vulnerable sets can be easily proved by removing such nodes in Eq. (9) as their corresponding vulnerable functions always equal to zero. ∎

Proof of Proposition 2. For simplicity, we assume $A_{i}\neq\emptyset$ for any $i\in V$ . The proof below can be easily adapted to the general case without this assumption, by removing the nodes with empty vulnerable sets similarly as the proof for Lemma 3.

Proof.

$\forall i\in V$ , define $\widetilde{B}_{i}\triangleq\{S\in B_{i}\mid|S|=1\}$ . We can then define a new group of vulnerable sets $\widetilde{A}_{i}$ on $V$ for $i\in V$ . Let

\displaystyle\widetilde{A}_{i}=\left\{\begin{array}[]{lr}2^{V},&\text{ if $B_{i}=\emptyset$},\\ \emptyset,&\text{$B_{i}\neq\emptyset$ but $\widetilde{B}_{i}=\emptyset$},\\ \{S\subseteq V\mid\exists T\in\widetilde{B}_{i},T\subseteq S\},&\text{otherwise.}\end{array}\right.

Then it is clear that $\widetilde{B}_{i}$ is a valid basic vulnerable set corresponding to $\widetilde{A}_{i}$ , for $i\in V$ . If we define $\tilde{g}_{i}:2^{V}\rightarrow\{0,1\}$ as

\displaystyle\tilde{g}_{i}(S)=\left\{\begin{array}[]{lr}1,&\text{ if $B_{i}=\emptyset$ or $\exists T\in\widetilde{B}_{i},T\subseteq S$},\\ 0,&\text{otherwise,}\end{array}\right.

we can easily verify that $\tilde{g}_{i}$ is a valid vulnerable function corresponding to $\widetilde{A}_{i}$ , for $i\in V$ . Further let $\tilde{h}:2^{V}\rightarrow\mathbb{R}_{+}$ as

\tilde{h}(S)=\frac{1}{N}\sum_{i=1}^{N}\tilde{g}_{i}(S).

By Lemma 3, as $\forall i\in V,\widetilde{B}_{i}$ is either empty or only contains singleton sets, we know $\tilde{h}$ is submodular.

Next we investigate the difference between $h$ and $\tilde{h}$ . First, for any $S\subseteq V$ , if $S\notin\cup_{i=1}^{N}A_{i}$ , clearly $h(S)=\tilde{h}(S)=0$ ; if $|S|\leq 1$ , it’s easy to show $h(S)=\tilde{h}(S)$ . Second, for any $S\in\cup_{i=1}^{N}A_{i}$ and $|S|>1$ , by Assumption 3, there are exactly $b$ (omitting the $S$ in $b(S)$ ) nodes whose vulnerable set contains $S$ . Without loss of generality, let us assume the indexes of $b$ nodes are $1,2,\ldots,b$ . Then, for any node $i>b$ , $g_{i}(S)=0,\tilde{g}_{i}(S)=0$ . For node $i=1,2,\ldots,b$ , $g_{i}(S)=1$ , and

\displaystyle\tilde{g}_{i}(S)=\left\{\begin{array}[]{lr}1,&\text{ if $B_{i}=\emptyset$ or $\exists T\subseteq S$, $|T|=1$ and $T\in\widetilde{B_{i}}$},\\ 0,&\text{otherwise.}\end{array}\right.

By Assumption 4, there are at least $\lceil pb\rceil$ (omitting the $S$ in $p(S)$ ) nodes like $j$ s.t. $\tilde{g}_{j}(S)=1$ . Therefore, $h(S)=\frac{b}{N}$ and $\frac{\lceil pb\rceil}{N}\leq\tilde{h}(S)\leq\frac{b}{N}$ . Hence $1-\frac{1}{r}<1\leq\frac{h(S)}{\tilde{h}(S)}\leq\frac{1}{p}<1+\frac{1}{r}$ . ∎

Appendix B Addtional Experiment Details

B.1 Additional Dataset Details

Datasets. We adopt the Deep Graph Library [20] version of Cora, Citeseer, and Pubmed in our experiments. The summary statistics of the datasets are summarized in Table 2. The number of edges does not include self-loops.

Table 2: Summary statistics of datasets.

Dataset	Nodes	Edges	Classes	Features
Citeseer	3,327	4,552	6	3,703
Cora	2,708	5,278	7	1,433
Pubmed	19,717	44,324	3	500

B.2 Additional Experiment Results

In this section, we provide results of more experiment setups.

Comparison to the Random baseline. We first highlight the relative decrease of accuracy between the proposed GC-RWCS strategy ( $L=4$ ) and the Random strategy in Table 3. GC-RWCS is able to decrease the node classification accuracy by up to $33.5\%$ , and achieves a $70\%$ larger decrease of the accuracy than the Random baseline in most cases. As the GC-RWCS and Random use exactly the same feature perturbation and the node selection step of Random does not include any information of the graph structure, this relative comparison can be roughly viewed as an indicator of the attack effectiveness attributed to the structural inductive biases of the GNN models.

Table 3: Accuracy decrease (in

\%

) comparison with clean dataset

	Cora			Citeseer			Pubmed
Method	GCN	JKNetConcat	JKNetMaxpool	GCN	JKNetConcat	JKNetMaxpool	GCN	JKNetConcat	JKNetMaxpool
Threshold 10%
Random	4.3	17.4	17	3.8	12.1	11.5	3.7	9.9	10.3
GC-RWCS	7.1	33.5	32.5	10.0	26.3	25.0	8.4	23.7	25.1
GC-RWCS/Random	165.12%	192.53%	191.18%	263.16%	217.36%	217.39%	227.03%	239.39%	243.69%
Threshold 30%
Random	3.0	15.5	14	2.5	10.2	9.3	3.1	8.5	8.3
GC-RWCS	4.9	27.1	24.7	7.3	23.9	22.5	5.4	16.6	15.7
GC-RWCS/Random	163.33%	174.84%	176.43%	292.00%	234.31%	241.94%	174.19%	195.29%	189.16%

More thresholds and sensitivity analysis with respect to $L$ .. We provide a setup of $20\%$ threshold in addition to the $10\%$ and $30\%$ thresholds shown in Section 4.2, to give a better resolution of the results. And the results of threshold $20\%$ are consistent with other setups. We also conduct a sensitivity analysis of the hyper-parameter $L$ in GC-RWCS in Table 4, and show the results of GC-RWCS with $L=3,4,5,6,7$ . Note that GCN has 2 layers and the JK-Nets have 7 layers. The variations of GC-RWCS results with the provided range of $L$ are typically within $2\%$ , indicating that the proposed GC-RWCS strategy does not rely on the exact knowledge of number of layers in the GNN models to be effective.

Table 4: Summary of the accuracy (in

\%

) when

L=\{3,4,5,6,7\}

. The bold number and the asterisk (*) denotes the same meaning as Table 1. The underline marker denotes the values of GC-RWCS outperforms all the baseline.

	Cora			Citeseer			Pubmed
Method	GCN	JKNetConcat	JKNetMaxpool	GCN	JKNetConcat	JKNetMaxpool	GCN	JKNetConcat	JKNetMaxpool
None	85.6 $\pm$ 0.3	86.2 $\pm$ 0.2	85.8 $\pm$ 0.3	75.1 $\pm$ 0.2	72.9 $\pm$ 0.3	73.2 $\pm$ 0.3	85.7 $\pm$ 0.1	85.8 $\pm$ 0.1	85.7 $\pm$ 0.1
Threshold 10%
Random	81.3 $\pm$ 0.3	68.8 $\pm$ 0.8	68.8 $\pm$ 1.3	71.3 $\pm$ 0.3	60.8 $\pm$ 0.8	61.7 $\pm$ 0.9	82.0 $\pm$ 0.3	75.9 $\pm$ 0.7	75.2 $\pm$ 0.7
Degree	78.2 $\pm$ 0.4	60.7 $\pm$ 1.0	59.9 $\pm$ 1.5	67.5 $\pm$ 0.4	52.5 $\pm$ 0.8	53.7 $\pm$ 1.0	78.9 $\pm$ 0.5	63.4 $\pm$ 1.0	63.2 $\pm$ 1.2
Pagerank	79.4 $\pm$ 0.4	71.6 $\pm$ 0.6	70.0 $\pm$ 1.0	70.1 $\pm$ 0.3	61.5 $\pm$ 0.5	62.6 $\pm$ 0.6	80.3 $\pm$ 0.3	71.3 $\pm$ 0.8	71.2 $\pm$ 0.8
Betweenness	79.7 $\pm$ 0.4	60.5 $\pm$ 0.9	60.3 $\pm$ 1.6	68.9 $\pm$ 0.3	53.5 $\pm$ 0.8	55.1 $\pm$ 1.0	78.5 $\pm$ 0.6	67.1 $\pm$ 1.1	66.1 $\pm$ 1.1
RWCS	79.4 $\pm$ 0.4	71.7 $\pm$ 0.5	70.3 $\pm$ 0.9	69.9 $\pm$ 0.3	62.4 $\pm$ 0.4	63.1 $\pm$ 0.6	79.8 $\pm$ 0.3	70.7 $\pm$ 0.8	70.7 $\pm$ 0.8
GC-RWCS-3	78.6 $\pm$ 0.5	52.1 $\pm$ 1.1*	53.0 $\pm$ 1.9*	64.8 $\pm$ 0.5*	46.4 $\pm$ 0.8*	48.2 $\pm$ 1.0*	78.1 $\pm$ 0.6	62.3 $\pm$ 1.2	61.6 $\pm$ 1.5
GC-RWCS-4	78.5 $\pm$ 0.5	52.7 $\pm$ 1.0*	53.3 $\pm$ 1.9*	65.1 $\pm$ 0.5*	46.6 $\pm$ 0.8*	48.2 $\pm$ 1.1*	77.3 $\pm$ 0.7	62.1 $\pm$ 1.2	60.6 $\pm$ 1.4*
GC-RWCS-5	78.9 $\pm$ 0.5	53.5 $\pm$ 1.1*	54.2 $\pm$ 1.9*	65.3 $\pm$ 0.5*	46.6 $\pm$ 0.8*	48.4 $\pm$ 1.0*	78.4 $\pm$ 0.5	64.2 $\pm$ 1.2	62.5 $\pm$ 1.4
GC-RWCS-6	78.5 $\pm$ 0.5	54.3 $\pm$ 1.1*	54.9 $\pm$ 1.9*	65.5 $\pm$ 0.5*	47.1 $\pm$ 0.8	48.9 $\pm$ 1.1*	78.0 $\pm$ 0.6	63.7 $\pm$ 1.1	62.6 $\pm$ 1.4
GC-RWCS-7	78.1 $\pm$ 0.5	54.2 $\pm$ 1.1*	54.8 $\pm$ 1.9*	66.1 $\pm$ 0.4*	47.5 $\pm$ 0.8	49.3 $\pm$ 1.1*	78.7 $\pm$ 0.5	64.9 $\pm$ 1.2	63.3 $\pm$ 1.3
Threshold 20%
Random	82.3 $\pm$ 0.3	71.7 $\pm$ 1.1	69.8 $\pm$ 1.1	72.1 $\pm$ 0.3	62.1 $\pm$ 0.7	62.6 $\pm$ 0.9	82.6 $\pm$ 0.2	77.9 $\pm$ 0.5	77.5 $\pm$ 0.5
Degree	79.3 $\pm$ 0.4	64.2 $\pm$ 1.2	61.6 $\pm$ 1.3	69.2 $\pm$ 0.4	56.0 $\pm$ 0.8	56.4 $\pm$ 1.0	80.6 $\pm$ 0.4	69.5 $\pm$ 0.8	69.4 $\pm$ 1.0
Pagerank	80.8 $\pm$ 0.3	74.5 $\pm$ 0.8	73.0 $\pm$ 0.8	72.1 $\pm$ 0.3	68.3 $\pm$ 0.3	68.2 $\pm$ 0.4	82.2 $\pm$ 0.2	77.7 $\pm$ 0.4	77.8 $\pm$ 0.4
Betweenness	80.7 $\pm$ 0.4	62.2 $\pm$ 1.4	60.1 $\pm$ 1.4	70.1 $\pm$ 0.4	54.8 $\pm$ 0.8	55.8 $\pm$ 1.1	80.2 $\pm$ 0.4	72.4 $\pm$ 0.8	72.0 $\pm$ 0.7
RWCS	81.4 $\pm$ 0.3	76.8 $\pm$ 0.6	76.0 $\pm$ 0.6	72.4 $\pm$ 0.3	68.9 $\pm$ 0.3	69.0 $\pm$ 0.4	81.3 $\pm$ 0.2	76.0 $\pm$ 0.4	76.5 $\pm$ 0.4
GC-RWCS-3	79.4 $\pm$ 0.5	57.5 $\pm$ 1.6*	53.1 $\pm$ 1.5*	67.1 $\pm$ 0.4*	48.4 $\pm$ 0.9*	49.3 $\pm$ 1.2*	79.0 $\pm$ 0.5*	67.4 $\pm$ 0.9*	66.3 $\pm$ 1.0*
GC-RWCS-4	79.4 $\pm$ 0.5	57.5 $\pm$ 1.7*	53.2 $\pm$ 1.4*	67.3 $\pm$ 0.5*	47.9 $\pm$ 0.9*	48.8 $\pm$ 1.3*	79.0 $\pm$ 0.5*	67.4 $\pm$ 1.0*	66.3 $\pm$ 1.0*
GC-RWCS-5	79.4 $\pm$ 0.5	59.0 $\pm$ 1.7*	54.5 $\pm$ 1.4*	67.3 $\pm$ 0.4*	48.4 $\pm$ 0.9*	49.4 $\pm$ 1.3*	79.2 $\pm$ 0.5*	68.5 $\pm$ 0.9	68.1 $\pm$ 0.9
GC-RWCS-6	79.5 $\pm$ 0.5	59.3 $\pm$ 1.7	54.9 $\pm$ 1.5*	68.1 $\pm$ 0.4*	49.2 $\pm$ 0.9*	50.2 $\pm$ 1.3*	79.1 $\pm$ 0.5*	68.4 $\pm$ 0.9	68.5 $\pm$ 1.0
GC-RWCS-7	79.4 $\pm$ 0.5	59.3 $\pm$ 1.6	55.3 $\pm$ 1.5*	68.1 $\pm$ 0.4*	50.0 $\pm$ 0.9*	50.8 $\pm$ 1.3*	79.2 $\pm$ 0.5*	68.7 $\pm$ 0.9	68.2 $\pm$ 0.8
Threshold 30%
Random	82.6 $\pm$ 0.4	70.7 $\pm$ 1.1	71.8 $\pm$ 1.1	72.6 $\pm$ 0.3	62.7 $\pm$ 0.8	63.9 $\pm$ 0.8	82.6 $\pm$ 0.2	77.3 $\pm$ 0.4	77.3 $\pm$ 0.5
Degree	80.7 $\pm$ 0.4	64.9 $\pm$ 1.4	67.0 $\pm$ 1.5	70.4 $\pm$ 0.4	56.9 $\pm$ 0.8	58.7 $\pm$ 0.9	81.5 $\pm$ 0.4	72.4 $\pm$ 0.7	72.1 $\pm$ 0.8
Pagerank	82.6 $\pm$ 0.3	79.6 $\pm$ 0.4	79.7 $\pm$ 0.4	72.9 $\pm$ 0.2	70.2 $\pm$ 0.3	70.3 $\pm$ 0.3	83.0 $\pm$ 0.2	79.3 $\pm$ 0.3	79.5 $\pm$ 0.3
Betweenness	81.8 $\pm$ 0.4	64.1 $\pm$ 1.3	65.9 $\pm$ 1.4	70.7 $\pm$ 0.3	56.3 $\pm$ 0.8	58.3 $\pm$ 0.9	81.3 $\pm$ 0.3	74.1 $\pm$ 0.5	74.5 $\pm$ 0.5
RWCS	82.9 $\pm$ 0.3	79.7 $\pm$ 0.4	80.0 $\pm$ 0.4	72.9 $\pm$ 0.2	70.2 $\pm$ 0.3	70.4 $\pm$ 0.3	82.1 $\pm$ 0.2	77.8 $\pm$ 0.3	78.4 $\pm$ 0.3
GC-RWCS-3	80.2 $\pm$ 0.6	57.3 $\pm$ 1.7*	59.0 $\pm$ 1.6*	67.9 $\pm$ 0.5*	49.1 $\pm$ 0.9*	50.8 $\pm$ 1.1*	80.3 $\pm$ 0.5*	69.0 $\pm$ 0.7*	69.8 $\pm$ 0.7*
GC-RWCS-4	80.7 $\pm$ 0.5	59.1 $\pm$ 1.6*	61.1 $\pm$ 1.6*	67.8 $\pm$ 0.5*	49.0 $\pm$ 0.9*	50.7 $\pm$ 1.1*	80.3 $\pm$ 0.5*	69.2 $\pm$ 0.7*	70.0 $\pm$ 0.7*
GC-RWCS-5	80.8 $\pm$ 0.5	59.8 $\pm$ 1.6*	61.5 $\pm$ 1.6*	68.4 $\pm$ 0.5*	49.2 $\pm$ 0.9*	51.2 $\pm$ 1.1*	80.2 $\pm$ 0.5*	70.4 $\pm$ 0.6*	71.5 $\pm$ 0.6
GC-RWCS-6	80.7 $\pm$ 0.5	59.8 $\pm$ 1.5*	61.4 $\pm$ 1.5*	68.5 $\pm$ 0.5*	50.5 $\pm$ 0.9*	52.2 $\pm$ 1.1*	80.2 $\pm$ 0.5*	70.5 $\pm$ 0.5*	71.6 $\pm$ 0.6
GC-RWCS-7	80.7 $\pm$ 0.5	60.2 $\pm$ 1.5*	61.9 $\pm$ 1.5*	68.7 $\pm$ 0.5*	50.7 $\pm$ 0.9*	52.6 $\pm$ 1.1*	80.3 $\pm$ 0.4*	70.9 $\pm$ 0.5*	71.9 $\pm$ 0.6

Extra experiments on Graph Attetion Networks (GAT). We further show that the proposed attack strategy GC-RWCS, while being derived from the GCN model, is also able to generalize well on GAT [19]. The results on GAT are shown in the Table 5.

Table 5: Accuracy (in %) on GAT model

Dataset	Cora			Citeseer			Pubmed
threshold	10%	20%	30%	10%	20%	30%	10%	20%	30%
None	87.8 $\pm$ 0.2			76.9 $\pm$ 0.3			85.2 $\pm$ 0.1
Random	72.9 $\pm$ 0.5	73.8 $\pm$ 0.6	73.9 $\pm$ 0.6	70.0 $\pm$ 0.5	71.2 $\pm$ 0.4	71.7 $\pm$ 0.4	73.9 $\pm$ 0.4	75.4 $\pm$ 0.3	76.2 $\pm$ 0.3
Degree	66.5 $\pm$ 0.7	67.3 $\pm$ 0.7	69.8 $\pm$ 0.7	63.3 $\pm$ 0.5	65.9 $\pm$ 0.4	67.9 $\pm$ 0.3	66.7 $\pm$ 0.7	69.0 $\pm$ 0.5	71.2 $\pm$ 0.4
Pagerank	74.3 $\pm$ 0.5	74.8 $\pm$ 0.3	82.4 $\pm$ 0.2	69.5 $\pm$ 0.3	72.9 $\pm$ 0.3	74.2 $\pm$ 0.3	71.6 $\pm$ 0.4	78.1 $\pm$ 0.2	79.1 $\pm$ 0.2
Betweenness	64.8 $\pm$ 0.5	66.0 $\pm$ 0.5	67.3 $\pm$ 0.6	65.2 $\pm$ 0.5	66.5 $\pm$ 0.4	67.6 $\pm$ 0.3	63.4 $\pm$ 0.7	68.4 $\pm$ 0.6	72.0 $\pm$ 0.4
RWCS	71.1 $\pm$ 0.5	74.6 $\pm$ 0.3	82.5 $\pm$ 0.2	69.2 $\pm$ 0.3	72.9 $\pm$ 0.3	73.9 $\pm$ 0.3	69.4 $\pm$ 0.5	74.9 $\pm$ 0.3	77.9 $\pm$ 0.2
GC-RWCS	58.1 $\pm$ 0.6*	57.9 $\pm$ 0.6*	63.0 $\pm$ 0.5*	58.3 $\pm$ 0.6*	61.9 $\pm$ 0.6*	61.9 $\pm$ 0.4*	58.9 $\pm$ 0.9*	63.8 $\pm$ 0.7*	68.9 $\pm$ 0.5*

Extra experiments on a synthetic dataset. We also run experiments on a synthetic dataset to show that, when sufficient domain knowledge regarding the node features is present, the proposed attack strategy can be made effective in a pure black-box fashion. We generate the synthetic dataset as follows. First, we generate a graph using the Barabási-Albert random graph model [1] with $N$ nodes, and denote the adjacency matrix as $A$ . Then we sample $D$ -dimensional node features, $X\in\mathbb{R}^{N\times D}$ , from a multivariate normal distribution and take the absolute value elementwisely. Finally, we generate the node labels as follows. We first calculate $\tilde{Y}=\text{sigmoid}((A+I)XW)$ , for some $W\in\mathbb{R}^{D}$ . Considering a binary classification problem, we make the label $Y_{i}=1$ if $\tilde{Y}_{i}>0.5$ , $0$ otherwise. When attacking a GNN model trained on such a synthetic dataset, we assume the attackers know a couple of “important features” with large weights in $W$ but do not know any of the trained model information. We randomly generate two datasets with $N=3000$ and $D=10$ . And the experiment results are shown in the Table 6. And we find the proposed GC-RWCS model performs well on these two synthetic datasets.

Table 6: Accuracy (in %) on GCN model with synthetic data

Dataset	synthetic_0			synthetic_1
Threshold	10%	20%	30%	10%	20%	30%
None	83.6 $\pm$ 0.3			85.2 $\pm$ 0.4
Degree	77.9 $\pm$ 0.4	77.5 $\pm$ 0.4	79.2 $\pm$ 0.4	77.4 $\pm$ 0.4	79.7 $\pm$ 0.4	79.9 $\pm$ 0.2
Pagerank	76.7 $\pm$ 0.4	78.3 $\pm$ 0.3	79.2 $\pm$ 0.4	78.5 $\pm$ 0.5	79.2 $\pm$ 0.3	80.1 $\pm$ 0.3
Between	76.3 $\pm$ 0.4	80.4 $\pm$ 0.3	79.1 $\pm$ 0.4	78.7 $\pm$ 0.3	80.6 $\pm$ 0.3	80.3 $\pm$ 0.3
Random	79.3 $\pm$ 0.4	80.3 $\pm$ 0.4	81.2 $\pm$ 0.4	82.4 $\pm$ 0.4	79.6 $\pm$ 0.3	82.2 $\pm$ 0.2
RWCS	74.9 $\pm$ 0.5	78.8 $\pm$ 0.4	78.5 $\pm$ 0.3	79.0 $\pm$ 0.4	79.9 $\pm$ 0.4	80.0 $\pm$ 0.3
GC-RWCS	74.0 $\pm$ 0.3*	77.9 $\pm$ 0.5	77.4 $\pm$ 0.4*	77.5 $\pm$ 0.4	78.9 $\pm$ 0.3*	78.4 $\pm$ 0.3*

Diminishing-return effect with respect to $J$ . In Section 4.2, we have empirically shown the diminishing-return effect on the mis-classification rate when strengthening the adversarial perturbation by increasing the perturbation strength $\lambda$ . Here we further validate our observation by showing that a similar effect appears as we strengthen the adversarial perturbation by increasing the number of features to be perturbed, $J$ . The results are shown in Figure 2.

$\displaystyle h(S)$	$\displaystyle=\frac{1}{N}\sum_{i=1}^{N}g_{i}(S)$	(9)
	$\displaystyle=\frac{1}{N}\left\|\{j\in V\mid B_{j}=\emptyset\text{ or }\exists T\in B_{j},T\subseteq S\}\right\|$
	$\displaystyle=\frac{1}{N}\left\|\{j\in V\mid B_{j}=\emptyset\text{ or }\exists\{i\}\in B_{j},i\in S\}\right\|$
	$\displaystyle=\frac{1}{N}\left\|\{j\in V\mid B_{j}=\emptyset\text{ or }\exists i\in S,\{i\}\in B_{j}\}\right\|$
	$\displaystyle=\frac{1}{N}\left(\left\|\cup_{i\in S}e(i)\right\|+\left\|\{j\in V\mid B_{j}=\emptyset\}\right\|\right).$