Spectral Adversarial Training for Robust
Graph Neural Network
Abstract
Recent studies demonstrate that Graph Neural Networks (GNNs) are vulnerable to slight but adversarially designed perturbations, known as adversarial examples. To address this issue, robust training methods against adversarial examples have received considerable attention in the literature. Adversarial Training (AT) is a successful approach to learning a robust model using adversarially perturbed training samples. Existing AT methods on GNNs typically construct adversarial perturbations in terms of graph structures or node features. However, they are less effective and fraught with challenges on graph data due to the discreteness of graph structure and the relationships between connected examples. In this work, we seek to address these challenges and propose Spectral Adversarial Training (SAT), a simple yet effective adversarial training approach for GNNs. SAT first adopts a low-rank approximation of the graph structure based on spectral decomposition, and then constructs adversarial perturbations in the spectral domain rather than directly manipulating the original graph structure. To investigate its effectiveness, we employ SAT on three widely used GNNs. Experimental results on four public graph datasets demonstrate that SAT significantly improves the robustness of GNNs against adversarial attacks without sacrificing classification accuracy and training efficiency.
Index Terms:
Adversarial training, Graph neural networks, Node classification, Network robustness1 Introduction
Graph Neural Networks (GNNs) [1], as powerful architectures for modeling graph-structured data, have attracted enormous attention in both academia and industry in recent years. Unlike Convolutional Neural Networks (CNNs) that operate on Euclidean data, GNNs provide an effective and efficient framework, namely message passing and aggregation, allowing to generalize the CNN architectures to non-Euclidean domains (graphs and manifolds), and learn the relationships (edges) between a set of objects (nodes). Due to their strong representation ability, GNNs have achieved significant success in a variety of graph applications, such as social networks [2], knowledge graphs[3], recommender systems [4], and even life science [5].
Nevertheless, recent efforts show that neural networks are vulnerable to adversarially designed inputs, i.e., adversarial examples [6], and naturally, GNNs inherit such defects. Despite the promising advancement of GNNs, they still suffer from potential security threats in many graph-related tasks. Specifically, a well-trained GNN can be easily fooled by slight but carefully designed perturbations on the graph structure or node features, which try to mislead the model by malicious input instances. For example, attackers may modify (add or remove) a small number of connections (edges), such that the target node could be misclassified [7, 8] or the overall model performance would have a significant degradation [9]. How to resist such adversarial behaviors has so far become a crucial open problem, which motivates researchers to develop more effective methods to enhance the robustness of GNNs.

Adversarial Training (AT) has recently emerged as a general approach to improve the resistance of GNNs against adversarial attacks. As shown in Figure 1, AT simulates the adversarial attacks during the training phase and encourages the smoothness of prediction outputs. It has been proven as an effective data augmentation and regularization technique to enhance the robustness against adversarial examples [10, 11, 12]. Current AT approaches on graph data are typically motivated by the vision research that adds imperceptible perturbations on image data [6]. These existing methods can be summarized into two generic categories: (i) structure-based AT. The graph topology itself is subject to attacks by adversarially injecting or removing edges or nodes, and hence the adversarial examples can likewise be constructed by such modifications. In this way, Xu et al. [13] present the first optimization-based AT, where the adversarial input graph is formed by a PGD attack [13] that greedily leverages the gradients and a projection operation to sequentially modify the nodes’ connections. (ii) feature-based AT. It is challenging to facilitate the difficulty of tackling graph data since one has to maintain their properties, e.g., discreteness111Generally, if we use an adjacency matrix to denote a simple unweighted graph, any values would be meaningless except 0 and 1.. Therefore, existing works mainly manipulate the node input features or hidden representations with continuous perturbations to avoid directly manipulating graph structures. As such, Feng et al. [14] dynamically apply graph perturbations on input node features and encourage the smoothness of prediction outputs via virtual adversarial training. Jin et al. [15] propose latent adversarial training, which constructs adversarial examples by perturbing the latent representations instead of directly perturbing the graph.
Although these works attempt to enhance the robustness of GNNs by employing adversarial training on the graph domain, there still exist several problems to be addressed: (i) For structure-based AT, an immediate obstacle arises from computational overhead. Existing works on generating adversarial examples by structural attacks, e.g., [7] and [16], usually require substantially more costs than standard attacks. Besides, the resulting model jointly trained with a specific attack may not easily generalize to different attacks that have never been seen before. In conclusion, directly applying AT to discrete graph data is inefficient and fraught with challenges. (ii) For feature-based AT, it treats node input features or latent representations as independent of each other, including connected instances. The perturbations are performed on the feature space separately while the relationship between instances is discarded, which would fail to capture the structural information of a graph and generalize to structural attacks. However, it has been proven that structural attack is more powerful than feature attack [17]. Hence, standard adversarial training on feature space is not sufficient to resist stronger structural adversarial attacks (we provide detailed experimental evidence in Section 5.3).
In light of these challenges, in this work, we propose a novel Spectral Adversarial Training (SAT) method to learn robust GNNs against adversarial attacks. SAT enjoys the advantages of structure and feature-based AT, as well as overcoming both of these obstacles based on the matrix perturbation theory. Specifically, SAT first forms a low-rank approximation of the graph structure by spectral decomposition. Since the spectrum characterizes the topological property of the graph, SAT approximates the structure perturbations in terms of the change of the spectrum to facilitate adversarial training. Then, we perform perturbations on the spectrum by treating the eigenvalues and eigenvectors (eigenpairs) as node structural features. The standard AT is applied on the eigenpairs to simulate the structure-based AT in a continuous space. In this regard, SAT additionally incorporates two spectral adversarial training regularizers into the objective function during training, for the sake of enforcing the model to yield a smoothed output even when the adversarial attacks are presented.
The main contributions of this paper are summarized as follows:
-
•
We thoroughly investigate related works and subsequently give a unified formulation of AT on graph data. We systematically categorize existing works related to graph-based AT into structure-based and feature-based methods and shed light on the obstacles to overcome.
-
•
We propose a novel approach Spectral Adversarial Training (SAT), which approximates the low-rank graph by spectral decomposition and employs adversarial training on the graph spectrum to regularize GNN models and improve their robustness against adversarial attacks.
-
•
We demonstrate the robustness and generalization ability of SAT employed on a line of widely used GNNs. Comprehensive experiments on four node classification datasets validate that SAT is capable to improve the model robustness without sacrificing classification power and training efficiency.
The rest of this article is organized as follows. In Section 2, we summarize the related works of GNNs and adversarial defense on graph data. In Section 3, we give notations and preliminaries. In Section 4 and Section 5, we illustrate our proposed method and present the experimental results with sufficient analysis, respectively. Finally, we draw our conclusion and highlight future works in Section 6.
2 Related Works
In this section, we will briefly review the recent advances regarding the GNN model family, and then describe the literature on graph adversarial defense research closely related to this work.
2.1 Graph Neural Networks
GNNs [18, 1] are a class of deep learning models designed to learn representations of data described by graphs. They extend the traditional neural networks to directly operate on the graph-structured data, and provide an elegant way to implement node-level, edge-level, and graph-level prediction tasks [19].
The key to the success of GNNs is the message passing scheme between the nodes and their neighborhoods (a.k.a. neighborhood aggregation), which recursively aggregates the messages (embeddings) based on the adjacency structure of the graph. Following this scheme, Graph Convolutional Networks (GCNs) [20] are proposed to apply convolutional operations on the local neighborhoods of the nodes. GCNs capture the graph structure and also borrow feature information to learn an embedding for each node that contains neighboring information. Wu et al. [21] propose Simple Graph Convolution (SGC) that simplifies GCNs by removing nonlinearities and applying the -th power of graph convolution in a single layer. In a following work, Zhu et al. [22] propose Simple Spectral Graph convolution (S2GC) that generalizes SGC to deeper architectures while limiting over-smoothing issues. For a comprehensive overview we refer readers to a survey [19] on graph neural networks.
Due to the excellent representation ability, GNN-based models have achieved advanced performance on several tasks and play a vital role in machine learning fields. However, recent studies show that GNNs are vulnerable to adversarial attacks [7, 16, 22], particularly deliberate perturbations on the graph structures. The robustness of GNNs has drawn significant attention ever since, which motivates us to design a robust training approach in this work.
2.2 Adversarial Defense on Graph Data
Extensive efforts so far have been dedicated to the design of defense strategies that improve the robustness of GNNs and resist adversarial attacks. They can be generally categorized into three classes [23]: adversarial purification, robust architectures, and adversarial training. As the focus of our work is on adversarial training, we will draw more attention to it in the following. We refer readers to [23, 24] and the references therein for more thorough reviews of adversarial defenses on graphs.
2.2.1 Adversarial Purification
To reduce the effect of adversarial behaviors, a straightforward way of defense is to detect and remove potential perturbations in a graph. Based on this insight, Wu et al. [25] propose to inspect the graph and recover the suspicious connections via Jaccard similarity scores of features between connected nodes. Xu et al. [26] propose a different approach to detect malicious edges with a filter-and-sample framework. Beyond these edge-based methods, Entezari et al. [27] propose to use Singular Value Decomposition (SVD) to form a low-rank approximation of graph adjacency matrix, and thus reduce the adversarial effects.
2.2.2 Robust Architectures
Rather than directly inspecting poisoned graph structures, such as edges or nodes before inputting to the models, several works attempt to design more robust architectures for GNNs. Specifically, Zhu et al. [22] adopt Gaussian distributions as node representations and employ attention mechanisms to dynamically adjust the propagation of adversarial attacks in GNNs. As the vulnerability of GNNs is mostly attributed to the non-robust aggregation scheme [28], Jin et al. [29] propose Similarity Preserving GCN (SimPGCN) that preserves node similarly and adaptively capture the graph structure during the aggregation process. Chen et al. [28] adopt robust aggregation schemes, median and trimmed mean, instead of weighted mean to enhance the robustness of GNNs.
2.2.3 Adversarial Training
While adversarial perturbations are initially used to construct adversarial examples and fool the neural networks, the same sword can be utilized by defenders to improve the robustness of their models against adversarial attacks [23]. Adversarial training (AT) on graph data is inspired by research in computer vision, where the perturbations are constructed in continuous input domains. So far, a variety of works [14, 30, 31] employ virtual adversarial training on node features, to enforce the smoothness of the models’ output distributions against adversarial inputs. In particular, Deng et al. [30] propose batch virtual AT to smooth the outputs of GNNs, Feng et al. [14] devise a dynamic regularizer encouraging GNN models to learn representations that are robust to perturbations. In the ensuing work, Hu et al. [32] design neighbor perturbations to restrict the perturbation directions on node features and prevent the propagation of perturbed neighbors. Jin et al. [31] apply AT on the latent representations of the nodes to improve the robustness and generalization ability of GNNs.
The aforementioned works are typically feature-based AT. However, the structural information has not been fully exploited during training. The structure-based AT perturbs the graph by adding or dropping edges or nodes, which is also an effective way to defend against attacks. Dai et al. [8] propose to randomly drop edges during each training step. Such a simple strategy has shown its effectiveness in improving the GNNs’ robustness. One step further, Xu et al. [13] design an optimization-based AT, where adversarial edges are generated by a topology attack. The presented work has demonstrated the effectiveness of employing structural AT on graph data. To facilitate adversarial training via practical adversarial examples, a growing number of researchers [33, 13] seek to employ attack-oriented perturbations, which are generated based on existing network adversarial attacks of PGD [13] and Metattack [9].
Although these works attempt to handle similar problems, i.e., improving the robustness of GNNs via practical adversarial training, they either require substantial costs to construct adversarial examples in the discrete graph domain (structure-based AT) or discard the graph topology that is crucial for GNNs (feature-based AT). In light of these challenges, our work attempts to employ AT on the graph spectrum. Specifically, it is able to approximate the structure perturbations in terms of the changes of the eigenpairs. Thus, the discrete optimization problem is transformed into a continuous one and the computational overhead is greatly alleviated.
3 Preliminary
In this section, we first give some notations and review the family of GNN models, and then formulate the problem of AT on graph data. For convenience, our frequently used notations and formulas are summarized in Table I.
Notations | Descriptions | ||||
---|---|---|---|---|---|
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
3.1 Notations
Consider an undirected attributed graph that has nodes, with denoting the adjacency matrix and representing the node feature matrix, where is the dimensionality. Without loss of generality, we focus on an attributed, undirected graph in this paper. Formally, we denote the set of nodes as and the set of edges as , respectively. In the context of semi-supervised node classification, each node is associated with a corresponding node label , where is the number of class labels.
3.2 Graph Neural Network Family
3.2.1 Graph Convolutional Network (GCN)
Graph Convolutional Network (GCN) [20] is one of the classical models in graph-based learning, which has been broadly used in many tasks. The purpose of the GCN is to leverage the geometrical properties into the neural network by an iterative message aggregation scheme. Under the setting of semi-supervised node classification, the -th hidden output of GCN can be formulated as follows:
(1) |
Here, is the adjacency matrix of the graph with self-connections, is the identity matrix, and the -th diagonal element of , , is degree of node plus 1. is the trainable weight matrix of layer which learns useful features and transforms the embedding size from to . Here we omit the bias terms in each layer for the sake of simplicity. denotes the activation function, such as ReLU [34] in this paper. Initially, . In particular, a two-layer GCN with ReLU as hidden activation and softmax as output is widely adopted. Thus the semi-supervised classification model can be described as:
(2) |
where is the normalized Laplacian matrix of graph .
3.2.2 Simple Graph Convolution (SGC)
The vanilla GCN recursively but repeatedly computes the aggregated messages from the nodes’ neighborhoods, resulting in excessive computation during training. To reduce the computation overhead, Wu et al. [21] theoretically analyze the architecture of GCN and propose to linearize it. Specifically, the resulting linear model, SGC, removes the nonlinear activation function in the hidden layers and collapses the weight matrices between consecutive layers. Therefore, the output of the GCN is simplified as:
(3) |
where is the redefined weight matrix, and is the number of hidden layers and often set as 2. SGC captures structural information from the -hop neighborhoods of the nodes in the graph, by simply applying the -th power of the transition matrix in a single neural network layer. It is worth mentioning that the term of is usually a constant and can be pre-computed, which reduces the computation overhead during training.
3.2.3 Simple Spectral Graph Convolution (GC)
Inspired by the previous works [21, 35], GC [22] has been recently proposed as a state-of-the-art model. Similar to SGC, GC is a linear learner which seeks to address the issues of over-smoothing and high computational costs encountered in the literature. GC adopts a simple but effective graph filter that aggregates the neighborhoods with various sizes, and is beneficial for capturing the global and local information of each node in the graph. Typically, GC generalizes the output of SGC as:
(4) |
where is a scalar to balance the aggregations of the nodes and their neighborhoods. Similar to SGC, the graph filter can be also pre-computed and the resulting model represents a good trade-off between classification accuracy and computation overhead.
3.3 Adversarial Training Formulation
AT [6] is a regularization technique for standard training and has been demonstrated effective to improve the model robustness against adversarial attacks. Generally, AT can be viewed as a min-max game: a network defender seeks to minimize the training loss on the graph, while the adversary constructs adversarial examples to maximize the training loss. Through alternating the two opposite (minimization and maximization) processes iteratively, the learned model is expected to have better robustness and generalization ability, not only against benign examples that have not yet been seen in the training data, but also against adversarial examples. We extend the formulation of AT in [36], to make it a unified one in the graph setting:
(5) |
Here is the input graph which can be either a well-designed graph to achieve the goal of AT or a clean graph without perturbations. is the perturbed or unperturbed feature representations of the -th layer. In most cases, the perturbations are added on the input features or the first hidden output [31], i.e., ( or . denotes the perturbation function of adversarial examples w.r.t. the given inputs. is a GNN model parameterized by and takes a graph as input. is the set of labeled nodes for training, while and are the labeled node in and the corresponding class label, respectively. is the training loss function, which can be formulated as follows:
(6) |
where is the classification loss that measures the divergence between the prediction and the ground-truth. is an additional regularization term, e.g., regularization on . is the parameter balancing the two terms.
According to Equation (5), the adversarial perturbations can be added on the input graph, including the adjacency structures and the node features , or the hidden representation . Since the hidden representation can be treated as the transformed node features, we group AT with perturbations on and as feature-based AT, and the remaining as structure-based AT.
3.3.1 Structure-based AT Formulation
The structure-based AT mainly focuses on the graph structure denoted by . Unlike on non-graph structured data, e.g., image or text data, AT cannot be directly employed on the graph structure due to its discreteness. Naturally, there must be constraints on the adversarial perturbations, which ensure the discreteness of the graph structure. Specifically, we define the structure-based AT on top of Equation (5), as:
(7) | ||||
where denote the structural perturbations.
Although generating adversarial examples based on randomly dropping is quite cheap to implement, there is little performance gain against adversarial attacks with such a strategy [36]. Currently, the mainstream structure-based AT methods are attack-oriented, that is, the adversarial perturbations are generated by powerful attackers. However, adversarial attacks on graph data are frequently insufficient, and generating such adversarial examples comes at a high cost, which limits their usability.
3.3.2 Feature-based AT Formulation
The feature-based AT generates adversarial examples in a different way. The perturbations are added on the node features, either the original input features or the transformed ones in the latent space. Similar to structure-based AT, we define the feature-based AT as:
(8) |
where and are the perturbations added on the node input feature and the latent representations, respectively. Because the features are often in the continuous domain, the generated perturbations can have any values and are confined by a tiny constant.
Existing feature-based AT methods, however, assume that the nodes are independently distributed across the graph and the relationships between the samples are less considered or even disregarded. As a result, they often fail to facilitate AT on the graph structure and are insufficient to withstand more powerful adversarial attacks, particularly structure-based ones (See Section 5.3).
4 Methodology
In this section, we first introduce the matrix perturbation theory that lays as the foundation of our work, followed by the low-rank approximation of the graph structure. We then outline how we construct adversarial examples based on spectral perturbations, and present the overall objective function and the algorithm for Spectral Adversarial Training (SAT). Finally, we analyze the time and memory complexity of SAT.
4.1 Matrix Perturbation Theory
In the area of complex network analysis, one of the most important topics is to study the robustness of a graph [37, 38, 39]. The robustness of a graph is evaluated by certain metrics under the perturbations in terms of both structural and spectral changes. The structural perturbations include deletion or addition of edges or nodes, and modification of the edge weights; the spectral changes consist of perturbations on the eigenvalues and eigenvectors of the adjacency or Laplacian matrix. There is an intriguing problem to investigate: how does a slight change in the input affects the output?
Formally, given a symmetric matrix and some arbitrary function which operates on , we are interested in understanding how a slight perturbation added to the matrix affects the behavior of . Noticing that a low dimensional representation of can be computed using spectral decomposition, we are able to estimate the changes from the spectrum perspective, i.e., the eigenpairs of . Specifically, we are given a symmetric matrix with its dominant eigenvalues and the corresponding eigenvectors , respectively. If the matrix is modified from to where the perturbation is sufficiently small, we look for and that satisfy the equation:
(9) |
for .
Intuitively, it is straightforward to recompute the perturbed eigenpairs of directly based on spectral decomposition. However, finding the eigenpairs results in high computation overhead when is high-dimensional. Given that the perturbation on is sufficiently small, we wish to estimate the perturbed eigenpairs of based on and its eigenpairs calculated aforehand. According to the matrix perturbation theory [40], the changes of eigenvalues and eigenvectors can be estimated as follows:
(10) | ||||
where and are the -th perturbed eigenvector and eigenvalue of , respectively. is a small constant to avoid the numerical instability.
The matrix perturbation theory shows that the perturbations on are directly related to the eigenpairs, and we are able to update the perturbed eigenpairs of based on and its eigenpairs if the changes are sufficiently small. This is the primary motivation that drives us to go deeper into AT from the spectrum perspective. We concentrate on studying the direct perturbation training in the continuous domain with spectral decomposition and propose an adversarial regularization method to improve the model robustness. We construct adversarial perturbations on the spectrum of the graph during the training phase, rather than simply modifying the discrete graph structure or empirically employing other continuous components to generate adversarial examples.

4.2 Low-Rank Approximation of Graph Structure
Several works [6, 11, 12] on AT have developed methods to add perturbations on image raw features or embeddings that are continuous. Since the graph structure is discrete, previous methods cannot be directly extended to generate structure perturbations for graph neural networks. Inspired by the matrix perturbation theory, one possible way is to transform the discrete optimization problem into a continuous one that can be solved by well-established AT approaches. As illustrated in Section 4.1, the structural perturbations can be viewed as the changes in the eigenvalues and eigenvectors. This motivates us to approximate the structural perturbations in terms of spectral changes.
Firstly, we observe that the normalized Laplacian matrix encodes the topological information of the graph . As is a real symmetric semi-definite matrix, it can be decomposed to a set of orthogonal eigenvectors and associated nonnegative eigenvalues. Thus, we are able to modify the obtained eigenvalues and eigenvectors to generate adversarial examples from the continuous domain.
Suppose that represents the set of dominant eigenvalues and denotes the set of associated eigenvectors. Spectral decomposition is an elegant tool to compute the best rank- approximation of matrix . The rank- approximation of is calculated as follows:
(11) |
where is the rank- approximation of derived from spectral decomposition. As stated by Entezari et al. [27], adversarial attacks, such as NETTACK, tend to affect the high-rank (low-valued) components of the graph. Following this insight, we can use a much smaller to reconstruct a low-rank approximation of the graph structure, where we only choose the largest eigenvalues and the eigenvectors to project the graph into a low-rank subspace. In addition, using the top- largest eigenvalues can ensure that they are non-zero and not close to each other, thus avoiding the numerical instability in Equation (10).
Based on the rank- approximation of , we can reformulate the output of GNNs, by replacing the normalized Laplacian matrix with . In particular, the outputs of SGC and S2GC are respectively written as:
(12) | ||||
where we simplify the computation of into using the properties of eigenpairs of a symmetric matrix.
Observe that the outputs of SGC and S2GC are now substantially easier to compute. One can easily compute even is larger. It is worth noting that since the attackers tend to perturb the high-rank components of the graph spectrum, this simplification does not negatively affect the performance. Instead, it benefits to improve the robustness of GNNs and effectively reduce the computation overhead.
4.3 Generating Adversarial Examples
A straightforward way to generate the adversarial examples is applying gradient ascent on the inputs. Specifically, we treat the eigenvalues and eigenvectors as hyperparameters to compute their gradients and construct the perturbations in terms of and independently222We omit the subscript in the rest of the paper for brevity.. Let denote the two types of perturbations, representing the changes on and , respectively. Based on Equation (5), the objective function of spectral AT could be defined as:
(13) |
Although our proposed SAT focuses on the graph structure, it generates adversarial examples from a continuous perspective and can avoid directly tackling the graph structure. We devise two types of spectral adversarial training employed on the eigenvalues and the eigenvectors, respectively. The resulting adversarial loss and can be formulated as follows:
(14) | ||||
Here is the ground-truth distribution of output labels and denotes the predicted distribution by GNNs. represents the node features. In this paper, we choose a popular loss function, i.e., cross-entropy (CE) loss, as for demonstration. Inspired by the linear approximation method [6] for standard AT, the continuous perturbations and are approximated as follows:
(15) | |||
Note that and are hyperparameters controlling the magnitudes of perturbations on eigenvector and eigenvalue perturbations, respectively. Typically, and are small values to ensure the generated adversarial examples are “unnoticeable”. Figure 2 presents the framework of AT for various types of perturbations, including traditional AT and our proposed SAT. SAT focuses on the graph structure input but allows continuous adversarial perturbations on the eigenpairs, incorporating the structural information of the graph and further improving the effectiveness of AT.
4.4 Objective Function Definition
For AT of GNNs, two adversarial regularization terms are added to the original loss to help improve the robustness against the adversarial perturbations. Then the final objective function is defined as follows:
(16) |
where and are two trade-off hyperparameters adopted to balance the regularization terms. is defined in Equation (6) where is the classification loss and is the regularization loss on . By jointly optimizing the classification loss and the regularization terms, we enforce the smoothness between the original graph and the adversarial graph, therefore improving the model’s robustness.
Algorithm 1 summarizes the details of our proposed SAT method. Technically, the proposed approach is generalized to most GNNs by replacing the adjacency matrix with its rank- approximation based on spectral decomposition. The obtained eigenpairs that lie in the continuous space can be treated as low-dimensional embeddings of the graph. For each training loop, the perturbations on eigenvalues and eigenvectors are approximated based on Equation (15) to facilitate AT. Finally, the resulting model is expected to have high robustness against adversarial attacks while maintaining high classification accuracy.
Input: graph , adversarial regularization parameters and , norms of perturbations and , approximation rank ;
Output: model parameter ;
4.5 Time and Memory Complexity
Time complexity. Compared to standard training of GNNs, SAT additionally introduces two adversarial regularizers that promote the smoothness of GNNs over the given graph and make GNNs more robust against adversarial perturbations. The additional computation of SAT is twofold: (i) adversarial perturbations and calculated with Equation (15), and (ii) adversarial regularizers and calculated with Equation (14). Firstly, calculating and requires one set of forward-propagation and back-propagation based on the linear approximation, where the forward-propagation has already been done and the result can be reused from the model. Secondly, SAT computes the two adversarial regularizers as additional smoothness constraints, respectively, which requires two forward propagations. Considering that we use the rank- approximation of the graph where , the overall computation will be greatly reduced and the overhead of SAT is acceptable.
Memory complexity. SAT requires additional memory to store the adversarial perturbations and . However, the additional memory is insignificant since . Furthermore, the original graph structure denoted by is projected into a low-dimensional space using the rank- approximation, and it also reduces the time and memory required for optimization. Overall, the memory complexity of our algorithm is still dominated by the scale of the dataset (i.e., ). As a result, the memory usage of SAT is entirely acceptable in most cases.
5 Experiments
5.1 Experimental Settings
5.1.1 Datasets
To evaluate the model performance and robustness on the semi-supervised node classification task, we conduct comprehensive experiments on four widely used datasets: Cora, Cora-ML, Citeseer, and Pubmed [41, 42]. We follow the previous work [7, 43] and randomly split each dataset into training (10%), validation (10%), and testing (80%) sets. The statistics of datasets are summarized in Table II. Here only the largest connected component of each graph is considered, same as [7, 43].
Cora | Cora-ML | Citeseer | Pubmed | |
---|---|---|---|---|
#Nodes | 2,485 | 2,810 | 2,100 | 19,717 |
#Edges | 5,069 | 7,981 | 3,668 | 44,324 |
#Features | 1,433 | 2,879 | 3,703 | 500 |
#Classes | 7 | 7 | 6 | 3 |
Avg. Degree | 4.07 | 5.68 | 3.48 | 4.50 |
5.1.2 Baselines
To comprehensively compare our proposed approach with several state-of-the-art methods, we consider the following four categories of methods: (i) Random and GCN-PGD from structured-based AT methods; (ii) BVAT, GraphVAT, and LAT from feature-based AT methods; (iii) Jaccard and SVD from adversarial purification methods and (iv) SimPGCN and MedianGCN from methods with robust architectures.
-
•
Random [8]: it is the simplest structure-based AT that randomly drops edges during each training step to generate adversarial examples.
-
•
GCN-PGD [13]: it is an attack-oriented AT that generates adversarial examples on the discrete graph structure based on projected gradient descent (PGD) attack.
-
•
BVAT [30]: it is a batch virtual AT that promotes the smoothness of GNNs by generating virtual adversarial perturbations.
-
•
GraphVAT [14]: it proposes a dynamic regularization technique based on virtual AT, to smooth the output and learn a robust GNN over adversarial examples.
-
•
LAT [31]: it proposes to employ AT on the first latent representation of GNNs and make GNNs more robust against adversarial perturbations.
-
•
Jaccard [25]: since attackers tend to connect nodes with large differences to achieve a better attack effect, it uses Jaccard similarity to preprocess the graph and filter out edges between nodes with low feature similarity.
-
•
SVD [27]: it adopts a truncated SVD to form a low-rank approximation of graph adjacency matrix to alleviate the adversarial behaviors. SVD is similar to our method, but it does not incorporate adversarial training into the learning phase.
-
•
SimPGCN [44]: it is an improved GNN with a feature similarity preserving aggregation, which effectively preserves node similarity while exploiting graph structure.
-
•
MedianGCN [28]: it is a state-of-the-art defense method, adopting median as the aggregation function of GNNs to enhance their robustness.
5.1.3 SAT Setup
For the proposed SAT approach, we have five important hyperparameters to tune: (i) and : the scales of adversarial perturbations on eigenvectors and eigenvalues, respectively; (ii) and : the weights for the two graph adversarial regularizers; (iii) : the rank of the approximated graph. Particularly, we perform grid-search within a range of values and carefully tune the hyperparameters to balance classification power and model robustness. In this experiment, we set , for all datasets, while for small-scale datasets Cora, Cora-ML, and Citeseer, and for large-scale dataset Pubmed (see detailed discussion in Section 5.5). We employ SAT on three well-established GNNs, i.e., GCN, SGC and GC detailed in Section 3.2. The resulting models are named GCN-SAT, SGC-SAT, and GC-SAT, respectively. For SGC and S2GC, is set as 2 and 5 across all datasets, respectively. Additionally, we set for S2GC. The number of training epochs is set to 100 and Adam optimizer [45] is adopted to learn the model, with an initial learning rate of 0.01, 0.2, and 0.01 for GCN-SAT, SGC-SAT, and S2GCN-SAT, respectively. It is worth noting that SAT is not limited to the above-mentioned three GNNs and that the technique can be straightforwardly generalized for other GNNs.
5.1.4 Baseline Setup
For Random, we set the edge dropping ratio as 0.5 during training; For GCN-PGD, we use PGD attack [13] to generate adversarial examples, with perturbations rate set as 5% of the edges and the optimization step set as 10 in each epoch. For BVAT, GraphVAT and LAT, we tune the scale of adversarial perturbations on the (hidden) features in the range {0.01, 0.03, 0.05, 0.07, 0.1} to achieve the best performance. For Jaccard and SVD, we empirically tune the threshold of dropping dissimilar edges from {0, 0.01, 0.05, 0.08, 0.1} and the approximation rank from 30 to 500. We finally set the threshold as 0.01 and the approximation rank the same as SAT across all datasets. All the baselines are trained with Adam optimizer with an initial learning rate of 0.01. We train them for 200 epochs with early stopping, where we stop training if there is no further improvement on the validation accuracy during 50 epochs. All these baselines use GCN as the backbone model according to the authors’ implementations. For other hyperparameters, we empirically follow the default parameters in the original literature and fine-tune the model to achieve the best performance.
5.1.5 Evaluation Protocol
To evaluate the model robustness, we report the accuracy on the test set under the adversarial targeted attacks NETTACK [7] and PGD333Note that PGD is an untargeted attack method, we adapt the code provided by the authors to fit the targeted attack setting. [13], for which 1,000 nodes are randomly selected as targeted nodes from the testing set. For PGD attack, the iteration step is set as 100 to achieve the best performance. Both methods take GCN as the surrogate model to conduct attacks. We employ direct attacks where the perturbations are directly applied to connections of targeted nodes. We exactly follow the experimental settings in [43], which focus on the evasion attack scenario. Specifically, the perturbation budget is set as the node degrees. For all methods, we report classification accuracy averaged over 10 runs with different initial weights. We implement SAT with open-sourced software GraphGallery [46]. Code to reproduce the experiments is publicly available at https://github.com/EdisonLeeeee/SAT.
Cora | Cora-ML | Citeseer | Pubmed | |
---|---|---|---|---|
GCN | 86.6±0.3 | 86.0±0.2 | 71.2±0.4 | 85.6±0.2 |
w / SAT | 85.2±0.2 | 86.1±0.2 | 72.4±0.4 | 85.8±0.1 |
w / SVD | 78.0±1.1 | 77.1±0.8 | 68.6±1.2 | 80.4±0.6 |
SGC | 84.7±0.1 | 85.4±0.2 | 72.5±0.5 | 84.7±0.4 |
w / SAT | 84.5±0.3 | 85.8±0.6 | 72.9±0.3 | 85.4±0.3 |
w / SVD | 76.9±1.6 | 77.8±0.8 | 68.1±1.4 | 79.7±0.9 |
S2GC | 87.0±0.3 | 86.4±0.5 | 73.6±1.0 | 85.2±0.1 |
w / SAT | 85.7±0.5 | 85.5±0.3 | 73.4±0.6 | 85.4±0.2 |
w / SVD | 78.7±0.9 | 78.5±0.7 | 68.9±1.6 | 81.2±0.9 |

5.2 Standard GNNs with SAT
5.2.1 Performance on Clean Data
To empirically validate the effectiveness of our proposed SAT, we first investigate if the proposed algorithm will hinder the performance of standard GNNs on the clean datasets, i.e., the unperturbed datasets. In Table III, we report the classification results of three GNNs, GCN, SGC, and S2GC, on which standard training, SAT and SVD defense are employed, respectively. From the results, we have the following observations:
-
•
Although SAT adopts the rank- approximation of the graph structure and jointly optimizes the model with adversarial perturbations on the graph spectrum, the classification performance is not negatively affected in most cases. Note that SAT approximates the graph structure in a low-dimensional space, and thus leads to certain information loss. In general, the highest degradation is less than 2%, which is acceptable in practice.
-
•
Compared to SVD, a similar defense strategy that simply employs SVD as a low-rank preprocessing on the input graph, SAT achieves a better performance on the clean graphs. This suggests that although utilizing low-rank reconstruction would sacrifice the performance of the model on the graph, combining it with adversarial training can largely reduce such negative effects.
-
•
SAT is also beneficial to promoting the generalization performance of GNNs. For instance, the results on SGC with SAT have demonstrated the effectiveness, in which the classification accuracy has a slight improvement in all datasets. It indicates that slight perturbations indeed help to improve the generalization ability, which is consistent with previous studies [14, 30, 32].
5.2.2 Robustness against Adversarial Attacks
We further investigate the robustness of GNNs with SAT against adversarial attacks. We plot the results in Figure 3. The experimental results are obtained by conducting NETTACK and PGD, two strong adversarial structural attacks, on the targeted nodes with direct attack setting. According to the results, we have the following observations:
-
•
Different GNNs show various robustness against adversarial attacks. Specifically, SGC slightly underperforms GCN since it drops the nonlinearity between hidden layers, and a linear model is more vulnerable against attacks [6]. We can also find that S2GCN outperforms GCN and SGC across the four datasets. The main reason for this might be that NETTACK and PGD both take GCN as a surrogate model, leading to better attack performance on GCN than its linear variant SGC.
-
•
A high level of robustness against the adversarial attack is achieved for GNNs with SAT. As can be seen, SAT remarkably improves the robustness of standard GNNs across all cases. We can find that the improvement of SAT is significant under both attacks, particularly PGD. Overall, the results affirmatively validate the claim that SAT does contribute to improving the robustness of GNNs. More specifically, the highlight of our approach is that SAT can greatly improve the robustness of GNNs while still maintaining classification accuracy.
NETTACK | PGD | ||||||||
Method | Cora | Cora-ML | Citeseer | Pubmed | Cora | Cora-ML | Citeseer | Pubmed | |
Standard GNNs | GCN | 5.3±0.9 | 6.8±0.8 | 6.3±1.2 | 3.4±0.3 | 7.0±0.8 | 8.2±0.9 | 7.4±1.2 | 4.6±0.5 |
Structure-based AT | Random | 6.5±1.7 | 7.3±1.5 | 7.0±1.9 | 4.2±1.4 | 10.2±1.6 | 9.5±1.6 | 11.1±1.7 | 10.0±0.3 |
GCN-PGD | 10.1±1.6 | 9.8±1.5 | 11.0±1.5 | 8.6±1.4 | 45.2±1.7 | 50.4±1.7 | 53.5±1.9 | 62.3±1.5 | |
Feature-based AT | BVAT | 12.3±1.4 | 11.5±1.5 | 10.9±1.5 | 9.6±1.3 | 23.6±1.5 | 19.4±1.6 | 28.3±1.6 | 15.7±1.4 |
GraphVAT | 10.4±1.5 | 11.1±1.6 | 10.7±1.7 | 9.3±1.3 | 20.7±1.6 | 22.4±1.6 | 30.3±1.5 | 16.3±1.3 | |
LAT | 6.7±1.7 | 7.4±1.7 | 6.0±1.9 | 4.4±1.5 | 18.8±1.8 | 17.0±1.7 | 25.4±1.8 | 14.7±1.6 | |
Other Defenses | Jaccard | 39.2±1.2 | 31.1±1.3 | 23.5±1.3 | 54.2±1.1 | 37.3±1.2 | 28.8±1.4 | 22.7±1.4 | 40.8±1.2 |
SVD | 67.1±1.4 | 68.3±1.4 | 64.7±1.5 | 77.5±1.3 | 55.4±1.4 | 58.7±1.5 | 60.2±1.4 | 70.9±1.3 | |
SimPGCN | 22.3±1.2 | 34.5±1.0 | 32.4±1.3 | 45.6±1.2 | 30.4±1.5 | 39.3±1.3 | 41.6±1.2 | 47.9±0.9 | |
MedianGCN | 31.7±1.5 | 43.3±1.4 | 42.9±1.5 | 59.8±1.2 | 42.6±1.3 | 50.2±1.3 | 55.4±1.6 | 59.2±1.5 | |
Ours | GCN-SAT | 69.5±1.3 | 70.3±1.5 | 67.1±1.5 | 82.0±1.4 | 69.6±1.6 | 68.3±1.5 | 67.0±1.5 | 75.8±1.5 |
SGC-SAT | 68.7±1.6 | 69.0±1.5 | 66.9±1.7 | 80.8±1.3 | 69.0±1.5 | 66.3±1.7 | 66.7±1.7 | 74.7±1.4 | |
GC-SAT | 72.5±1.5 | 71.6±1.6 | 68.5±1.6 | 82.6±1.5 | 70.2±1.4 | 71.9±1.7 | 65.1±1.8 | 77.0±1.5 |
5.3 Robustness Comparison with Baselines
In Section 5.2, we empirically show the performance of different GNNs employed with SAT on four datasets. To further validate the robustness of our proposed approach, we adopt GCN-SAT, SGC-SAT, and S2GC-SAT algorithms compared with multiple baselines for semi-supervised node classification on four datasets. The experimental results are detailed in Table IV. It should be noted that while the majority of the defense methods use GCN as the base model, we also report the result of GCN as a comparison baseline. In what follows, we summarized our major observations from the results.
GCN performs worst in all cases. The result is expected since the vulnerability of GNNs has been demonstrated in the literature. This points to an urgent need to improve the robustness of GNNs or design a more practical defense strategy. Random, as a simple structure-based AT method, slightly improves GCN’s robustness by randomly dropping a subset of edges during training. The key to making this simple strategy work is that attackers prefer to add malicious edges rather than remove them, according to [36]. GCN-PGD is an attack-oriented AT method that outperforms Random in terms of robustness. For PGD attack, GCN-PGD shows better robustness since it generates adversarial perturbations based on the attacks that are essentially similar. However, it is less sufficient to resist a targeted attack, NETTACK. The experimental results highlight the issue that the attack-oriented method has poor generalization ability to resist an unseen attack.
Compared to structure-based AT, the feature-based AT methods, especially BVAT and GraphVAT, achieve better robustness against NETTACK. The results show that applying virtual AT to node features improves the robustness of GNNs. Surprisingly, LAT performs the worst in most cases, and its robustness even suffers a slight drop in Citeseer when compared to the original GCN. We believe that AT on the hidden representation is ineffective for defending against structural perturbations on inputs. Furthermore, the results support our claim that feature-based AT is insufficient to resist structural attacks because it ignores structural information and treats feature perturbations independently during training.
Beyond AT-based methods, different heuristics seek to improve the robustness of GNNs from various perspectives. Among these methods, MedianGCN and SimPGCN show fairly robust performance, demonstrating the efficacy of improving the robustness aggregation mechanism of GNNs. In addition, Jaccard and SVD share similar motivation by preprocessing the input graph before training, which achieves better performance than most AT-based methods. More specifically, SVD is the most effective baseline method that outperforms other baselines in all cases. However, SVD is less effective when an adaptive attack, PGD, is performed.
For our proposed SAT approach, the three models outperform other baselines in most cases. As can be seen, joint training with SAT can significantly improve GNNs’ robustness by providing a simple way to resist adversarial attacks. As the only difference between GCN-SAT and GCN is applying the proposed SAT, the improvement is attributed to the proposed SAT, which greatly improves GCN’s robustness. In particular, S2GC-SAT boosts the performance most significantly and establishes state-of-the-art results across four datasets. The results verify that SAT is beneficial for GNNs’ robustness.
Overall, we can conclude that the existing methods for improving GNNs’ robustness through graph AT may be insufficient to resist powerful attacks like NETTACK. Our work suggests a way to improve the AT scheme from the spectral domain, which is beneficial and meaningful for future research.

5.4 Training Time Comparison with AT-based Methods
In general, a good defense method has three characteristics: (i) less compromise on clean performance, (ii) higher robustness against adversarial attack, and (iii) acceptable training time. Clearly, SAT can help GNNs improve their robustness without sacrificing classification accuracy. In Section 4.5, we briefly and theoretically analyze the time complexity of SAT. We now consider the practical running time of the algorithm. In Figure 4, we empirically estimate and compare the time consumption of GCN-SAT and other AT-based methods. The results are obtained on a machine with one NVIDIA RTX 2080Ti GPU and an Intel Core i9-9900X CPU (3.50GHz).
For these baselines, we have the following observations. (i) Random has little impact on the training efficiency of GCN since it is a random drop trick. (ii) BVAT and GraphVAT adopt a similar scheme, i.e., virtual AT, and therefore have similar performance in running time. (iii) LAT and GCN-PGD require a set of inner iterations to optimize the adversarial perturbations during each training step, resulting in dramatically high overheads compared with other methods.
According to Figure 4, the running time of GCN-SAT is slightly higher than Random, but much lower than other methods on Cora, Cora-ML, and Citeseer. As there are only two additional forward-propagations and one additional back-propagation, the training algorithm efficiency is not a problem in practical applications. In addition, the low-rank approximation benefits the scalability of SAT, which makes it capable of tackling the challenge of AT with large-scale datasets. In particular, GCN-SAT still shows acceptable running time in Pubmed. Generally, the results are consistent with our theoretical analysis, which demonstrates the efficiency of our proposed approach. Overall, the computation overhead for SAT is acceptable in most cases, and the training efficiency has not been negatively affected even when dealing with larger datasets such as Pubmed.


5.5 Ablation Study
5.5.1 Parameter Sensitivity w.r.t. , , and
Recall that SAT has a set of hyperparameters: , , , , and , we next study the effects of different values of hyperparameters on the performance of the proposed SAT. We take GCN employed with SAT as our base model in the ablation study.
We first investigate the effects of , , , and , a GCN employed with SAT as our base model in the ablation study. Note that we only select GCN to tune the hyperparameters as SGC and S2GC yield similar trends, and the results are omitted for brevity. Figure 5 shows the model performance on Cora and Cora-ML, including the accuracy on the clean graph and the robustness against NETTACK on the perturbed graph. We use clean accuracy (clean Acc.) to refer to standard test accuracy (no adversarial perturbations) while adversarial accuracy denotes accuracy under attacks. For each hyperparameter, we vary the value in a range while the other hyperparameters are fixed with optimal values.
Firstly, and control the magnitudes of perturbations on eigenvector and eigenvalue perturbations, respectively. In Figure 5, the clean accuracy and adversarial accuracy of GCN-SAT first increase and then decrease as and are increased from 0 to 0.5. It is reasonable since small perturbations may be beneficial for the model generalization ability, while larger perturbations would have the opposite effect. For the model robustness, enlarging consistently leads to better robustness, but overlarge would degrade the performance. Considering both clean accuracy and model robustness together, we set as optimal values.
Secondly, and are two trade-off hyperparameters to balance the regularization terms. Observed from Figure 5, it is clear that both clean accuracy and model robustness achieve an increase in performance when and , which means that the two regularizers are important for training. However, with the growth, the overall performance of the model would be negatively affected. Taking both clean accuracy and model robustness into account jointly, we set as optimal values.
5.5.2 Parameter Sensitivity w.r.t. Approximation Rank
In addition to the above hyperparameters, is another important one to adjust for the model. The graph is approximated by the eigenpairs, where determines the richness of structural information. As demonstrated in [27], imposing high-rank changes to the graph can be greatly alleviated by discarding the high-rank components of the graph. Following this insight, we adjust to achieve better classification performance as well as high robustness.
To demonstrate the importance of different , we run an ablation study by varying in four datasets w.r.t. NETTACK, and the results are displayed in Figure 6. As can be seen, with different values of , GCN-SAT has a highly different sensitivity on the scale of the graph. In Cora, Cora-ML, and Citeseer, of which the graph is relatively small, we can see that a small number of is sufficient to capture the structural information and model the relationship between the nodes. Typically, a relatively good clean performance is achieved for and on the above three datasets. For a large-scale graph in Pubmed, a large is required to approximate the graph structure to avoid underfitting. The clean accuracy has a significant increase when is increased from 50 to 350.
As a coin has two sides, increasing has limitations as well as advantages. A large will lead to low robustness of the model since the perturbations are often high-rank [27]. As shown in Figure 6, the robustness of the model is decreased, despite the classification accuracy being increased with larger . The results in Cora demonstrate a clear trend especially when is greater than 90. For the Pubmed dataset, the robustness of the model could benefit from a relatively large as the classification performance has simultaneously improved. But when is greater than 250, the robustness is also slightly degraded due to being affected by the high-rank perturbations.
In addition to the model robustness, training efficiency is also important. A larger indicates more computational overhead is required in the two additional forward-propagations. Therefore, a trade-off needs to be made for our proposed approach. Thus, we choose for Cora, Cora-ML and Citeseer, while for Pubmed.
5.5.3 Method Ablation
Cora | Cora-ML | Citeseer | Pubmed | |
---|---|---|---|---|
Clean | ||||
GCN | 86.6±0.3 | 86.0±0.2 | 71.2±0.4 | 85.6±0.2 |
GCN-SAT | 84.7±0.2 | 86.2±0.2 | 73.2±0.4 | 85.5±0.1 |
GCN-SAT () | 78.3±1.5 | 77.4±0.9 | 67.9±1.5 | 79.8±0.9 |
SimPGCN-SAT | 86.9±0.7 | 87.1±0.5 | 72.4±0.6 | 86.4±0.5 |
NETTACK | ||||
GCN | 5.3±0.9 | 6.8±0.8 | 6.3±1.2 | 3.4±0.3 |
GCN-SAT | 69.5±1.3 | 70.3±1.5 | 67.1±1.5 | 82.0±1.4 |
GCN-SAT () | 66.5±1.3 | 67.8±1.5 | 65.1±1.5 | 77.0±1.4 |
SimPGCN-SAT | 68.9±1.1 | 70.1±1.5 | 73.1±1.5 | 79.9±1.2 |
PGD | ||||
GCN | 7.0±0.8 | 8.2±0.9 | 7.4±1.2 | 4.6±0.5 |
GCN-SAT | 69.6±1.6 | 68.3±1.5 | 67.0±1.5 | 75.8±1.5 |
GCN-SAT () | 56.1±1.2 | 58.0±1.0 | 61.4±1.5 | 69.6±1.7 |
SimPGCN-SAT | 71.6±1.2 | 70.5±1.0 | 69.3±1.1 | 75.7±1.7 |
To evaluate the contribution of low-rank approximation and spectral adversarial training, we first conduct ablation studies on SAT built upon GCN with , i.e., performing SAT with only low-rank approximation. As shown in Table V, the clean accuracy of SAT has decreased without adversarial training. When adversarial attacks are performed, the robustness of GCN is still improved by SAT even without incorporating spectral adversarial training. In this regard, it achieves a similar performance with SVD. Indeed, the low-rank approximation contributes significantly to the robustness of GNNs, as suggested in [27]. When incorporating with the spectral adversarial training, the performance of GCN-SAT can be also boosted against attacks, especially the adaptive attack PGD. Overall, the results reveal that although the low-rank approximation is important for improving the adversarial robustness of GNNs, the adversarial training can improve the generalization performance of GNNs and is beneficial for defending against adaptive attacks such as PGD.
In addition, we can also observe from Table V that SAT can also benefit SimPGCN, an improved GNN with robust architecture. The results demonstrate that combining different strategies with SAT further improves the model performance. Therefore, we believe that SAT can work as a general defense strategy when applied to different GNNs.
6 Conclusion
We present the SAT method for improving GNN’s resilience to adversarial perturbations, with a focus on the task of node classification. By studying the matrix perturbation theory, we approximate the low-rank graph structure by spectral decomposition and perform adversarial training on the eigenvalues and eigenvectors. In this way, we devise two regularizers to smooth the output distribution of the model and improve its performance. Extensive and rigorous experiments on four datasets demonstrate the effectiveness of our approach, showing that SAT can significantly improve the robustness of GNNs and outperform state-of-the-art competitors by a large margin. Furthermore, the SAT approach is generic to be employed on most GNN models without sacrificing classification power and training efficiency.
Studying the robustness of GNNs is an important problem, and this work provides essential insights for further research. In the future, we plan to study tasks beyond node classification and extend our research to other graph structures such as multiple and heterogeneous information graphs. Moreover, we would like to explore specific attacks such as untargeted attacks, poisoning attacks, and backdoor attacks, and investigate whether SAT is beneficial to improving the resistance of GNNs in such scenarios. These works, we believe, will be significant and influential in the future.
Acknowledgments
The research is supported by the Key-Area Research and Development Program of Guangdong Province (2020B010165003), the Guangdong Basic and Applied Basic Research Foundation (2020A1515010831), the Guangzhou Basic and Applied Basic Research Foundation (202102020881), and CCF-AFSG Research Fund (20210002).
References
- [1] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun, “Graph neural networks: A review of methods and applications,” AI Open, vol. 1, pp. 57–81, 2020.
- [2] W. Fan, Y. Ma, Q. Li, Y. He, Y. E. Zhao, J. Tang, and D. Yin, “Graph neural networks for social recommendation,” in WWW. ACM, 2019, pp. 417–426.
- [3] Z. Zhang, F. Zhuang, H. Zhu, Z. Shi, H. Xiong, and Q. He, “Relational graph neural network with hierarchical attention for knowledge graph completion,” in AAAI, vol. 34, no. 05, 2020, pp. 9612–9619.
- [4] Y. Liu, C. Liang, X. He, J. Peng, Z. Zheng, and J. Tang, “Modelling high-order social relations for item recommendation,” IEEE Transactions on Knowledge and Data Engineering (TKDE), pp. 1–1, 2020.
- [5] M. Li, J. Zhou, J. Hu, W. Fan, Y. Zhang, Y. Gu, and G. Karypis, “Dgl-lifesci: An open-source toolkit for deep learning on graphs in life science,” arXiv preprint arXiv:2106.14232, 2021.
- [6] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in ICLR (Poster), 2015.
- [7] D. Zügner, A. Akbarnejad, and S. Günnemann, “Adversarial attacks on neural networks for graph data,” in KDD. ACM, 2018, pp. 2847–2856.
- [8] H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song, “Adversarial attack on graph structured data,” in ICML. PMLR, 2018, pp. 1123–1132.
- [9] D. Zügner and S. Günnemann, “Adversarial attacks on graph neural networks via meta learning,” in ICLR, 2019.
- [10] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” in ICLR, 2017.
- [11] T. Miyato, A. M. Dai, and I. J. Goodfellow, “Adversarial training methods for semi-supervised text classification,” in ICLR (Poster), 2017.
- [12] X. He, Z. He, X. Du, and T.-S. Chua, “Adversarial personalized ranking for recommendation,” in SIGIR. ACM, 2018, pp. 355–364.
- [13] K. Xu, H. Chen, S. Liu, P. Chen, T. Weng, M. Hong, and X. Lin, “Topology attack and defense for graph neural networks: An optimization perspective,” pp. 3961–3967, 2019.
- [14] F. Feng, X. He, J. Tang, and T.-S. Chua, “Graph adversarial training: Dynamically regularizing based on graph structure,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 6, pp. 2493–2504, 2021.
- [15] H. Jin and X. Zhang, “Latent adversarial training of graph convolution networks,” in ICML Workshop on Learning and Reasoning with Graph-Structured Representations, 2019.
- [16] D. Zügner and S. Günnemann, “Certifiable robustness and robust training for graph convolutional networks,” in KDD. ACM, 2019, pp. 246–256.
- [17] H. Tang, G. Ma, Y. Chen, L. Guo, W. Wang, B. Zeng, and L. Zhan, “Adversarial attack on hierarchical graph pooling neural networks,” CoRR, vol. abs/2005.11560, 2020.
- [18] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE transactions on neural networks, vol. 20, no. 1, pp. 61–80, 2008.
- [19] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Trans. Neural Networks Learn. Syst., vol. 32, no. 1, pp. 4–24, 2021.
- [20] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in ICLR (Poster), 2017.
- [21] F. Wu, A. H. S. Jr., T. Zhang, C. Fifty, T. Yu, and K. Q. Weinberger, “Simplifying graph convolutional networks,” in ICML, vol. 97. PMLR, 2019, pp. 6861–6871.
- [22] D. Zhu, Z. Zhang, P. Cui, and W. Zhu, “Robust graph convolutional networks against adversarial attacks,” in KDD, 2019, pp. 1399–1407.
- [23] L. Sun, J. Wang, P. S. Yu, and B. Li, “Adversarial attack and defense on graph data: A survey,” CoRR, vol. abs/1812.10528, 2018.
- [24] L. Chen, J. Li, J. Peng, T. Xie, Z. Cao, K. Xu, X. He, and Z. Zheng, “A survey of adversarial learning on graph,” arXiv preprint arXiv:2003.05730, 2020.
- [25] H. Wu, C. Wang, Y. Tyshetskiy, A. Docherty, K. Lu, and L. Zhu, “Adversarial examples for graph data: deep insights into attack and defense,” in IJCAI, 2019, pp. 4816–4823.
- [26] X. Xu, Y. Yu, B. Li, L. Song, C. Liu, and C. Gunter, “Characterizing malicious edges targeting on graph neural networks,” 2019.
- [27] N. Entezari, S. A. Al-Sayouri, A. Darvishzadeh, and E. E. Papalexakis, “All you need is low (rank): Defending against adversarial attacks on graphs,” in WSDM. ACM, 2020, pp. 169–177.
- [28] L. Chen, J. Li, Q. Peng, Y. Liu, Z. Zheng, and C. Yang, “Understanding structural vulnerability in graph convolutional networks,” in IJCAI, 2021.
- [29] W. Jin, T. Derr, Y. Wang, Y. Ma, Z. Liu, and J. Tang, “Node similarity preserving graph convolutional networks,” in WSDM, 2021, pp. 148–156.
- [30] Z. Deng, Y. Dong, and J. Zhu, “Batch virtual adversarial training for graph convolutional networks,” CoRR, vol. abs/1902.09192, 2019.
- [31] H. Jin and X. Zhang, “Robust training of graph convolutional networks via latent perturbation,” in ECML/PKDD.
- [32] W. Hu, C. Chen, Y. Chang, Z. Zheng, and Y. Du, “Robust graph convolutional networks with directional graph adversarial training,” Applied Intelligence, pp. 1–15, 2021.
- [33] J. Chen, Y. Wu, X. Lin, and Q. Xuan, “Can adversarial network attack be defended?” CoRR, vol. abs/1903.05994, 2019.
- [34] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in AISTATS, 2011, pp. 315–323.
- [35] J. Klicpera, A. Bojchevski, and S. Günnemann, “Predict then propagate: Graph neural networks meet personalized pagerank,” in ICLR (Poster). OpenReview.net, 2019.
- [36] W. Jin, Y. Li, H. Xu, Y. Wang, and J. Tang, “Adversarial attacks and defenses on graphs: A review and empirical study,” CoRR, vol. abs/2003.00653, 2020.
- [37] A. Milanese, J. Sun, and T. Nishikawa, “Approximating spectral impact of structural perturbations in large networks,” Physical Review E, vol. 81, no. 4, p. 046112, 2010.
- [38] X. Yan, Y. Wu, X. Li, C. Li, and Y. Hu, “Eigenvector perturbations of complex networks,” PHYSICA, vol. 408, pp. 106–118, 2014.
- [39] Y. Liu, J. Peng, L. Chen, and Z. Zheng, “Abstract interpretation based robustness certification for graph convolutional networks,” in ECAI, ser. Frontiers in Artificial Intelligence and Applications, vol. 325. IOS Press, 2020, pp. 1309–1315.
- [40] G. W. Stewart, “Matrix perturbation theory,” 1990.
- [41] A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore, “Automating the construction of internet portals with machine learning,” Information Retrieval, vol. 3, no. 2, pp. 127–163, 2000.
- [42] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad, “Collective classification in network data,” AI Magazine, vol. 29, no. 3, pp. 93–106, 2008.
- [43] J. Li, T. Xie, C. Liang, F. Xie, X. He, and Z. Zheng, “Adversarial attack on large scale graph,” IEEE Transactions on Knowledge and Data Engineering, 2021.
- [44] W. Jin, T. Derr, Y. Wang, Y. Ma, Z. Liu, and J. Tang, “Node similarity preserving graph convolutional networks,” in WSDM. ACM, 2021, pp. 148–156.
- [45] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR (Poster), Y. Bengio and Y. LeCun, Eds., 2015.
- [46] J. Li, K. Xu, L. Chen, Z. Zheng, and X. Liu, “Graphgallery: A platform for fast benchmarking and easy development of graph neural networks based intelligent software,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). Los Alamitos, CA, USA: IEEE Computer Society, 2021, pp. 13–16.
![]() |
Jintang Li received the master’s degree from Sun Yat-sen University (SYSU) in 2021. He is currently pursuing the Ph.D. degree with the School of Software Engineering, Sun Yat-sen University, Guangzhou, China. His main research interests include graph representation learning, adversarial machine learning, and data mining techniques. |
![]() |
Jiaying Peng received the bachelor’s and the master’s degrees from South China Normal University and Sun Yat-sen University in 2019 and 2021, respectively. He is now working as a programmer and his interests include social networks, recommendation systems, machine learning, and data mining techniques. |
![]() |
Liang Chen received the bachelor’s and Ph.D. degrees from Zhejiang University (ZJU) in 2009 and 2015, respectively. He is currently an associate professor with the School of Computer Science and Engineering, Sun Yat-Sen University (SYSU), China. His research areas include data mining, graph neural network, adversarial learning, and services computing. In the recent five years, he has published over 70 papers in several top conferences/journals, including SIGIR, KDD, ICDE, WWW, ICML, IJCAI, ICSOC, WSDM, TKDE, TSC, TOIT, and TII. His work on service recommendation has received the Best Paper Award Nomination in ICSOC 2016. Moreover, he has served as PC member of several top conferences including SIGIR, WWW, IJCAI, WSDM etc., and the regular reviewer for journals including TKDE, TNNLS, TSC, etc. |
![]() |
Zibin Zheng received the Ph.D. degree from the Chinese University of Hong Kong, in 2011. He is currently a Professor with the School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China. His research interests include services computing, software engineering, and blockchain. He received the ACM SIGSOFT Distinguished Paper Award at the ICSE’10, the Best Student Paper Award at the ICWS’10, and the IBM Ph.D. Fellowship Award. |
![]() |
Tingting Liang received the Ph.D. degree from the College of Computer Science, Zhejiang University, in 2019. She is currently working as an Associate Professor in the School of Computer Science and Technology, Hangzhou Dianzi University. Her research interests include Data Mining, Recommender System, Multi-View Learning, Deep Learning, and Service Oriented Computing. Her papers have been published in some well-known conference proceedings and international journals such as ICDM, SIGIR, ICSOC, TITS, TSC, KBS, etc. |
![]() |
Qing Ling received the B.E. degree in automation and the Ph.D. degree in control theory and control engineering from the University of Science and Technology of China, Hefei, China, in 2001 and 2006, respectively. He was a Post-Doctoral Research Fellow with the Department of Electrical and Computer Engineering, Michigan Technological University, Houghton, MI, USA, from 2006 to 2009, and an Associate Professor with the Department of Automation, University of Science and Technology of China, from 2009 to 2017. He is currently a Professor with the School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China. His current research interest includes distributed and decentralized optimization and its application in machine learning. His work received the 2017 IEEE Signal Processing Society Young Author Best Paper Award. Dr. Ling is a Senior Area Editor of IEEE SIGNAL PROCESSING LETTERS. |