Towards Inductive Robustness: Distilling and Fostering Wave-induced Resonance in Transductive GCNs Against Graph Adversarial Attacks

Ao Liu¹, Wenshan Li², Tao Li¹, Beibei Li¹, Hanyuan Huang¹, Pan Zhou³ Corresponding author

Abstract

Graph neural networks (GNNs) have recently been shown to be vulnerable to adversarial attacks, where slight perturbations in the graph structure can lead to erroneous predictions. However, current robust models for defending against such attacks inherit the transductive limitations of graph convolutional networks (GCNs). As a result, they are constrained by fixed structures and do not naturally generalize to unseen nodes. Here, we discover that transductive GCNs inherently possess a distillable robustness, achieved through a wave-induced resonance process. Based on this, we foster this resonance to facilitate inductive and robust learning. Specifically, we first prove that the signal formed by GCN-driven message passing (MP) is equivalent to the edge-based Laplacian wave, where, within a wave system, resonance can naturally emerge between the signal and its transmitting medium. This resonance provides inherent resistance to malicious perturbations inflicted on the signal system. We then prove that merely three MP iterations within GCNs can induce signal resonance between nodes and edges, manifesting as a coupling between nodes and their distillable surrounding local subgraph. Consequently, we present Graph Resonance-fostering Network (GRN) to foster this resonance via learning node representations from their distilled resonating subgraphs. By capturing the edge-transmitted signals within this subgraph and integrating them with the node signal, GRN embeds these combined signals into the central node’s representation. This node-wise embedding approach allows for generalization to unseen nodes. We validate our theoretical findings with experiments, and demonstrate that GRN generalizes robustness to unseen nodes, whilst maintaining state-of-the-art classification accuracy on perturbed graphs.

Introduction

In recent years, graph neural networks (GNNs), through the capabilities afforded by inductive learning, have emerged as the most potent instruments for node classification tasks. Nevertheless, earlier transductive models, such as graph convolutional networks (GCNs) (Kipf and Welling 2017), have inadvertently introduced vulnerabilities to adversarial attacks within the GNN framework. It has been observed that perturbed graphs derived from GCNs serving as surrogate models have the potential to compromise the outputs of inductive GNNs when transferred. In real-world applications, where trust and accuracy are non-negotiable (Chen et al. 2021; Nadal et al. 2021; Zhao et al. 2022; Berberidis and Giannakis 2019; Xiao, Chen, and Shi 2019), such vulnerabilities can significantly jeopardize public trust (Kreps and Kriner 2020), distort human decision-making (Walt, Jack, and Christof 2019), and threaten human well-being (Samuel et al. 2019). Addressing the vulnerabilities introduced by transductive GCNs into the GNNs’ community is of paramount importance.

Distinct from discrete feature data like images or text, graph data comprises a connected set of features through its topological structure. This interconnectedness naturally encourages the adoption of a global input-output mechanism to establish a learning channel from features to labels, a paradigm referred as transductive learning (Kipf and Welling 2017; Defferrard, Bresson, and Vandergheynst 2016; Bruna et al. 2013), with GCNs epitomizing this approach. This very transductive nature of GCNs offers adversaries an ideal environment for launching attacks (Liu et al. 2022). Leveraging this global input-output pattern, given sufficient computation, adversaries can invariably devise perturbations that are both concealment and effective (Sun et al. 2022). Given that adversaries exploit vulnerabilities inherent to transductive models to compromise the GNNs’ communities, the formulation of a more robust transductive model has ascended as the prevailing defensive approach.

To defense adversarial attacks, early research predominantly sought to fortify GCN’s tolerance to perturbations by adversarial training through random edge drops (Dai et al. 2018). Recently, a shift towards self-supervised training methods has been observed. These techniques sidestep the trap set by adversaries, which bait the model into misclassifying specific inputs. Instead of singularly focusing on enhancing the model’s robustness to a given label space, they aim to expand the GCN’s overall robustness to potential perturbed graphs. Key representatives of these research endeavors include: (1) In RGCN (Zhu et al. 2019), the Gaussian distributions are employed to replace the node hidden representations across each GCN layer, aiming to mitigate the adversarial modifications’ impact. (2) By introducing a singular value decomposition (SVD) filter before the GCN processing, GCN-SVD (Entezari et al. 2020) is designed to discard adversarial edges from the training dataset. (3) STABLE (Li et al. 2022) introduces enhancements in GCN’s forward propagation by incorporating functions that sporadically reinstate edges which were approximately removed. (4) EGNN (Liu et al. 2021) leverages graph smoothing techniques to confine the permutation setting space, effectively excluding the majority of non-smooth permutations.

However, current research, aiming to improve GCN-based models into a robust transductive variant against attacks, inadvertently carries over transductive-introduced weaknesses (Hamilton, Ying, and Leskovec 2017). Specifically, these models can’t handle unseen nodes and are limited to fixed structures, lacking generalization. This restricts their applicability. If adversaries slightly adjust tactics, defenders must retrain their models for safety. The cause is that GCNs’ vulnerabilities are inherent. To enhance their robustness, these vulnerabilities require targeted solutions. Deviating from the context of GCNs could hinder a thorough analysis of attack mechanisms. This, in turn, would obstruct the transition from transductive robust models to inductive ones. Until we harness GCN’s inherent robustness for inductive models, we will be stuck in a cycle of constantly refining transductive ones to address vulnerabilities.

In addressing this conundrum, our exploration unveiled an intriguing intrinsic source of robustness within the GCN itself. Without resorting to additional designs, merely deepening the standard 2-layer GCN to a 3-layer structure endows it with an innate (albeit partial) robustness. Importantly, the mechanism underpinning this robustness can be distilled. By purposefully fostering this intrinsic mechanism, it has paved the way for us to architect a robust inductive model. Employing this approach serves a dual purpose: On one hand, it facilitates a precision-oriented confrontation against the perturbations devised specifically by adversaries for transductive structures, ensuring the efficacy of our defensive strategies. On the other hand, it enables us to integrate this robustness mechanism into inductive frameworks, thereby achieving a seamless melding of inductiveness and robustness.

Specifically, we demonstrate that the vibrations of node signals within the GCN-driven message passing (MP) are equivalent to the edge-based waves, formulated by wave equations (Friedman and Tillich 2004; Shatah and Struwe 1993). Given this equivalence, it follows that GCNs inherently possess the potential for resonance (Kovalyov 1989), allowing them to harness the natural advantages of waves in defending against perturbations (Blas and Jenkins 2022). Then, we introduce a mathematical definition for the intensity of such resonance in GCNs. This definition, which outlines the scope and weights of a node’s connections to its neighbors, concurrently adheres to four principles: universality, adaptation via MP, node-independence, and topological correlation. Subsequently, we demonstrated that for 3+ layer GCNs, an invariant mapping exists, translating GCNs’ outputs into resonance intensity, manifesting as nodes capturing their surrounding local weighted structure.

Informed by these insights, we introduce the Graph Resonance-fostering Network (GRN) for inductive learning. The core of GRN is that it distills the structure resonating with nodes as local resonance subgraphs. Then, within this subgraph, GRN fosters the resonance by embedding both the node’s signals and the signals transmitted through edges as central node’s representation. This embedding approach is generalizable across graph structures. If the surrounding topology of a node (with unseen ones) can be clearly determined to distill the local resonance subgraph, robust and inductive graph learning is achieved. Our contributions are:

•

We propose the first inductive and robust GNN.
•

We prove that a 3-layer GCN inherently possesses an distillable robustness.
•

We prove the equivalence between GCN-driven signal vibrations and edge-based waves.

Preliminaries

Notations

We consider connected graphs $\mathcal{G}=(\mathcal{V},\mathcal{E})$ consisting $N=|\mathcal{V}|$ nodes. Let $\mathbf{A}\in\{0,1\}^{N\times N}$ be the adjacency matrix. Let generic symbol $\mathbf{L}$ be the Laplacian in its broadest sense. The feature and one-hot label matrix are $\mathbf{Z}\in\mathbb{R}^{N\times d_{0}}$ and $\mathbf{Y}\in\mathbb{R}^{N\times d_{L}}$ respectively. The edge connected nodes $v_{i}$ and $v_{j}$ is written as $(v_{i},v_{j})$ or $(v_{j},v_{i})$ . The neighborhood $\mathcal{N}_{i}$ of a node $v_{i}$ consists of all nodes $v_{j}$ for which $(v_{i},v_{j})\in\mathcal{E}$ . Let $\mathrm{deg}_{i}$ be the degree of node $v_{i}$ . The feature vector and one-hot label of node $v_{i}$ are $\mathbf{z}_{i}$ and $\mathbf{y}_{i}$ .

GCN

Under the topology $\mathbf{L}$ , with $\mathbf{Z}$ as the input, the output at the $k$ -th layer of a GCN is denoted as $\mathcal{M}(\mathbf{Z},k;\mathbf{L})$ . The $k$ -th parameter matrix of $\mathcal{M}$ is $\mathbf{W}^{(k)}\in\mathbb{R}^{d_{k}\times d_{k+1}}$ . $\mathbf{Z}^{(k)}=[\mathbf{z}_{1}^{(k)},\ldots,\mathbf{z}_{N}^{(k)}]$ denotes the features in $k$ -th MP.

Wave Equation

The edge-based wave equation introduces a relationship between a graph-based signal $g=\mathrm{W{\scriptstyle AVE}}(\mathbf{Z},k;\mathbf{L})$ and its topological structure. Let $c$ be a constant, it is defined as $\frac{\partial^{2}g}{\partial k^{2}}=-\mathbf{L}^{k}g\cdot c$ (Friedman and Tillich 2004). Herein $g$ can be instantiated as any discernible signal.

3-layer GCN Possesses Adversarial Robustness via Wave-induced Resonance

Equivalence of GCN-driven MP and Wave Equation

Here we demonstrate that the signal vibrations driven by GCNs, are equivalent to waves on graph topologies and can be characterized by nonlinear wave equations.

Theorem 1.

Let $\mathcal{M}(\mathbf{Z},k;\mathbf{L})$ and $\mathrm{W{\scriptstyle AVE}}(\mathbf{Z},k;\mathbf{L})$ denote the signals of the $k$ -th MP and $k$ -th wave respectively, under the topological structure represented by $\mathbf{L}$ . It is established that for the given $k$ and $\mathcal{M}(\cdot)$ , there exists $\mathrm{W{\scriptstyle AVE}}(\cdot)$ satisfies $\mathcal{M}(\mathbf{Z},k;\mathbf{L})=\mathrm{W{\scriptstyle AVE}}(\mathbf{Z},k;\mathbf{L}),\forall\mathbf{L}\in\widehat{\mathbf{L}}$ , where $\widehat{\mathbf{L}}$ are the Laplacian matrices of all attribute graphs.

This study draws an analogy between the node signals in GCN-driven MP and waves, considering edges as the transmission medium. Research indicates that in systems producing waves, resonance can arise between waves and their medium (Ahmad et al. 2023; Bykov, Bezus, and Doskolovich 2019). Building on this understanding, we can reaffirm our empirical observations about the GCN training pattern: in non-adversarial contexts, GCNs converge to the most natural, or congruent with ground truth, signal MP paradigm during training. Under this premise, the messages transmitted by nodes and edges in the graph manifest a natural coupling state, maintaining a benign mapping relationship $\mathcal{M}:\mathcal{G}\to\mathbf{Y}$ . The key to optimizing $\mathcal{M}$ is the resonance between node signals $\mathbf{Z}^{(\ell)}$ and edge signals $\mathbf{E}^{(\ell)}$ . In adversarial situations, adversaries manipulate node signals by rewiring edges, which inadvertently induces unnatural, i.e., noncongruent with ground truth, MP patterns. Under this scheme, the benign resonance is disrupted, resulting in a malignant mapping relationship $\mathcal{M}:\mathcal{G}^{\prime}\to\mathbf{Y}^{\prime}$ , where $\mathcal{G}^{\prime}$ and $\mathbf{Y}^{\prime}$ is the perturbed graph and label, respectively.

Mathematical Definition of Resonance in GCN

Maintaining benign resonance becomes an intuitive defensive mechanism as it intrinsically resists unnatural perturbations. To actualize control over this resonance, thereby purposefully fostering resonance within the GCN, we subsequently delineate a detailed definition of this resonance. Thus, this definition should comply with the following conditions: 1) Every node within a graph should possess a computable resonance intensity, 2) the resonance intensity of all nodes should evolve in accordance with MP, 3) each node should maintain an independent resonance intensity, and 4) the stronger a node’s connection to its surrounding topology, the greater its perceived resonance intensity.

To devise a methodology compliant with the desired conditions, we consider node $v_{i}$ and utilize its latent representation (Kula 2015) $\bar{{z}}_{i}^{(k)}=\sum_{j}\mathbf{z}^{(k)}_{i,j}$ to quantify the intensity of the node signals. Furthermore, we use $T_{i}$ , the count of edges among nodes in $\mathcal{N}_{i}$ , to measure the connectivity strength specific to the edges at the given nodes. Then, we use the total number $p_{i}=\sum_{j}\mathbf{A}^{2}_{i,j}$ of walks of length 2 originating from $v_{i}$ to any node in $\mathcal{G}$ , to quantity the magnitude of connectivity density that $v_{i}$ exhibits in the structure.

Accordingly, we propose the following definition to quantify the resonance intensity at node $v_{i}$ :

Definition 1.

The resonance intensity of $v_{i}$ on $k$ -th MP is

R(v_{i};k)\overset{\text{def.}}{=}\bar{{z}}_{i}^{(k)}T_{i}+2p_{i}+8\mathrm{deg}_{i}.

(1)

The unique of defining resonance intensity can be articulated as follows: it not only allows for an interpretable quantification of the resonance on different nodes, but it is also directly observable within MP. This implies that under such a definition, the resonance intensity of any node at any given MP epoch on a graph can be independently calculated, obviating the need for the GCN computational paradigm.

Resonance arises in 3rd MP

Definition 1 facilitates the quantification of resonance for any signal function on any graph, irrespective of whether or not it is driven by GCN. Nonetheless, an intriguing finding has been proven: the wave system constructed by GCN inherently and involuntarily arises resonance, which is outlined in the theorem:

Theorem 2.

Let $\bar{{z}}_{i}^{(k)}$ be the latent signal formed by GCN-driven MP, we have:

R(v_{i};k)\propto 64\bar{{z}}_{i}^{(k+3)}-32.

(2)

Theorem 2 unveils an intriguing phenomenon: for $k\geq 3$ , there subsists an invariant mapping, which transposes the GCN-driven signal into a resonance intensity that bears no correlation with the GCN paradigm. Given that Definition 1 has established the resonance intensity as a measure of the coupling strength between nodes and structure within the graph, we can thus characterize it as the degree of coupling. Consequently, it can be asserted that prior to 3rd MP iteration, the GCN appears to have yet to delve into the coupling paradigm between nodes and structure within the graph. However, subsequent to the 3rd MP, due to the persistent presence of the invariant mapping, it can be construed that the GCN has fortuitously assimilated the coupling paradigm within the graph during the 3rd MP, and perpetuates this paradigm into subsequent MPs.

Vast Perturbation Search Space of 3-layer GCN

In light of the current absence of an effective method for quantifying the combined adversarial robustness of a specific graph and a GCN learning from said graph, we propose an intuitive approach. For a graph $\mathcal{G}$ , comprised of $|\mathcal{V}|$ edges and represented by the adjacency matrix $\mathbf{A}$ , and a GCN $\mathcal{M}$ with $K$ layers, where the perturbation budget is denoted as $r$ , the number of matrix multiplication-based forward propagations required by the attack model can be construed as the highest attack cost. In this context, the number of subgraphs is independent of node features, hence we employ $\mathbf{A}$ as the independent variable for the attack cost function, denoted as $\mathrm{Cost}(\mathbf{A},r,K)$ . We then present the following theorem:

Theorem 3.

For any specified graph with a node set $\mathcal{V}$ and an adjacency matrix $\mathbf{A}$ , in conjunction with a $K$ -layer GCN, and a maximum perturbation $r$ , the following holds:

\displaystyle\mathrm{Cost}(\mathbf{A},r,K)\leq\begin{cases}C(|\mathcal{V}|,r),&\mbox{if }K<3\\ (K-1)C\left(\frac{|\mathcal{V}|^{K-1}}{2},r\right),&\mbox{otherwise},\end{cases}

(3)

where $C(\cdot,\cdot)$ denotes the number of combinations.

It’s revealed that adversaries face the same computational cost for matrix multiplication-based forward propagations when $K$ = 1 or 2. However, for $K\geq 3$ , the cost dramatically increases, largely due to $C(\frac{|\mathcal{V}|^{K-1}}{2},r)$ . As an example, with the Cora dataset ( $|\mathcal{V}|=5429$ ) and a 1% perturbation rate ( $r=54$ ), the cost for $K$ = 3 becomes exponentially larger. Thus, attacking a 3-layer GCN presents a vast search space for adversaries. This insight extends Theorem 2’s real-world applications and our previous findings: a 3-layer GCN can naturally create resonance robustness. With our defined resonance, we can further boost this robustness proposefully.

Graph Resonance-fostering Network

Principle Overview

We employ GRN to enhance the resonance of the GCN. The underlying concept of the GRN is articulated as follows. Definition 1 exhibits that for a node $v_{i}$ , there exists a local graph structure that resonates, known as the local resonance subgraph (LRS) for node $v_{i}$ , denoted as $G_{i}=(\mathcal{V}_{G_{i}},\mathcal{E}_{G_{i}})$ , used to represent the maximal subgraph structure that node $v_{i}$ can capture. During end-to-end training, both the node signals $\mathbf{Z}^{(\ell)}$ , and the signals transmitted through edges $\mathbf{E}^{(\ell)}$ concurrently vibrate within the LRS. Consequently, for $v_{i}$ , if a learnable parameter $\mathbf{W}^{(\ell)}$ capable of jointly embedding MP’s result $\mathbf{A}_{G_{i}}\mathbf{Z}^{(\ell)}_{G_{i}}$ , and $\mathbf{E}^{(\ell)}_{G_{i}}$ , into $v_{i}$ ’s output representation, this aggregation intentionally accomplishes a learnable resonance, generating a local-level embedding. This identical aggregate pattern is applied across all nodes to facilitate a mapping, thereby achieving a global-level forward propagation within the GRN.

In summary, a single forward propagation of the GRN is:

\mathbf{z}^{(\ell+1)}_{i}=\sigma(\mathrm{\scriptstyle MEAN}(\mathrm{\scriptstyle CONCAT}(\mathbf{A}_{G_{i}}\mathbf{Z}^{(\ell)}_{G_{i}},\mathbf{E}^{(\ell)}_{G_{i}})\mathbf{W}^{(\ell)})).

(4)

Next, we provide explicit definitions for $G_{i}$ and $\mathbf{E}^{\ell}_{G_{i}}$ .

$G_{i}$ : Local Resonance Subgraph

As per Definition 1, LRS comprises three components: 1) edges formed amongst all first-order neighbors, as counted by $T_{i}$ , with these edges’ weights equal 1; 2) edges formed between it and all 2nd- (inevitably includes 1st-) order neighbors, as counted by $p_{i}$ , with these edges’ weights equal 2; 3) edges between it and all first-order neighbors, as counted by $\mathrm{deg}_{i}$ , with these edges’ weights equal 8. Consequently, the LRS can be viewed as a weighted graph, in which the weights of edges serve as attention for the joint combination of $\mathbf{Z}^{(\ell)}$ and $\mathbf{E}^{(\ell)}$ . An illustrative example of the LRS is presented in Figure 1.

Refer to caption — Figure 1: An illustration of local resonance subgraph.

$\mathbf{E}^{(\ell)}_{G_{i}}$ : Edge-transmitted Signals

In the MP driven by adjacency matrices, only signals at the nodes are observable, while the signals transmitted across each edge remain imperceptible. To ascertain the quantified signals on every specific edge within $G_{i}$ , we first obtain the global edge-transmitted signals $\mathbf{E}^{\ell}_{\mathcal{G}}$ . Then, $\mathbf{E}^{\ell}_{G_{i}}$ is subsequently derived through a sampling procedure on $\mathbf{E}^{\ell}_{\mathcal{G}}$ using the edge indices within $G_{i}$ .

Specifically, within $\mathbf{E}^{\ell}_{\mathcal{G}}$ , the edge-transmitted signals on $(v_{j},v_{k})$ are denoted as $\mathbf{e}^{\ell}_{j,k}$ . For $\ell>0$ , $\mathbf{e}^{\ell}_{j,k}$ is defined via a sequential procedure: 1) The edge $(v_{j},v_{k})$ in $\mathcal{G}$ is deleted, producing a new graph $\mathcal{G}^{j,k}$ with its adjacency matrix $\mathbf{A}_{\mathcal{G}^{j,k}}$ . 2) A new forward propagation is executed in the same layer on $\mathcal{G}^{j,k}$ , obtaining a feature matrix $\mathbf{Z}^{(\ell)}_{\mathcal{G}^{j,k}}$ . This matrix does not contain any messages transmitted through the edge $(v_{j},v_{k})$ . Consequently,

\mathbf{Z}^{(\ell)}_{\mathcal{G}^{j,k}}=\mathbf{A}_{\mathcal{G}^{j,k}}\mathbf{Z}^{(\ell-1)}\mathbf{W}^{(\ell-1)}.

(5)

The feature of node $v_{j}$ in $\mathbf{Z}^{(\ell)}_{\mathcal{G}^{j,k}}$ denoted as $\mathbf{z}^{(\ell)}_{\mathcal{G}^{j,k},j}$ , is obtained. 3) In $\mathcal{G}^{j,k}$ , there is no edge between the nodes $(v_{j},v_{k})$ . Hence, the feature transmitted from node $v_{k}$ to $v_{j}$ (i.e., $\mathbf{e}^{(\ell)}_{j,k}$ ) is calculated by subtracting the feature obtained through the re-propagation on $\mathcal{G}^{j,k}$ (i.e., $\mathbf{z}^{(\ell)}_{G_{i}^{j,k},j}$ ) from the original feature (i.e., $\mathbf{z}^{(\ell)}$ ). Similarly, the signal transmitted through the pair $(v_{j},v_{k})$ could be interpreted as the average of the mutually transmitted signals. At $\ell=0$ , since MP has not been initiated, $\mathbf{e}^{(\ell)}_{j,k}$ would ideally be 0. For end-to-end training, it is defined as a random infinitesimal value. In conclusion, $\mathbf{E}^{(\ell)}_{G_{i}}$ is determined as

\mathbf{E}^{(\ell)}_{G_{i}}=\mathrm{\scriptstyle CONCAT}\left(\left\{\mathbf{e}^{(\ell)}_{j,k}:v_{j},v_{k}\in G_{i}\right\}\right),\\ \text{ s.t. }\mathbf{e}^{(\ell)}_{j,k}=\left\{\begin{array}[]{l}\mathbf{z}^{(\ell)}-\frac{\mathbf{z}^{(\ell)}_{\mathcal{G}^{j,k},j}+\mathbf{z}^{(\ell)}_{\mathcal{G}^{j,k},k}}{2},\mbox{\emph{if}}\ \ell>0\\ \epsilon,\quad\text{where }\epsilon\sim U(0,1\times 10^{-7})\ ,\mbox{\emph{if}}\ \ell=0\end{array}\right..

(6)

Figure 2 illustrates the computation of $\mathbf{e}^{(\ell)}_{j,k}$ .

Simplifying the Computational Overhead of $\mathbf{E}^{(\ell)}_{G_{i}}$

Equation (5) explicates the method of re-propagation on $\mathcal{G}_{j,k}$ . Given that there are $|\mathcal{V}|$ ways to choose $(v_{j},v_{k})$ , it necessitates the computation of $|\mathcal{V}|$ matrix multiplications (where $\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}=\mathbf{Z}^{(\ell-1)}\mathbf{W}^{(\ell-1)}$ remains the same for all $(v_{j},v_{k})$ selections and can be considered a constant matrix), thereby constituting the primary computational cost of $\mathbf{E}^{(\ell)}_{G_{i}}$ . Here, we provide a computational method equivalent to Equation (5), reducing the $|\mathcal{V}|$ times to once.

Proposition 1.

By indexing and rearranging $\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}$ by rows $j$ and $k$ to obtain a matrix $Q(\mathbf{Z}_{\mathbf{W}}^{(\ell-1)};j,k)\in\mathbb{R}^{N\times d_{\ell-1}}$ ,

\mathbf{Z}^{(\ell)}_{\mathcal{G}^{j,k}}=\mathbf{A}\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}-Q\left(\mathbf{Z}_{\mathbf{W}}^{(\ell-1)};j,k\right).

(7)

Evidently, a single matrix multiplication, i.e., $\mathbf{A}_{G}\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}$ , is sufficient to iterate over all $(v_{j},v_{k})$ and yield the results.

Learning the Parameters

Each layer of GRN only contains trainable parameters $\mathbf{W}^{(\ell)}$ , and each has a distinct output representation $\mathbf{Z}^{(\ell)}$ . Thus, in accordance with the requirements of the downstream task, GRN can accommodate either supervised or unsupervised loss functions, thereby tuning their weight matrices. Specifically, we denote the discrepancy function as $D(\cdot,\cdot)$ . In semi-supervised scenarios, the loss function is $J_{s}(\mathbf{z}^{(K)}_{i})=D(\mathbf{z}^{(K)}_{i},\mathbf{y}_{i})$ ; in unsupervised scenarios (Müller 2023), $J_{u}(\mathbf{z}^{(K)}_{i})=D(\mathbf{z}^{(K)}_{i},\{\mathbf{y}_{j}:v_{j}\in\mathcal{N}_{i}\})$ . Depending on the downstream applications, $D(\cdot,\cdot)$ can take various forms, such as cross-entropy, etc. The general workflow of GRN is illustrated in Figure 3.

Experiments

Datasets

Our findings are evaluated on five real-world datasets widely used for studying graph adversarial attacks (Sun et al. 2020; Liu et al. 2022; Zhu et al. 2019; Entezari et al. 2020; Li et al. 2022), including Cora, Citeseer, Polblogs, and Pubmed.

Baselines

Comparison defending models. We compare GRN with other defending models including: 1) RGCN which leverages the Gaussian distributions for node representations to amortize the effects of adversarial attacks, 2) GNN-SVD which is applied to a low-rank approximation of the adjacency matrix obtained by truncated SVD, 3) Pro-GNN (Jin et al. 2020) which can learn a robust GNN by the intrinsic properties of nodes, 4) Jaccard (Wu et al. 2019) which defends attacks based on the measured Jaccard similarity score, 5) EGNN (Liu et al. 2021) which filters out perturbations by $l_{1}$ - and $l_{2}$ -based graph smoothing. Attack methods. The experiments are designed under the following attack strategies: 1) Metattack (Zügner and Günnemann 2018), a meta-learning based attack, 2) CLGA (Sixiao et al. 2022), an unsupervised attack, 3) RL-S2V (Dai et al. 2018), a reinforcement learning based attack.

Pinpointing the Layer of Resonance

In Theorem 2, we establish an equivalence relation between the $k$ -th and the $k+3$ -th layer’s output latent representations, as derived from Equations (1) and (2). This elucidates that when $k=0$ , the $3$ -th layer involuntarily captures local structures, thereby inducing resonance. To facilitate experimental variable control, we first demonstrate the equivalence relation under varying “gap layer numbers” (denoted as $k_{gap}$ ). If the equivalence between Equations (1) and (2) only holds when $k_{gap}\geq 3$ , it substantiates the validity of Theorem 2. Specifically, we first train a 5-layer GCN, then obtain the resonance intensity denoted as $R_{def}(k)=\bar{{z}}_{i}^{(k)}T_{i}+2p_{i}+8\mathrm{deg}_{i}$ , and the actual observed signal denoted as $R_{real}(k+k_{gap})=64\bar{{z}}_{i}^{(k+k_{gap})}-32$ , for each epoch. Given these observational variables, we delineate their transformations over the learning process using lists $\{R_{def}(k)\}$ and $\{R_{real}(k+k_{gap})\}$ respectively. Each list chronicles its corresponding variable’s fluctuations across all epochs. Subsequently, we standardize (using the standardize function $\mathrm{{\scriptstyle STD}}(\cdot)$ ) the sequences under different $k_{gap}$ and calculate the absolute difference to obtain a difference sequence:

\mathbf{d}_{k,k_{gap}}=|\mathrm{\scriptstyle STD}(\{R_{def}(k)\})-\mathrm{\scriptstyle STD}(\{R_{real}(k+k_{gap})\})|.

(8)

The parameter $\mathbf{d}_{k,k_{gap}}$ , serving as an indicator variable, accurately encapsulates the discrepancy between $R_{def}(k)$ and $R_{real}(k+k_{gap})$ . The experimental results are illustrated in Figure 4. Owing to the large number of nodes, we display the mean value (central line) and standard deviation (shadow areas) of all nodes. As epochs progress, $\mathbf{d}_{0,3}$ gradually converges to zero. After the initial several epochs, it significantly diverges from $\mathbf{d}_{0,1}$ and $\mathbf{d}_{0,2}$ . This validates the intriguing phenomenon mentioned in Theorem 2: a correlation has been established between the signal at the 3-th layer and the graph’s initial signal and structure. Subsequent experimental results echo the aforementioned findings, thereby affirming the correctness of Theorem 2 when $k>0$ .

Attack Success Rate Cliff-like declines on 3-layer GCN

Intuitively, the complexity of an attack tends to increase with the number of GCNs’ layer. Observing the pattern of attack success rate (ASR) declines as the number of GCN layers increases helps validate our claim that the 3-layer GCN, derived from resonance, can significantly enhance robustness. Specifically, we start by initializing 10 GCNs with the number of layers ranging from 1 to 10. Next, we conduct experiments on 4 datasets using 3 typical attacks, setting the perturbation rate uniformly at 20%. We then train a surrogate model for each GCN separately, placing perturbations in the dataset, and clearing these perturbations after each attack. We repeat each attack five times and report the average ASR accuracy (depicted by the lines) and variation range (represented by the shaded areas). The results, as shown in Figure 6, clearly reveal a steep drop in ASR at the 3-layer GCN. However, further layer addition seems unable to significantly reduce the ASR, as the additional layers maintain the same resonance pattern as the 3-layer GCN to achieve adversarial robustness. These findings articulate the concept of distilling the resonance from GCNs and fostering this resonance to design a inductive approach.

How Robust of LRS-constructed Graphs

Derived from a transductive model, the LRS captures distinct resonance regions and transforms a localized, unweighted graph (also perceivable as a graph with unitary weights) into a weighted one. The implementation of the LRS within the GRN enables the demarcation of a learnable resonance scope for inductive learning. Consequently, it becomes essential to validate the efficacy of the LRS through its embedding precision within the transductive model.

We initiate by presenting results obtained under non-adversarial conditions. We sum the LRS of all nodes in $\mathcal{G}$ and apply min-max scaling to all weights, thus creating a global weighted graph $\mathcal{G}_{LRS}$ . Then, we train a standard 2-layer GCN $\mathcal{M}_{\mathcal{G}}$ on Cora dataset (denoted as $\mathcal{G}$ ) and visualize its well-trained representations. Then, we feed $\mathcal{G}_{LRS}$ into $\mathcal{M}_{\mathcal{G}}$ to generate its visualization. Lastly, for comparison, we create a random-weighted graph $\mathcal{G}_{random}$ whose edge distribution is the same as $\mathcal{G}$ , and input it into $\mathcal{M}_{\mathcal{G}}$ to get the corresponding visualization. As Figure 5(a) shows, under identical weights, the representations of different categories in $\mathcal{G}_{LRS}$ are tighter than those in $\mathcal{G}$ . This suggests that introducing LRS brings beneficial global weights, which enhance the model’s performance in non-adversarial scenarios.

Then, we explored the similarity between $\mathcal{G}$ and $\mathcal{G}_{LRS}$ . Using strength distribution, akin to degree distribution for unweighted graphs (Zügner, Akbarnejad, and Günnemann 2018), we found the two weighted graphs are notably similar. Figure 5(b) confirms this, showing a stark difference from $\mathcal{G}_{random}$ . Therefore, $\mathcal{G}_{LRS}$ maintains the traits of $\mathcal{G}$ .

We next assessed the classification accuracy of $\mathcal{G}_{LRS}$ under adversarial attacks using varying Metattack perturbation rates $p_{r}=\frac{r}{|\mathcal{E}|}$ . Figure 5(c) shows that as $p_{r}$ increases, $\mathcal{G}_{LRS}$ ’s accuracy consistently edges out $\mathcal{G}$ . This suggests that the LRS introduces a resonance in the transductive model, marginally boosting its adversarial robustness.

Generalizable Robustness of GRN

Dataset	$p_{r}$ (%)	Defense Baselines					GRN
Dataset	$p_{r}$ (%)	RGCN	GNN-SVD	Pro-GNN	Jaccard	EGNN	$s_{r}$ =20%	$s_{r}$ =40%	$s_{r}$ =60%
Cora	0	83.49±0.57	81.14±0.79	85.01±0.40	81.74±0.36	85.00±0.40	83.74±1.68	86.79±2.27	87.75±0.93
	5	77.20±0.47	78.29±0.63	80.10±0.22	80.56±1.30	82.24±0.49	81.48±0.83	86.04±3.15	86.24±1.54
	10	72.65±0.40	70.81±1.77	74.45±0.28	75.07±1.28	76.38±0.35	79.51±1.62	81.38±2.58	82.03±1.48
	20	59.31±0.27	56.67±1.22	64.68±0.75	73.54±0.94	69.82±0.71	73.79±1.91	74.14±1.93	74.81±1.52
Citeseer	0	71.81±0.71	70.42±0.39	74.94±0.40	73.82±0.56	74.92±0.66	75.69±0.69	78.19±1.30	83.26±0.73
	5	71.22±0.61	68.86±0.47	72.45±0.88	71.41±0.65	73.60±0.45	75.40±0.95	77.54±1.04	81.66±0.70
	10	67.53±0.60	68.70±0.89	70.16±1.05	70.09±0.48	73.66±0.37	73.81±1.10	77.04±1.51	80.32±0.49
	20	63.20±1.70	57.95±1.48	55.84±1.28	67.22±1.32	65.91±1.20	71.06±1.20	72.72±1.28	77.21±0.64
Pubmed	0	84.57±0.39	83.25±0.35	84.96±0.08	84.87±0.10	85.94±0.10	80.26±0.43	85.51±0.66	87.27±0.67
	5	81.25±0.50	82.90±0.26	83.00±0.10	82.32±0.11	83.89±0.09	77.86±0.35	81.92±0.59	83.36±0.62
	10	78.96±0.43	80.35±0.21	80.82±0.20	80.77±0.11	82.13±0.15	76.62±0.55	79.29±0.60	80.99±0.58
	20	71.33±0.40	73.57±0.15	74.16±0.16	73.41±0.12	76.01±0.19	74.98±0.64	76.87±0.60	77.15±0.56
Polblogs	0	94.87±0.19	95.08±0.22	95.45±0.12	95.03±0.57	95.70±0.34	95.42±0.56	94.88±0.43	94.97±0.31
	5	73.28±0.18	88.86±0.58	90.98±0.69	90.97±0.61	89.97±1.25	90.18±0.43	91.22±0.38	89.37±0.46
	10	70.91±0.37	80.38±0.85	85.60±1.08	85.93±1.39	83.66±1.81	86.30±0.70	85.43±0.68	85.07±0.54
	20	57.97±0.41	55.33±2.07	73.52±0.53	70.47±1.27	75.87±0.88	82.03±0.79	81.96±0.72	81.56±0.18

Table 1: Classification accuracy (%) on the perturbed graph.

p_{r}

is the perturbation rate and

s_{r}

is the “seen” rate.

We assess the adversarial robustness of the 3-layer GRN under generalization demands by comparing its accuracy against other baselines on perturbed graphs. We partition a subset of the dataset as training set with proportions (also named “seen” rate) $s_{r}$ as 20%, 40%, and 60%. The data within these proportions are deemed “seen” by the GRN, while the remaining data is categorized as “unseen”. Utilizing Metattack as our attack approach, we adopt the standard 2-layer GCN as the surrogate model. By adjusting $p_{r}$ , we derive the corresponding perturbed graphs. Then, we evaluate the classification performance of the baselines on these graphs, placing an emphasis on the accuracy upon model convergence. For each setting, we executed 10 iterations, tabulating both the average outcome and its variability. Table 1 reports the results.

From the data, both clean and perturbed graphs show GRN with a $s_{r}$ = 60% generally surpasses the baseline in accuracy. There are three exceptions: 1) For the Pubmed dataset at $p_{r}$ = 10%, this is due to EGNN using graph smoothing to enhance adversarial robustness. In this case, perturbations may be more pronounced in a certain area, and EGNN could leverage this by smoothing concentrated perturbation patterns. However, these incidents are rare, and as $p_{r}$ increases, GRN’s accuracy returns to its peak. 2) With the Polblogs dataset at $p_{r}$ = 0%, GRN is slightly behind EGNN by 0.28%. Yet, as $p_{r}$ rises, the decline in GRN’s accuracy is the least noticeable among all baselines, ensuring its top position. 3) An intriguing pattern emerging from the Polblogs dataset is the non-proportional relationship between the GRN’s $s_{r}$ and its accuracy. The peculiarity of the Polblogs dataset is that its nodes lack intrinsic features. Typically, scholars have used node degrees as proxies for these absent attributes. This substitution results in the inherent attributes of Polblogs leaning towards uniformity. Expanding the training set’s scale exacerbate the oversmoothing phenomenon, culminating in diminished accuracy.

Ablation Studies

GRNs combine edge-transmitted signal $\mathbf{E}^{(\ell)}_{G_{i}}$ and node signal $\mathbf{Z}^{(\ell)}_{G_{i}}$ for node $v_{i}$ ’s representation. We initiate an ablation study to delve into this process. First, we embed only $\mathbf{Z}^{(\ell)}_{G_{i}}$ , naming the model GRN_Z. This appears similar to a 2-depth GraphSAGE with mean aggregation, indicating potential vulnerability to adversarial attacks. We then examine the combination order of $\mathbf{E}^{(\ell)}_{G_{i}}$ and $\mathbf{Z}^{(\ell)}_{G_{i}}$ . The default GRN order is GRN_E,Z. We test GRN_Z,E (reversed order) and GRN_shuf (shuffled rows). Results (Table 2) show GRN_Z underperforms, especially in adversarial settings, emphasizing the importance of co-embedding both signals. Precisions of GRN_Z,E, GRN_E,Z, and GRN_shuf are comparable due to the edge-transmitted signal, which, combined with the node signal through shared parameters, results in consistent performance regardless of order. This suggests that GRN has the capability to recognize a latent graph structure, wherein edge-transmitted signals function as latent node signals, contributing to adversarial robustness and insensitivity to signal order.

Dataset	$p_{r}$ (%)	Standard	Ablated models
Dataset	$p_{r}$ (%)	GRN_E,Z	GRN_Z,E	GRN_shuf	GRN_Z
Cora	0	87.75	87.14±0.81	87.28±0.92	87.74±0.72
	5	86.24	86.18±0.75	86.34±0.98	85.56±0.87
	10	82.03	82.81±0.84	82.54±1.05	79.07±0.90
	20	74.81	74.53±0.80	74.42±1.36	63.54±1.30
Citeseer	0	83.26	83.07±0.64	83.52±0.92	82.23±0.97
	5	81.66	81.49±0.71	81.45±0.95	79.47±1.10
	10	80.32	80.10±0.65	80.46±1.17	74.39±1.44
	20	77.21	76.95±1.26	76.84±1.53	69.05±1.86

Table 2: Classification accuracy (%) of ablated models.

Conclusions

We addressed critical concerns surrounding the transductive nature of existing robust graph learning models. We began by establishing the equivalence between GCN-driven MP and edge-based waves. Subsequently, we demonstrated that a 3-layer GCN capitalizes on the unique resonance intrinsic to waves to achieve robustness. Delving deeper, we formalized this resonance as a coupling between a node and its surrounding local structure. We then introduced an inductive graph learning model, GRN. Experimental results not only corroborated our theoretical insights but also highlighted the exemplary robustness of the proposed GRN model.

Acknowledgments

This work is supported in part by the National Key Research and Development Program of China (No. 2020YFB1805400), the National Natural Science Foundation of China (No. U19A2068, No. 62372313), and the Sichuan Science and Technology Program (No. 2023YFG0113).

References

Ahmad et al. (2023) Ahmad, S.; Saifullah, S.; Khan, A.; and Wazwaz, A. M. 2023. Resonance, fusion and fission dynamics of bifurcation solitons and hybrid rogue wave structures of Sawada–Kotera equation. Commun. Nonlinear. Sci. Numer. Simul., 119: 107117.

Berberidis and Giannakis (2019) Berberidis, D.; and Giannakis, G. B. 2019. Node embedding with adaptive similarities for scalable learning over graphs. IEEE Trans. Knowl. Data Eng., 33(2): 637–650.

Blas and Jenkins (2022) Blas, D.; and Jenkins, A. C. 2022. Detecting stochastic gravitational waves with binary resonance. Phys. Rev. D, 105(6): 064021.

Bruna et al. (2013) Bruna, J.; Zaremba, W.; Szlam, A.; and LeCun, Y. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203.

Bykov, Bezus, and Doskolovich (2019) Bykov, D. A.; Bezus, E. A.; and Doskolovich, L. L. 2019. Coupled-wave formalism for bound states in the continuum in guided-mode resonant gratings. Phys. Rev. A, 99(6): 063805.

Chen et al. (2021) Chen, W.; Feng, F.; Wang, Q.; He, X.; Song, C.; Ling, G.; and Zhang, Y. 2021. Catgcn: Graph convolutional networks with categorical node features. IEEE Trans. Knowl. Data Eng.

Dai et al. (2018) Dai, H.; Li, H.; Tian, T.; Huang, X.; Wang, L.; Zhu, J.; and Song, L. 2018. Adversarial attack on graph structured data. In Proc. 35th Int. Conf. Mach. Learn., 1115–1124.

Defferrard, Bresson, and Vandergheynst (2016) Defferrard, M.; Bresson, X.; and Vandergheynst, P. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. volume 29.

Entezari et al. (2020) Entezari, N.; Al-Sayouri, S. A.; Darvishzadeh, A.; and Papalexakis, E. E. 2020. All you need is low (rank) defending against adversarial attacks on graphs. In Proc. 13th Int. Conf. Web Search Data Min., 169–177.

Friedman and Tillich (2004) Friedman, J.; and Tillich, J.-P. 2004. Wave equations for graphs and the edge-based Laplacian. Pac. J. Math., 216(2): 229–266.

Hamilton, Ying, and Leskovec (2017) Hamilton, W.; Ying, Z.; and Leskovec, J. 2017. Inductive representation learning on large graphs. In Proc. 31st Adv. Neural Inf. Proces. Syst.

Jin et al. (2020) Jin, W.; Ma, Y.; Liu, X.; Tang, X.; Wang, S.; and Tang, J. 2020. Graph structure learning for robust graph neural networks. In 26th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 66–74.

Kipf and Welling (2017) Kipf, T. N.; and Welling, M. 2017. Semi-supervised classification with graph convolutional networks. In Proc. 5th Int. Conf. Learn. Represent.

Kovalyov (1989) Kovalyov, M. 1989. Resonance-type behaviour in a system of nonlinear wave equations. J. Differ. Equ., 77(1): 73–83.

Kreps and Kriner (2020) Kreps, S.; and Kriner, D. 2020. Model uncertainty, political contestation, and public trust in science: Evidence from the COVID-19 pandemic. Sci. Adv., 6(43): eabd4563.

Kula (2015) Kula, M. 2015. Metadata Embeddings for User and Item Cold-start Recommendations. In CORR., volume 1507.08439.

Li et al. (2022) Li, K.; Liu, Y.; Ao, X.; Chi, J.; Feng, J.; Yang, H.; and He, Q. 2022. Reliable Representations Make A Stronger Defender: Unsupervised Structure Refinement for Robust GNN. In Proc. 28th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min.

Liu et al. (2022) Liu, A.; Li, B.; Li, T.; Zhou, P.; and Wang, R. 2022. AN-GCN: An Anonymous Graph Convolutional Network Against Edge-Perturbing Attacks. IEEE Trans. Neural Netw. Learn. Syst.

Liu et al. (2021) Liu, X.; Jin, W.; Ma, Y.; Li, Y.; Liu, H.; Wang, Y.; Yan, M.; and Tang, J. 2021. Elastic graph neural networks. In Proc. 38th Int. Conf. Mach. Learn., 6837–6849.

Müller (2023) Müller, E. 2023. Graph clustering with graph neural networks. J. Mach. Learn. Res., 24: 1–21.

Nadal et al. (2021) Nadal, S.; Abelló, A.; Romero, O.; Vansummeren, S.; and Vassiliadis, P. 2021. Graph-driven federated data management. IEEE Trans. Knowl. Data Eng., 35(1): 509–520.

Samuel et al. (2019) Samuel, F., G; John, B., D; Joichi, I.; Jonathan, Z., L; Andrew, B., L; and Isaac, K., S. 2019. Adversarial attacks on medical machine learning. Science, 363(6433): 1287–1289.

Shatah and Struwe (1993) Shatah, J.; and Struwe, M. 1993. Regularity results for nonlinear wave equations. Ann. Math., 138(3): 503–518.

Sixiao et al. (2022) Sixiao, Z.; Hongxu, C.; Xiangguo, S.; Yicong, L.; and Xu, G. 2022. Unsupervised Graph Poisoning Attack via Contrastive Loss Back-propagation. In Proc. 31st Int. Conf. World Wide Web.

Sun et al. (2022) Sun, L.; Dou, Y.; Yang, C.; Zhang, K.; Wang, J.; Philip, S. Y.; He, L.; and Li, B. 2022. Adversarial attack and defense on graph data: A survey. IEEE Trans. Knowl. Data Eng.

Sun et al. (2020) Sun, Y.; Wang, S.; Tang, X.; Hsieh, T.-Y.; and Honavar, V. 2020. Non-target-specific node injection attacks on graph neural networks: A hierarchical reinforcement learning approach. In Proc. 29th Int. Conf. World Wide Web, volume 3.

Walt, Jack, and Christof (2019) Walt, W.; Jack, C.; and Christof, T. 2019. Adversarial explanations for understanding image classification decisions and improved neural network robustness. Nat. Mach. Intell., 1: 508––516.

Wu et al. (2019) Wu, H.; Wang, C.; Tyshetskiy, Y.; Docherty, A.; Lu, K.; and Zhu, L. 2019. Adversarial examples for graph data: deep insights into attack and defense. In Proc. 28th Int. Joint Conf. Artif. Intel.

Xiao, Chen, and Shi (2019) Xiao, H.; Chen, Y.; and Shi, X. 2019. Knowledge graph embedding based on multi-view clustering framework. IEEE Trans. Knowl. Data Eng., 33(2): 585–596.

Zhao et al. (2022) Zhao, Y.; Zhou, H.; Zhang, A.; Xie, R.; Li, Q.; and Zhuang, F. 2022. Connecting embeddings based on multiplex relational graph attention networks for knowledge graph entity typing. IEEE Trans. Knowl. Data Eng.

Zhu et al. (2019) Zhu, D.; Zhang, Z.; Cui, P.; and Zhu, W. 2019. Robust graph convolutional networks against adversarial attacks. In Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 1399–1407.

Zügner, Akbarnejad, and Günnemann (2018) Zügner, D.; Akbarnejad, A.; and Günnemann, S. 2018. Adversarial attacks on neural networks for graph data. In Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2847–2856.

Zügner and Günnemann (2018) Zügner, D.; and Günnemann, S. 2018. Adversarial Attacks on Graph Neural Networks via Meta Learning. In Proc. 6th Int. Conf. Learn. Represent.

Proofs

Proof of Theorem 1

Node signals vibrate from high to low dimensions during MP, which is implemented through embedding. To ensure the uniformity of node signal dimensions during the MP process, we introduce a invertible transfer function, denoted as $\mathcal{T}_{d_{0}}:\mathbb{R}^{d_{k}}\to\mathbb{R}^{d_{0}},\forall d_{k}\leq d_{0}$ , to standardize the dimensions of $\mathbf{Z}^{(\ell)}$ and $\mathbf{Z}^{(0)}$ . This function can extend the process of the $k$ -th MP to $\mathbf{Z}^{(k-1)}\stackrel{{\scriptstyle\text{MP}}}{{\longrightarrow}}\mathbf{Z}^{(k)}\stackrel{{\scriptstyle\mathcal{T}_{d_{0}}}}{{\longrightarrow}}{\bar{\mathbf{Z}}^{(k)}}$ , where ${\bar{\mathbf{Z}}^{(k)}}=[\mathbf{Z}^{(k)}|\mathbf{0}_{N\times(d_{0}-d_{k})}]$ . This process can be interpreted as all embeddings induced by MP are completed in $\mathbb{R}^{d_{0}}$ . i.e., embedding into $d_{k}$ dimensions at the k-th MP, and the subsequent $d_{0}-d_{k}$ dimensions collapse to zero. Indeed, with this approach, MP can be viewed as signal propagation in a space of equal dimensions.

The edge-based Laplacian wave equation is (also introduced in main text):

\frac{\partial^{2}g}{\partial\ell^{2}}=-\mathbf{L}^{\ell}g\cdot c,

(9)

where $c$ is a constant. In this context, we instantiate $g$ as the signal ${\bar{\mathbf{Z}}^{(\ell)}}$ . Then we define $c$ as a signal amplitude matrix:

Definition 2.

The amplitude in $k$ -th MP is $\mathbf{S}_{k}\in\mathbb{R}^{d_{0}\times d_{0}}$ represents the transmission amplitude of signals in different dimensions. It is a variable matrix that defines the adjustable parameters of the graph-based function at different times. It performs an equal-dimensional transformation of the signal through right multiplication.

The initial speed of the first-hop MP can thus be denoted as $\mathbf{S}_{0}$ . Consequently, we derive that $\frac{\partial^{2}\bar{\mathbf{Z}}^{(\ell)}}{\partial\ell^{2}}=\mathbf{L}^{\ell}\bar{\mathbf{Z}}^{(0)}\mathbf{S}_{\ell}$ . Hence, the element-wise fluctuation speed [1] of the signal $\bar{\mathbf{Z}}^{(\ell)}$ , denoted as $\mathbf{V}_{k}\in\mathbb{R}^{N\times d_{0}}$ , can be described, and it has the relationship

\mathbf{V}_{\ell}=\frac{\partial\bar{\mathbf{Z}}^{(\ell)}}{\partial\ell}=\mathbf{L}^{\ell}\bar{\mathbf{Z}}^{(0)}\int\mathbf{S}_{\ell}d\ell,\frac{\partial\mathbf{V}_{\ell}}{\partial\ell}=\mathbf{L}^{\ell}\bar{\mathbf{Z}}^{(0)}\mathbf{S}_{\ell}.

(10)

We regard $\bar{\mathbf{Z}}^{(\ell)}$ as a sampling within the continuous signal with a minimal step $h$ , the wave described by $\bar{\mathbf{Z}}^{(\ell)}$ can thus be formalized as

\bar{\mathbf{Z}}^{(k+h)}=\bar{\mathbf{Z}}^{(k)}+h\mathbf{V}_{k},\mathbf{V}_{k+h}=\mathbf{V}_{k}+h\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{S}_{k}.

(11)

One may write

	$\displaystyle\bar{\mathbf{Z}}^{(k+h)}$	$\displaystyle=\bar{\mathbf{Z}}^{(k)}+\mathbf{V}_{k-h}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{S}_{k-h}$
		$\displaystyle=\bar{\mathbf{Z}}^{(k)}+\mathbf{V}_{k-2h}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{S}_{k-2h}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{S}_{k-h},$		(12)

and this discrete representation iterating with a step size of $h$ can be written in a continuous expression:

$\displaystyle\lim\limits_{h\to 0}\bar{\mathbf{Z}}^{(k+h)}$	$\displaystyle=\bar{\mathbf{Z}}^{(k)}+\mathbf{V}_{0}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\int_{1}^{k-1}\mathbf{S}_{\ell}d\ell$
	$\displaystyle=\bar{\mathbf{Z}}^{(k)}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\left(\int_{0}^{1}\mathbf{S}_{\ell}d\ell+\int_{1}^{k-1}\mathbf{S}_{\ell}d\ell\right)$
	$\displaystyle=\bar{\mathbf{Z}}^{(k)}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\int_{0}^{k-1}\mathbf{S}_{\ell}d\ell.$	(13)

In order to establish a direct correlation between $\bar{\mathbf{Z}}^{(k+h)}$ and $\bar{\mathbf{Z}}^{(0)}$ , instead of via a chain-like propagation through $\bar{\mathbf{Z}}^{(k)},\bar{\mathbf{Z}}^{(k-h)},\ldots$ , we introduce the Moore-Penrose (MP) pseudoinverse [2] to simplify Equation (Proof of Theorem 1). Specifically, we introduce a dimension matrix $\mathbf{C}_{k}\in\mathbb{R}^{d_{0}\times d_{0}}$ , which is associated with $\mathbf{S}_{k}$ , calculated by

\mathbf{C}_{k}=\left(\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\right)^{\dagger}\left(\mathbf{L}^{k}\bar{\mathbf{Z}}^{(k)}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\int_{0}^{k-1}\mathbf{S}_{\ell}d\ell\right).

(14)

Then, a direct connection between $\bar{\mathbf{Z}}^{(k+1)}$ and $\bar{\mathbf{Z}}^{(0)}$ can be obtained as

\bar{\mathbf{Z}}^{(k)}=\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k},

(15)

which delineates the general term formula for equal-dimensional waves $\bar{\mathbf{Z}}^{(\ell)}$ . Given that $\mathbf{Z}^{\ell}=\mathcal{T}_{d_{0}}^{-1}(\bar{\mathbf{Z}}^{(\ell)})$ , the proposition henceforth is such that, if one were to establish the validity of $\mathcal{T}_{d_{0}}^{-1}(\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k})$ , signifying that a $\mathbf{C}_{k}$ exists that would embedding the trailing $d_{0}-d_{k}$ columns of $\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}$ to be minimal value matrices (denoted as $\mathbf{T}_{d_{k}|0}=[\mathbf{T}_{d_{k}}|\mathbf{0}_{N\times(d_{0}-d_{k})}]$ ), then a inverse mapping could be constructed that would allow for the transition from equal-dimensional waves $\bar{\mathbf{Z}}^{(\ell)}$ to unequal-dimensional waves $\mathbf{Z}^{(\ell)}$ . To address this problem, we split $\mathbf{C}_{k}$ into two segments: a left $d\times(d_{k})$ matrix $\mathbf{C}_{k,L}$ and a right $d\times(d_{0}-d_{k})$ matrix $\mathbf{C}_{k,R}$ . Given that, we have: $\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k}=[\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k,L}|\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k,R}]$ . Since $\bar{\mathbf{Z}}^{(0)}={\mathbf{Z}}^{(0)}$ , it thus holds that $\bar{\mathbf{Z}}^{(0)}$ is of full column rank. Consequently, we may conclude that for any arbitrary real matrix $\mathbf{T}_{d_{k}}$ , there exist $\mathbf{C}_{k,L}$ such that $\mathbf{L}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k,L}=\mathbf{T}_{d_{k}}$ , while supporting $\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{0}_{N\times(d_{0}-d_{k})}=\mathbf{0}_{N\times(d_{0}-d_{k})}$ holds. Therefore, there exists a matrix

\mathbf{C}_{k}=[\mathbf{C}_{k,L}|\mathbf{0}_{N\times(d_{0}-d_{k})}],

(16)

such that $\mathbf{L}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k}=[\mathbf{T}_{d_{k}}|\mathbf{0}_{N\times(d_{0}-d_{k})}]=\mathbf{T}_{d_{k}|0}$ , thereby confirming the validity of $\mathcal{T}_{d_{0}}^{-1}$ . Hence, Equation (15) can be expressed inversely as

{\mathbf{Z}}^{(k+1)}=\mathcal{T}_{d_{0}}^{-1}(\bar{\mathbf{Z}}^{(k+1)})=\mathbf{L}^{k}{\mathbf{Z}}^{(0)}\mathcal{T}_{d_{0}}^{-1}({\mathbf{C}_{k}})=\mathbf{L}^{k}{\mathbf{Z}}^{(0)}\mathbf{C}_{k,L}.

(17)

Taking into account the dimension of variation under the wave, and the independent amplitude correlation when the signal fluctuates once, we apply rank- $\eta$ truncated singular value decomposition ( $\eta$ -TSVD) [3] to $\mathbf{C}_{k,L}$ to approximately obtain $k$ amplitude-correlated matrices. Given $\forall d_{0}\leq\eta\leq d_{k}$ , $\eta$ -TSVD decomposes $\mathbf{C}_{k,L}$ as $\mathrm{SVD_{T}^{(\eta)}}(\mathbf{C}_{k,L})=\mathbf{U}_{k}\mathbf{\Lambda}_{k}\mathbf{V}_{k}$ , where $\mathbf{U}_{k}\in\mathbb{R}^{d_{0}\times\eta}$ , $\mathbf{\Lambda}_{k}\in\mathbb{R}^{\eta\times\eta}$ , and $\mathbf{V}_{k}\in\mathbb{R}^{\eta\times d_{k}}$ are matrices whose columns are the first $\eta$ left singular vectors, the first $\eta$ singular values, and the first $\eta$ right singular vectors of the $\mathbf{C}_{k,L}$ , respectively. We denote: 1) $\mathbf{U}_{k}\mathbf{\Lambda}_{k}=\widehat{\mathbf{C}}_{k-1}\in\mathbb{R}^{d_{0}\times\eta}$ ; 2) $\widehat{\mathbf{U}}_{k-1}$ , $\widehat{\mathbf{\Lambda}}_{k-1}$ and $\widehat{\mathbf{V}}_{k-1}$ as the same of the above definitions; 3) $\widehat{\mathbf{C}}_{k-2}=\widehat{\mathbf{U}}_{k-1}\widehat{\mathbf{\Lambda}}_{k-1}$ . Since TSVD can decompose any real-valued matrix for any valid $\eta$ , we apply continuous $\eta$ -TSVD to $\mathbf{C}_{k,L}$ , yielding:

$\displaystyle\mathrm{SVD_{T}^{(\eta_{k})}}(\mathbf{C}_{k,L})$	$\displaystyle=\widehat{\mathbf{C}}_{k-1}\mathbf{V}_{k}$
$\displaystyle\mathrm{SVD_{T}^{(\eta_{k-1})}}(\widehat{\mathbf{C}}_{k-1})$	$\displaystyle=\widehat{\mathbf{C}}_{k-2}\mathbf{V}_{k-1}$
$\displaystyle\ldots$
$\displaystyle\mathrm{SVD_{T}^{(\eta_{2})}}(\widehat{\mathbf{C}}_{2})$	$\displaystyle=\widehat{\mathbf{C}}_{1}\mathbf{V}_{2}$	(18)

Since TSVD provides an approximate equality with introducing a minimum error term:

O(\Gamma)=O\left(\sum_{i=2}^{k-1}||\widehat{\mathbf{C}}_{i}-\widehat{\mathbf{C}}_{i-1}\mathbf{V}_{i}||_{F}+||\mathbf{C}_{k,L}-\widehat{\mathbf{C}}_{k-1}\mathbf{V}_{k}||_{F}\right),

(19)

we can summarize Equation (Proof of Theorem 1), decomposing $\mathbf{C}_{k,L}$ into a form of $k$ continuous multiplication, to formalize the $k$ amplitude matrices introduced by $k$ iterations of MP:

\mathbf{C}_{k,L}=\widehat{\mathbf{C}}_{1}\prod_{i=2}^{k}\mathbf{V}_{k}+O(\Gamma).

(20)

Substituting Equation (20) into Equation (17), we obtain:

\mathbf{Z}^{(k+1)}=\mathbf{L}^{k}{\mathbf{Z}}^{(0)}\widehat{\mathbf{C}}_{1}\prod_{i=2}^{k}\mathbf{V}_{k}

(21)

Now, to endow the fluctuation equation represented by Equation (21) with the capability of backpropagation, we introduce $k$ nonlinear activation function $\sigma_{1},\ldots\sigma_{k}$ , Subsequently, we instantiate $\widehat{\mathbf{C}}_{1},\mathbf{V}_{2},\mathbf{V}_{3},\ldots,\mathbf{V}_{k}$ as $k$ trainable matrices. As we expounded in the preceding text, the dimensions of these matrices merely need to satisfy the condition of input-output decrement, and thus these dimensions can be arbitrarily set. We denote X as the $i$ -th trainable matrix by given a decrement dimension $d_{0},\eta_{k},\eta_{k-1},\ldots,\eta_{3},\eta_{2},d_{k}$ , the aforementioned matrices can be instantiated as $\mathbf{W}_{1}=\widehat{\mathbf{C}}_{1}\in\mathbb{R}^{d_{0}\times\eta_{k}},\mathbf{W}_{2}=\mathbf{V}_{k}\in\mathbb{R}^{\eta_{k}\times\eta_{k-1}},\ldots\mathbf{W}_{k}=\mathbf{V}_{2}\in\mathbb{R}^{\eta_{2}\times d_{k}}$ . In summary, by making Equation (21) trainable, we obtain the following form of a trainable wave equation:

\mathbf{Z}^{(k)}=\mathrm{W{\scriptstyle AVE}}(\mathbf{Z},k;\mathbf{L})\\ =\sigma_{k}\underbrace{\Big{(}\mathbf{L}\ldots\overbrace{\sigma_{2}(\mathbf{L}\underbrace{\sigma_{1}(\mathbf{L}{\mathbf{Z}}^{(0)}\mathbf{W}_{1})}_{\begin{subarray}{c}\text{layer }1\\ \ldots\end{subarray}}\mathbf{W}_{2})}^{\text{layer }2}\ldots\mathbf{W}_{k}\Big{)}}_{\text{layer }k}\\ =\mathcal{M}(\mathbf{Z},k;\mathbf{L}).

(22)

We note that Equation (22) is the trainable form of the edge-based Laplacian wave equation (Equation (9)), and it is also the forward propagation form of the GCN. Through the derivation process we have provided, the equivalence between the GCN and the wave equation is thus substantiated.

Proof of Theorem 2

We first gives the vectorized representation of the resonance intensity. For each pair of nodes $(v_{j},v_{k})$ in $\mathcal{N}_{i}$ , if there is an edge between $v_{j}$ and $v_{k}$ , then this edge is counted in $T_{i}$ . This is indicated by $\mathbf{A}_{j,k}=1$ . Therefore $T_{i}$ equals the sum of $\mathbf{A}_{j,k}$ over all nodes pairs $(v_{j},v_{k})$ in $\mathcal{N}_{i}$ . Then, $T_{i}$ can be obtained by three steps:

•

First, the number of edges on all nodes pairs of $\mathcal{G}$ $(v_{j},v_{k})$ (not just pairs of neighbors of node $v_{i}$ ) is $\sum_{v_{j},v_{k}\in\mathcal{N}_{i}}\mathbf{A}_{j,k}$ .
•

Then, to ensure that $v_{j},v_{k}\in\mathcal{N}_{i}$ , we need to include the terms $A_{i,j}$ and $\mathbf{A}_{k,i}$ in the above summation. This is because $\mathbf{A}_{i,j}=\mathbf{A}_{k,i}=1$ if and only if $v_{j}$ and $v_{k}$ are both neighbors of node $v_{i}$ , i.e., $\mathbf{A}_{i,j}\mathbf{A}_{j,k}\mathbf{A}_{k,1}=1$ if and only if $v_{i}$ , $v_{j}$ and $v_{k}$ form a triangle.
•

Finally, $\sum_{i,j}\mathbf{A}_{i,j}\mathbf{A}_{j,k}\mathbf{A}_{k,1}$ counts the number of triangles that include node $v_{i}$ , This gives that $T_{i}=\sum_{i,j}\mathbf{A}_{i,j}\mathbf{A}_{j,k}\mathbf{A}_{k,i}$

Then, we have

\bar{{z}}_{i}^{(0)}T_{i}=\bar{{z}}_{i}^{(0)}\sum_{v_{j},v_{k}\in\mathcal{N}_{i}}\mathbf{A}_{jk}=\sum_{j,k}\mathbf{A}_{i,j}\mathbf{A}_{j,k}\mathbf{A}_{k,i}\sum_{l}\mathbf{Z}_{i,l}\\ =\sum_{j}\mathbf{A}_{i,j}\left(\mathbf{A}\mathbf{A}^{\top}\right)_{j,i}\sum_{l}\mathbf{Z}_{i,l}.

(23)

Hence, according to Definition 1, one may write

R(v_{i};0)=\bar{{z}}_{i}^{(0)}T_{i}+2p_{i}+8\mathrm{deg}_{i}\\ =\sum_{j}\mathbf{A}_{i,j}\left(\mathbf{A}^{2}\right)_{j,i}\sum_{l}\mathbf{Z}_{i,l}+2\sum_{j}(\mathbf{A}^{2})_{i,j}+8\sum_{j}\mathbf{A}_{i,j}.

(24)

We denote by $[1]_{N\times 1}\in\mathbb{R}^{N\times 1}$ whose entries are all equal to $1$ . We utilize right multiplication by $[1]_{1\times N}$ for row-wise summation of the matrix. Subsequently, we engage in the augmentation of $R(\cdot)$ to its vectorized representation:

\mathbf{R}^{\prime}(v_{i};0)=(\mathbf{A}^{3}\mathbf{Z_{0}}[1]_{N\times 1}+2\mathbf{A}^{2}[1]_{N\times 1}+8\mathbf{A}[1]_{N\times 1})_{i}

(25)

Here, $\mathbf{Z_{0}}=[\mathbf{Z}|\mathbf{0}]\in\mathbb{R}^{N\times N}$ represents the expansion of $\mathbf{Z}$ into a square matrix $\mathbf{Z_{0}}$ by padding zeros on the right side of $\mathbf{Z}$ . Moving forward, we incorporate the Sigmoid function $S(\cdot)$ and elucidate its role through the matrix notation

\mathbf{B}=S(\mathbf{A}S(\mathbf{A}\mathbf{Z})),\mathbf{C}=S(\mathbf{A}\mathbf{Z}).

(26)

Then we stipulate the minimum error function as

E(\mathbf{X})=-\frac{1}{288}\mathbf{X}^{3}+O(\mathbf{X}^{5}),

(27)

and denote following minimum error terms $\mathbf{o}_{1}=E(\mathbf{A}\mathbf{Z})$ , $\mathbf{o}_{2}=E(\mathbf{A}\mathbf{C})$ , $\mathbf{o}_{3}=E(\mathbf{A}\mathbf{B})$ and $\mathbf{o}=\frac{1}{16}\mathbf{A}^{2}\mathbf{o}_{1}[1]_{N\times 1}+\frac{1}{4}\mathbf{A}\mathbf{o}_{2}[1]_{N\times 1}+\mathbf{o}_{3}[1]_{N\times 1}$ . Utilizing the aforementioned notation, we can deduce

		$\displaystyle\frac{1}{64}\left(\mathbf{A}^{3}\mathbf{Z_{0}}[1]_{N\times 1}+2\mathbf{A}^{2}[1]_{N\times 1}+8\mathbf{A}[1]_{N\times 1}\right)+\mathbf{o}+[\frac{1}{2}]_{N\times 1}$
		$\displaystyle=[\frac{1}{2}]_{N\times 1}+\frac{1}{64}\mathbf{A}^{3}\mathbf{Z_{0}}[1]_{N\times 1}+\frac{1}{16}\mathbf{A}^{2}[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}[\frac{1}{2}]_{N\times 1}+\mathbf{o}$
		$\displaystyle=[\frac{1}{2}]_{N\times 1}+\frac{1}{4}(\frac{1}{4})^{2}\mathbf{A}^{3}\mathbf{Z_{0}}[1]_{N\times 1}+\frac{1}{4}\frac{1}{4}\mathbf{A}^{2}[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}[\frac{1}{2}]_{N\times 1}$
		$\displaystyle\quad+\frac{1}{4}\frac{1}{4}\mathbf{A}^{2}\mathbf{o}_{1}[1]_{N\times 1}+\frac{1}{4}\mathbf{A}\mathbf{o}_{2}[1]_{N\times 1}+\mathbf{o}_{3}[1]_{N\times 1}$
		$\displaystyle=[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}\Big{(}[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}[\frac{1}{2}]_{N\times 1}+(\frac{1}{4})^{2}\mathbf{A}^{2}\mathbf{Z_{0}}[1]_{N\times 1}$
		$\displaystyle\quad+\frac{1}{4}\mathbf{A}\mathbf{o}_{1}[1]_{N\times 1}+\mathbf{o}_{2}[1]_{N\times 1}\Big{)}+\mathbf{o}_{3}[1]_{N\times 1}$
		$\displaystyle=[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}\bigg{(}[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}\Big{(}[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}\mathbf{Z_{0}}[1]_{N\times 1}$
		$\displaystyle\quad+\mathbf{o}_{1}[1]_{N\times 1}\Big{)}+\mathbf{o}_{2}[1]_{N\times 1}\bigg{)}+\mathbf{o}_{3}[1]_{N\times 1}.$		(28)

For Equation (Proof of Theorem 2), we extract $[1]_{N\times 1}$ (there are countless methods of extraction, this is because the right multiplication by $[1]_{N\times 1}$ gives $[\frac{1}{2}]_{N\times 1}$ , indicating that the sum of each row of an $N\times N$ matrix is $\frac{1}{2}$ . Here, we consider the case where all elements of this matrix have the average value $\frac{1}{2N}$ ), thereby deriving Equation (Proof of Theorem 2) into:

		$\displaystyle\frac{1}{64}\left(\mathbf{A}^{3}\mathbf{Z_{0}}[1]_{N\times 1}+2\mathbf{A}^{2}[1]_{N\times 1}+8\mathbf{A}[1]_{N\times 1}\right)+\mathbf{o}+[\frac{1}{2}]_{N\times 1}$
		$\displaystyle=\bigg{(}[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\bigg{(}[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\Big{(}[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{Z_{0}}$
		$\displaystyle\quad+\mathbf{o}_{1}\Big{)}+\mathbf{o}_{2}\bigg{)}+\mathbf{o}_{3}\bigg{)}[1]_{N\times 1}.$		(29)

The matrix-based Taylor series expansion [4] for $\mathbf{X}$ is

S(\mathbf{X})=[\frac{S(0)}{N}]_{N\times N}+S^{\prime}(0)\mathbf{X}+\frac{1}{2}\mathbf{X}^{2}S^{\prime\prime}(0)+\\ \frac{1}{6}\mathbf{X}^{3}S^{\prime\prime\prime}(0)+\frac{1}{24}\mathbf{X}^{4}S^{\prime\prime\prime\prime}(0)+O(\mathbf{X})^{5}).

(30)

Since $S(0)=\frac{1}{2}$ , $S^{\prime}(0)=\frac{1}{4}$ , $S^{\prime\prime}(0)=0$ , $S^{\prime\prime\prime}(0)=-\frac{1}{48}$ and $S^{\prime\prime\prime\prime}(0)=0$ , by substituting them into Equation (30), we have

$\displaystyle S(\mathbf{X})$	$\displaystyle=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{X}+0\frac{1}{2}\mathbf{X}^{2}-\frac{1}{48}\frac{1}{6}\mathbf{X}^{3}+0\frac{1}{24}\mathbf{X}^{4}+O(\mathbf{X}^{5})$
	$\displaystyle=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{X}-\frac{1}{288}\mathbf{X}^{3}+O(\mathbf{X}^{5})$
	$\displaystyle=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{X}+E(\mathbf{X}).$	(31)

Hence, Equation (Proof of Theorem 2), through astute algebraic manipulation, can be metamorphosed into the form of a Taylor series expansion, as delineated by Equation (Proof of Theorem 2). In the ensuing steps, we shall meticulously engage in the transformation of the discrete constituents within Equation (Proof of Theorem 2), with the aim of perpetuating the advancement of its derivation

		$\displaystyle[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}+\mathbf{o}_{1}$
		$\displaystyle\qquad=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{Z_{0}}+E(\mathbf{A}\mathbf{Z_{0}})=S(\mathbf{A}\mathbf{Z_{0}})=\mathbf{C},$
		$\displaystyle[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\Big{(}[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{Z_{0}}+\mathbf{o}_{1}\Big{)}+\mathbf{o}_{2}$
		$\displaystyle\qquad=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{C}+\mathbf{o}_{2}$
		$\displaystyle\qquad=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{C}+E(\mathbf{A}\mathbf{C})=S(\mathbf{A}\mathbf{C})=\mathbf{B},$
		$\displaystyle[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\bigg{(}[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\Big{(}[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{Z_{0}}$
		$\displaystyle+\mathbf{o}_{1}\Big{)}+\mathbf{o}_{2}\bigg{)}+\mathbf{o}_{3}=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{B}+\mathbf{o}_{3}$
		$\displaystyle\qquad=S(\mathbf{A}\mathbf{B})=S(\mathbf{A}S(\mathbf{A}S(\mathbf{A}\mathbf{Z_{0}}))).$		(32)

Substituting Equation (Proof of Theorem 2) into Equation (Proof of Theorem 2), we can ultimately derive Equation (Proof of Theorem 2) into

		$\displaystyle\frac{1}{64}\left(\mathbf{A}^{3}\mathbf{Z_{0}}[1]_{N\times 1}+2\mathbf{A}^{2}[1]_{N\times 1}+8\mathbf{A}[1]_{N\times 1}\right)+\mathbf{o}+[\frac{1}{2}]_{N\times 1}$
		$\displaystyle\qquad=S(\mathbf{A}S(\mathbf{A}S(\mathbf{A}\mathbf{Z_{0}})))[1]_{N\times 1}$
		$\displaystyle\qquad=\mathrm{\scriptstyle SUM_{row}}(S(\mathbf{A}S(\mathbf{A}S(\mathbf{A}\mathbf{Z_{0}})))_{i,:}),$		(33)

where $\mathrm{\scriptstyle SUM_{row}}(\cdot)$ denotes the operation of summing the elements of each row in the input matrix. For $\mathbf{Z_{0}}$ , irrespective of left multiplication by any matrix or application of an element-wise activation function, its rightmost $N-d_{0}$ columns persist as zero. As a result, when right-multiplied by $[1]_{N\times 1}$ , (which effectively sums each row), the outcomes for $\mathbf{Z_{0}}$ and $\mathbf{Z}$ are identical. This is because the zeroed columns in $\mathbf{Z_{0}}$ do not contribute to the row sum. Consequently, even when $\mathbf{Z_{0}}$ is reverted back to $\mathbf{Z}$ , Equation (Proof of Theorem 2) still holds. Then, it is perceptible that the yield of Equation (Proof of Theorem 2) is emblematic of a 3rd MP scheme, propelled by the GCN — wherein the GCN variant deployed abstains from the consideration of self-loops, thereby adopting the use of matrix $\mathbf{A}$ in lieu of $\mathbf{L}$ — and integrates the sigmoid function as its activation. Therefore, we can write $\mathcal{M}(\mathbf{Z},3;\mathbf{A})=S(\mathbf{A}S(\mathbf{A}S(\mathbf{A}\mathbf{Z})))$ . Consequently, by effectuating the incorporation of Equation (Proof of Theorem 2) into Equations (Proof of Theorem 2) and (25), and electing to neglect the error magnitudes $\mathbf{o}$ , we are enabled to derive the following expression

\displaystyle\mathbf{R}^{\prime}(v_{i};0)

\displaystyle\propto 64\mathrm{\scriptstyle SUM_{row}}(\mathcal{M}(\mathbf{Z},3;\mathbf{A})_{i,:})-[32]_{N\times 1}.

(34)

Similarly, we have

	$\displaystyle\mathbf{R}^{\prime}(v_{i};1)$	$\displaystyle\propto 64\mathrm{\scriptstyle SUM_{row}}(\mathcal{M}(\mathbf{Z}^{(1)},3;\mathbf{A}))_{i,:}-[32]_{N\times 1}$
		$\displaystyle=64\mathrm{\scriptstyle SUM_{row}}\mathcal{M}(\mathbf{Z},4;\mathbf{A}))_{i,:}-[32]_{N\times 1}.$		(35)

Hence, we can write

	$\displaystyle\mathbf{R}^{\prime}(v_{i};k)$	$\displaystyle\propto 64\mathrm{\scriptstyle SUM_{row}}(\mathcal{M}(\mathbf{Z},k+3;\mathbf{A}))_{i,:}-[32]_{N\times 1}$
		$\displaystyle=64[\bar{{z}}_{i}^{(k+3)}]_{N\times 1}-[32]_{N\times 1}.$		(36)

Through the transposition of Equation (Proof of Theorem 2) into a scalar representation (i.e., $R(v_{i};k)$ ), we are capacitated to consummate the derivation of Equation (2).

Proof of Theorem 3

Adversaries employ the following methodology to attack node $v_{i}$ : Initially, a target label $\mathbf{y}^{\prime}_{i}$ is defined, wherein $\mathbf{y}^{\prime}_{i}$ and $\mathbf{y}_{i}$ differ only in the label value of the target nodes (regardless of single target or multiple targets), which is altered to the target labels, while the labels of all other nodes remain unchanged. Subsequently, by modifying the graph structure, specifically $\mathbf{A}$ , with a budget constraint $r$ a new graph which with $\mathbf{A}^{\prime}$ as adjacency matrix is obtained. The objective is to maximize the similarity between the output and $\mathbf{y}^{\prime}_{i}$ (i.e., minimize the loss between them) after the forward propagation by $\mathcal{M}$ .

Ultimately, as the primary goal is to manipulate the one-hot output [5], i.e., the magnitude relationship in the $i$ -th row of $\mathbf{Z}^{(K)}$ , an activation function is added only in the final layer to compute the loss function with the label matrix [6]. Consequently, the optimization target of the adversaries can be expressed as:

\arg\min_{\mathbf{A}^{\prime}}\mathrm{MSE}(\sigma(\mathbf{A}^{\prime K}\mathbf{Z}\prod_{i=1}^{K}\mathbf{W}^{(i)}),\mathbf{Y}^{\prime}).

(37)

For $r$ , less than 5% of total edges is considered to be inconspicuous in common. Hence, given the necessity to minimize the perceptibility of the perturbation, $r<5\%|\mathcal{V}|$ . Therefore, by introducing a sparse matrix $\mathbf{P_{1}}$ containing only 0, 1, and -1, the perturbation process can be represented as $\mathbf{A}^{\prime}=\mathbf{A}+\mathbf{P_{1}}$ . Simultaneously, during training, as the activation functions between layers have been disregarded (as aforementioned, this does not affect the size relationship of the output), the parameter matrices between different layers can be unified into a single parameter matrix. We denote $\mathbf{W}_{K}=\prod_{i=1}^{K}\mathbf{W}^{(i)}$ , hence, Equation (37) can be rewritten as:

\arg\min_{\mathbf{P_{1}}}\mathrm{MSE}(\sigma((\mathbf{A}+\mathbf{P_{1}})^{K}\mathbf{Z}\mathbf{W}_{K}),\mathbf{Y}^{\prime}).

(38)

Considering when $K=1$ , Equation (38) becomes:

\arg\min_{\mathbf{P_{1}}}\mathrm{MSE}(\sigma(\mathbf{A}\mathbf{Z}\mathbf{W}^{(1)}+\mathbf{P_{1}}\mathbf{Z}\mathbf{W}^{(1)}),\mathbf{y^{\prime}}).

(39)

To quantify the cost of the equation above, we consider that in non-adversarial scenarios, the GCN can reach the computational universality of the ground-truth labels, i.e., there always exists $\mathbf{W}$ and a minimal value $\sigma$ such that $\mathrm{MSE}(\sigma(\mathbf{A}\mathbf{Z}\mathbf{W}),\mathbf{Y})\leq\epsilon$ . Since $\mathbf{P_{1}}$ is perturbable, to maintain the concealment of the perturbations and minimize the modification degree of the GCN parameters, i.e., $\mathbf{W}^{(1)}=\mathbf{W}$ , it is only necessary to ensure that, with an identity activation $\sigma_{I}(\cdot)$ , $\sigma_{I}(\mathbf{P_{1}}\mathbf{Z}\mathbf{W}^{(1)})$ is as close to a zero matrix as possible. Thus, Equation (39) can be further simplified to:

\arg\min_{\mathbf{P}}\mathrm{MSE}(\sigma_{I}(\mathbf{P_{1}}\mathbf{Z}\mathbf{W}^{(1)}),[0]_{N}).

(40)

Upon observation, we consider $\mathbf{P_{1}}$ can represent a distinct graph $\mathcal{G}_{\mathbf{P_{1}}}$ , which has the same number of nodes as $\mathcal{G}$ , with most nodes being isolated. Perturbations can be viewed as edges, where inserted edges (i.e., when an edge is inserted between nodes $v_{i}$ and $v_{j}$ , $\mathbf{P_{1}}_{i,j}=1$ ) have a weight of $1$ , and deleted edges (i.e., when an edge is deleted between nodes $v_{i}$ and $v_{j}$ , $\mathbf{P_{1}}_{i,j}=-1$ ) have a weight of $-1$ . Therefore, the equation can be approximated as another GCN training paradigm. The difference is that the training parameters have changed from $\mathbf{W}$ to $\mathbf{P_{1}}$ . At this point, since $\mathcal{G}_{\mathbf{P_{1}}}$ is a disconnected subgraph of $\mathcal{G}$ with $r$ edges, hence, when $K=1$ , the maximum number of disconnected subgraphs of $\mathcal{G}$ with $r$ edges is the maximum number of forward propagations that the attack model needs to compute. That is, $\mathrm{Cost}(\mathbf{A},r,1)\leq C(|\mathcal{V}|,r)$ .

Proceeding to the case where $K=2$ , with $\mathbf{P_{2}}$ which plays the same role of $\mathbf{P_{1}}$ , Equation (38) can be reformulated as follows:

\arg\min_{\mathbf{P_{2}}}\mathrm{MSE}(\sigma(\mathbf{A}^{2}\mathbf{Z}\mathbf{W}_{2}+\mathbf{A}\mathbf{P_{2}}\mathbf{Z}\mathbf{W}_{2}+\mathbf{P_{2}}^{2}\mathbf{Z}\mathbf{W}_{2}),\mathbf{Y}^{\prime}).

(41)

Analogous to the scenario where $K=1$ , the objective of the adversary for $K=2$ can be simplified to:

\arg\min_{\mathbf{P_{2}}}\mathrm{MSE}(\sigma_{I}((2\mathbf{A}+\mathbf{P_{2}})\mathbf{P_{2}}\mathbf{Z}\mathbf{W}_{2}),[0]_{N}).

(42)

This expression can be interpreted as an extension of the $K=1$ , case: an attack on a graph $\mathcal{G}_{2\mathbf{A}}$ with an adjacency matrix as $2\mathbf{A}$ The maximum number of forward propagations for the attacker is equal to the number of largest disconnected subgraphs in $\mathcal{G}_{2\mathbf{A}}$ , which is the same as the $K=1$ case. We present illustrative examples of $\mathcal{G}$ , $\mathcal{G}_{\mathbf{P_{1}}}$ , $\mathcal{G}_{\mathbf{P_{2}}}$ and $\mathcal{G}_{2\mathbf{A}}$ in Figure 7.

The situation differs when $K=3$ . Specifically, Equation (38) for $K=3$ is given by:

\arg\min_{\mathbf{P_{3}}}\mathrm{MSE}(\sigma(\mathbf{A}^{3}\mathbf{Z}\mathbf{W}_{3}+\mathbf{A}^{2}\mathbf{P_{3}}\mathbf{Z}\mathbf{W}_{3}\\ +\mathbf{A}\mathbf{P_{3}}^{2}\mathbf{Z}\mathbf{W}_{3}+\mathbf{P_{3}}^{3}\mathbf{Z}\mathbf{W}_{3}),\mathbf{Y}^{\prime}).

(43)

In the above expression, it is required to ensure:

\arg\min_{\mathbf{P}}\mathrm{MSE}(\sigma_{I}(\mathbf{A}^{2}\mathbf{P}\mathbf{Z}\mathbf{W}_{3}+\mathbf{A}\mathbf{P}^{2}\mathbf{Z}\mathbf{W}_{3}+\mathbf{P}^{3}\mathbf{Z}\mathbf{W}_{3}),[0]_{N}).

(44)

It can be observed that for $K=3$ the maximum number of connected subgraphs in $\mathcal{G}_{\mathbf{P_{3}}}$ depends on the number of edges in the graph $\mathcal{G}_{\mathbf{A}^{2}}$ , represented by $\mathbf{A}^{2}$ , rather than the number of edges in $\mathcal{A}$ (which is the case for $K=1$ and $K=2$ ). In $\mathbf{A}^{2}$ , each element $\mathbf{A}^{2}_{i,j}$ represents the number of paths of length 2 from node $v_{i}$ to $v_{j}$ in $\mathcal{G}_{\mathbf{A}^{2}}$ . Therefore, the number of non-zero elements in $\mathbf{A}^{2}$ can be at most $\mathcal{V}^{2}$ , as each edge can form a path of length 2 with the other $\mathcal{V}-1$ nodes. Hence, the upper bound on the number of edges contained in $\mathcal{G}_{\mathbf{A}^{2}}$ is $\frac{|\mathcal{V}|^{2}}{2}$ . The maximum number of disconnected $r$ -edges subgraphs (i.e., $\mathcal{G}_{\mathbf{P_{3}}}$ ) is $C(\frac{|\mathcal{V}|^{2}}{2},r)$ .

Upon obtaining $\mathcal{G}_{\mathbf{P_{3}}}$ , it needs to be sent to the last two terms, namely $\mathbf{A}\mathbf{P}^{2}\mathbf{Z}\mathbf{W}_{3}$ and $\mathbf{P}^{3}\mathbf{Z}\mathbf{W}_{3}$ for computation. Considering that $\mathbf{P_{3}}$ is a sparse matrix, $\mathbf{P_{3}}^{3}$ is generally a zero matrix, implying that all nodes in the graph $\mathcal{G}_{\mathbf{P_{3}}^{3}}$ formed by it are isolated. Consequently, the last two terms can be reduced to one terms, meaning that after obtaining $\mathcal{G}_{\mathbf{P_{3}}}$ , one more matrix multiplication-based forward propagations are needed. In summary, $\mathrm{Cost}(\mathbf{A},r,3)\leq 2C\left(\frac{|\mathcal{V}|^{2}}{2},r\right)$ .

Utilizing the same methodology, for $K>3$ , each decomposition introduces a graph $\mathcal{G}_{\mathbf{A}^{K-1}}$ with $\mathbf{A}^{K-1}$ as the adjacency matrix, and a linearly increasing number of summation terms. Combining the above analysis, we can conclude that the maximum number of edges in $\mathcal{G}_{\mathbf{A}^{K-1}}$ , grows exponentially, and the number of matrix multiplication-based forward propagation summation terms determined by the maximum number of disconnected subgraphs $\mathcal{G}_{\mathbf{P_{r}}}$ in $\mathcal{G}_{\mathbf{A}^{K-1}}$ grows linearly. That is, when $K\geq 3$ , $\mathrm{Cost}(\mathbf{A},r,K)\leq(K-1)C\left(\frac{|\mathcal{V}|^{K-1}}{2},r\right)$ , which completes the proof.

Proof of Proposition 1

We adopt an intuitive approach to simplify this computation. Given that $\mathcal{G}_{j,k}$ contains one fewer edge than $\mathcal{G}$ (i.e., $(v_{j},v_{k})$ ), $\mathbf{A}_{\mathcal{G}^{j,k}}$ and $\mathbf{A}$ differ in only two elements, hence the subtraction $\mathbf{A}-\mathbf{A}_{\mathcal{G}^{j,k}}$ results in a sparse matrix $\mathbf{D}^{j,k}$ , where the majority of elements are zero except for the elements at the $j$ -th row and $k$ -th column, and $k$ -th row and $j$ -th column, which are 1. Given that the parameters of $\mathbf{W}^{(\ell-1)}$ have been trained during the calculation of the $\ell$ -th layer’s edge-transmitted signals, we use $\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}=\mathbf{Z}^{(\ell-1)}\mathbf{W}^{(\ell-1)}$ to represent the constant matrix in the perspective of the $\ell$ -th layer. Then, Equation (5) simplifies to

\mathbf{Z}^{(\ell)}_{\mathcal{G}^{j,k}}=\mathbf{A}\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}-\mathbf{D}^{j,k}\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}.

(45)

Given that $\mathbf{D}^{j,k}$ is a symmetric matrix with only two non-zero values, the result of $\mathbf{D}^{j,k}\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}$ , defined as

Q(\mathbf{Z}{\mathbf{W}}^{(\ell-1)};j,k)=[\mathbf{0},\ldots,\underbrace{[\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}]_{k,:}}_{j\text{-th row}},\ldots,\underbrace{[\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}]_{j,:}}_{k\text{-th row}},\ldots,\mathbf{0}]^{\top},

(46)

does not need to be recalculated for each $v_{j}$ , $v_{k}$ , but can instead be reassembled by indexing the rows of $\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}$ according to the chosen of $v_{j}$ , $v_{k}$ , i.e.,

Through the method stated above, Equation (5) is eventually simplified to the conclusion of the Proposition 1 (Equation (7)). It can be observed that Equation (7) encompasses all arbitrary selections of $v_{j}$ and $v_{k}$ , but only the $\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}$ and $\mathbf{A}_{G}\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}$ term is required to be computed. Thus, during the computation of edge-transmitted signals in GRN, we can reduce the initial $|\mathcal{V}|$ computations down to a single computation. Thereafter, when transitioning among different edges, it is only necessary to conduct a column-wise concatenation and substraction of the matrices.

Additional Experiments

In this section, we elucidate the behavior of layers arising from resonance across an expanded range of datasets. Concurrently, we delineate the strength distribution of LRS-constructed graphs drawn from additional datasets. The experimental framework employed here remains consistent with the main body of the text. It is imperative to note that, in the interest of maintaining methodological rigor and fairness, these experiments were conducted based on randomized initializations that have been reset. Consequently, under identical experimental conditions, there exists a slight discrepancy between these results and those presented in the primary narrative. Figures 8(a) and 8(b), which can be conceptually viewed as extensions of Figures 4 and 5(b) respectively, depict these outcomes. The conclusions derived from the illustrated results demonstrate a congruence with the central assertions delineated in the main discourse.

References

[1]

Breuer Heinz-Peter, Huber Wolfgang, and Petruccione Francesco. 1994. Fluctuation effects on wave propagation in a reaction-diffusion process. Physica D, 73(3): 259–273.
[2]

Golub Gene H and Reinsch Christian. 1971. Singular value decomposition and least squares solutions. Linear algebra.
[3]

Vannieuwenhoven Nick, Vandebril Raf and Meerbergen Karl. 2012. A new truncation strategy for the higher-order singular value decomposition. SIAM J. Sci. Comput., 34(2): 1027–1052.
[4]

Al-Mohy, Awad H and Higham, Nicholas J. 2011. Computing the action of the matrix exponential, with an application to exponential integrators. SIAM J. Sci. Comput., 33(2): 488–511.
[5]

Sun Lichao, Dou Yingtong, Yang Carl, et al. 2022. Adversarial attack and defense on graph data: A survey. IEEE Trans. Knowl. Data Eng., 35(8): 7693–7711.
[6]

Zügner Daniel, Akbarnejad Amir and Günnemann Stephan. 2018. Adversarial attacks on neural networks for graph data. Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2847–2856.

Towards Inductive Robustness: Distilling and Fostering Wave-induced Resonance in Transductive GCNs Against Graph Adversarial Attacks

Abstract

Introduction

Preliminaries

Notations

GCN

Wave Equation

3-layer GCN Possesses Adversarial Robustness via Wave-induced Resonance

Equivalence of GCN-driven MP and Wave Equation

Theorem 1.

Mathematical Definition of Resonance in GCN

Definition 1.

Resonance arises in 3rd MP

Theorem 2.

Vast Perturbation Search Space of 3-layer GCN

Theorem 3.

Graph Resonance-fostering Network

Principle Overview

GiG_{i}: Local Resonance Subgraph

𝐄Gi(ℓ)\mathbf{E}^{(\ell)}_{G_{i}}: Edge-transmitted Signals

Simplifying the Computational Overhead of 𝐄Gi(ℓ)\mathbf{E}^{(\ell)}_{G_{i}}

Proposition 1.

Learning the Parameters

Experiments

Datasets

Baselines

Pinpointing the Layer of Resonance

Attack Success Rate Cliff-like declines on 3-layer GCN

How Robust of LRS-constructed Graphs

Generalizable Robustness of GRN

Ablation Studies

Conclusions

Acknowledgments

References

Proofs

Proof of Theorem 1

Definition 2.

Proof of Theorem 2

Proof of Theorem 3

Proof of Proposition 1

Additional Experiments

References

$G_{i}$ : Local Resonance Subgraph

$\mathbf{E}^{(\ell)}_{G_{i}}$ : Edge-transmitted Signals

Simplifying the Computational Overhead of $\mathbf{E}^{(\ell)}_{G_{i}}$