This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Towards Inductive Robustness: Distilling and Fostering Wave-induced Resonance in Transductive GCNs Against Graph Adversarial Attacks

Ao Liu1, Wenshan Li2, Tao Li1, Beibei Li1, Hanyuan Huang1, Pan Zhou3 Corresponding author
Abstract

Graph neural networks (GNNs) have recently been shown to be vulnerable to adversarial attacks, where slight perturbations in the graph structure can lead to erroneous predictions. However, current robust models for defending against such attacks inherit the transductive limitations of graph convolutional networks (GCNs). As a result, they are constrained by fixed structures and do not naturally generalize to unseen nodes. Here, we discover that transductive GCNs inherently possess a distillable robustness, achieved through a wave-induced resonance process. Based on this, we foster this resonance to facilitate inductive and robust learning. Specifically, we first prove that the signal formed by GCN-driven message passing (MP) is equivalent to the edge-based Laplacian wave, where, within a wave system, resonance can naturally emerge between the signal and its transmitting medium. This resonance provides inherent resistance to malicious perturbations inflicted on the signal system. We then prove that merely three MP iterations within GCNs can induce signal resonance between nodes and edges, manifesting as a coupling between nodes and their distillable surrounding local subgraph. Consequently, we present Graph Resonance-fostering Network (GRN) to foster this resonance via learning node representations from their distilled resonating subgraphs. By capturing the edge-transmitted signals within this subgraph and integrating them with the node signal, GRN embeds these combined signals into the central node’s representation. This node-wise embedding approach allows for generalization to unseen nodes. We validate our theoretical findings with experiments, and demonstrate that GRN generalizes robustness to unseen nodes, whilst maintaining state-of-the-art classification accuracy on perturbed graphs.

Introduction

In recent years, graph neural networks (GNNs), through the capabilities afforded by inductive learning, have emerged as the most potent instruments for node classification tasks. Nevertheless, earlier transductive models, such as graph convolutional networks (GCNs) (Kipf and Welling 2017), have inadvertently introduced vulnerabilities to adversarial attacks within the GNN framework. It has been observed that perturbed graphs derived from GCNs serving as surrogate models have the potential to compromise the outputs of inductive GNNs when transferred. In real-world applications, where trust and accuracy are non-negotiable (Chen et al. 2021; Nadal et al. 2021; Zhao et al. 2022; Berberidis and Giannakis 2019; Xiao, Chen, and Shi 2019), such vulnerabilities can significantly jeopardize public trust (Kreps and Kriner 2020), distort human decision-making (Walt, Jack, and Christof 2019), and threaten human well-being (Samuel et al. 2019). Addressing the vulnerabilities introduced by transductive GCNs into the GNNs’ community is of paramount importance.

Distinct from discrete feature data like images or text, graph data comprises a connected set of features through its topological structure. This interconnectedness naturally encourages the adoption of a global input-output mechanism to establish a learning channel from features to labels, a paradigm referred as transductive learning (Kipf and Welling 2017; Defferrard, Bresson, and Vandergheynst 2016; Bruna et al. 2013), with GCNs epitomizing this approach. This very transductive nature of GCNs offers adversaries an ideal environment for launching attacks (Liu et al. 2022). Leveraging this global input-output pattern, given sufficient computation, adversaries can invariably devise perturbations that are both concealment and effective (Sun et al. 2022). Given that adversaries exploit vulnerabilities inherent to transductive models to compromise the GNNs’ communities, the formulation of a more robust transductive model has ascended as the prevailing defensive approach.

To defense adversarial attacks, early research predominantly sought to fortify GCN’s tolerance to perturbations by adversarial training through random edge drops (Dai et al. 2018). Recently, a shift towards self-supervised training methods has been observed. These techniques sidestep the trap set by adversaries, which bait the model into misclassifying specific inputs. Instead of singularly focusing on enhancing the model’s robustness to a given label space, they aim to expand the GCN’s overall robustness to potential perturbed graphs. Key representatives of these research endeavors include: (1) In RGCN (Zhu et al. 2019), the Gaussian distributions are employed to replace the node hidden representations across each GCN layer, aiming to mitigate the adversarial modifications’ impact. (2) By introducing a singular value decomposition (SVD) filter before the GCN processing, GCN-SVD (Entezari et al. 2020) is designed to discard adversarial edges from the training dataset. (3) STABLE (Li et al. 2022) introduces enhancements in GCN’s forward propagation by incorporating functions that sporadically reinstate edges which were approximately removed. (4) EGNN (Liu et al. 2021) leverages graph smoothing techniques to confine the permutation setting space, effectively excluding the majority of non-smooth permutations.

However, current research, aiming to improve GCN-based models into a robust transductive variant against attacks, inadvertently carries over transductive-introduced weaknesses (Hamilton, Ying, and Leskovec 2017). Specifically, these models can’t handle unseen nodes and are limited to fixed structures, lacking generalization. This restricts their applicability. If adversaries slightly adjust tactics, defenders must retrain their models for safety. The cause is that GCNs’ vulnerabilities are inherent. To enhance their robustness, these vulnerabilities require targeted solutions. Deviating from the context of GCNs could hinder a thorough analysis of attack mechanisms. This, in turn, would obstruct the transition from transductive robust models to inductive ones. Until we harness GCN’s inherent robustness for inductive models, we will be stuck in a cycle of constantly refining transductive ones to address vulnerabilities.

In addressing this conundrum, our exploration unveiled an intriguing intrinsic source of robustness within the GCN itself. Without resorting to additional designs, merely deepening the standard 2-layer GCN to a 3-layer structure endows it with an innate (albeit partial) robustness. Importantly, the mechanism underpinning this robustness can be distilled. By purposefully fostering this intrinsic mechanism, it has paved the way for us to architect a robust inductive model. Employing this approach serves a dual purpose: On one hand, it facilitates a precision-oriented confrontation against the perturbations devised specifically by adversaries for transductive structures, ensuring the efficacy of our defensive strategies. On the other hand, it enables us to integrate this robustness mechanism into inductive frameworks, thereby achieving a seamless melding of inductiveness and robustness.

Specifically, we demonstrate that the vibrations of node signals within the GCN-driven message passing (MP) are equivalent to the edge-based waves, formulated by wave equations (Friedman and Tillich 2004; Shatah and Struwe 1993). Given this equivalence, it follows that GCNs inherently possess the potential for resonance (Kovalyov 1989), allowing them to harness the natural advantages of waves in defending against perturbations (Blas and Jenkins 2022). Then, we introduce a mathematical definition for the intensity of such resonance in GCNs. This definition, which outlines the scope and weights of a node’s connections to its neighbors, concurrently adheres to four principles: universality, adaptation via MP, node-independence, and topological correlation. Subsequently, we demonstrated that for 3+ layer GCNs, an invariant mapping exists, translating GCNs’ outputs into resonance intensity, manifesting as nodes capturing their surrounding local weighted structure.

Informed by these insights, we introduce the Graph Resonance-fostering Network (GRN) for inductive learning. The core of GRN is that it distills the structure resonating with nodes as local resonance subgraphs. Then, within this subgraph, GRN fosters the resonance by embedding both the node’s signals and the signals transmitted through edges as central node’s representation. This embedding approach is generalizable across graph structures. If the surrounding topology of a node (with unseen ones) can be clearly determined to distill the local resonance subgraph, robust and inductive graph learning is achieved. Our contributions are:

  • We propose the first inductive and robust GNN.

  • We prove that a 3-layer GCN inherently possesses an distillable robustness.

  • We prove the equivalence between GCN-driven signal vibrations and edge-based waves.

Preliminaries

Notations

We consider connected graphs 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) consisting N=|𝒱|N=|\mathcal{V}| nodes. Let 𝐀{0,1}N×N\mathbf{A}\in\{0,1\}^{N\times N} be the adjacency matrix. Let generic symbol 𝐋\mathbf{L} be the Laplacian in its broadest sense. The feature and one-hot label matrix are 𝐙N×d0\mathbf{Z}\in\mathbb{R}^{N\times d_{0}} and 𝐘N×dL\mathbf{Y}\in\mathbb{R}^{N\times d_{L}} respectively. The edge connected nodes viv_{i} and vjv_{j} is written as (vi,vj)(v_{i},v_{j}) or (vj,vi)(v_{j},v_{i}). The neighborhood 𝒩i\mathcal{N}_{i} of a node viv_{i} consists of all nodes vjv_{j} for which (vi,vj)(v_{i},v_{j})\in\mathcal{E}. Let degi\mathrm{deg}_{i} be the degree of node viv_{i}. The feature vector and one-hot label of node viv_{i} are 𝐳i\mathbf{z}_{i} and 𝐲i\mathbf{y}_{i}.

GCN

Under the topology 𝐋\mathbf{L}, with 𝐙\mathbf{Z} as the input, the output at the kk-th layer of a GCN is denoted as (𝐙,k;𝐋)\mathcal{M}(\mathbf{Z},k;\mathbf{L}). The kk-th parameter matrix of \mathcal{M} is 𝐖(k)dk×dk+1\mathbf{W}^{(k)}\in\mathbb{R}^{d_{k}\times d_{k+1}}. 𝐙(k)=[𝐳1(k),,𝐳N(k)]\mathbf{Z}^{(k)}=[\mathbf{z}_{1}^{(k)},\ldots,\mathbf{z}_{N}^{(k)}] denotes the features in kk-th MP.

Wave Equation

The edge-based wave equation introduces a relationship between a graph-based signal g=WAVE(𝐙,k;𝐋)g=\mathrm{W{\scriptstyle AVE}}(\mathbf{Z},k;\mathbf{L}) and its topological structure. Let cc be a constant, it is defined as 2gk2=𝐋kgc\frac{\partial^{2}g}{\partial k^{2}}=-\mathbf{L}^{k}g\cdot c (Friedman and Tillich 2004). Herein gg can be instantiated as any discernible signal.

3-layer GCN Possesses Adversarial Robustness via Wave-induced Resonance

Equivalence of GCN-driven MP and Wave Equation

Here we demonstrate that the signal vibrations driven by GCNs, are equivalent to waves on graph topologies and can be characterized by nonlinear wave equations.

Theorem 1.

Let (𝐙,k;𝐋)\mathcal{M}(\mathbf{Z},k;\mathbf{L}) and WAVE(𝐙,k;𝐋)\mathrm{W{\scriptstyle AVE}}(\mathbf{Z},k;\mathbf{L}) denote the signals of the kk-th MP and kk-th wave respectively, under the topological structure represented by 𝐋\mathbf{L}. It is established that for the given kk and ()\mathcal{M}(\cdot), there exists WAVE()\mathrm{W{\scriptstyle AVE}}(\cdot) satisfies (𝐙,k;𝐋)=WAVE(𝐙,k;𝐋),𝐋𝐋^\mathcal{M}(\mathbf{Z},k;\mathbf{L})=\mathrm{W{\scriptstyle AVE}}(\mathbf{Z},k;\mathbf{L}),\forall\mathbf{L}\in\widehat{\mathbf{L}}, where 𝐋^\widehat{\mathbf{L}} are the Laplacian matrices of all attribute graphs.

This study draws an analogy between the node signals in GCN-driven MP and waves, considering edges as the transmission medium. Research indicates that in systems producing waves, resonance can arise between waves and their medium (Ahmad et al. 2023; Bykov, Bezus, and Doskolovich 2019). Building on this understanding, we can reaffirm our empirical observations about the GCN training pattern: in non-adversarial contexts, GCNs converge to the most natural, or congruent with ground truth, signal MP paradigm during training. Under this premise, the messages transmitted by nodes and edges in the graph manifest a natural coupling state, maintaining a benign mapping relationship :𝒢𝐘\mathcal{M}:\mathcal{G}\to\mathbf{Y}. The key to optimizing \mathcal{M} is the resonance between node signals 𝐙()\mathbf{Z}^{(\ell)} and edge signals 𝐄()\mathbf{E}^{(\ell)}. In adversarial situations, adversaries manipulate node signals by rewiring edges, which inadvertently induces unnatural, i.e., noncongruent with ground truth, MP patterns. Under this scheme, the benign resonance is disrupted, resulting in a malignant mapping relationship :𝒢𝐘\mathcal{M}:\mathcal{G}^{\prime}\to\mathbf{Y}^{\prime}, where 𝒢\mathcal{G}^{\prime} and 𝐘\mathbf{Y}^{\prime} is the perturbed graph and label, respectively.

Mathematical Definition of Resonance in GCN

Maintaining benign resonance becomes an intuitive defensive mechanism as it intrinsically resists unnatural perturbations. To actualize control over this resonance, thereby purposefully fostering resonance within the GCN, we subsequently delineate a detailed definition of this resonance. Thus, this definition should comply with the following conditions: 1) Every node within a graph should possess a computable resonance intensity, 2) the resonance intensity of all nodes should evolve in accordance with MP, 3) each node should maintain an independent resonance intensity, and 4) the stronger a node’s connection to its surrounding topology, the greater its perceived resonance intensity.

To devise a methodology compliant with the desired conditions, we consider node viv_{i} and utilize its latent representation (Kula 2015) z¯i(k)=j𝐳i,j(k)\bar{{z}}_{i}^{(k)}=\sum_{j}\mathbf{z}^{(k)}_{i,j} to quantify the intensity of the node signals. Furthermore, we use TiT_{i}, the count of edges among nodes in 𝒩i\mathcal{N}_{i}, to measure the connectivity strength specific to the edges at the given nodes. Then, we use the total number pi=j𝐀i,j2p_{i}=\sum_{j}\mathbf{A}^{2}_{i,j} of walks of length 2 originating from viv_{i} to any node in 𝒢\mathcal{G}, to quantity the magnitude of connectivity density that viv_{i} exhibits in the structure.

Accordingly, we propose the following definition to quantify the resonance intensity at node viv_{i}:

Definition 1.

The resonance intensity of viv_{i} on kk-th MP is

R(vi;k)=def.z¯i(k)Ti+2pi+8degi.R(v_{i};k)\overset{\text{def.}}{=}\bar{{z}}_{i}^{(k)}T_{i}+2p_{i}+8\mathrm{deg}_{i}. (1)

The unique of defining resonance intensity can be articulated as follows: it not only allows for an interpretable quantification of the resonance on different nodes, but it is also directly observable within MP. This implies that under such a definition, the resonance intensity of any node at any given MP epoch on a graph can be independently calculated, obviating the need for the GCN computational paradigm.

Resonance arises in 3rd MP

Definition 1 facilitates the quantification of resonance for any signal function on any graph, irrespective of whether or not it is driven by GCN. Nonetheless, an intriguing finding has been proven: the wave system constructed by GCN inherently and involuntarily arises resonance, which is outlined in the theorem:

Theorem 2.

Let z¯i(k)\bar{{z}}_{i}^{(k)} be the latent signal formed by GCN-driven MP, we have:

R(vi;k)64z¯i(k+3)32.R(v_{i};k)\propto 64\bar{{z}}_{i}^{(k+3)}-32. (2)

Theorem 2 unveils an intriguing phenomenon: for k3k\geq 3, there subsists an invariant mapping, which transposes the GCN-driven signal into a resonance intensity that bears no correlation with the GCN paradigm. Given that Definition 1 has established the resonance intensity as a measure of the coupling strength between nodes and structure within the graph, we can thus characterize it as the degree of coupling. Consequently, it can be asserted that prior to 3rd MP iteration, the GCN appears to have yet to delve into the coupling paradigm between nodes and structure within the graph. However, subsequent to the 3rd MP, due to the persistent presence of the invariant mapping, it can be construed that the GCN has fortuitously assimilated the coupling paradigm within the graph during the 3rd MP, and perpetuates this paradigm into subsequent MPs.

Vast Perturbation Search Space of 3-layer GCN

In light of the current absence of an effective method for quantifying the combined adversarial robustness of a specific graph and a GCN learning from said graph, we propose an intuitive approach. For a graph 𝒢\mathcal{G}, comprised of |𝒱||\mathcal{V}| edges and represented by the adjacency matrix 𝐀\mathbf{A}, and a GCN \mathcal{M} with KK layers, where the perturbation budget is denoted as rr, the number of matrix multiplication-based forward propagations required by the attack model can be construed as the highest attack cost. In this context, the number of subgraphs is independent of node features, hence we employ 𝐀\mathbf{A} as the independent variable for the attack cost function, denoted as Cost(𝐀,r,K)\mathrm{Cost}(\mathbf{A},r,K). We then present the following theorem:

Theorem 3.

For any specified graph with a node set 𝒱\mathcal{V} and an adjacency matrix 𝐀\mathbf{A}, in conjunction with a KK-layer GCN, and a maximum perturbation rr, the following holds:

Cost(𝐀,r,K){C(|𝒱|,r),if K<3(K1)C(|𝒱|K12,r),otherwise,\displaystyle\mathrm{Cost}(\mathbf{A},r,K)\leq\begin{cases}C(|\mathcal{V}|,r),&\mbox{if }K<3\\ (K-1)C\left(\frac{|\mathcal{V}|^{K-1}}{2},r\right),&\mbox{otherwise},\end{cases} (3)

where C(,)C(\cdot,\cdot) denotes the number of combinations.

It’s revealed that adversaries face the same computational cost for matrix multiplication-based forward propagations when KK = 1 or 2. However, for K3K\geq 3, the cost dramatically increases, largely due to C(|𝒱|K12,r)C(\frac{|\mathcal{V}|^{K-1}}{2},r). As an example, with the Cora dataset (|𝒱|=5429|\mathcal{V}|=5429) and a 1% perturbation rate (r=54r=54), the cost for KK = 3 becomes exponentially larger. Thus, attacking a 3-layer GCN presents a vast search space for adversaries. This insight extends Theorem 2’s real-world applications and our previous findings: a 3-layer GCN can naturally create resonance robustness. With our defined resonance, we can further boost this robustness proposefully.

Graph Resonance-fostering Network

Principle Overview

We employ GRN to enhance the resonance of the GCN. The underlying concept of the GRN is articulated as follows. Definition 1 exhibits that for a node viv_{i}, there exists a local graph structure that resonates, known as the local resonance subgraph (LRS) for node viv_{i}, denoted as Gi=(𝒱Gi,Gi)G_{i}=(\mathcal{V}_{G_{i}},\mathcal{E}_{G_{i}}), used to represent the maximal subgraph structure that node viv_{i} can capture. During end-to-end training, both the node signals 𝐙()\mathbf{Z}^{(\ell)}, and the signals transmitted through edges 𝐄()\mathbf{E}^{(\ell)} concurrently vibrate within the LRS. Consequently, for viv_{i}, if a learnable parameter 𝐖()\mathbf{W}^{(\ell)} capable of jointly embedding MP’s result 𝐀Gi𝐙Gi()\mathbf{A}_{G_{i}}\mathbf{Z}^{(\ell)}_{G_{i}}, and 𝐄Gi()\mathbf{E}^{(\ell)}_{G_{i}}, into viv_{i}’s output representation, this aggregation intentionally accomplishes a learnable resonance, generating a local-level embedding. This identical aggregate pattern is applied across all nodes to facilitate a mapping, thereby achieving a global-level forward propagation within the GRN.

In summary, a single forward propagation of the GRN is:

𝐳i(+1)=σ(MEAN(CONCAT(𝐀Gi𝐙Gi(),𝐄Gi())𝐖())).\mathbf{z}^{(\ell+1)}_{i}=\sigma(\mathrm{\scriptstyle MEAN}(\mathrm{\scriptstyle CONCAT}(\mathbf{A}_{G_{i}}\mathbf{Z}^{(\ell)}_{G_{i}},\mathbf{E}^{(\ell)}_{G_{i}})\mathbf{W}^{(\ell)})). (4)

Next, we provide explicit definitions for GiG_{i} and 𝐄Gi\mathbf{E}^{\ell}_{G_{i}}.

GiG_{i}: Local Resonance Subgraph

As per Definition 1, LRS comprises three components: 1) edges formed amongst all first-order neighbors, as counted by TiT_{i}, with these edges’ weights equal 1; 2) edges formed between it and all 2nd- (inevitably includes 1st-) order neighbors, as counted by pip_{i}, with these edges’ weights equal 2; 3) edges between it and all first-order neighbors, as counted by degi\mathrm{deg}_{i}, with these edges’ weights equal 8. Consequently, the LRS can be viewed as a weighted graph, in which the weights of edges serve as attention for the joint combination of 𝐙()\mathbf{Z}^{(\ell)} and 𝐄()\mathbf{E}^{(\ell)}. An illustrative example of the LRS is presented in Figure 1.

Refer to caption
Figure 1: An illustration of local resonance subgraph.

𝐄Gi()\mathbf{E}^{(\ell)}_{G_{i}}: Edge-transmitted Signals

In the MP driven by adjacency matrices, only signals at the nodes are observable, while the signals transmitted across each edge remain imperceptible. To ascertain the quantified signals on every specific edge within GiG_{i}, we first obtain the global edge-transmitted signals 𝐄𝒢\mathbf{E}^{\ell}_{\mathcal{G}}. Then, 𝐄Gi\mathbf{E}^{\ell}_{G_{i}} is subsequently derived through a sampling procedure on 𝐄𝒢\mathbf{E}^{\ell}_{\mathcal{G}} using the edge indices within GiG_{i}.

Specifically, within 𝐄𝒢\mathbf{E}^{\ell}_{\mathcal{G}}, the edge-transmitted signals on (vj,vk)(v_{j},v_{k}) are denoted as 𝐞j,k\mathbf{e}^{\ell}_{j,k}. For >0\ell>0, 𝐞j,k\mathbf{e}^{\ell}_{j,k} is defined via a sequential procedure: 1) The edge (vj,vk)(v_{j},v_{k}) in 𝒢\mathcal{G} is deleted, producing a new graph 𝒢j,k\mathcal{G}^{j,k} with its adjacency matrix 𝐀𝒢j,k\mathbf{A}_{\mathcal{G}^{j,k}}. 2) A new forward propagation is executed in the same layer on 𝒢j,k\mathcal{G}^{j,k}, obtaining a feature matrix 𝐙𝒢j,k()\mathbf{Z}^{(\ell)}_{\mathcal{G}^{j,k}}. This matrix does not contain any messages transmitted through the edge (vj,vk)(v_{j},v_{k}). Consequently,

𝐙𝒢j,k()=𝐀𝒢j,k𝐙(1)𝐖(1).\mathbf{Z}^{(\ell)}_{\mathcal{G}^{j,k}}=\mathbf{A}_{\mathcal{G}^{j,k}}\mathbf{Z}^{(\ell-1)}\mathbf{W}^{(\ell-1)}. (5)

The feature of node vjv_{j} in 𝐙𝒢j,k()\mathbf{Z}^{(\ell)}_{\mathcal{G}^{j,k}} denoted as 𝐳𝒢j,k,j()\mathbf{z}^{(\ell)}_{\mathcal{G}^{j,k},j}, is obtained. 3) In 𝒢j,k\mathcal{G}^{j,k}, there is no edge between the nodes (vj,vk)(v_{j},v_{k}). Hence, the feature transmitted from node vkv_{k} to vjv_{j} (i.e., 𝐞j,k()\mathbf{e}^{(\ell)}_{j,k}) is calculated by subtracting the feature obtained through the re-propagation on 𝒢j,k\mathcal{G}^{j,k} (i.e., 𝐳Gij,k,j()\mathbf{z}^{(\ell)}_{G_{i}^{j,k},j}) from the original feature (i.e., 𝐳()\mathbf{z}^{(\ell)}). Similarly, the signal transmitted through the pair (vj,vk)(v_{j},v_{k}) could be interpreted as the average of the mutually transmitted signals. At =0\ell=0, since MP has not been initiated, 𝐞j,k()\mathbf{e}^{(\ell)}_{j,k} would ideally be 0. For end-to-end training, it is defined as a random infinitesimal value. In conclusion, 𝐄Gi()\mathbf{E}^{(\ell)}_{G_{i}} is determined as

𝐄Gi()=CONCAT({𝐞j,k():vj,vkGi}), s.t. 𝐞j,k()={𝐳()𝐳𝒢j,k,j()+𝐳𝒢j,k,k()2,if>0ϵ,where ϵU(0,1×107),if=0.\mathbf{E}^{(\ell)}_{G_{i}}=\mathrm{\scriptstyle CONCAT}\left(\left\{\mathbf{e}^{(\ell)}_{j,k}:v_{j},v_{k}\in G_{i}\right\}\right),\\ \text{ s.t. }\mathbf{e}^{(\ell)}_{j,k}=\left\{\begin{array}[]{l}\mathbf{z}^{(\ell)}-\frac{\mathbf{z}^{(\ell)}_{\mathcal{G}^{j,k},j}+\mathbf{z}^{(\ell)}_{\mathcal{G}^{j,k},k}}{2},\mbox{\emph{if}}\ \ell>0\\ \epsilon,\quad\text{where }\epsilon\sim U(0,1\times 10^{-7})\ ,\mbox{\emph{if}}\ \ell=0\end{array}\right.. (6)

Figure 2 illustrates the computation of 𝐞j,k()\mathbf{e}^{(\ell)}_{j,k}.

Refer to caption
Figure 2: Schematic diagram illustrating the computation of the edge-transmitted signal between nodes vjv_{j} and vkv_{k}.

Simplifying the Computational Overhead of 𝐄Gi()\mathbf{E}^{(\ell)}_{G_{i}}

Equation (5) explicates the method of re-propagation on 𝒢j,k\mathcal{G}_{j,k}. Given that there are |𝒱||\mathcal{V}| ways to choose (vj,vk)(v_{j},v_{k}), it necessitates the computation of |𝒱||\mathcal{V}| matrix multiplications (where 𝐙𝐖(1)=𝐙(1)𝐖(1)\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}=\mathbf{Z}^{(\ell-1)}\mathbf{W}^{(\ell-1)} remains the same for all (vj,vk)(v_{j},v_{k}) selections and can be considered a constant matrix), thereby constituting the primary computational cost of 𝐄Gi()\mathbf{E}^{(\ell)}_{G_{i}}. Here, we provide a computational method equivalent to Equation (5), reducing the |𝒱||\mathcal{V}| times to once.

Proposition 1.

By indexing and rearranging 𝐙𝐖(1)\mathbf{Z}_{\mathbf{W}}^{(\ell-1)} by rows jj and kk to obtain a matrix Q(𝐙𝐖(1);j,k)N×d1Q(\mathbf{Z}_{\mathbf{W}}^{(\ell-1)};j,k)\in\mathbb{R}^{N\times d_{\ell-1}},

𝐙𝒢j,k()=𝐀𝐙𝐖(1)Q(𝐙𝐖(1);j,k).\mathbf{Z}^{(\ell)}_{\mathcal{G}^{j,k}}=\mathbf{A}\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}-Q\left(\mathbf{Z}_{\mathbf{W}}^{(\ell-1)};j,k\right). (7)

Evidently, a single matrix multiplication, i.e., 𝐀G𝐙𝐖(1)\mathbf{A}_{G}\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}, is sufficient to iterate over all (vj,vk)(v_{j},v_{k}) and yield the results.

Refer to caption
Figure 3: The general workflow of GRN.

Learning the Parameters

Each layer of GRN only contains trainable parameters 𝐖()\mathbf{W}^{(\ell)}, and each has a distinct output representation 𝐙()\mathbf{Z}^{(\ell)}. Thus, in accordance with the requirements of the downstream task, GRN can accommodate either supervised or unsupervised loss functions, thereby tuning their weight matrices. Specifically, we denote the discrepancy function as D(,)D(\cdot,\cdot). In semi-supervised scenarios, the loss function is Js(𝐳i(K))=D(𝐳i(K),𝐲i)J_{s}(\mathbf{z}^{(K)}_{i})=D(\mathbf{z}^{(K)}_{i},\mathbf{y}_{i}); in unsupervised scenarios (Müller 2023), Ju(𝐳i(K))=D(𝐳i(K),{𝐲j:vj𝒩i})J_{u}(\mathbf{z}^{(K)}_{i})=D(\mathbf{z}^{(K)}_{i},\{\mathbf{y}_{j}:v_{j}\in\mathcal{N}_{i}\}). Depending on the downstream applications, D(,)D(\cdot,\cdot) can take various forms, such as cross-entropy, etc. The general workflow of GRN is illustrated in Figure 3.

Experiments

Datasets

Our findings are evaluated on five real-world datasets widely used for studying graph adversarial attacks (Sun et al. 2020; Liu et al. 2022; Zhu et al. 2019; Entezari et al. 2020; Li et al. 2022), including Cora, Citeseer, Polblogs, and Pubmed.

Baselines

Comparison defending models. We compare GRN with other defending models including: 1) RGCN which leverages the Gaussian distributions for node representations to amortize the effects of adversarial attacks, 2) GNN-SVD which is applied to a low-rank approximation of the adjacency matrix obtained by truncated SVD, 3) Pro-GNN (Jin et al. 2020) which can learn a robust GNN by the intrinsic properties of nodes, 4) Jaccard (Wu et al. 2019) which defends attacks based on the measured Jaccard similarity score, 5) EGNN (Liu et al. 2021) which filters out perturbations by l1l_{1}- and l2l_{2}-based graph smoothing. Attack methods. The experiments are designed under the following attack strategies: 1) Metattack (Zügner and Günnemann 2018), a meta-learning based attack, 2) CLGA (Sixiao et al. 2022), an unsupervised attack, 3) RL-S2V (Dai et al. 2018), a reinforcement learning based attack.

Pinpointing the Layer of Resonance

In Theorem 2, we establish an equivalence relation between the kk-th and the k+3k+3-th layer’s output latent representations, as derived from Equations (1) and (2). This elucidates that when k=0k=0, the 33-th layer involuntarily captures local structures, thereby inducing resonance. To facilitate experimental variable control, we first demonstrate the equivalence relation under varying “gap layer numbers” (denoted as kgapk_{gap}). If the equivalence between Equations (1) and (2) only holds when kgap3k_{gap}\geq 3, it substantiates the validity of Theorem 2. Specifically, we first train a 5-layer GCN, then obtain the resonance intensity denoted as Rdef(k)=z¯i(k)Ti+2pi+8degiR_{def}(k)=\bar{{z}}_{i}^{(k)}T_{i}+2p_{i}+8\mathrm{deg}_{i}, and the actual observed signal denoted as Rreal(k+kgap)=64z¯i(k+kgap)32R_{real}(k+k_{gap})=64\bar{{z}}_{i}^{(k+k_{gap})}-32, for each epoch. Given these observational variables, we delineate their transformations over the learning process using lists {Rdef(k)}\{R_{def}(k)\} and {Rreal(k+kgap)}\{R_{real}(k+k_{gap})\} respectively. Each list chronicles its corresponding variable’s fluctuations across all epochs. Subsequently, we standardize (using the standardize function STD()\mathrm{{\scriptstyle STD}}(\cdot)) the sequences under different kgapk_{gap} and calculate the absolute difference to obtain a difference sequence:

𝐝k,kgap=|STD({Rdef(k)})STD({Rreal(k+kgap)})|.\mathbf{d}_{k,k_{gap}}=|\mathrm{\scriptstyle STD}(\{R_{def}(k)\})-\mathrm{\scriptstyle STD}(\{R_{real}(k+k_{gap})\})|. (8)

The parameter 𝐝k,kgap\mathbf{d}_{k,k_{gap}}, serving as an indicator variable, accurately encapsulates the discrepancy between Rdef(k)R_{def}(k) and Rreal(k+kgap)R_{real}(k+k_{gap}). The experimental results are illustrated in Figure 4. Owing to the large number of nodes, we display the mean value (central line) and standard deviation (shadow areas) of all nodes. As epochs progress, 𝐝0,3\mathbf{d}_{0,3} gradually converges to zero. After the initial several epochs, it significantly diverges from 𝐝0,1\mathbf{d}_{0,1} and 𝐝0,2\mathbf{d}_{0,2}. This validates the intriguing phenomenon mentioned in Theorem 2: a correlation has been established between the signal at the 3-th layer and the graph’s initial signal and structure. Subsequent experimental results echo the aforementioned findings, thereby affirming the correctness of Theorem 2 when k>0k>0.

Refer to caption
Figure 4: Values of 𝐝k,kgap\mathbf{d}_{k,k_{gap}} under different kk and kgapk_{gap} settings. The 𝐝k,kgap\mathbf{d}_{k,k_{gap}} encapsulates the resonance situation between the kk-th layer and the k+kgapk+k_{gap}-th layer. A smaller value indicates a stronger resonance.
Refer to caption
(a) Visualizations of embedding of 𝒢\mathcal{G}, 𝒢LRS\mathcal{G}_{LRS} and 𝒢random\mathcal{G}_{random}.
Refer to caption
(b) Strength distribution of 𝒢\mathcal{G}, 𝒢LRS\mathcal{G}_{LRS} and 𝒢random\mathcal{G}_{random}.
Refer to caption
(c) Classification accuracy of 𝒢\mathcal{G}, 𝒢LRS\mathcal{G}_{LRS} and 𝒢random\mathcal{G}_{random} using 𝒢\mathcal{M}_{\mathcal{G}} under different perturbation rates.
Figure 5: Experimental results of LRS-constructed graph 𝒢LRS\mathcal{G}_{LRS} in relation to 𝒢\mathcal{G}

Attack Success Rate Cliff-like declines on 3-layer GCN

Intuitively, the complexity of an attack tends to increase with the number of GCNs’ layer. Observing the pattern of attack success rate (ASR) declines as the number of GCN layers increases helps validate our claim that the 3-layer GCN, derived from resonance, can significantly enhance robustness. Specifically, we start by initializing 10 GCNs with the number of layers ranging from 1 to 10. Next, we conduct experiments on 4 datasets using 3 typical attacks, setting the perturbation rate uniformly at 20%. We then train a surrogate model for each GCN separately, placing perturbations in the dataset, and clearing these perturbations after each attack. We repeat each attack five times and report the average ASR accuracy (depicted by the lines) and variation range (represented by the shaded areas). The results, as shown in Figure 6, clearly reveal a steep drop in ASR at the 3-layer GCN. However, further layer addition seems unable to significantly reduce the ASR, as the additional layers maintain the same resonance pattern as the 3-layer GCN to achieve adversarial robustness. These findings articulate the concept of distilling the resonance from GCNs and fostering this resonance to design a inductive approach.

Refer to caption
Figure 6: GCN’s layer count and ASR relation.

How Robust of LRS-constructed Graphs

Derived from a transductive model, the LRS captures distinct resonance regions and transforms a localized, unweighted graph (also perceivable as a graph with unitary weights) into a weighted one. The implementation of the LRS within the GRN enables the demarcation of a learnable resonance scope for inductive learning. Consequently, it becomes essential to validate the efficacy of the LRS through its embedding precision within the transductive model.

We initiate by presenting results obtained under non-adversarial conditions. We sum the LRS of all nodes in 𝒢\mathcal{G} and apply min-max scaling to all weights, thus creating a global weighted graph 𝒢LRS\mathcal{G}_{LRS}. Then, we train a standard 2-layer GCN 𝒢\mathcal{M}_{\mathcal{G}} on Cora dataset (denoted as 𝒢\mathcal{G}) and visualize its well-trained representations. Then, we feed 𝒢LRS\mathcal{G}_{LRS} into 𝒢\mathcal{M}_{\mathcal{G}} to generate its visualization. Lastly, for comparison, we create a random-weighted graph 𝒢random\mathcal{G}_{random} whose edge distribution is the same as 𝒢\mathcal{G}, and input it into 𝒢\mathcal{M}_{\mathcal{G}} to get the corresponding visualization. As Figure 5(a) shows, under identical weights, the representations of different categories in 𝒢LRS\mathcal{G}_{LRS} are tighter than those in 𝒢\mathcal{G}. This suggests that introducing LRS brings beneficial global weights, which enhance the model’s performance in non-adversarial scenarios.

Then, we explored the similarity between 𝒢\mathcal{G} and 𝒢LRS\mathcal{G}_{LRS}. Using strength distribution, akin to degree distribution for unweighted graphs (Zügner, Akbarnejad, and Günnemann 2018), we found the two weighted graphs are notably similar. Figure 5(b) confirms this, showing a stark difference from 𝒢random\mathcal{G}_{random}. Therefore, 𝒢LRS\mathcal{G}_{LRS} maintains the traits of 𝒢\mathcal{G}.

We next assessed the classification accuracy of 𝒢LRS\mathcal{G}_{LRS} under adversarial attacks using varying Metattack perturbation rates pr=r||p_{r}=\frac{r}{|\mathcal{E}|}. Figure 5(c) shows that as prp_{r} increases, 𝒢LRS\mathcal{G}_{LRS}’s accuracy consistently edges out 𝒢\mathcal{G}. This suggests that the LRS introduces a resonance in the transductive model, marginally boosting its adversarial robustness.

Generalizable Robustness of GRN

Dataset prp_{r} (%) Defense Baselines GRN
RGCN GNN-SVD Pro-GNN Jaccard EGNN srs_{r}=20% srs_{r}=40% srs_{r}=60%
Cora 0 83.49±0.57 81.14±0.79 85.01±0.40 81.74±0.36 85.00±0.40 83.74±1.68 86.79±2.27 87.75±0.93
5 77.20±0.47 78.29±0.63 80.10±0.22 80.56±1.30 82.24±0.49 81.48±0.83 86.04±3.15 86.24±1.54
10 72.65±0.40 70.81±1.77 74.45±0.28 75.07±1.28 76.38±0.35 79.51±1.62 81.38±2.58 82.03±1.48
20 59.31±0.27 56.67±1.22 64.68±0.75 73.54±0.94 69.82±0.71 73.79±1.91 74.14±1.93 74.81±1.52
Citeseer 0 71.81±0.71 70.42±0.39 74.94±0.40 73.82±0.56 74.92±0.66 75.69±0.69 78.19±1.30 83.26±0.73
5 71.22±0.61 68.86±0.47 72.45±0.88 71.41±0.65 73.60±0.45 75.40±0.95 77.54±1.04 81.66±0.70
10 67.53±0.60 68.70±0.89 70.16±1.05 70.09±0.48 73.66±0.37 73.81±1.10 77.04±1.51 80.32±0.49
20 63.20±1.70 57.95±1.48 55.84±1.28 67.22±1.32 65.91±1.20 71.06±1.20 72.72±1.28 77.21±0.64
Pubmed 0 84.57±0.39 83.25±0.35 84.96±0.08 84.87±0.10 85.94±0.10 80.26±0.43 85.51±0.66 87.27±0.67
5 81.25±0.50 82.90±0.26 83.00±0.10 82.32±0.11 83.89±0.09 77.86±0.35 81.92±0.59 83.36±0.62
10 78.96±0.43 80.35±0.21 80.82±0.20 80.77±0.11 82.13±0.15 76.62±0.55 79.29±0.60 80.99±0.58
20 71.33±0.40 73.57±0.15 74.16±0.16 73.41±0.12 76.01±0.19 74.98±0.64 76.87±0.60 77.15±0.56
Polblogs 0 94.87±0.19 95.08±0.22 95.45±0.12 95.03±0.57 95.70±0.34 95.42±0.56 94.88±0.43 94.97±0.31
5 73.28±0.18 88.86±0.58 90.98±0.69 90.97±0.61 89.97±1.25 90.18±0.43 91.22±0.38 89.37±0.46
10 70.91±0.37 80.38±0.85 85.60±1.08 85.93±1.39 83.66±1.81 86.30±0.70 85.43±0.68 85.07±0.54
20 57.97±0.41 55.33±2.07 73.52±0.53 70.47±1.27 75.87±0.88 82.03±0.79 81.96±0.72 81.56±0.18
Table 1: Classification accuracy (%) on the perturbed graph. prp_{r} is the perturbation rate and srs_{r} is the “seen” rate.

We assess the adversarial robustness of the 3-layer GRN under generalization demands by comparing its accuracy against other baselines on perturbed graphs. We partition a subset of the dataset as training set with proportions (also named “seen” rate) srs_{r} as 20%, 40%, and 60%. The data within these proportions are deemed “seen” by the GRN, while the remaining data is categorized as “unseen”. Utilizing Metattack as our attack approach, we adopt the standard 2-layer GCN as the surrogate model. By adjusting prp_{r}, we derive the corresponding perturbed graphs. Then, we evaluate the classification performance of the baselines on these graphs, placing an emphasis on the accuracy upon model convergence. For each setting, we executed 10 iterations, tabulating both the average outcome and its variability. Table 1 reports the results.

From the data, both clean and perturbed graphs show GRN with a srs_{r} = 60% generally surpasses the baseline in accuracy. There are three exceptions: 1) For the Pubmed dataset at prp_{r} = 10%, this is due to EGNN using graph smoothing to enhance adversarial robustness. In this case, perturbations may be more pronounced in a certain area, and EGNN could leverage this by smoothing concentrated perturbation patterns. However, these incidents are rare, and as prp_{r} increases, GRN’s accuracy returns to its peak. 2) With the Polblogs dataset at prp_{r} = 0%, GRN is slightly behind EGNN by 0.28%. Yet, as prp_{r} rises, the decline in GRN’s accuracy is the least noticeable among all baselines, ensuring its top position. 3) An intriguing pattern emerging from the Polblogs dataset is the non-proportional relationship between the GRN’s srs_{r} and its accuracy. The peculiarity of the Polblogs dataset is that its nodes lack intrinsic features. Typically, scholars have used node degrees as proxies for these absent attributes. This substitution results in the inherent attributes of Polblogs leaning towards uniformity. Expanding the training set’s scale exacerbate the oversmoothing phenomenon, culminating in diminished accuracy.

Ablation Studies

GRNs combine edge-transmitted signal 𝐄Gi()\mathbf{E}^{(\ell)}_{G_{i}} and node signal 𝐙Gi()\mathbf{Z}^{(\ell)}_{G_{i}} for node viv_{i}’s representation. We initiate an ablation study to delve into this process. First, we embed only 𝐙Gi()\mathbf{Z}^{(\ell)}_{G_{i}}, naming the model GRNZ. This appears similar to a 2-depth GraphSAGE with mean aggregation, indicating potential vulnerability to adversarial attacks. We then examine the combination order of 𝐄Gi()\mathbf{E}^{(\ell)}_{G_{i}} and 𝐙Gi()\mathbf{Z}^{(\ell)}_{G_{i}}. The default GRN order is GRNE,Z. We test GRNZ,E (reversed order) and GRNshuf (shuffled rows). Results (Table 2) show GRNZ underperforms, especially in adversarial settings, emphasizing the importance of co-embedding both signals. Precisions of GRNZ,E, GRNE,Z, and GRNshuf are comparable due to the edge-transmitted signal, which, combined with the node signal through shared parameters, results in consistent performance regardless of order. This suggests that GRN has the capability to recognize a latent graph structure, wherein edge-transmitted signals function as latent node signals, contributing to adversarial robustness and insensitivity to signal order.

Dataset prp_{r} (%) Standard Ablated models
GRNE,Z GRNZ,E GRNshuf GRNZ
Cora 0 87.75 87.14±0.81 87.28±0.92 87.74±0.72
5 86.24 86.18±0.75 86.34±0.98 85.56±0.87
10 82.03 82.81±0.84 82.54±1.05 79.07±0.90
20 74.81 74.53±0.80 74.42±1.36 63.54±1.30
Citeseer 0 83.26 83.07±0.64 83.52±0.92 82.23±0.97
5 81.66 81.49±0.71 81.45±0.95 79.47±1.10
10 80.32 80.10±0.65 80.46±1.17 74.39±1.44
20 77.21 76.95±1.26 76.84±1.53 69.05±1.86
Table 2: Classification accuracy (%) of ablated models.

Conclusions

We addressed critical concerns surrounding the transductive nature of existing robust graph learning models. We began by establishing the equivalence between GCN-driven MP and edge-based waves. Subsequently, we demonstrated that a 3-layer GCN capitalizes on the unique resonance intrinsic to waves to achieve robustness. Delving deeper, we formalized this resonance as a coupling between a node and its surrounding local structure. We then introduced an inductive graph learning model, GRN. Experimental results not only corroborated our theoretical insights but also highlighted the exemplary robustness of the proposed GRN model.

Acknowledgments

This work is supported in part by the National Key Research and Development Program of China (No. 2020YFB1805400), the National Natural Science Foundation of China (No. U19A2068, No. 62372313), and the Sichuan Science and Technology Program (No. 2023YFG0113).

References

  • Ahmad et al. (2023) Ahmad, S.; Saifullah, S.; Khan, A.; and Wazwaz, A. M. 2023. Resonance, fusion and fission dynamics of bifurcation solitons and hybrid rogue wave structures of Sawada–Kotera equation. Commun. Nonlinear. Sci. Numer. Simul., 119: 107117.
  • Berberidis and Giannakis (2019) Berberidis, D.; and Giannakis, G. B. 2019. Node embedding with adaptive similarities for scalable learning over graphs. IEEE Trans. Knowl. Data Eng., 33(2): 637–650.
  • Blas and Jenkins (2022) Blas, D.; and Jenkins, A. C. 2022. Detecting stochastic gravitational waves with binary resonance. Phys. Rev. D, 105(6): 064021.
  • Bruna et al. (2013) Bruna, J.; Zaremba, W.; Szlam, A.; and LeCun, Y. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203.
  • Bykov, Bezus, and Doskolovich (2019) Bykov, D. A.; Bezus, E. A.; and Doskolovich, L. L. 2019. Coupled-wave formalism for bound states in the continuum in guided-mode resonant gratings. Phys. Rev. A, 99(6): 063805.
  • Chen et al. (2021) Chen, W.; Feng, F.; Wang, Q.; He, X.; Song, C.; Ling, G.; and Zhang, Y. 2021. Catgcn: Graph convolutional networks with categorical node features. IEEE Trans. Knowl. Data Eng.
  • Dai et al. (2018) Dai, H.; Li, H.; Tian, T.; Huang, X.; Wang, L.; Zhu, J.; and Song, L. 2018. Adversarial attack on graph structured data. In Proc. 35th Int. Conf. Mach. Learn., 1115–1124.
  • Defferrard, Bresson, and Vandergheynst (2016) Defferrard, M.; Bresson, X.; and Vandergheynst, P. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. volume 29.
  • Entezari et al. (2020) Entezari, N.; Al-Sayouri, S. A.; Darvishzadeh, A.; and Papalexakis, E. E. 2020. All you need is low (rank) defending against adversarial attacks on graphs. In Proc. 13th Int. Conf. Web Search Data Min., 169–177.
  • Friedman and Tillich (2004) Friedman, J.; and Tillich, J.-P. 2004. Wave equations for graphs and the edge-based Laplacian. Pac. J. Math., 216(2): 229–266.
  • Hamilton, Ying, and Leskovec (2017) Hamilton, W.; Ying, Z.; and Leskovec, J. 2017. Inductive representation learning on large graphs. In Proc. 31st Adv. Neural Inf. Proces. Syst.
  • Jin et al. (2020) Jin, W.; Ma, Y.; Liu, X.; Tang, X.; Wang, S.; and Tang, J. 2020. Graph structure learning for robust graph neural networks. In 26th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 66–74.
  • Kipf and Welling (2017) Kipf, T. N.; and Welling, M. 2017. Semi-supervised classification with graph convolutional networks. In Proc. 5th Int. Conf. Learn. Represent.
  • Kovalyov (1989) Kovalyov, M. 1989. Resonance-type behaviour in a system of nonlinear wave equations. J. Differ. Equ., 77(1): 73–83.
  • Kreps and Kriner (2020) Kreps, S.; and Kriner, D. 2020. Model uncertainty, political contestation, and public trust in science: Evidence from the COVID-19 pandemic. Sci. Adv., 6(43): eabd4563.
  • Kula (2015) Kula, M. 2015. Metadata Embeddings for User and Item Cold-start Recommendations. In CORR., volume 1507.08439.
  • Li et al. (2022) Li, K.; Liu, Y.; Ao, X.; Chi, J.; Feng, J.; Yang, H.; and He, Q. 2022. Reliable Representations Make A Stronger Defender: Unsupervised Structure Refinement for Robust GNN. In Proc. 28th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min.
  • Liu et al. (2022) Liu, A.; Li, B.; Li, T.; Zhou, P.; and Wang, R. 2022. AN-GCN: An Anonymous Graph Convolutional Network Against Edge-Perturbing Attacks. IEEE Trans. Neural Netw. Learn. Syst.
  • Liu et al. (2021) Liu, X.; Jin, W.; Ma, Y.; Li, Y.; Liu, H.; Wang, Y.; Yan, M.; and Tang, J. 2021. Elastic graph neural networks. In Proc. 38th Int. Conf. Mach. Learn., 6837–6849.
  • Müller (2023) Müller, E. 2023. Graph clustering with graph neural networks. J. Mach. Learn. Res., 24: 1–21.
  • Nadal et al. (2021) Nadal, S.; Abelló, A.; Romero, O.; Vansummeren, S.; and Vassiliadis, P. 2021. Graph-driven federated data management. IEEE Trans. Knowl. Data Eng., 35(1): 509–520.
  • Samuel et al. (2019) Samuel, F., G; John, B., D; Joichi, I.; Jonathan, Z., L; Andrew, B., L; and Isaac, K., S. 2019. Adversarial attacks on medical machine learning. Science, 363(6433): 1287–1289.
  • Shatah and Struwe (1993) Shatah, J.; and Struwe, M. 1993. Regularity results for nonlinear wave equations. Ann. Math., 138(3): 503–518.
  • Sixiao et al. (2022) Sixiao, Z.; Hongxu, C.; Xiangguo, S.; Yicong, L.; and Xu, G. 2022. Unsupervised Graph Poisoning Attack via Contrastive Loss Back-propagation. In Proc. 31st Int. Conf. World Wide Web.
  • Sun et al. (2022) Sun, L.; Dou, Y.; Yang, C.; Zhang, K.; Wang, J.; Philip, S. Y.; He, L.; and Li, B. 2022. Adversarial attack and defense on graph data: A survey. IEEE Trans. Knowl. Data Eng.
  • Sun et al. (2020) Sun, Y.; Wang, S.; Tang, X.; Hsieh, T.-Y.; and Honavar, V. 2020. Non-target-specific node injection attacks on graph neural networks: A hierarchical reinforcement learning approach. In Proc. 29th Int. Conf. World Wide Web, volume 3.
  • Walt, Jack, and Christof (2019) Walt, W.; Jack, C.; and Christof, T. 2019. Adversarial explanations for understanding image classification decisions and improved neural network robustness. Nat. Mach. Intell., 1: 508––516.
  • Wu et al. (2019) Wu, H.; Wang, C.; Tyshetskiy, Y.; Docherty, A.; Lu, K.; and Zhu, L. 2019. Adversarial examples for graph data: deep insights into attack and defense. In Proc. 28th Int. Joint Conf. Artif. Intel.
  • Xiao, Chen, and Shi (2019) Xiao, H.; Chen, Y.; and Shi, X. 2019. Knowledge graph embedding based on multi-view clustering framework. IEEE Trans. Knowl. Data Eng., 33(2): 585–596.
  • Zhao et al. (2022) Zhao, Y.; Zhou, H.; Zhang, A.; Xie, R.; Li, Q.; and Zhuang, F. 2022. Connecting embeddings based on multiplex relational graph attention networks for knowledge graph entity typing. IEEE Trans. Knowl. Data Eng.
  • Zhu et al. (2019) Zhu, D.; Zhang, Z.; Cui, P.; and Zhu, W. 2019. Robust graph convolutional networks against adversarial attacks. In Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 1399–1407.
  • Zügner, Akbarnejad, and Günnemann (2018) Zügner, D.; Akbarnejad, A.; and Günnemann, S. 2018. Adversarial attacks on neural networks for graph data. In Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2847–2856.
  • Zügner and Günnemann (2018) Zügner, D.; and Günnemann, S. 2018. Adversarial Attacks on Graph Neural Networks via Meta Learning. In Proc. 6th Int. Conf. Learn. Represent.

Proofs

Proof of Theorem 1

Node signals vibrate from high to low dimensions during MP, which is implemented through embedding. To ensure the uniformity of node signal dimensions during the MP process, we introduce a invertible transfer function, denoted as 𝒯d0:dkd0,dkd0\mathcal{T}_{d_{0}}:\mathbb{R}^{d_{k}}\to\mathbb{R}^{d_{0}},\forall d_{k}\leq d_{0}, to standardize the dimensions of 𝐙()\mathbf{Z}^{(\ell)} and 𝐙(0)\mathbf{Z}^{(0)}. This function can extend the process of the kk-th MP to 𝐙(k1)MP𝐙(k)𝒯d0𝐙¯(k)\mathbf{Z}^{(k-1)}\stackrel{{\scriptstyle\text{MP}}}{{\longrightarrow}}\mathbf{Z}^{(k)}\stackrel{{\scriptstyle\mathcal{T}_{d_{0}}}}{{\longrightarrow}}{\bar{\mathbf{Z}}^{(k)}}, where 𝐙¯(k)=[𝐙(k)|𝟎N×(d0dk)]{\bar{\mathbf{Z}}^{(k)}}=[\mathbf{Z}^{(k)}|\mathbf{0}_{N\times(d_{0}-d_{k})}]. This process can be interpreted as all embeddings induced by MP are completed in d0\mathbb{R}^{d_{0}}. i.e., embedding into dkd_{k} dimensions at the k-th MP, and the subsequent d0dkd_{0}-d_{k} dimensions collapse to zero. Indeed, with this approach, MP can be viewed as signal propagation in a space of equal dimensions.

The edge-based Laplacian wave equation is (also introduced in main text):

2g2=𝐋gc,\frac{\partial^{2}g}{\partial\ell^{2}}=-\mathbf{L}^{\ell}g\cdot c, (9)

where cc is a constant. In this context, we instantiate gg as the signal 𝐙¯(){\bar{\mathbf{Z}}^{(\ell)}}. Then we define cc as a signal amplitude matrix:

Definition 2.

The amplitude in kk-th MP is 𝐒kd0×d0\mathbf{S}_{k}\in\mathbb{R}^{d_{0}\times d_{0}} represents the transmission amplitude of signals in different dimensions. It is a variable matrix that defines the adjustable parameters of the graph-based function at different times. It performs an equal-dimensional transformation of the signal through right multiplication.

The initial speed of the first-hop MP can thus be denoted as 𝐒0\mathbf{S}_{0}. Consequently, we derive that 2𝐙¯()2=𝐋𝐙¯(0)𝐒\frac{\partial^{2}\bar{\mathbf{Z}}^{(\ell)}}{\partial\ell^{2}}=\mathbf{L}^{\ell}\bar{\mathbf{Z}}^{(0)}\mathbf{S}_{\ell}. Hence, the element-wise fluctuation speed [1] of the signal 𝐙¯()\bar{\mathbf{Z}}^{(\ell)} , denoted as 𝐕kN×d0\mathbf{V}_{k}\in\mathbb{R}^{N\times d_{0}}, can be described, and it has the relationship

𝐕=𝐙¯()=𝐋𝐙¯(0)𝐒𝑑,𝐕=𝐋𝐙¯(0)𝐒.\mathbf{V}_{\ell}=\frac{\partial\bar{\mathbf{Z}}^{(\ell)}}{\partial\ell}=\mathbf{L}^{\ell}\bar{\mathbf{Z}}^{(0)}\int\mathbf{S}_{\ell}d\ell,\frac{\partial\mathbf{V}_{\ell}}{\partial\ell}=\mathbf{L}^{\ell}\bar{\mathbf{Z}}^{(0)}\mathbf{S}_{\ell}. (10)

We regard 𝐙¯()\bar{\mathbf{Z}}^{(\ell)} as a sampling within the continuous signal with a minimal step hh, the wave described by 𝐙¯()\bar{\mathbf{Z}}^{(\ell)} can thus be formalized as

𝐙¯(k+h)=𝐙¯(k)+h𝐕k,𝐕k+h=𝐕k+h𝐋k𝐙¯(0)𝐒k.\bar{\mathbf{Z}}^{(k+h)}=\bar{\mathbf{Z}}^{(k)}+h\mathbf{V}_{k},\mathbf{V}_{k+h}=\mathbf{V}_{k}+h\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{S}_{k}. (11)

One may write

𝐙¯(k+h)\displaystyle\bar{\mathbf{Z}}^{(k+h)} =𝐙¯(k)+𝐕kh+𝐋k𝐙¯(0)𝐒kh\displaystyle=\bar{\mathbf{Z}}^{(k)}+\mathbf{V}_{k-h}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{S}_{k-h}
=𝐙¯(k)+𝐕k2h+𝐋k𝐙¯(0)𝐒k2h+𝐋k𝐙¯(0)𝐒kh,\displaystyle=\bar{\mathbf{Z}}^{(k)}+\mathbf{V}_{k-2h}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{S}_{k-2h}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{S}_{k-h}, (12)

and this discrete representation iterating with a step size of hh can be written in a continuous expression:

limh0𝐙¯(k+h)\displaystyle\lim\limits_{h\to 0}\bar{\mathbf{Z}}^{(k+h)} =𝐙¯(k)+𝐕0+𝐋k𝐙¯(0)1k1𝐒𝑑\displaystyle=\bar{\mathbf{Z}}^{(k)}+\mathbf{V}_{0}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\int_{1}^{k-1}\mathbf{S}_{\ell}d\ell
=𝐙¯(k)+𝐋k𝐙¯(0)(01𝐒𝑑+1k1𝐒𝑑)\displaystyle=\bar{\mathbf{Z}}^{(k)}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\left(\int_{0}^{1}\mathbf{S}_{\ell}d\ell+\int_{1}^{k-1}\mathbf{S}_{\ell}d\ell\right)
=𝐙¯(k)+𝐋k𝐙¯(0)0k1𝐒𝑑.\displaystyle=\bar{\mathbf{Z}}^{(k)}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\int_{0}^{k-1}\mathbf{S}_{\ell}d\ell. (13)

In order to establish a direct correlation between 𝐙¯(k+h)\bar{\mathbf{Z}}^{(k+h)} and 𝐙¯(0)\bar{\mathbf{Z}}^{(0)}, instead of via a chain-like propagation through 𝐙¯(k),𝐙¯(kh),\bar{\mathbf{Z}}^{(k)},\bar{\mathbf{Z}}^{(k-h)},\ldots, we introduce the Moore-Penrose (MP) pseudoinverse [2] to simplify Equation (Proof of Theorem 1). Specifically, we introduce a dimension matrix 𝐂kd0×d0\mathbf{C}_{k}\in\mathbb{R}^{d_{0}\times d_{0}}, which is associated with 𝐒k\mathbf{S}_{k}, calculated by

𝐂k=(𝐋k𝐙¯(0))(𝐋k𝐙¯(k)+𝐋k𝐙¯(0)0k1𝐒𝑑).\mathbf{C}_{k}=\left(\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\right)^{\dagger}\left(\mathbf{L}^{k}\bar{\mathbf{Z}}^{(k)}+\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\int_{0}^{k-1}\mathbf{S}_{\ell}d\ell\right). (14)

Then, a direct connection between 𝐙¯(k+1)\bar{\mathbf{Z}}^{(k+1)} and 𝐙¯(0)\bar{\mathbf{Z}}^{(0)} can be obtained as

𝐙¯(k)=𝐋k𝐙¯(0)𝐂k,\bar{\mathbf{Z}}^{(k)}=\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k}, (15)

which delineates the general term formula for equal-dimensional waves 𝐙¯()\bar{\mathbf{Z}}^{(\ell)}. Given that 𝐙=𝒯d01(𝐙¯())\mathbf{Z}^{\ell}=\mathcal{T}_{d_{0}}^{-1}(\bar{\mathbf{Z}}^{(\ell)}), the proposition henceforth is such that, if one were to establish the validity of 𝒯d01(𝐋k𝐙¯(0)𝐂k)\mathcal{T}_{d_{0}}^{-1}(\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k}), signifying that a 𝐂k\mathbf{C}_{k} exists that would embedding the trailing d0dkd_{0}-d_{k} columns of 𝐋k𝐙¯(0)\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)} to be minimal value matrices (denoted as 𝐓dk|0=[𝐓dk|𝟎N×(d0dk)]\mathbf{T}_{d_{k}|0}=[\mathbf{T}_{d_{k}}|\mathbf{0}_{N\times(d_{0}-d_{k})}]), then a inverse mapping could be constructed that would allow for the transition from equal-dimensional waves 𝐙¯()\bar{\mathbf{Z}}^{(\ell)} to unequal-dimensional waves 𝐙()\mathbf{Z}^{(\ell)}. To address this problem, we split 𝐂k\mathbf{C}_{k} into two segments: a left d×(dk)d\times(d_{k}) matrix 𝐂k,L\mathbf{C}_{k,L} and a right d×(d0dk)d\times(d_{0}-d_{k}) matrix 𝐂k,R\mathbf{C}_{k,R}. Given that, we have: 𝐋k𝐙¯(0)𝐂k=[𝐋k𝐙¯(0)𝐂k,L|𝐋k𝐙¯(0)𝐂k,R]\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k}=[\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k,L}|\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k,R}]. Since 𝐙¯(0)=𝐙(0)\bar{\mathbf{Z}}^{(0)}={\mathbf{Z}}^{(0)}, it thus holds that 𝐙¯(0)\bar{\mathbf{Z}}^{(0)} is of full column rank. Consequently, we may conclude that for any arbitrary real matrix 𝐓dk\mathbf{T}_{d_{k}}, there exist 𝐂k,L\mathbf{C}_{k,L} such that 𝐋𝐙¯(0)𝐂k,L=𝐓dk\mathbf{L}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k,L}=\mathbf{T}_{d_{k}}, while supporting 𝐋k𝐙¯(0)𝟎N×(d0dk)=𝟎N×(d0dk)\mathbf{L}^{k}\bar{\mathbf{Z}}^{(0)}\mathbf{0}_{N\times(d_{0}-d_{k})}=\mathbf{0}_{N\times(d_{0}-d_{k})} holds. Therefore, there exists a matrix

𝐂k=[𝐂k,L|𝟎N×(d0dk)],\mathbf{C}_{k}=[\mathbf{C}_{k,L}|\mathbf{0}_{N\times(d_{0}-d_{k})}], (16)

such that 𝐋𝐙¯(0)𝐂k=[𝐓dk|𝟎N×(d0dk)]=𝐓dk|0\mathbf{L}\bar{\mathbf{Z}}^{(0)}\mathbf{C}_{k}=[\mathbf{T}_{d_{k}}|\mathbf{0}_{N\times(d_{0}-d_{k})}]=\mathbf{T}_{d_{k}|0}, thereby confirming the validity of 𝒯d01\mathcal{T}_{d_{0}}^{-1}. Hence, Equation (15) can be expressed inversely as

𝐙(k+1)=𝒯d01(𝐙¯(k+1))=𝐋k𝐙(0)𝒯d01(𝐂k)=𝐋k𝐙(0)𝐂k,L.{\mathbf{Z}}^{(k+1)}=\mathcal{T}_{d_{0}}^{-1}(\bar{\mathbf{Z}}^{(k+1)})=\mathbf{L}^{k}{\mathbf{Z}}^{(0)}\mathcal{T}_{d_{0}}^{-1}({\mathbf{C}_{k}})=\mathbf{L}^{k}{\mathbf{Z}}^{(0)}\mathbf{C}_{k,L}. (17)

Taking into account the dimension of variation under the wave, and the independent amplitude correlation when the signal fluctuates once, we apply rank-η\eta truncated singular value decomposition (η\eta-TSVD) [3] to 𝐂k,L\mathbf{C}_{k,L} to approximately obtain kk amplitude-correlated matrices. Given d0ηdk\forall d_{0}\leq\eta\leq d_{k}, η\eta-TSVD decomposes 𝐂k,L\mathbf{C}_{k,L} as SVDT(η)(𝐂k,L)=𝐔k𝚲k𝐕k\mathrm{SVD_{T}^{(\eta)}}(\mathbf{C}_{k,L})=\mathbf{U}_{k}\mathbf{\Lambda}_{k}\mathbf{V}_{k}, where 𝐔kd0×η\mathbf{U}_{k}\in\mathbb{R}^{d_{0}\times\eta}, 𝚲kη×η\mathbf{\Lambda}_{k}\in\mathbb{R}^{\eta\times\eta} , and 𝐕kη×dk\mathbf{V}_{k}\in\mathbb{R}^{\eta\times d_{k}} are matrices whose columns are the first η\eta left singular vectors, the first η\eta singular values, and the first η\eta right singular vectors of the 𝐂k,L\mathbf{C}_{k,L}, respectively. We denote: 1) 𝐔k𝚲k=𝐂^k1d0×η\mathbf{U}_{k}\mathbf{\Lambda}_{k}=\widehat{\mathbf{C}}_{k-1}\in\mathbb{R}^{d_{0}\times\eta}; 2) 𝐔^k1\widehat{\mathbf{U}}_{k-1}, 𝚲^k1\widehat{\mathbf{\Lambda}}_{k-1} and 𝐕^k1\widehat{\mathbf{V}}_{k-1} as the same of the above definitions; 3) 𝐂^k2=𝐔^k1𝚲^k1\widehat{\mathbf{C}}_{k-2}=\widehat{\mathbf{U}}_{k-1}\widehat{\mathbf{\Lambda}}_{k-1}. Since TSVD can decompose any real-valued matrix for any valid η\eta, we apply continuous η\eta-TSVD to 𝐂k,L\mathbf{C}_{k,L}, yielding:

SVDT(ηk)(𝐂k,L)\displaystyle\mathrm{SVD_{T}^{(\eta_{k})}}(\mathbf{C}_{k,L}) =𝐂^k1𝐕k\displaystyle=\widehat{\mathbf{C}}_{k-1}\mathbf{V}_{k}
SVDT(ηk1)(𝐂^k1)\displaystyle\mathrm{SVD_{T}^{(\eta_{k-1})}}(\widehat{\mathbf{C}}_{k-1}) =𝐂^k2𝐕k1\displaystyle=\widehat{\mathbf{C}}_{k-2}\mathbf{V}_{k-1}
\displaystyle\ldots
SVDT(η2)(𝐂^2)\displaystyle\mathrm{SVD_{T}^{(\eta_{2})}}(\widehat{\mathbf{C}}_{2}) =𝐂^1𝐕2\displaystyle=\widehat{\mathbf{C}}_{1}\mathbf{V}_{2} (18)

Since TSVD provides an approximate equality with introducing a minimum error term:

O(Γ)=O(i=2k1𝐂^i𝐂^i1𝐕iF+𝐂k,L𝐂^k1𝐕kF),O(\Gamma)=O\left(\sum_{i=2}^{k-1}||\widehat{\mathbf{C}}_{i}-\widehat{\mathbf{C}}_{i-1}\mathbf{V}_{i}||_{F}+||\mathbf{C}_{k,L}-\widehat{\mathbf{C}}_{k-1}\mathbf{V}_{k}||_{F}\right), (19)

we can summarize Equation (Proof of Theorem 1), decomposing 𝐂k,L\mathbf{C}_{k,L} into a form of kk continuous multiplication, to formalize the kk amplitude matrices introduced by kk iterations of MP:

𝐂k,L=𝐂^1i=2k𝐕k+O(Γ).\mathbf{C}_{k,L}=\widehat{\mathbf{C}}_{1}\prod_{i=2}^{k}\mathbf{V}_{k}+O(\Gamma). (20)

Substituting Equation (20) into Equation (17), we obtain:

𝐙(k+1)=𝐋k𝐙(0)𝐂^1i=2k𝐕k\mathbf{Z}^{(k+1)}=\mathbf{L}^{k}{\mathbf{Z}}^{(0)}\widehat{\mathbf{C}}_{1}\prod_{i=2}^{k}\mathbf{V}_{k} (21)

Now, to endow the fluctuation equation represented by Equation (21) with the capability of backpropagation, we introduce kk nonlinear activation function σ1,σk\sigma_{1},\ldots\sigma_{k}, Subsequently, we instantiate 𝐂^1,𝐕2,𝐕3,,𝐕k\widehat{\mathbf{C}}_{1},\mathbf{V}_{2},\mathbf{V}_{3},\ldots,\mathbf{V}_{k} as kk trainable matrices. As we expounded in the preceding text, the dimensions of these matrices merely need to satisfy the condition of input-output decrement, and thus these dimensions can be arbitrarily set. We denote X as the ii-th trainable matrix by given a decrement dimension d0,ηk,ηk1,,η3,η2,dkd_{0},\eta_{k},\eta_{k-1},\ldots,\eta_{3},\eta_{2},d_{k}, the aforementioned matrices can be instantiated as 𝐖1=𝐂^1d0×ηk,𝐖2=𝐕kηk×ηk1,𝐖k=𝐕2η2×dk\mathbf{W}_{1}=\widehat{\mathbf{C}}_{1}\in\mathbb{R}^{d_{0}\times\eta_{k}},\mathbf{W}_{2}=\mathbf{V}_{k}\in\mathbb{R}^{\eta_{k}\times\eta_{k-1}},\ldots\mathbf{W}_{k}=\mathbf{V}_{2}\in\mathbb{R}^{\eta_{2}\times d_{k}}. In summary, by making Equation (21) trainable, we obtain the following form of a trainable wave equation:

𝐙(k)=WAVE(𝐙,k;𝐋)=σk(𝐋σ2(𝐋σ1(𝐋𝐙(0)𝐖1)layer 1𝐖2)layer 2𝐖k)layer k=(𝐙,k;𝐋).\mathbf{Z}^{(k)}=\mathrm{W{\scriptstyle AVE}}(\mathbf{Z},k;\mathbf{L})\\ =\sigma_{k}\underbrace{\Big{(}\mathbf{L}\ldots\overbrace{\sigma_{2}(\mathbf{L}\underbrace{\sigma_{1}(\mathbf{L}{\mathbf{Z}}^{(0)}\mathbf{W}_{1})}_{\begin{subarray}{c}\text{layer }1\\ \ldots\end{subarray}}\mathbf{W}_{2})}^{\text{layer }2}\ldots\mathbf{W}_{k}\Big{)}}_{\text{layer }k}\\ =\mathcal{M}(\mathbf{Z},k;\mathbf{L}). (22)

We note that Equation (22) is the trainable form of the edge-based Laplacian wave equation (Equation (9)), and it is also the forward propagation form of the GCN. Through the derivation process we have provided, the equivalence between the GCN and the wave equation is thus substantiated.

Proof of Theorem 2

We first gives the vectorized representation of the resonance intensity. For each pair of nodes (vj,vk)(v_{j},v_{k}) in 𝒩i\mathcal{N}_{i}, if there is an edge between vjv_{j} and vkv_{k}, then this edge is counted in TiT_{i}. This is indicated by 𝐀j,k=1\mathbf{A}_{j,k}=1. Therefore TiT_{i} equals the sum of 𝐀j,k\mathbf{A}_{j,k} over all nodes pairs (vj,vk)(v_{j},v_{k}) in 𝒩i\mathcal{N}_{i}. Then, TiT_{i} can be obtained by three steps:

  • First, the number of edges on all nodes pairs of 𝒢\mathcal{G} (vj,vk)(v_{j},v_{k}) (not just pairs of neighbors of node viv_{i}) is vj,vk𝒩i𝐀j,k\sum_{v_{j},v_{k}\in\mathcal{N}_{i}}\mathbf{A}_{j,k}.

  • Then, to ensure that vj,vk𝒩iv_{j},v_{k}\in\mathcal{N}_{i}, we need to include the terms Ai,jA_{i,j} and 𝐀k,i\mathbf{A}_{k,i} in the above summation. This is because 𝐀i,j=𝐀k,i=1\mathbf{A}_{i,j}=\mathbf{A}_{k,i}=1 if and only if vjv_{j} and vkv_{k} are both neighbors of node viv_{i}, i.e., 𝐀i,j𝐀j,k𝐀k,1=1\mathbf{A}_{i,j}\mathbf{A}_{j,k}\mathbf{A}_{k,1}=1 if and only if viv_{i}, vjv_{j} and vkv_{k} form a triangle.

  • Finally, i,j𝐀i,j𝐀j,k𝐀k,1\sum_{i,j}\mathbf{A}_{i,j}\mathbf{A}_{j,k}\mathbf{A}_{k,1} counts the number of triangles that include node viv_{i}, This gives that Ti=i,j𝐀i,j𝐀j,k𝐀k,iT_{i}=\sum_{i,j}\mathbf{A}_{i,j}\mathbf{A}_{j,k}\mathbf{A}_{k,i}

Then, we have

z¯i(0)Ti=z¯i(0)vj,vk𝒩i𝐀jk=j,k𝐀i,j𝐀j,k𝐀k,il𝐙i,l=j𝐀i,j(𝐀𝐀)j,il𝐙i,l.\bar{{z}}_{i}^{(0)}T_{i}=\bar{{z}}_{i}^{(0)}\sum_{v_{j},v_{k}\in\mathcal{N}_{i}}\mathbf{A}_{jk}=\sum_{j,k}\mathbf{A}_{i,j}\mathbf{A}_{j,k}\mathbf{A}_{k,i}\sum_{l}\mathbf{Z}_{i,l}\\ =\sum_{j}\mathbf{A}_{i,j}\left(\mathbf{A}\mathbf{A}^{\top}\right)_{j,i}\sum_{l}\mathbf{Z}_{i,l}. (23)

Hence, according to Definition 1, one may write

R(vi;0)=z¯i(0)Ti+2pi+8degi=j𝐀i,j(𝐀2)j,il𝐙i,l+2j(𝐀2)i,j+8j𝐀i,j.R(v_{i};0)=\bar{{z}}_{i}^{(0)}T_{i}+2p_{i}+8\mathrm{deg}_{i}\\ =\sum_{j}\mathbf{A}_{i,j}\left(\mathbf{A}^{2}\right)_{j,i}\sum_{l}\mathbf{Z}_{i,l}+2\sum_{j}(\mathbf{A}^{2})_{i,j}+8\sum_{j}\mathbf{A}_{i,j}. (24)

We denote by [1]N×1N×1[1]_{N\times 1}\in\mathbb{R}^{N\times 1} whose entries are all equal to 11. We utilize right multiplication by [1]1×N[1]_{1\times N} for row-wise summation of the matrix. Subsequently, we engage in the augmentation of R()R(\cdot) to its vectorized representation:

𝐑(vi;0)=(𝐀3𝐙𝟎[1]N×1+2𝐀2[1]N×1+8𝐀[1]N×1)i\mathbf{R}^{\prime}(v_{i};0)=(\mathbf{A}^{3}\mathbf{Z_{0}}[1]_{N\times 1}+2\mathbf{A}^{2}[1]_{N\times 1}+8\mathbf{A}[1]_{N\times 1})_{i} (25)

Here, 𝐙𝟎=[𝐙|𝟎]N×N\mathbf{Z_{0}}=[\mathbf{Z}|\mathbf{0}]\in\mathbb{R}^{N\times N} represents the expansion of 𝐙\mathbf{Z} into a square matrix 𝐙𝟎\mathbf{Z_{0}} by padding zeros on the right side of 𝐙\mathbf{Z}. Moving forward, we incorporate the Sigmoid function S()S(\cdot) and elucidate its role through the matrix notation

𝐁=S(𝐀S(𝐀𝐙)),𝐂=S(𝐀𝐙).\mathbf{B}=S(\mathbf{A}S(\mathbf{A}\mathbf{Z})),\mathbf{C}=S(\mathbf{A}\mathbf{Z}). (26)

Then we stipulate the minimum error function as

E(𝐗)=1288𝐗3+O(𝐗5),E(\mathbf{X})=-\frac{1}{288}\mathbf{X}^{3}+O(\mathbf{X}^{5}), (27)

and denote following minimum error terms 𝐨1=E(𝐀𝐙)\mathbf{o}_{1}=E(\mathbf{A}\mathbf{Z}), 𝐨2=E(𝐀𝐂)\mathbf{o}_{2}=E(\mathbf{A}\mathbf{C}), 𝐨3=E(𝐀𝐁)\mathbf{o}_{3}=E(\mathbf{A}\mathbf{B}) and 𝐨=116𝐀2𝐨1[1]N×1+14𝐀𝐨2[1]N×1+𝐨3[1]N×1\mathbf{o}=\frac{1}{16}\mathbf{A}^{2}\mathbf{o}_{1}[1]_{N\times 1}+\frac{1}{4}\mathbf{A}\mathbf{o}_{2}[1]_{N\times 1}+\mathbf{o}_{3}[1]_{N\times 1}. Utilizing the aforementioned notation, we can deduce

164(𝐀3𝐙𝟎[1]N×1+2𝐀2[1]N×1+8𝐀[1]N×1)+𝐨+[12]N×1\displaystyle\frac{1}{64}\left(\mathbf{A}^{3}\mathbf{Z_{0}}[1]_{N\times 1}+2\mathbf{A}^{2}[1]_{N\times 1}+8\mathbf{A}[1]_{N\times 1}\right)+\mathbf{o}+[\frac{1}{2}]_{N\times 1}
=[12]N×1+164𝐀3𝐙𝟎[1]N×1+116𝐀2[12]N×1+14𝐀[12]N×1+𝐨\displaystyle=[\frac{1}{2}]_{N\times 1}+\frac{1}{64}\mathbf{A}^{3}\mathbf{Z_{0}}[1]_{N\times 1}+\frac{1}{16}\mathbf{A}^{2}[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}[\frac{1}{2}]_{N\times 1}+\mathbf{o}
=[12]N×1+14(14)2𝐀3𝐙𝟎[1]N×1+1414𝐀2[12]N×1+14𝐀[12]N×1\displaystyle=[\frac{1}{2}]_{N\times 1}+\frac{1}{4}(\frac{1}{4})^{2}\mathbf{A}^{3}\mathbf{Z_{0}}[1]_{N\times 1}+\frac{1}{4}\frac{1}{4}\mathbf{A}^{2}[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}[\frac{1}{2}]_{N\times 1}
+1414𝐀2𝐨1[1]N×1+14𝐀𝐨2[1]N×1+𝐨3[1]N×1\displaystyle\quad+\frac{1}{4}\frac{1}{4}\mathbf{A}^{2}\mathbf{o}_{1}[1]_{N\times 1}+\frac{1}{4}\mathbf{A}\mathbf{o}_{2}[1]_{N\times 1}+\mathbf{o}_{3}[1]_{N\times 1}
=[12]N×1+14𝐀([12]N×1+14𝐀[12]N×1+(14)2𝐀2𝐙𝟎[1]N×1\displaystyle=[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}\Big{(}[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}[\frac{1}{2}]_{N\times 1}+(\frac{1}{4})^{2}\mathbf{A}^{2}\mathbf{Z_{0}}[1]_{N\times 1}
+14𝐀𝐨1[1]N×1+𝐨2[1]N×1)+𝐨3[1]N×1\displaystyle\quad+\frac{1}{4}\mathbf{A}\mathbf{o}_{1}[1]_{N\times 1}+\mathbf{o}_{2}[1]_{N\times 1}\Big{)}+\mathbf{o}_{3}[1]_{N\times 1}
=[12]N×1+14𝐀([12]N×1+14𝐀([12]N×1+14𝐀𝐙𝟎[1]N×1\displaystyle=[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}\bigg{(}[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}\Big{(}[\frac{1}{2}]_{N\times 1}+\frac{1}{4}\mathbf{A}\mathbf{Z_{0}}[1]_{N\times 1}
+𝐨1[1]N×1)+𝐨2[1]N×1)+𝐨3[1]N×1.\displaystyle\quad+\mathbf{o}_{1}[1]_{N\times 1}\Big{)}+\mathbf{o}_{2}[1]_{N\times 1}\bigg{)}+\mathbf{o}_{3}[1]_{N\times 1}. (28)

For Equation (Proof of Theorem 2), we extract [1]N×1[1]_{N\times 1} (there are countless methods of extraction, this is because the right multiplication by [1]N×1[1]_{N\times 1} gives [12]N×1[\frac{1}{2}]_{N\times 1}, indicating that the sum of each row of an N×NN\times N matrix is 12\frac{1}{2}. Here, we consider the case where all elements of this matrix have the average value 12N\frac{1}{2N}), thereby deriving Equation (Proof of Theorem 2) into:

164(𝐀3𝐙𝟎[1]N×1+2𝐀2[1]N×1+8𝐀[1]N×1)+𝐨+[12]N×1\displaystyle\frac{1}{64}\left(\mathbf{A}^{3}\mathbf{Z_{0}}[1]_{N\times 1}+2\mathbf{A}^{2}[1]_{N\times 1}+8\mathbf{A}[1]_{N\times 1}\right)+\mathbf{o}+[\frac{1}{2}]_{N\times 1}
=([12N]N×N+14𝐀([12N]N×N+14𝐀([12N]N×N+14𝐀𝐙𝟎\displaystyle=\bigg{(}[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\bigg{(}[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\Big{(}[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{Z_{0}}
+𝐨1)+𝐨2)+𝐨3)[1]N×1.\displaystyle\quad+\mathbf{o}_{1}\Big{)}+\mathbf{o}_{2}\bigg{)}+\mathbf{o}_{3}\bigg{)}[1]_{N\times 1}. (29)

The matrix-based Taylor series expansion [4] for 𝐗\mathbf{X} is

S(𝐗)=[S(0)N]N×N+S(0)𝐗+12𝐗2S′′(0)+16𝐗3S′′′(0)+124𝐗4S′′′′(0)+O(𝐗)5).S(\mathbf{X})=[\frac{S(0)}{N}]_{N\times N}+S^{\prime}(0)\mathbf{X}+\frac{1}{2}\mathbf{X}^{2}S^{\prime\prime}(0)+\\ \frac{1}{6}\mathbf{X}^{3}S^{\prime\prime\prime}(0)+\frac{1}{24}\mathbf{X}^{4}S^{\prime\prime\prime\prime}(0)+O(\mathbf{X})^{5}). (30)

Since S(0)=12S(0)=\frac{1}{2}, S(0)=14S^{\prime}(0)=\frac{1}{4}, S′′(0)=0S^{\prime\prime}(0)=0, S′′′(0)=148S^{\prime\prime\prime}(0)=-\frac{1}{48} and S′′′′(0)=0S^{\prime\prime\prime\prime}(0)=0, by substituting them into Equation (30), we have

S(𝐗)\displaystyle S(\mathbf{X}) =[12N]N×N+14𝐗+012𝐗214816𝐗3+0124𝐗4+O(𝐗5)\displaystyle=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{X}+0\frac{1}{2}\mathbf{X}^{2}-\frac{1}{48}\frac{1}{6}\mathbf{X}^{3}+0\frac{1}{24}\mathbf{X}^{4}+O(\mathbf{X}^{5})
=[12N]N×N+14𝐗1288𝐗3+O(𝐗5)\displaystyle=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{X}-\frac{1}{288}\mathbf{X}^{3}+O(\mathbf{X}^{5})
=[12N]N×N+14𝐗+E(𝐗).\displaystyle=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{X}+E(\mathbf{X}). (31)

Hence, Equation (Proof of Theorem 2), through astute algebraic manipulation, can be metamorphosed into the form of a Taylor series expansion, as delineated by Equation (Proof of Theorem 2). In the ensuing steps, we shall meticulously engage in the transformation of the discrete constituents within Equation (Proof of Theorem 2), with the aim of perpetuating the advancement of its derivation

[12N]N×N+14𝐀+𝐨1\displaystyle[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}+\mathbf{o}_{1}
=[12N]N×N+14𝐀𝐙𝟎+E(𝐀𝐙𝟎)=S(𝐀𝐙𝟎)=𝐂,\displaystyle\qquad=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{Z_{0}}+E(\mathbf{A}\mathbf{Z_{0}})=S(\mathbf{A}\mathbf{Z_{0}})=\mathbf{C},
[12N]N×N+14𝐀([12N]N×N+14𝐀𝐙𝟎+𝐨1)+𝐨2\displaystyle[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\Big{(}[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{Z_{0}}+\mathbf{o}_{1}\Big{)}+\mathbf{o}_{2}
=[12N]N×N+14𝐀𝐂+𝐨2\displaystyle\qquad=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{C}+\mathbf{o}_{2}
=[12N]N×N+14𝐀𝐂+E(𝐀𝐂)=S(𝐀𝐂)=𝐁,\displaystyle\qquad=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{C}+E(\mathbf{A}\mathbf{C})=S(\mathbf{A}\mathbf{C})=\mathbf{B},
[12N]N×N+14𝐀([12N]N×N+14𝐀([12N]N×N+14𝐀𝐙𝟎\displaystyle[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\bigg{(}[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\Big{(}[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{Z_{0}}
+𝐨1)+𝐨2)+𝐨3=[12N]N×N+14𝐀𝐁+𝐨3\displaystyle+\mathbf{o}_{1}\Big{)}+\mathbf{o}_{2}\bigg{)}+\mathbf{o}_{3}=[\frac{1}{2N}]_{N\times N}+\frac{1}{4}\mathbf{A}\mathbf{B}+\mathbf{o}_{3}
=S(𝐀𝐁)=S(𝐀S(𝐀S(𝐀𝐙𝟎))).\displaystyle\qquad=S(\mathbf{A}\mathbf{B})=S(\mathbf{A}S(\mathbf{A}S(\mathbf{A}\mathbf{Z_{0}}))). (32)

Substituting Equation (Proof of Theorem 2) into Equation (Proof of Theorem 2), we can ultimately derive Equation (Proof of Theorem 2) into

164(𝐀3𝐙𝟎[1]N×1+2𝐀2[1]N×1+8𝐀[1]N×1)+𝐨+[12]N×1\displaystyle\frac{1}{64}\left(\mathbf{A}^{3}\mathbf{Z_{0}}[1]_{N\times 1}+2\mathbf{A}^{2}[1]_{N\times 1}+8\mathbf{A}[1]_{N\times 1}\right)+\mathbf{o}+[\frac{1}{2}]_{N\times 1}
=S(𝐀S(𝐀S(𝐀𝐙𝟎)))[1]N×1\displaystyle\qquad=S(\mathbf{A}S(\mathbf{A}S(\mathbf{A}\mathbf{Z_{0}})))[1]_{N\times 1}
=SUMrow(S(𝐀S(𝐀S(𝐀𝐙𝟎)))i,:),\displaystyle\qquad=\mathrm{\scriptstyle SUM_{row}}(S(\mathbf{A}S(\mathbf{A}S(\mathbf{A}\mathbf{Z_{0}})))_{i,:}), (33)

where SUMrow()\mathrm{\scriptstyle SUM_{row}}(\cdot) denotes the operation of summing the elements of each row in the input matrix. For 𝐙𝟎\mathbf{Z_{0}}, irrespective of left multiplication by any matrix or application of an element-wise activation function, its rightmost Nd0N-d_{0} columns persist as zero. As a result, when right-multiplied by [1]N×1[1]_{N\times 1}, (which effectively sums each row), the outcomes for 𝐙𝟎\mathbf{Z_{0}} and 𝐙\mathbf{Z} are identical. This is because the zeroed columns in 𝐙𝟎\mathbf{Z_{0}} do not contribute to the row sum. Consequently, even when 𝐙𝟎\mathbf{Z_{0}} is reverted back to 𝐙\mathbf{Z}, Equation (Proof of Theorem 2) still holds. Then, it is perceptible that the yield of Equation (Proof of Theorem 2) is emblematic of a 3rd MP scheme, propelled by the GCN — wherein the GCN variant deployed abstains from the consideration of self-loops, thereby adopting the use of matrix 𝐀\mathbf{A} in lieu of 𝐋\mathbf{L} — and integrates the sigmoid function as its activation. Therefore, we can write (𝐙,3;𝐀)=S(𝐀S(𝐀S(𝐀𝐙)))\mathcal{M}(\mathbf{Z},3;\mathbf{A})=S(\mathbf{A}S(\mathbf{A}S(\mathbf{A}\mathbf{Z}))). Consequently, by effectuating the incorporation of Equation (Proof of Theorem 2) into Equations (Proof of Theorem 2) and (25), and electing to neglect the error magnitudes 𝐨\mathbf{o}, we are enabled to derive the following expression

𝐑(vi;0)\displaystyle\mathbf{R}^{\prime}(v_{i};0) 64SUMrow((𝐙,3;𝐀)i,:)[32]N×1.\displaystyle\propto 64\mathrm{\scriptstyle SUM_{row}}(\mathcal{M}(\mathbf{Z},3;\mathbf{A})_{i,:})-[32]_{N\times 1}. (34)

Similarly, we have

𝐑(vi;1)\displaystyle\mathbf{R}^{\prime}(v_{i};1) 64SUMrow((𝐙(1),3;𝐀))i,:[32]N×1\displaystyle\propto 64\mathrm{\scriptstyle SUM_{row}}(\mathcal{M}(\mathbf{Z}^{(1)},3;\mathbf{A}))_{i,:}-[32]_{N\times 1}
=64SUMrow(𝐙,4;𝐀))i,:[32]N×1.\displaystyle=64\mathrm{\scriptstyle SUM_{row}}\mathcal{M}(\mathbf{Z},4;\mathbf{A}))_{i,:}-[32]_{N\times 1}. (35)

Hence, we can write

𝐑(vi;k)\displaystyle\mathbf{R}^{\prime}(v_{i};k) 64SUMrow((𝐙,k+3;𝐀))i,:[32]N×1\displaystyle\propto 64\mathrm{\scriptstyle SUM_{row}}(\mathcal{M}(\mathbf{Z},k+3;\mathbf{A}))_{i,:}-[32]_{N\times 1}
=64[z¯i(k+3)]N×1[32]N×1.\displaystyle=64[\bar{{z}}_{i}^{(k+3)}]_{N\times 1}-[32]_{N\times 1}. (36)

Through the transposition of Equation (Proof of Theorem 2) into a scalar representation (i.e., R(vi;k)R(v_{i};k)), we are capacitated to consummate the derivation of Equation (2).

Proof of Theorem 3

Adversaries employ the following methodology to attack node viv_{i}: Initially, a target label 𝐲i\mathbf{y}^{\prime}_{i} is defined, wherein 𝐲i\mathbf{y}^{\prime}_{i} and 𝐲i\mathbf{y}_{i} differ only in the label value of the target nodes (regardless of single target or multiple targets), which is altered to the target labels, while the labels of all other nodes remain unchanged. Subsequently, by modifying the graph structure, specifically 𝐀\mathbf{A}, with a budget constraint rr a new graph which with 𝐀\mathbf{A}^{\prime} as adjacency matrix is obtained. The objective is to maximize the similarity between the output and 𝐲i\mathbf{y}^{\prime}_{i} (i.e., minimize the loss between them) after the forward propagation by \mathcal{M}.

Ultimately, as the primary goal is to manipulate the one-hot output [5], i.e., the magnitude relationship in the ii-th row of 𝐙(K)\mathbf{Z}^{(K)}, an activation function is added only in the final layer to compute the loss function with the label matrix [6]. Consequently, the optimization target of the adversaries can be expressed as:

argmin𝐀MSE(σ(𝐀K𝐙i=1K𝐖(i)),𝐘).\arg\min_{\mathbf{A}^{\prime}}\mathrm{MSE}(\sigma(\mathbf{A}^{\prime K}\mathbf{Z}\prod_{i=1}^{K}\mathbf{W}^{(i)}),\mathbf{Y}^{\prime}). (37)

For rr, less than 5% of total edges is considered to be inconspicuous in common. Hence, given the necessity to minimize the perceptibility of the perturbation, r<5%|𝒱|r<5\%|\mathcal{V}|. Therefore, by introducing a sparse matrix 𝐏𝟏\mathbf{P_{1}} containing only 0, 1, and -1, the perturbation process can be represented as 𝐀=𝐀+𝐏𝟏\mathbf{A}^{\prime}=\mathbf{A}+\mathbf{P_{1}}. Simultaneously, during training, as the activation functions between layers have been disregarded (as aforementioned, this does not affect the size relationship of the output), the parameter matrices between different layers can be unified into a single parameter matrix. We denote 𝐖K=i=1K𝐖(i)\mathbf{W}_{K}=\prod_{i=1}^{K}\mathbf{W}^{(i)}, hence, Equation (37) can be rewritten as:

argmin𝐏𝟏MSE(σ((𝐀+𝐏𝟏)K𝐙𝐖K),𝐘).\arg\min_{\mathbf{P_{1}}}\mathrm{MSE}(\sigma((\mathbf{A}+\mathbf{P_{1}})^{K}\mathbf{Z}\mathbf{W}_{K}),\mathbf{Y}^{\prime}). (38)

Considering when K=1K=1, Equation (38) becomes:

argmin𝐏𝟏MSE(σ(𝐀𝐙𝐖(1)+𝐏𝟏𝐙𝐖(1)),𝐲).\arg\min_{\mathbf{P_{1}}}\mathrm{MSE}(\sigma(\mathbf{A}\mathbf{Z}\mathbf{W}^{(1)}+\mathbf{P_{1}}\mathbf{Z}\mathbf{W}^{(1)}),\mathbf{y^{\prime}}). (39)

To quantify the cost of the equation above, we consider that in non-adversarial scenarios, the GCN can reach the computational universality of the ground-truth labels, i.e., there always exists 𝐖\mathbf{W} and a minimal value σ\sigma such that MSE(σ(𝐀𝐙𝐖),𝐘)ϵ\mathrm{MSE}(\sigma(\mathbf{A}\mathbf{Z}\mathbf{W}),\mathbf{Y})\leq\epsilon. Since 𝐏𝟏\mathbf{P_{1}} is perturbable, to maintain the concealment of the perturbations and minimize the modification degree of the GCN parameters, i.e., 𝐖(1)=𝐖\mathbf{W}^{(1)}=\mathbf{W}, it is only necessary to ensure that, with an identity activation σI()\sigma_{I}(\cdot), σI(𝐏𝟏𝐙𝐖(1))\sigma_{I}(\mathbf{P_{1}}\mathbf{Z}\mathbf{W}^{(1)}) is as close to a zero matrix as possible. Thus, Equation (39) can be further simplified to:

argmin𝐏MSE(σI(𝐏𝟏𝐙𝐖(1)),[0]N).\arg\min_{\mathbf{P}}\mathrm{MSE}(\sigma_{I}(\mathbf{P_{1}}\mathbf{Z}\mathbf{W}^{(1)}),[0]_{N}). (40)

Upon observation, we consider 𝐏𝟏\mathbf{P_{1}} can represent a distinct graph 𝒢𝐏𝟏\mathcal{G}_{\mathbf{P_{1}}}, which has the same number of nodes as 𝒢\mathcal{G}, with most nodes being isolated. Perturbations can be viewed as edges, where inserted edges (i.e., when an edge is inserted between nodes viv_{i} and vjv_{j}, 𝐏𝟏i,j=1\mathbf{P_{1}}_{i,j}=1) have a weight of 11, and deleted edges (i.e., when an edge is deleted between nodes viv_{i} and vjv_{j}, 𝐏𝟏i,j=1\mathbf{P_{1}}_{i,j}=-1) have a weight of 1-1. Therefore, the equation can be approximated as another GCN training paradigm. The difference is that the training parameters have changed from 𝐖\mathbf{W} to 𝐏𝟏\mathbf{P_{1}}. At this point, since 𝒢𝐏𝟏\mathcal{G}_{\mathbf{P_{1}}} is a disconnected subgraph of 𝒢\mathcal{G} with rr edges, hence, when K=1K=1, the maximum number of disconnected subgraphs of 𝒢\mathcal{G} with rr edges is the maximum number of forward propagations that the attack model needs to compute. That is, Cost(𝐀,r,1)C(|𝒱|,r)\mathrm{Cost}(\mathbf{A},r,1)\leq C(|\mathcal{V}|,r).

Proceeding to the case where K=2K=2, with 𝐏𝟐\mathbf{P_{2}} which plays the same role of 𝐏𝟏\mathbf{P_{1}}, Equation (38) can be reformulated as follows:

argmin𝐏𝟐MSE(σ(𝐀2𝐙𝐖2+𝐀𝐏𝟐𝐙𝐖2+𝐏𝟐2𝐙𝐖2),𝐘).\arg\min_{\mathbf{P_{2}}}\mathrm{MSE}(\sigma(\mathbf{A}^{2}\mathbf{Z}\mathbf{W}_{2}+\mathbf{A}\mathbf{P_{2}}\mathbf{Z}\mathbf{W}_{2}+\mathbf{P_{2}}^{2}\mathbf{Z}\mathbf{W}_{2}),\mathbf{Y}^{\prime}). (41)

Analogous to the scenario where K=1K=1, the objective of the adversary for K=2K=2 can be simplified to:

argmin𝐏𝟐MSE(σI((2𝐀+𝐏𝟐)𝐏𝟐𝐙𝐖2),[0]N).\arg\min_{\mathbf{P_{2}}}\mathrm{MSE}(\sigma_{I}((2\mathbf{A}+\mathbf{P_{2}})\mathbf{P_{2}}\mathbf{Z}\mathbf{W}_{2}),[0]_{N}). (42)

This expression can be interpreted as an extension of the K=1K=1, case: an attack on a graph 𝒢2𝐀\mathcal{G}_{2\mathbf{A}} with an adjacency matrix as 2𝐀2\mathbf{A} The maximum number of forward propagations for the attacker is equal to the number of largest disconnected subgraphs in 𝒢2𝐀\mathcal{G}_{2\mathbf{A}}, which is the same as the K=1K=1 case. We present illustrative examples of 𝒢\mathcal{G}, 𝒢𝐏𝟏\mathcal{G}_{\mathbf{P_{1}}}, 𝒢𝐏𝟐\mathcal{G}_{\mathbf{P_{2}}} and 𝒢2𝐀\mathcal{G}_{2\mathbf{A}} in Figure 7.

Refer to caption
Figure 7: For K=1K=1 and K=2K=2, the primary number of forward propagations for the attacker is determined by 𝒢𝐏𝟏\mathcal{G}_{\mathbf{P_{1}}} and 𝒢𝐏𝟐\mathcal{G}_{\mathbf{P_{2}}}, irrespective of the graphs 𝒢\mathcal{G} and 𝒢2𝐀\mathcal{G}_{2\mathbf{A}} that constitute 𝒢𝐏𝟏\mathcal{G}_{\mathbf{P_{1}}} and 𝒢𝐏𝟐\mathcal{G}_{\mathbf{P_{2}}}.
Refer to caption
(a) Values of 𝐝k,kgap\mathbf{d}_{k,k_{gap}} under different kk and kgapk_{gap} settings.
Refer to caption
(b) Strength distribution of 𝒢\mathcal{G}, 𝒢LRS\mathcal{G}_{LRS} and 𝒢random\mathcal{G}_{random}.
Figure 8: Additional Experimental results.

The situation differs when K=3K=3. Specifically, Equation (38) for K=3K=3 is given by:

argmin𝐏𝟑MSE(σ(𝐀3𝐙𝐖3+𝐀2𝐏𝟑𝐙𝐖3+𝐀𝐏𝟑2𝐙𝐖3+𝐏𝟑3𝐙𝐖3),𝐘).\arg\min_{\mathbf{P_{3}}}\mathrm{MSE}(\sigma(\mathbf{A}^{3}\mathbf{Z}\mathbf{W}_{3}+\mathbf{A}^{2}\mathbf{P_{3}}\mathbf{Z}\mathbf{W}_{3}\\ +\mathbf{A}\mathbf{P_{3}}^{2}\mathbf{Z}\mathbf{W}_{3}+\mathbf{P_{3}}^{3}\mathbf{Z}\mathbf{W}_{3}),\mathbf{Y}^{\prime}). (43)

In the above expression, it is required to ensure:

argmin𝐏MSE(σI(𝐀2𝐏𝐙𝐖3+𝐀𝐏2𝐙𝐖3+𝐏3𝐙𝐖3),[0]N).\arg\min_{\mathbf{P}}\mathrm{MSE}(\sigma_{I}(\mathbf{A}^{2}\mathbf{P}\mathbf{Z}\mathbf{W}_{3}+\mathbf{A}\mathbf{P}^{2}\mathbf{Z}\mathbf{W}_{3}+\mathbf{P}^{3}\mathbf{Z}\mathbf{W}_{3}),[0]_{N}). (44)

It can be observed that for K=3K=3 the maximum number of connected subgraphs in 𝒢𝐏𝟑\mathcal{G}_{\mathbf{P_{3}}} depends on the number of edges in the graph 𝒢𝐀2\mathcal{G}_{\mathbf{A}^{2}}, represented by 𝐀2\mathbf{A}^{2}, rather than the number of edges in 𝒜\mathcal{A} (which is the case for K=1K=1 and K=2K=2). In 𝐀2\mathbf{A}^{2}, each element 𝐀i,j2\mathbf{A}^{2}_{i,j} represents the number of paths of length 2 from node viv_{i} to vjv_{j} in 𝒢𝐀2\mathcal{G}_{\mathbf{A}^{2}}. Therefore, the number of non-zero elements in 𝐀2\mathbf{A}^{2} can be at most 𝒱2\mathcal{V}^{2}, as each edge can form a path of length 2 with the other 𝒱1\mathcal{V}-1 nodes. Hence, the upper bound on the number of edges contained in 𝒢𝐀2\mathcal{G}_{\mathbf{A}^{2}} is |𝒱|22\frac{|\mathcal{V}|^{2}}{2}. The maximum number of disconnected rr-edges subgraphs (i.e., 𝒢𝐏𝟑\mathcal{G}_{\mathbf{P_{3}}}) is C(|𝒱|22,r)C(\frac{|\mathcal{V}|^{2}}{2},r).

Upon obtaining 𝒢𝐏𝟑\mathcal{G}_{\mathbf{P_{3}}}, it needs to be sent to the last two terms, namely 𝐀𝐏2𝐙𝐖3\mathbf{A}\mathbf{P}^{2}\mathbf{Z}\mathbf{W}_{3} and 𝐏3𝐙𝐖3\mathbf{P}^{3}\mathbf{Z}\mathbf{W}_{3} for computation. Considering that 𝐏𝟑\mathbf{P_{3}} is a sparse matrix, 𝐏𝟑3\mathbf{P_{3}}^{3} is generally a zero matrix, implying that all nodes in the graph 𝒢𝐏𝟑3\mathcal{G}_{\mathbf{P_{3}}^{3}} formed by it are isolated. Consequently, the last two terms can be reduced to one terms, meaning that after obtaining 𝒢𝐏𝟑\mathcal{G}_{\mathbf{P_{3}}}, one more matrix multiplication-based forward propagations are needed. In summary, Cost(𝐀,r,3)2C(|𝒱|22,r)\mathrm{Cost}(\mathbf{A},r,3)\leq 2C\left(\frac{|\mathcal{V}|^{2}}{2},r\right).

Utilizing the same methodology, for K>3K>3, each decomposition introduces a graph 𝒢𝐀K1\mathcal{G}_{\mathbf{A}^{K-1}} with 𝐀K1\mathbf{A}^{K-1} as the adjacency matrix, and a linearly increasing number of summation terms. Combining the above analysis, we can conclude that the maximum number of edges in 𝒢𝐀K1\mathcal{G}_{\mathbf{A}^{K-1}}, grows exponentially, and the number of matrix multiplication-based forward propagation summation terms determined by the maximum number of disconnected subgraphs 𝒢𝐏𝐫\mathcal{G}_{\mathbf{P_{r}}} in 𝒢𝐀K1\mathcal{G}_{\mathbf{A}^{K-1}} grows linearly. That is, when K3K\geq 3, Cost(𝐀,r,K)(K1)C(|𝒱|K12,r)\mathrm{Cost}(\mathbf{A},r,K)\leq(K-1)C\left(\frac{|\mathcal{V}|^{K-1}}{2},r\right), which completes the proof.

Proof of Proposition 1

We adopt an intuitive approach to simplify this computation. Given that 𝒢j,k\mathcal{G}_{j,k} contains one fewer edge than 𝒢\mathcal{G} (i.e., (vj,vk)(v_{j},v_{k})), 𝐀𝒢j,k\mathbf{A}_{\mathcal{G}^{j,k}} and 𝐀\mathbf{A} differ in only two elements, hence the subtraction 𝐀𝐀𝒢j,k\mathbf{A}-\mathbf{A}_{\mathcal{G}^{j,k}} results in a sparse matrix 𝐃j,k\mathbf{D}^{j,k}, where the majority of elements are zero except for the elements at the jj-th row and kk-th column, and kk-th row and jj-th column, which are 1. Given that the parameters of 𝐖(1)\mathbf{W}^{(\ell-1)} have been trained during the calculation of the \ell-th layer’s edge-transmitted signals, we use 𝐙𝐖(1)=𝐙(1)𝐖(1)\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}=\mathbf{Z}^{(\ell-1)}\mathbf{W}^{(\ell-1)} to represent the constant matrix in the perspective of the \ell-th layer. Then, Equation (5) simplifies to

𝐙𝒢j,k()=𝐀𝐙𝐖(1)𝐃j,k𝐙𝐖(1).\mathbf{Z}^{(\ell)}_{\mathcal{G}^{j,k}}=\mathbf{A}\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}-\mathbf{D}^{j,k}\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}. (45)

Given that 𝐃j,k\mathbf{D}^{j,k} is a symmetric matrix with only two non-zero values, the result of 𝐃j,k𝐙𝐖(1)\mathbf{D}^{j,k}\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}, defined as

Q(𝐙𝐖(1);j,k)=[𝟎,,[𝐙𝐖(1)]k,:j-th row,,[𝐙𝐖(1)]j,:k-th row,,𝟎],Q(\mathbf{Z}{\mathbf{W}}^{(\ell-1)};j,k)=[\mathbf{0},\ldots,\underbrace{[\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}]_{k,:}}_{j\text{-th row}},\ldots,\underbrace{[\mathbf{Z}_{\mathbf{W}}^{(\ell-1)}]_{j,:}}_{k\text{-th row}},\ldots,\mathbf{0}]^{\top}, (46)

does not need to be recalculated for each vjv_{j}, vkv_{k}, but can instead be reassembled by indexing the rows of 𝐙𝐖(1)\mathbf{Z}_{\mathbf{W}}^{(\ell-1)} according to the chosen of vjv_{j}, vkv_{k}, i.e.,

Through the method stated above, Equation (5) is eventually simplified to the conclusion of the Proposition 1 (Equation (7)). It can be observed that Equation (7) encompasses all arbitrary selections of vjv_{j} and vkv_{k}, but only the 𝐙𝐖(1)\mathbf{Z}_{\mathbf{W}}^{(\ell-1)} and 𝐀G𝐙𝐖(1)\mathbf{A}_{G}\mathbf{Z}_{\mathbf{W}}^{(\ell-1)} term is required to be computed. Thus, during the computation of edge-transmitted signals in GRN, we can reduce the initial |𝒱||\mathcal{V}| computations down to a single computation. Thereafter, when transitioning among different edges, it is only necessary to conduct a column-wise concatenation and substraction of the matrices.

Additional Experiments

In this section, we elucidate the behavior of layers arising from resonance across an expanded range of datasets. Concurrently, we delineate the strength distribution of LRS-constructed graphs drawn from additional datasets. The experimental framework employed here remains consistent with the main body of the text. It is imperative to note that, in the interest of maintaining methodological rigor and fairness, these experiments were conducted based on randomized initializations that have been reset. Consequently, under identical experimental conditions, there exists a slight discrepancy between these results and those presented in the primary narrative. Figures 8(a) and 8(b), which can be conceptually viewed as extensions of Figures 4 and 5(b) respectively, depict these outcomes. The conclusions derived from the illustrated results demonstrate a congruence with the central assertions delineated in the main discourse.

References

  • [1]

    Breuer Heinz-Peter, Huber Wolfgang, and Petruccione Francesco. 1994. Fluctuation effects on wave propagation in a reaction-diffusion process. Physica D, 73(3): 259–273.

  • [2]

    Golub Gene H and Reinsch Christian. 1971. Singular value decomposition and least squares solutions. Linear algebra.

  • [3]

    Vannieuwenhoven Nick, Vandebril Raf and Meerbergen Karl. 2012. A new truncation strategy for the higher-order singular value decomposition. SIAM J. Sci. Comput., 34(2): 1027–1052.

  • [4]

    Al-Mohy, Awad H and Higham, Nicholas J. 2011. Computing the action of the matrix exponential, with an application to exponential integrators. SIAM J. Sci. Comput., 33(2): 488–511.

  • [5]

    Sun Lichao, Dou Yingtong, Yang Carl, et al. 2022. Adversarial attack and defense on graph data: A survey. IEEE Trans. Knowl. Data Eng., 35(8): 7693–7711.

  • [6]

    Zügner Daniel, Akbarnejad Amir and Günnemann Stephan. 2018. Adversarial attacks on neural networks for graph data. Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2847–2856.