This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Generalized Laplacian Regularized Framelet Graph Neural Networks

Zhiqi Shao111University of Sydney ([email protected], [email protected], [email protected], [email protected])    Andi Han11footnotemark: 1    Dai Shi222Western Sydney University ([email protected])
Zhiqi Shao, Andi Han and Dai Shi are with equal contribution.
11footnotemark: 1
   Andrey Vasnev 11footnotemark: 1    Junbin Gao11footnotemark: 1
Abstract

Graph neural networks (GNNs) have achieved remarkable results for various graph learning tasks. However, one of the recent challenges for GNNs is to adapt to different types of graph inputs, such as heterophilic graph datasets in which linked nodes are more likely to contain a different class of labels and features. Accordingly, an ideal GNN model should adaptively accommodate all types of graph datasets with different labeling distributions. In this paper, we tackle this challenge by proposing a regularization framework on graph framelet with the regularizer induced from graph pp-Laplacian. By adjusting the value of pp, the pp-Laplacian based regularizer restricts the solution space of graph framelet into the desirable region based on the graph homophilic features. We propose an algorithm to effectively solve a more generalized regularization problem and prove that the algorithm imposes a (pp-Laplacian based) spectral convolution and diagonal scaling operation to the framelet filtered node features. Furthermore, we analyze the denoising power of the proposed model and compare it with the predefined framelet denoising regularizer. Finally, we conduct empirical studies to show the prediction power of the proposed model in both homophily undirect and heterophily direct graphs with and without noises. Our proposed model shows significant improvements compared to multiple baselines, and this suggests the effectiveness of combining graph framelet and pp-Laplacian.

Highlight

  • We define two types of new GNN layers by introducing pp-Laplacian regularizers to both decomposed and reconstructed framelets.

  • We provide an iterative algorithm to fit the proposed models and explore an alternative algorithm developed from the F-norm Locality Preserving Projection (LPP).

  • We prove that our algorithm provides a sequence of spectral graph convolutions and diagonal scaling over framelet-filtered graph signals and this further explains the interaction between pp-Laplacian regularizer and framelet.

  • We link the proposed pp-Laplacian regularizer with the previous work of framelet regularizer to illustrate the denoising power of our model.

  • We test the model on homophilic (undirected) and heterophilic (directed) graphs. To our best knowledge, we are the first to explore the possibility of applying the framelet GCN for directed graphs

1 Introduction

Graph neural networks (GNNs) have demonstrated remarkable ability for graph learning tasks (Bronstein et al, 2017; Wu et al, 2020; Zhang et al, 2022; Zhou et al, 2020). The input to GNNs is the so-called graph data which records useful features and structural information among data. Such data are widely seen in many fields, such as biomedical science (Ahmedt-Aristizabal et al, 2021), social networks (Fan et al, 2019), and recommend systems (Wu et al, 2022). GNN models can be broadly categorized into spectral and spatial methods. The spatial methods such as MPNN (Gilmer et al, 2017), GAT (Veličković et al, 2018) and GIN (Xu et al, 2018a) utilize the message passing mechanism to propagate node feature information based on their neighbours (Scarselli et al, 2009). On the other hand, the spectral methods including ChebyNet (Defferrard et al, 2016), GCN (Kipf and Welling, 2017) and BernNet (He et al, 2021) are derived from the classic convolutional networks, treating the input graph data as signals (i.e., a function with the domain of graph nodes) (Ortega et al, 2018), and filtering signals in the Fourier domain (Bruna et al, 2014; Defferrard et al, 2016). Among various spectral-based GNNs, the graph framelets GCN (Zheng et al, 2022), which is constructed on a type of wavelet frame analogized from the identical concept defined on manifold (Dong, 2017), further separates the input signal by predefined low-pass and high-pass filtering functions and resulting a convolution-type model as in the studies (Chen et al, 2022a; Zheng et al, 2021b, 2022). The graph framelet shows great flexibility in terms of controlling both low and high-frequency information with robustness to noise and thus in general possesses high prediction power in multiple graph learning tasks (Chen et al, 2022a; Yang et al, 2022; Zheng et al, 2021b; Zhou et al, 2021a, b; Zou et al, 2022).

Along with the path of developing more advanced models, one of the major challenges in GNN is identified from the aspect of data consistency. For example, many GNN models (Xu et al, 2018a; Wu et al, 2020; Zhu et al, 2020b; Wu et al, 2021; Liu et al, 2022) are designed based on the homophily assumption i.e., nodes with similar features or labels are often linked with each other. Such phenomenon can be commonly observed in citation networks (Ciotti et al, 2016) which have been widely used as benchmarks in GNNs empirical studies. However, the homophily assumption may not always hold, and its opposite, i.e. Heterophily, can be observed quite often in many real-world datasets in which the linked nodes are more likely to contain different class labels and features (Zheng et al, 2021a). For example, in online transaction networks, fraudsters are more likely to link to customers instead of other fraudsters (Pandit et al, 2007). The GNNs designed under homophilic assumption are deemed unsuitable for heterophilic graphs. It is evident from their significant performance degradation (Zheng et al, 2021a). The reason is that the class of heterophilic graphs contains heterogeneous instances and hence the signals should be sharpened rather than smoothed out. An ideal framework for learning on graphs should be able to accommodate both homophilic and heterophilic scenarios.

One of the active aspects to resolve the GNN adaption challenge is by regularizing the solution of GNNs via the perspective of optimization. The work done by Zhu et al (2021) unified most of state-of-the-art GNNs as an optimization framework. Furthermore, one of the recent works (Fu et al, 2022) considered assigning an adjustable pp-Laplacian regularizer to the (discrete) graph regularization problem that is conventionally treated as a way of producing GNN outcomes (i.e., Laplacian smoothing). In view of the fact that the classic graph Laplacian regularizer measures the graph signal energy along edges under L2L_{2} metric, it would be beneficial if GNN could be adapted to heterophilic graphs under LpL_{p} metric (1p<21\leq p<2). Given that L1L_{1} metric is more robust to high-frequency signals, a higher model discriminative power is preserved. The early work (Hu et al, 2018) has demonstrated the advantage of adopting L1L_{1} metric in the Locality Preserving Projection (LPP) model. In addition, the recently proposed pp-Laplacian GNN (Fu et al, 2022) adaptively modifies aggregation weights by exploiting the variance of node embeddings on edges measured by the graph gradient. With a further investigation on the application of pp-Laplacian, Luo et al (2010) suggested an efficient gradient descent optimization strategy to construct the pp-Laplacian embedding space that guarantees convergence to viable local solutions. Although the pp-Laplacian based optimization scheme has been successfully lodged in basic GNN model (Fu et al, 2022), whether it can be further incorporated with more advanced GNN model (i.e., graph framelets) with higher prediction power and flexibility is still unclear. Specifically, one may be interested in identifying whether the advantage of deploying pp-Laplacian in GNN still remains in graph framelet or what is the exact functionality of pp-Laplacian interacting with the feature representations generated both low and high pass framelet domains. In addition, as graph framelets have shown great potential in terms of the graph noise reduction (Zheng et al, 2022; Yang et al, 2022; Zou et al, 2022), whether the inclusion of pp-Laplacian regularizer could enhance/dilute such power remains unclear. These research gaps inspire us to incorporate pp-Laplacian into graph framelets and explore further for the underlying relationship between them.

Since the pp-Laplacian regularized optimization problem lacks a closed-form solution, except for when p=2p=2 (Zhou and Schölkopf, 2005), an iterative algorithm is suggested to estimate the solution and each iteration can be analogized to a GNN layer (Zhou et al, 2021a)). The solution of such an algorithm is defined by the first-order condition of the optimization problem. As a result, one can relate this to the implicit layer approach (Chen et al, 2022b; Zhang et al, 2003) which has the potential of avoiding over-smoothing issues since the adjusted node feature will be re-injected in each iteration step. By adjusting the value of pp, both pp-Laplacian GNN and the algorithm are capable of producing learning outcomes with different discriminative levels and thus able to handle both homophilic and heterophilic graphs. Similar work has been done by Frecon et al (2022) which presents a framework based on the Bregman layer to fulfill the bi-level optimization formulations. To reap the benefit of pp-Laplacian regularization in the framelet domain, in this paper, we propose two pp-Laplacian Regularized Framelet GCN models which are named pp-Laplacian Undecimated Framelet GCN (pL-UFG) and pp-Laplacian Fourier Undecimated Framelet GCN (pL-fUFG). In these models, the pp-Laplacian regularization can be applied on either the framelet domain or the well-known framelet Fourier domain. The pp-Laplacian-based regularizer incurs a penalty to both low and high-frequency domains created by framelet, producing flexible models that are capable of adapting different types of graph inputs (i.e., homophily and heterophily, direct and undirect). We summarize our contributions as follows:

  • We define two types of new GNN layers by introducing pp-Laplacian regularizers to both decomposed and reconstructed framelets. This paves the way for introducing more general Bregman divergence regularization in the graph framelet framework;

  • We propose an iterative algorithm to fit the proposed models and explore an alternative algorithm developed from the F-norm LPP;

  • We prove that the iteration of the proposed algorithm provides a sequence of spectral graph convolutions and diagonal scaling over framelet-filtered graph signals, this gives an deeper explanation of how pp-Laplacian regularizer interacts with the framelet.

  • We connect the proposed pp-Laplacian regularizer to the previously studied framelet regularizer to illustrate the denoising power of the proposed model.

  • We investigate the performance of the new models on graph learning tasks for both homophilic (undirected) graphs and heterophilic (directed) graphs. To our best knowledge, we are the first to explore the possibility of applying the framelet GCN for directed graphs. The experiment results demonstrate the effectiveness of pL-UFG and pL-fUFG on real-world node classification tasks with strong robustness.

The rest of the paper is organized as follows. In Section 2, we introduce the pp-Laplacian operator and regularized GNN, followed by a review of recent studies on regularized graph neural networks, which include implicit layers and graph homophily. In Section 3, we introduce the fundamental properties of graph framelet and propose the pp-Laplacian regularized framelet models. Furthermore, we develop an algorithm to solve the regularization problem that is more general than pp-Laplacian regularization. We also provide theoretical analysis to show how the pp-Laplacian based regularizer interacts with the graph framelet. By the end of Section 3 a brief discussion on the denoising power of the proposed model is presented. Lastly, we present the experimental results and analysis in Section 5. Finally, this paper is concluded in Section 6.

2 Preliminaries

This section provides an in-depth exploration of the fundamental concepts, encompassing graphs, graph framelets, and regularized graph neural networks. For the sake of reader comprehension and ease of following the intended ideas, each newly introduced model and definition will be accompanied by a succinct review of its developmental history.

2.1 Basic Notations

Let 𝒢=(𝒱,,W)\mathcal{G}=(\mathcal{V},\mathcal{E},W) denote a weighted graph, where 𝒱={v1,v2,,vN}\mathcal{V}=\{v_{1},v_{2},\cdots,v_{N}\} and 𝒱×𝒱\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V} represent the node set and the edge set, respectively. 𝐗N×c{\bf X}\in\mathbb{R}^{N\times c} is the feature matrix for 𝒢\mathcal{G} with {𝐱1,𝐱2,,𝐱N}\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{N}\} as its rows, and W=[wi,j]N×NW=[w_{i,j}]\in\mathbb{R}^{N\times N} is the weight matrix on edges with wi,j>0w_{i,j}>0 if (vi,vj)(v_{i},v_{j})\in\mathcal{E} and wi,j=0w_{i,j}=0 otherwise. For undirected graphs, we have wi,j=wj,iw_{i,j}=w_{j,i} which means that WW is a symmetric matrix. For directed graphs, it is likely that wi,jwj,iw_{i,j}\neq w_{j,i} which means that WW may not be a symmetric matrix. In most cases, the weight matrix is the graph adjacency matrix, i.e., wi,j{0,1}w_{i,j}\in\{0,1\} with elements wi,j=1w_{i,j}=1 if (vi,vj)(v_{i},v_{j})\in\mathcal{E} and wi,j=0w_{i,j}=0 otherwise. Furthermore, let 𝒩i={vj:(vi,vj)}\mathcal{N}_{i}=\{v_{j}:(v_{i},v_{j})\in\mathcal{E}\} denote the set of neighbours of node viv_{i} and 𝐃=diag(d1,1,,dN,N)N×N\mathbf{D}=\text{diag}(d_{1,1},...,d_{N,N})\in\mathbb{R}^{N\times N} denote the diagonal degree matrix with di,i=j=1Nwi,jd_{i,i}=\sum^{N}_{j=1}w_{i,j} for i=1,,Ni=1,...,N. The normalized graph Laplacian is defined as 𝐋~=𝐈𝐃12(𝐖+𝐈)𝐃12\widetilde{\mathbf{L}}=\mathbf{I}-\mathbf{D}^{-\frac{1}{2}}(\mathbf{W}+\mathbf{I})\mathbf{D}^{-\frac{1}{2}}. Lastly, for any vector 𝐱=(x1,,xc)c\mathbf{x}=(x_{1},...,x_{c})\in\mathbb{R}^{c}, we use 𝐱2=(i=1cxi2)12\|\mathbf{x}\|_{2}=(\sum^{c}_{i=1}x^{2}_{i})^{\frac{1}{2}} to denote its L2-norm, and similarly for any matrix 𝐌=[mi,j]\mathbf{M}=[m_{i,j}], 𝐌:=𝐌F=(i,jmi,j2)12\|\mathbf{M}\|:=\|\mathbf{M}\|_{F}=(\sum_{i,j}m^{2}_{i,j})^{\frac{1}{2}} is used to denote its Frobenius norm.

2.2 Consistency in Graphs

Generally speaking, most GNN frameworks are designed under the homophily assumption in which the labels of nodes and neighbours in the graph are mostly identical. The recent work by Zhu et al (2020a) emphasises that the general topology GNN fails to obtain outstanding results on the graphs with different class labels and dissimilar features in their connected nodes, which we call heterophilic or low homophilic graphs. The definition of homophilic and heterophilic graphs are given by:

Definition 2.1 (Homophily and Heterophily (Fu et al, 2022)).

The homophily or heterophily of a network is used to define the relationship between labels of connected nodes. The level of homophily of a graph can be measured by (𝒢)=𝔼i𝒱[|{j}j𝒩i,yi=yi|/|𝒩i|]\mathcal{H(G)}=\mathbb{E}_{i\in\mathcal{V}}[|\{j\}_{j\in\mathcal{N}_{i,y_{i}=y_{i}}}|/|\mathcal{N}_{i}|], where |{j}j𝒩i,yi=yi||\{j\}_{j\in\mathcal{N}_{i,y_{i}=y_{i}}}| denotes the number of neighbours of iVi\in V that share the same label as ii such that yi=yjy_{i}=y_{j}. (𝒢)1\mathcal{H(G)}\rightarrow 1 corresponds to strong homophily while (𝒢)0\mathcal{H(G)}\rightarrow 0 indicates strong heterophily. We say that a graph is a homophilic (heterophilic) graph if it has strong homophily (heterophily).

2.3 pp-Laplacian Operator

The first paper that explores the notation of graph pp-Laplacian and utilizes it in the regularization problem in graph structure data can be traced back to the work (Zhou and Schölkopf, 2005), in which the regularization scheme was further explored to build a flexible GNN learning model in the study (Fu et al, 2022). In this paper, we refer to the notation similar to the one used in the paper (Fu et al, 2022) to define the graph pp-Laplacian operator. We first define the graph gradient as follows:

Definition 2.2 (Graph Gradient).

Let 𝒱:={𝐅|𝐅:𝒱d}\mathcal{F_{V}}:=\{\mathbf{F}|\mathbf{F}:\mathcal{V}\rightarrow\mathbb{R}^{d}\} and :={𝐠|𝐠:d}\mathcal{F_{E}}:=\{\mathbf{g}|\mathbf{g}:\mathcal{E}\rightarrow\mathbb{R}^{d}\} be the function space on nodes and edges, respectively. Given a graph 𝒢=(𝒱,,W)\mathcal{G}=(\mathcal{V},\mathcal{E},W) and a function 𝐅𝒱\mathbf{F}\in\mathcal{F_{V}}, the graph gradient is an operator W\nabla_{W}:𝒱\mathcal{F_{V}}\rightarrow\mathcal{F_{E}} defined as for all [i,j][i,j]\in\mathcal{E},

(W𝐅)([i,j]):=wi,jdj,j𝐟jwi,jdi,i𝐟i.(\nabla_{W}\mathbf{F})([i,j]):=\sqrt{\frac{w_{i,j}}{d_{j,j}}}\mathbf{f}_{j}-\sqrt{\frac{w_{i,j}}{d_{i,i}}}\mathbf{f}_{i}.

where 𝐟i\mathbf{f}_{i} and 𝐟j\mathbf{f}_{j} are the signal vectors on nodes ii and jj, i.e., the rows of 𝐅\mathbf{F}.

Without confusion, we will simply denote W\nabla_{W} as \nabla for convenience. For [i,j][i,j]\notin\mathcal{E}, (𝐅)([i,j]):=0(\nabla\mathbf{F})([i,j]):=0. The graph gradient of 𝐅\mathbf{F} at a vertex ii, i{1,,N}\forall i\in\{1,...,N\}, is defined as 𝐅(i):=((𝐅)([i,1]);;(𝐅)([i,N]))\nabla\mathbf{F}(i):=((\nabla\mathbf{F})([i,1]);\dots;(\nabla\mathbf{F})([i,N])) and its Frobenius norm is given by 𝐅(i)2:=(j=1N(𝐅)2([i,j]))12\|\nabla\mathbf{F}(i)\|_{2}:=(\sum^{N}_{j=1}(\nabla\mathbf{F})^{2}([i,j]))^{\frac{1}{2}} which measures the variation of 𝐅\mathbf{F} around node ii. Note that we have two explanations for the notation 𝐅\nabla\mathbf{F}: one as a graph gradient (over edges) and one as node gradient (over nodes). The meaning can be inferred from its context in the rest of the paper. We also provide the definition of graph divergence, analogous to the Laplacian operator in continuous setting, which is the divergence of the gradient of a continuous function.

Definition 2.3 (Graph Divergence).

Given a graph 𝒢=(𝒱,,W)\mathcal{G}=(\mathcal{V},\mathcal{E},W) and a function 𝐅:𝒱d\mathbf{F}:\mathcal{V}\rightarrow\mathbb{R}^{d}, 𝐠:d\mathbf{g}:\mathcal{E}\rightarrow\mathbb{R}^{d}, the graph divergence is an operator div:𝒱\text{div}:\mathcal{F}_{\mathcal{E}}\rightarrow\mathcal{F}_{\mathcal{V}} which satisfies:

𝐅,𝐠=𝐅,div(𝐠).\displaystyle\langle\nabla\mathbf{F},\mathbf{g}\rangle=\langle\mathbf{F},-\text{div}(\mathbf{g})\rangle. (1)

Furthermore, the graph divergence can be computed by:

div(𝐠)(i)=j=1Nwi,jdi,i(𝐠[i,j]𝐠[j,i]).\displaystyle\text{div}(\mathbf{g})(i)=\sum_{j=1}^{N}\sqrt{\frac{w_{i,j}}{d_{i,i}}}(\mathbf{g}[i,j]-\mathbf{g}[j,i]). (2)

Given the above definitions on graph gradient and divergence, we reach the definition of the graph pp-Laplacian.

Definition 2.4 (pp-Laplacian operator (Fu et al, 2022)).

Given a graph 𝒢=(𝒱,,W)\mathcal{G}=(\mathcal{V},\mathcal{E},W) and a multiple channel signal function 𝐅:𝒱d\mathbf{F}:\mathcal{V}\rightarrow\mathbb{R}^{d}, the graph pp-Laplacian is an operator Δp:𝒱𝒱\Delta_{p}:\mathcal{F}_{\mathcal{V}}\rightarrow\mathcal{F}_{\mathcal{V}}, defined by:

Δp𝐅:=12div(𝐅p2𝐅),forp1.\displaystyle\Delta_{p}\mathbf{F}:=-\frac{1}{2}\text{div}(\|\nabla\mathbf{F}\|^{p-2}\nabla\mathbf{F}),\quad\text{for}\,\,\,p\geq 1. (3)

where p2\|\cdot\|^{p-2} is element-wise power over the node gradient 𝐅\nabla\mathbf{F}.

Remark 1.

Clearly, when we have p=2p=2, Eq. (3) recovers the classic graph Laplacian. When p=1p=1, we can analogize Eq. (3) as a curvature operator defined on the nodes of the graph, because when p=1p=1, we have Eq. (3) as Δ1𝐅=12div(𝐅𝐅)\Delta_{1}\mathbf{F}=-\frac{1}{2}\text{div}\left(\frac{\nabla\mathbf{F}}{\|\nabla\mathbf{F}\|}\right). This aligns with the definition of mean curvature operator defined on the continuous domain. Furthermore, we note that the pp-Laplacian operator is linear when p=2p=2, while in general for p2p\neq 2, the pp-Laplacian is a non-linear operator since Δp(a𝐅)aΔp(𝐅)fora\{1}\Delta_{p}(a\mathbf{F})\neq a\Delta_{p}(\mathbf{F})\,\,\,\text{for}\,\,\,a\in\mathbb{R}\backslash\{1\}.

Definition 2.5 (pp-eigenvector and pp-eigenvalue).

Let ψp(𝐯)=(v1p2v1,,vNp2vN)\psi_{p}(\mathbf{v})=(\|v_{1}\|^{p-2}v_{1},...,\allowbreak\|v_{N}\|^{p-2}v_{N}) for 𝐯N\mathbf{v}\in\mathbb{R}^{N}. With some abuse of the notation, we call 𝐮V\mathbf{u}\in\mathcal{F}_{V} (d=1d=1) a pp-eigenvector of Δp\Delta_{p} if Δp𝐮=λψp(𝐮)\Delta_{p}\mathbf{u}=\lambda\psi_{p}(\mathbf{u}) where λ\lambda is the associated pp-eigenvalue.

Note that in above definition, we use the fact that 𝐮\mathbf{u} acting on all nodes gives a vector in N\mathbb{R}^{N}. Let ψp(𝐔)=(ψp(𝐔;,1),,ψp(𝐔:,N))\psi_{p}(\mathbf{U})=(\psi_{p}(\mathbf{U}_{;,1}),...,\psi_{p}(\mathbf{U}_{:,N})) for 𝐔N×N\mathbf{U}\in\mathbb{R}^{N\times N}. One (Fu et al, 2022) can show that pp-Laplacian can be decomposed as Δp=ψp(𝐔)𝚲ψp(𝐔)T\Delta_{p}=\psi_{p}(\mathbf{U})\mathbf{\Lambda}\psi_{p}(\mathbf{U})^{T} for some diagonal matrix 𝚲\boldsymbol{\Lambda}.

2.4 Graph Framelets

Framelet is a type of wavelet frame. The study by Sweldens (1998) is the first to present a wavelet with a lifting scheme that provides a foundation in the research of wavelet transform on graphs. With the increase of computational power, Hammond et al (2011) proposed a framework for the wavelet transformation on graphs and employed Chebyshev polynomials to make approximations on wavelets. Dong (2017) further developed tight framelets on graphs. The new design has been applied for graph learning tasks (Zheng et al, 2021b) with great performance enhancement against the classic GNNs. The recent studies show that framelet can naturally decompose the graph signal and re-aggregate them effectively, achieving a significant result on graph noise reduction (Zhou et al, 2021a) with the use of a double term regularizer on the framelet coefficients. Combing with singular value decomposition (SVD), the framelets have been made applicable to directed graphs (Zou et al, 2022).

A simple method of building more versatile and stable framelet families was suggested by Yang et al (2022) which is known as Quasi-Framelets. In this study, we will introduce graph framelets using the same architecture described in the paper (Yang et al, 2022). We begin by defining the filtering functions for Quasi-Framelets:

Definition 2.6 (Filtering functions for Quasi-Framelets).

We call a set of R+1R+1 positive filtering functions defined on [0,π][0,\pi], ={g0(ξ),g1(ξ),,gR(ξ)}\mathcal{F}=\{g_{0}(\xi),g_{1}(\xi),...,g_{R}(\xi)\}, Quasi-Framelet scaling functions if the following identity condition is satisfied:

g0(ξ)2+g1(ξ)2++gR(ξ)21,ξ[0,π],\displaystyle g_{0}(\xi)^{2}+g_{1}(\xi)^{2}+\cdots+g_{R}(\xi)^{2}\equiv 1,\;\;\;\forall\xi\in[0,\pi], (4)

such that g0g_{0} descends from 1 to 0 and gRg_{R} ascends from 0 to 1 as frequency increases over the spectral domain [0,π][0,\pi].

Particularly g0g_{0} aims to regulate the highest frequency while gRg_{R} to regulate the lowest frequency, and the rest to regulate other frequencies in between.

Consider a graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) with its normalized graph Laplacian 𝐋~\widetilde{\mathbf{L}}. Let 𝐋~\widetilde{\mathbf{L}} have the eigen-decomposition 𝐋~=𝐔Λ𝐔T\widetilde{\mathbf{L}}=\mathbf{U}\Lambda\mathbf{U}^{T} where 𝐔\mathbf{U} is the orthogonal spectral bases with its spectra Λ=diag(λ1,λ2,,λN)\Lambda=\text{diag}(\lambda_{1},\lambda_{2},...,\lambda_{N}) in increasing order. For a given set of Quasi-Framelet functions ={g0(ξ),g1(ξ),,gR(ξ)}\mathcal{F}=\{g_{0}(\xi),g_{1}(\xi),...,g_{R}(\xi)\} defined on [0,π][0,\pi] and a given level JJ (0\geq 0), define the following Quasi-Framelet signal transformation matrices

𝒲0,J\displaystyle\mathcal{W}_{0,J} =𝐔g0(𝚲2m+J)g0(𝚲2m)𝐔T,\displaystyle=\mathbf{U}g_{0}(\frac{\boldsymbol{\Lambda}}{2^{m+J}})\cdots g_{0}(\frac{\boldsymbol{\Lambda}}{2^{m}})\mathbf{U}^{T}, (5)
𝒲r,0\displaystyle\mathcal{W}_{r,0} =𝐔gr(𝚲2m)𝐔T,for r=1,,R,\displaystyle=\mathbf{U}g_{r}(\frac{\boldsymbol{\Lambda}}{2^{m}})\mathbf{U}^{T},\;\;\text{for }r=1,...,R, (6)
𝒲r,\displaystyle\mathcal{W}_{r,\ell} =𝐔gr(𝚲2m+)g0(𝚲2m+1)g0(𝚲2m)𝐔T,\displaystyle=\mathbf{U}g_{r}(\frac{\boldsymbol{\Lambda}}{2^{m+\ell}})g_{0}(\frac{\boldsymbol{\Lambda}}{2^{m+\ell-1}})\cdots g_{0}(\frac{\boldsymbol{\Lambda}}{2^{m}})\mathbf{U}^{T}, (7)
for r=1,,R,=1,,J.\displaystyle\text{for }r=1,...,R,\ell=1,...,J.

Note that in the above definition, mm is called the coarsest scale level which is the smallest mm satisfying 2mλNπ2^{-m}\lambda_{N}\leq\pi. Denote by 𝒲=[𝒲0,J;𝒲1,0;;𝒲R,0;\mathcal{W}=[\mathcal{W}_{0,J};\mathcal{W}_{1,0};...;\mathcal{W}_{R,0}; 𝒲1,1;,𝒲R,J]\mathcal{W}_{1,1};...,\mathcal{W}_{R,J}] as the stacked matrix. It can be proved that 𝒲T𝒲=𝐈\mathcal{W}^{T}\mathcal{W}=\mathbf{I}, thus providing a signal decomposition and reconstruction process based on 𝒲\mathcal{W}. We call this graph Framelet transformation.

In order to alleviate the computational cost imposed by eigendecomposition for the graph Laplacians, the framelet transformation matrices can be approximated by Chebyshev polynomials. Empirically, the implementation of the Chebyshev polynomials 𝒯rn(ξ)\mathcal{T}^{n}_{r}(\xi) with a fixed degree nn, n=3n=3 is sufficient to approximate gr(ξ)g_{r}(\xi). In the sequel, we will simplify the notation by using 𝒯r(ξ)\mathcal{T}_{r}(\xi) instead of 𝒯rn(ξ)\mathcal{T}^{n}_{r}(\xi). Then the Quasi-Framelet transformation matrices are defined in Eqs. (6) - (7) can be approximated by,

𝒲0,J\displaystyle\mathcal{W}_{0,J} 𝒯0(12m+J𝐋~)𝒯0(12m𝐋~),\displaystyle\approx\mathcal{T}_{0}(\frac{1}{2^{m+J}}\widetilde{\mathbf{L}})\cdots\mathcal{T}_{0}(\frac{1}{2^{m}}\widetilde{\mathbf{L}}), (8)
𝒲r,0\displaystyle\mathcal{W}_{r,0} 𝒯r(12m𝐋~),for r=1,,R,\displaystyle\approx\mathcal{T}_{r}(\frac{1}{2^{m}}\widetilde{\mathbf{L}}),\;\;\;\text{for }r=1,...,R, (9)
𝒲r,\displaystyle\mathcal{W}_{r,\ell} 𝒯r(12m+𝐋~)𝒯0(12m+1𝐋~)𝒯0(12m𝐋~),\displaystyle\approx\mathcal{T}_{r}(\frac{1}{2^{m+\ell}}\widetilde{\mathbf{L}})\mathcal{T}_{0}(\frac{1}{2^{m+\ell-1}}\widetilde{\mathbf{L}})\cdots\mathcal{T}_{0}(\frac{1}{2^{m}}\widetilde{\mathbf{L}}), (10)
for r=1,,R,=1,,J.\displaystyle\;\;\;\;\text{for }r=1,...,R,\ell=1,...,J.

Remark 1: The approximated transformation matrices defined in Eqs. (8)-(10) simply depend on the graph Laplacian. For directed graphs, we directly take in the Laplacian normalized by the out degrees in our experiments. We observe this strategy leads to improved performance in general.

For a graph signal 𝐗\mathbf{X}, the framelet (graph) convolution similar to the spectral graph convolution that can be defined as

θ𝐗=𝒲T(diag(θ))(𝒲𝐗),\displaystyle\theta\star\mathbf{X}=\mathcal{W}^{T}(\textrm{diag}(\theta))(\mathcal{W}\mathbf{X}), (11)

where θ\theta is the learnable network filter. We also call 𝒲𝐗\mathcal{W}\mathbf{X} the framelet coefficients of 𝐗\mathbf{X} in Fourier domain. The signal then will be filtered in its spectral-domain according to learnable filter diag(θ)\textrm{diag}(\theta).

2.5 Regularized Graph Neural Network

In reality, graph data normally have large, noisy and complex structures (Leskovec and Faloutsos, 2006; He and Kempe, 2015). This brings the challenge of choosing a suitable neural network for fitting the data. It is well known that one of the common computational issues for most classic GNNs is over-smoothing which can be quantified by Dirichlet energy that converges to zero as the number of layers increases. This observation leads to an investigation into the so-called implicit layers on regularized graph neural networks.

The first paper to consider GNNs layer as a regularized signal smoothing process is done by Zhu et al (2021), in which the classic GNN layers are interpreted as the solution of the regularized optimization problem, with certain approximation strategies to avoid matrix inversion in the closed-form solution. The regularized layer also can be linked to the implicit layer (Zhang et al, 2003), more specific examples are given by Zhu et al (2021). In general, given the node features, the output of the GNN layer can be written as the solution for the following optimization problem

𝐅=argmin𝐅{μ𝐗𝐅F2+12tr(𝐅T𝐋~𝐅)},\displaystyle\mathbf{F}=\operatorname*{arg\,min}_{\mathbf{F}}\left\{\mu\|\mathbf{X}-\mathbf{F}\|^{2}_{F}+\frac{1}{2}\text{tr}(\mathbf{F}^{T}\widetilde{\mathbf{L}}\mathbf{F})\right\}, (12)

where 𝐅\mathbf{F} is the graph signal. Eq. (12) defines the layer output as the solution for the (layer) optimization problem with a regularization term tr(𝐅T𝐋~𝐅)}\text{tr}(\mathbf{F}^{T}\widetilde{\mathbf{L}}\mathbf{F})\}, which is able to enforce smoothness of a graph signal. The closed form solution is given by 𝐅=(𝐈+12μ𝐋~)1𝐗\mathbf{F}=(\mathbf{I}+\frac{1}{2\mu}\widetilde{\mathbf{L}})^{-1}\mathbf{X}. It is computationally inefficient because of matrix inversion. However, we can rearrange it to (𝐈+12μ𝐋~)𝐅=𝐗(\mathbf{I}+\frac{1}{2\mu}\widetilde{\mathbf{L}})\mathbf{F}=\mathbf{X} and interpret it as a fixed point solution to 𝐅(t+1)=12μ𝐋~𝐅(t)+𝐗\mathbf{F}^{(t+1)}=-\frac{1}{2\mu}\widetilde{\mathbf{L}}\mathbf{F}^{(t)}+\mathbf{X}. The latter is then the implicit layer in GNN at layer tt. Such an iteration is motivated by the recurrent GNNs as shown in several recent works (Gu et al, 2020; Liu et al, 2021a; Park et al, 2021). Different from explicit GNNs, the output features 𝐅\mathbf{F} of a general implicit GNN are directly modeled as the solution of a well defined implicit function, e.g., the first order condition from an optimization problem. Denote the rows of 𝐅\mathbf{F} by 𝐟i\mathbf{f}_{i} (i=1,2,,Ni=1,2,...,N) in column shape. It is well known that the regularization term (12) is

12tr(𝐅T𝐋~𝐅)=12(vi,vj)wi,jdj,j𝐟jwi,jdi,i𝐟i2,\frac{1}{2}\text{tr}(\mathbf{F}^{T}\widetilde{\mathbf{L}}\mathbf{F})=\frac{1}{2}\sum_{(v_{i},v_{j})\in\mathcal{E}}\left\|\sqrt{\frac{w_{i,j}}{d_{j,j}}}\mathbf{f}_{j}-\sqrt{\frac{w_{i,j}}{d_{i,i}}}\mathbf{f}_{i}\right\|^{2},

which is the so-called graph Dirichlet energy (Zhou and Schölkopf, 2005). As proposed in the work (Zhou and Schölkopf, 2005), one can replace the graph Laplacian matrix in the above equation to the pre-defined graph pp-Laplacian, then the pp-Laplacian based energy denoted as 𝒮p(𝐅)\mathcal{S}_{p}(\mathbf{F}) can be defined as (for any p1p\geq 1):

𝒮p(𝐅)=12(vi,vj)wi,jdj,j𝐟jwi,jdi,i𝐟ip,\displaystyle\mathcal{S}_{p}(\mathbf{F})=\frac{1}{2}\sum_{(v_{i},v_{j})\in\mathcal{E}}\left\|\sqrt{\frac{w_{i,j}}{d_{j,j}}}\mathbf{f}_{j}-\sqrt{\frac{w_{i,j}}{d_{i,i}}}\mathbf{f}_{i}\right\|^{p}, (13)

where we adopt the definition of element-wise pp-norm as in paper (Fu et al, 2022). Finally, the regularization problem becomes

𝐅=argmin𝐅{μ𝐗𝐅F2+𝒮p(𝐅)}.\displaystyle\mathbf{F}=\operatorname*{arg\,min}_{\mathbf{F}}\left\{\mu\|\mathbf{X}-\mathbf{F}\|^{2}_{F}+\mathcal{S}_{p}(\mathbf{F})\right\}. (14)

With strong generalizability of the regularization form in Eq. (14), it is natural to consider to deploy Eq. 14 to multi-scale graph neural networks (i.e., graph framelets) to explore whether the benefits of allocating pp-Laplacian based regularizer still remains and how the changes of pp within included regularizer interacts with the feature propagation process via each individual filtered domains. In the next section, we will accordingly proposed our model and provide detailed discussion and analysis on it.

3 The Proposed Models

In this section, we show our proposed models: pp-Laplacian Framelet GNN (pL-UFG) and pp-Laplacian Fourier Undecimated Framelet GNN (pL-fUFG) models. In addition, we introduce a more general regularization framework and describe the algorithms of our proposed models.

3.1 pp-Laplacian Undecimated Framelet GNN (pL-UFG)

Instead of simply taking the convolution result as the framelet layer output (derived in Eq. (11)), we apply the pp-Laplacian regularization on the framelet reconstructed signal by imposing the following optimization to define the new layer,

𝐅=argmin𝐅𝒮p(𝐅)+μ𝐅𝒲Tdiag(θ)𝒲𝐗F2,\displaystyle\mathbf{F}=\operatorname*{arg\,min}_{\mathbf{F}}\mathcal{S}_{p}(\mathbf{F})+\mu\|\mathbf{F}-\mathcal{W}^{T}\textrm{diag}(\theta)\mathcal{W}\mathbf{X}\|^{2}_{F}, (15)

where μ(0,)\mu\in(0,\infty). In which we recall that the first term 𝒮p(𝐅)\mathcal{S}_{p}(\mathbf{F}) in the above equation is the pp-Laplacian based energy defined in Eq. (13). 𝒮p(𝐅)\mathcal{S}_{p}(\mathbf{F}) measures the total signals’ variation throughout the graph-based format of pp-Laplacian. The second term dictates that optimal 𝐅\mathbf{F} should not deviate excessively from the framelet reconstructed signal for the input signal 𝐗\mathbf{X}. Each component of the network filter θ\theta in the frequency domain is applied to the framelet coefficients 𝒲𝐗\mathcal{W}\mathbf{X}. We call the model in Eq.(15) pL-UFG2.

One possible variant to model (15) is to apply the regularization individually on the reconstruction at each scale level, i.e., for all the r,r,\ell, define

𝐅r,\displaystyle\mathbf{F}_{r,\ell} =argmin𝐅r,𝒮p(𝐅r,)+μ𝐅r,𝒲r,Tdiag(θr,)𝒲r,𝐗F2,\displaystyle=\operatorname*{arg\,min}_{\mathbf{F}_{r,\ell}}\mathcal{S}_{p}(\mathbf{F}_{r,\ell})+\mu\|\mathbf{F}_{r,\ell}-\mathcal{W}^{T}_{r,\ell}\textrm{diag}(\theta_{r,\ell})\mathcal{W}_{r,\ell}\mathbf{X}\|^{2}_{F},
𝐅\displaystyle\mathbf{F} =𝐅0,J+r=1R=0J𝐅r,.\displaystyle=\mathbf{F}_{0,J}+\sum^{R}_{r=1}\sum^{J}_{\ell=0}\mathbf{F}_{r,\ell}. (16)

We name the model with the propagation in the above equation as pL-UFG1. In our experiment, we note that pL-UFG1 performs better than pL-UFG2.

3.2 pp-Laplacian Fourier Undecimated Framelet GNN (pL-fUFG)

In pL-fUFG, we take a strategy of regularizing the framelet coefficients. Informed by earlier experience, we consider the following optimization problem for each framelet transformation in Fourier domain. For each framelet transformation 𝒲r,j\mathcal{W}_{r,j} defined in Eqs. (9)-(10), define

𝐅r,=argmin𝐅r,𝒮p(𝐅r,)+μ𝐅r,diag(θr,)𝒲r,𝐗F2.\displaystyle\mathbf{F}_{r,\ell}=\operatorname*{arg\,min}_{\mathbf{F}_{r,\ell}}\mathcal{S}_{p}(\mathbf{F}_{r,\ell})+\mu\|\mathbf{F}_{r,\ell}-\textrm{diag}(\theta_{r,\ell})\mathcal{W}_{r,\ell}\mathbf{X}\|^{2}_{F}. (17)

Then the final layer output is defined by the reconstruction as follows

𝐅=𝒲0,JT𝐅0,J+r=1R=0J𝒲r,T𝐅r,.\displaystyle\mathbf{F}=\mathcal{W}^{T}_{0,J}\mathbf{F}_{0,J}+\sum^{R}_{r=1}\sum^{J}_{\ell=0}\mathcal{W}^{T}_{r,\ell}\mathbf{F}_{r,\ell}. (18)

Or we can only aggregate all the regularized and filtered framelet coefficients in the following way

𝐅=𝐅0,J+r=1R=0J𝐅r,.\displaystyle\mathbf{F}=\mathbf{F}_{0,J}+\sum^{R}_{r=1}\sum^{J}_{\ell=0}\mathbf{F}_{r,\ell}. (19)

With the form of both pL-UFG and pL-fUFG, in the next section we show an even more generalized regularization framework by incorporating pp-Laplacian with graph framelets.

3.3 More General Regularization

Refer to caption
Figure 1: The figure above shows the working flow of the pp-Laplacian regularized framelet. The input graph data is first filtered and reconstructed by the framelet model which contains one low-pass and two high-pass filters (i.e., J=2J=2). Then, the result (denoted as 𝐘\mathbf{Y}) is further regularized by a sequence of graph convolution and diagonal rescaling induced by the pp-Laplacian, which is generated based on the graph gradient information, serving as an implicit layer of the model. By adjusting the pp value, node features resulting from this implicit layer can be smoothed or sharpened accordingly, thus making the model adopt both homophily and heterophilic graphs. Lastly, the layer output will then be either forwarded to the task objective function or to the next framelet and pp-Laplacian layers before the final prediction task.

For convenience, we write the graph gradient for multiple channel signals 𝐅\mathbf{F} as

W𝐅([i,j])=wi,jdj,j𝐟jwi,jdi,i𝐟i.\nabla_{W}\mathbf{F}([i,j])=\sqrt{\frac{w_{i,j}}{d_{j,j}}}\mathbf{f}_{j}-\sqrt{\frac{w_{i,j}}{d_{i,i}}}\mathbf{f}_{i}.

or simply 𝐅\nabla\mathbf{F} if no confusion. For an undirected graph we have W𝐅([i,j])=W𝐅([j,i])\nabla_{W}\mathbf{F}([i,j])=-\nabla_{W}\mathbf{F}([j,i]). With this notation, we can re-write the pp-Laplacian regularizer in (13) (the element-wise pp-norm) in the following.

𝒮p(𝐅)=12(vi,vj)W𝐅([i,j])p\displaystyle\mathcal{S}_{p}(\mathbf{F})=\frac{1}{2}\sum_{(v_{i},v_{j})\in\mathcal{E}}\left\|\nabla_{W}\mathbf{F}([i,j])\right\|^{p} (20)
=\displaystyle= 12vi𝒱[(vjviW𝐅([i,j])p)1p]p=12vi𝒱W𝐅(vi)pp\displaystyle\frac{1}{2}\sum_{v_{i}\in\mathcal{V}}\left[\left(\sum_{v_{j}\sim v_{i}}\left\|\nabla_{W}\mathbf{F}([i,j])\right\|^{p}\right)^{\frac{1}{p}}\right]^{p}=\frac{1}{2}\sum_{v_{i}\in\mathcal{V}}\|\nabla_{W}\mathbf{F}(v_{i})\|_{p}^{p}

where vjviv_{j}\sim v_{i} stands for the node vjv_{j} that is connected to node viv_{i} and W𝐅(vi)=(W𝐅([i,j]))vj:(vi,vj)\nabla_{W}\mathbf{F}(v_{i})=\left(\nabla_{W}\mathbf{F}([i,j])\right)_{v_{j}:(v_{i},v_{j})\in\mathcal{E}} is the node gradient vector for each node viv_{i} and p\|\cdot\|_{p} is the vector pp-norm. In fact, W𝐅(vi)p\|\nabla_{W}\mathbf{F}(v_{i})\|_{p} measures the variation of 𝐅\mathbf{F} in the neighbourhood of each node viv_{i}. Next, we generalize the regularizer by considering any positive convex function ϕ\phi as

𝒮pϕ(𝐅)=12vi𝒱ϕ(W𝐅(vi)p).\displaystyle\mathcal{S}^{\phi}_{p}(\mathbf{F})=\frac{1}{2}\sum_{v_{i}\in\mathcal{V}}\phi(\|\nabla_{W}\mathbf{F}(v_{i})\|_{p}). (21)

It is clear when ϕ(ξ)=ξp\phi(\xi)=\xi^{p}, we recover the pp-Laplacian regularizer. In image processing field, several penalty functions have been proposed in the literature. For example, ϕ(ξ)=ξ2\phi(\xi)=\xi^{2} is known in the context of Tikhonov regularization (Li et al, 2022; Belkin et al, 2004; Assis et al, 2021). When ϕ(ξ)=ξ\phi(\xi)=\xi (i.e. p=1p=1), it is the classic total variation regularization. When ϕ(ξ)=ξ2+ϵ2ϵ\phi(\xi)=\sqrt{\xi^{2}+\epsilon^{2}}-\epsilon, it is referred as the regularized total variation. An example work can be found in the work (Cheng and Wang, 2020). The case of ϕ(ξ)=r2log(1+ξ2/r2)\phi(\xi)=r^{2}\log(1+\xi^{2}/r^{2}) corresponds to the non linear diffusion (Oka and Yamada, 2023).

In pL-UFG and pL-fUFG, we use 𝐘\mathbf{Y} to denote 𝒲Tdiag(θ)𝒲𝐗\mathcal{W}^{T}\textrm{diag}(\theta)\mathcal{W}\mathbf{X} and     diag(θr,j)𝒲r,j𝐗\textrm{diag}(\theta_{r,j})\mathcal{W}_{r,j}\mathbf{X} respectively, as then we propose the generalized pp-Laplacian regularization models as

𝐅=argmin𝐅𝒮pϕ(𝐅)+μ𝐅𝐘F2.\displaystyle\mathbf{F}=\operatorname*{arg\,min}_{\mathbf{F}}\mathcal{S}^{\phi}_{p}(\mathbf{F})+\mu\|\mathbf{F}-\mathbf{Y}\|^{2}_{F}. (22)

3.4 The Algorithm

We derive an iterative algorithm for solving the generalized pp-Laplacian regularization problem (22), presented as the following theorem.

Theorem 3.1.

For a given positive convex function ϕ(ξ)\phi(\xi), define

Mi,j=\displaystyle M_{i,j}= wi,j2W𝐅([i,j])p2[ϕ(W𝐅(vi)p)W𝐅(vi)pp1+ϕ(W𝐅(vj)p)W𝐅(vj)pp1],\displaystyle\frac{w_{i,j}}{2}\left\|\nabla_{W}\mathbf{F}([i,j])\right\|^{p-2}\cdot\left[\frac{\phi^{\prime}(\|\nabla_{W}\mathbf{F}(v_{i})\|_{p})}{\|\nabla_{W}\mathbf{F}(v_{i})\|^{p-1}_{p}}\right.+\left.\frac{\phi^{\prime}(\|\nabla_{W}\mathbf{F}(v_{j})\|_{p})}{\|\nabla_{W}\mathbf{F}(v_{j})\|^{p-1}_{p}}\right],
αii=\displaystyle\alpha_{ii}= 1/(vjviMi,jdi,i+2μ),βii=2μαii\displaystyle 1/\left(\sum_{v_{j}\sim v_{i}}\frac{M_{i,j}}{d_{i,i}}+2\mu\right),\quad\quad\beta_{ii}=2\mu\alpha_{ii}

and denote the matrices 𝐌=[Mi,j]\mathbf{M}=[M_{i,j}], 𝜶=diag(α11,,αNN)\boldsymbol{\alpha}=\text{diag}(\alpha_{11},...,\alpha_{NN}) and
𝜷=diag(β11,,βNN)\boldsymbol{\beta}=\text{diag}(\beta_{11},...,\beta_{NN}). Then the solution to problem (22) can be solved by the following message passing process

𝐅(k+1)=𝜶(k)𝐃1/2𝐌(k)𝐃1/2𝐅(k)+𝜷(k)𝐘,\displaystyle\mathbf{F}^{(k+1)}=\boldsymbol{\alpha}^{(k)}\mathbf{D}^{-1/2}\mathbf{M}^{(k)}\mathbf{D}^{-1/2}\mathbf{F}^{(k)}+\boldsymbol{\beta}^{(k)}\mathbf{Y}, (23)

with an initial value, e.g., 𝐅(0)=𝟎\mathbf{F}^{(0)}=\mathbf{0}.

Here 𝜶(k)\boldsymbol{\alpha}^{(k)}, 𝜷(k)\boldsymbol{\beta}^{(k)} and 𝐌(k)\mathbf{M}^{(k)} are calculated according to the current features 𝐅(k)\mathbf{F}^{(k)}. When ϕ(ξ)=ξp\phi(\xi)=\xi^{p}, from Eq. (23) we can have the algorithm for solving pL-UFG and pL-fUFG.

Proof: Define pϕ(𝐅)\mathcal{L}^{\phi}_{p}(\mathbf{F}) as the objective function in Eq. (22), consider a node feature 𝐟i\mathbf{f}_{i} in 𝐅\mathbf{F}, then we have

pϕ(𝐅)𝐟i=2μ(𝐟i𝐲i)+12vk𝒱𝐟iϕ(W𝐅(vk)p)\displaystyle\frac{\partial\mathcal{L}^{\phi}_{p}(\mathbf{F})}{\partial\mathbf{f}_{i}}=2\mu(\mathbf{f}_{i}\!-\!\mathbf{y}_{i})\!+\!\frac{1}{2}\sum_{v_{k}\in\mathcal{V}}\frac{\partial}{\partial\mathbf{f}_{i}}\phi(\|\nabla_{W}\mathbf{F}(v_{k})\|_{p})
=\displaystyle= 2μ(𝐟i𝐲i)+12𝐟iϕ(W𝐅(vi)p)+12vjvi𝐟iϕ(W𝐅(vj)p)\displaystyle 2\mu(\mathbf{f}_{i}-\mathbf{y}_{i})+\frac{1}{2}\frac{\partial}{\partial\mathbf{f}_{i}}\phi(\|\nabla_{W}\mathbf{F}(v_{i})\|_{p})+\frac{1}{2}\sum_{v_{j}\sim v_{i}}\frac{\partial}{\partial\mathbf{f}_{i}}\phi(\|\nabla_{W}\mathbf{F}(v_{j})\|_{p})
=\displaystyle= 2μ(𝐟i𝐲i)+12ϕ(W𝐅(vi)p)1p(W𝐅(vi)p)1p𝐟i(vjvii,j(w,𝐟)p)\displaystyle 2\mu(\mathbf{f}_{i}-\mathbf{y}_{i})+\frac{1}{2}\phi^{\prime}(\|\nabla_{W}\mathbf{F}(v_{i})\|_{p})\frac{1}{p}(\|\nabla_{W}\mathbf{F}(v_{i})\|_{p})^{1-p}\frac{\partial}{\partial\mathbf{f}_{i}}\!\left(\!\sum_{v_{j}\sim v_{i}}\|\nabla_{i,j}(w,\mathbf{f})\|^{p}\!\right)\!
+12pvjvi𝐟ij,i(w,𝐟)pϕ(W𝐅(vj)p)(W𝐅(vj)p)1p\displaystyle+\!\frac{1}{2p}\sum_{v_{j}\sim v_{i}}\frac{\partial}{\partial\mathbf{f}_{i}}\|\nabla_{j,i}(w,\mathbf{f})\|^{p}\cdot\phi^{\prime}(\|\nabla_{W}\mathbf{F}(v_{j})\|_{p})(\|\nabla_{W}\mathbf{F}(v_{j})\|_{p})^{1-p}
=\displaystyle= 2μ(𝐟i𝐲i)+12pϕ(W𝐅(vi)p)(W𝐅(vi)p)1p\displaystyle 2\mu(\mathbf{f}_{i}-\mathbf{y}_{i})+\frac{1}{2p}\phi^{\prime}(\|\nabla_{W}\mathbf{F}(v_{i})\|_{p})(\|\nabla_{W}\mathbf{F}(v_{i})\|_{p})^{1-p}\cdot
(vjvipW𝐅([i,j])p2wi,jdi,i(W𝐅([i,j])))\displaystyle\quad\quad\left(\!\sum_{v_{j}\sim v_{i}}p\|\nabla_{W}\mathbf{F}([i,j])\|^{p-2}\sqrt{\frac{w_{i,j}}{d_{i,i}}}(-\nabla_{W}\mathbf{F}([i,j]))\!\right)
+12vjviϕ(W𝐅(vj)p)(W𝐅(vj)p)1pW𝐅([i,j])p2wi,jdi,iW𝐅([j,i])\displaystyle+\frac{1}{2}\sum_{v_{j}\sim v_{i}}\phi^{\prime}(\|\nabla_{W}\mathbf{F}(v_{j})\|_{p})(\|\nabla_{W}\mathbf{F}(v_{j})\|_{p})^{1-p}\|\nabla_{W}\mathbf{F}([i,j])\|^{p-2}\sqrt{\frac{w_{i,j}}{d_{i,i}}}\nabla_{W}\mathbf{F}([j,i])
=\displaystyle= 2μ(𝐟i𝐲i)+vjvi12W𝐅([i,j])p2wi,jdi,iW𝐅([j,i])\displaystyle 2\mu(\mathbf{f}_{i}\!-\!\mathbf{y}_{i})\!+\!\sum_{v_{j}\sim v_{i}}\!\frac{1}{2}\|\nabla_{W}\mathbf{F}([i,j])\|^{p-2}\!\!\sqrt{\frac{w_{i,j}}{d_{i,i}}}\nabla_{W}\mathbf{F}([j,i])\cdot
[ϕ(W𝐅(vi)p)W𝐅(vi)pp1+ϕ(W𝐅(vj)p)W𝐅(vj)pp1]\displaystyle\quad\quad\left[\frac{\phi^{\prime}(\|\nabla_{W}\mathbf{F}(v_{i})\|_{p})}{\|\nabla_{W}\mathbf{F}(v_{i})\|^{p-1}_{p}}+\frac{\phi^{\prime}(\|\nabla_{W}\mathbf{F}(v_{j})\|_{p})}{\|\nabla_{W}\mathbf{F}(v_{j})\|^{p-1}_{p}}\right]

Setting pϕ(𝐅)𝐟i=0\frac{\partial\mathcal{L}^{\phi}_{p}(\mathbf{F})}{\partial\mathbf{f}_{i}}=0 gives the following first order condition:

(vjvi12[ϕ(W𝐅(vi)p)W𝐅(vi)pp1+ϕ(W𝐅(vj)p)W𝐅(vj)pp1]i,j(w,𝐟)p2wi,jdi,i+2μ)𝐟i\displaystyle\bigg{(}\sum_{v_{j}\sim v_{i}}\frac{1}{2}\left[\frac{\phi^{\prime}(\|\nabla_{W}\mathbf{F}(v_{i})\|_{p})}{\|\nabla_{W}\mathbf{F}(v_{i})\|^{p-1}_{p}}+\frac{\phi^{\prime}(\|\nabla_{W}\mathbf{F}(v_{j})\|_{p})}{\|\nabla_{W}\mathbf{F}(v_{j})\|^{p-1}_{p}}\right]\cdot\|\nabla_{i,j}(w,\mathbf{f})\|^{p-2}\frac{w_{i,j}}{d_{i,i}}+2\mu\bigg{)}\mathbf{f}_{i}
=\displaystyle= vjvi12[ϕ(W𝐅(vi)p)W𝐅(vi)pp1+ϕ(W𝐅(vj)p)W𝐅(vj)pp1]W𝐅([i,j])p2wi,jdi,idj,j𝐟j+2μ𝐲i.\displaystyle\sum_{v_{j}\sim v_{i}}\frac{1}{2}\left[\frac{\phi^{\prime}(\|\nabla_{W}\mathbf{F}(v_{i})\|_{p})}{\|\nabla_{W}\mathbf{F}(v_{i})\|^{p-1}_{p}}+\frac{\phi^{\prime}(\|\nabla_{W}\mathbf{F}(v_{j})\|_{p})}{\|\nabla_{W}\mathbf{F}(v_{j})\|^{p-1}_{p}}\right]\cdot\|\nabla_{W}\mathbf{F}([i,j])\|^{p-2}\frac{w_{i,j}}{\sqrt{d_{i,i}d_{j,j}}}\mathbf{f}_{j}+2\mu\mathbf{y}_{i}.

This equation defines the message passing on each node viv_{i} from its neighbour nodes vjv_{j}. With the definition of Mi,jM_{i,j}, αii\alpha_{ii} and βii\beta_{ii}, it can be turned into the iterative formula in Eq. (23). This completes the proof.

Remark 2.

When p1p\leq 1, the objective function is not differentiable in some extreme cases for example the neighbor node signals are the same and with the same degrees. In these rare cases, the first order condition cannot be applied in the above proof. However in practice, we suggest the following alternative iterative algorithm to solve the optimization problem. In fact, we can split the terms in 𝒮p(𝐅)\mathcal{S}_{p}(\mathbf{F}) as

𝒮p(𝐅)=12(vi,vj)W𝐅([i,j])p2W𝐅([i,j])2.\displaystyle\mathcal{S}_{p}(\mathbf{F})=\frac{1}{2}\sum_{(v_{i},v_{j})\in\mathcal{E}}\left\|\nabla_{W}\mathbf{F}([i,j])\right\|^{p-2}\left\|\nabla_{W}\mathbf{F}([i,j])\right\|^{2}. (24)

At iteration kk, we take wi,jnew=wi,jW𝐅([i,j])p2w^{\text{new}}_{i,j}=w_{i,j}\left\|\nabla_{W}\mathbf{F}([i,j])\right\|^{p-2} as the new edge weights, then the next iterate is defined as the solution to optimize the Dirichlet energy with the new weights, i.e.,

𝐅(k+1)=argmin12(vi,vj)Wnew𝐅([i,j])2.\mathbf{F}^{(k+1)}=\operatorname*{arg\,min}\frac{1}{2}\sum_{(v_{i},v_{j})\in\mathcal{E}}\left\|\nabla_{W^{\text{new}}}\mathbf{F}([i,j])\right\|^{2}.

Thus one step of the classic GCN can be applied in the iteration to solve the pp-Laplacian regularized problem (14).

3.5 Interaction between pp-Laplacian and Framelets

In this section, we present some theoretical support on how the pp-Laplacian regularizer interact with the framelets in the model. Specifically, we show that the proposed algorithm in Eq. (23) provides a sequence of (pp-Laplacian-based) spectral graph convolutions and diagonal scaling of the node features over the framelet filtered graph signals.This is indicated by the following analysis.

First considering the iteration Eq. (23) with initial values 𝐅(0)=𝐘=𝒲Tdiag(θ)𝒲𝐗\mathbf{F}^{(0)}=\mathbf{Y}=\mathcal{W}^{T}\textrm{diag}(\theta)\mathcal{W}\mathbf{X} or diag(θr,j)𝒲r,j𝐗\textrm{diag}(\theta_{r,j})\mathcal{W}_{r,j}\mathbf{X} without loss of generality333In our practical algorithm we choose 𝐅(0)=0\mathbf{F}^{(0)}=0, this will gives 𝐅(1)=β(0)𝐘\mathbf{F}^{(1)}=\beta^{(0)}\mathbf{Y} with a diagonal matrix β(0)\beta^{(0)}., we have

𝐅(K)=𝜶(K1)𝐃1/2𝐌(K1)𝐃1/2𝐅(K1)+𝜷(K1)𝐘\displaystyle\mathbf{F}^{(K)}=\boldsymbol{\alpha}^{(K\!-\!1)}\mathbf{D}^{-1/2}\mathbf{M}^{(K-1)}\mathbf{D}^{-1/2}\mathbf{F}^{(K\!-\!1)}\!+\!\boldsymbol{\beta}^{(K-1)}\mathbf{Y}
=\displaystyle= 𝜶(K1)𝐌~(K1)𝐅(K1)+𝜷(K1)𝐘\displaystyle\boldsymbol{\alpha}^{(K-1)}\widetilde{\mathbf{M}}^{(K-1)}\mathbf{F}^{(K-1)}+\boldsymbol{\beta}^{(K-1)}\mathbf{Y}
=\displaystyle= 𝜶(K1)𝐌~(K1)(𝜶(K2)𝐌~(K2)𝐅(K2)+𝜷(K2)𝐘)+𝜷(K1)𝐘\displaystyle\boldsymbol{\alpha}^{(K-1)}\widetilde{\mathbf{M}}^{(K-1)}\left(\boldsymbol{\alpha}^{(K-2)}\widetilde{\mathbf{M}}^{(K-2)}\mathbf{F}^{(K-2)}+\boldsymbol{\beta}^{(K-2)}\mathbf{Y}\right)+\boldsymbol{\beta}^{(K-1)}\mathbf{Y}
=\displaystyle= 𝜶(K1)𝐌~(K1)𝜶(K2)𝐌~(K2)𝐅(K2)+𝜶(K1)𝐌~(K1)𝜷(K2)𝐘+𝜷(K1)𝐘\displaystyle\boldsymbol{\alpha}^{(K-1)}\widetilde{\mathbf{M}}^{(K-1)}\boldsymbol{\alpha}^{(K-2)}\widetilde{\mathbf{M}}^{(K-2)}\mathbf{F}^{(K-2)}+\boldsymbol{\alpha}^{(K-1)}\widetilde{\mathbf{M}}^{(K-1)}\boldsymbol{\beta}^{(K-2)}\mathbf{Y}+\boldsymbol{\beta}^{(K-1)}\mathbf{Y}
=\displaystyle= (k=0K1𝜶(k)𝐌~(k))𝐘+𝜷(K1)𝐘+k=0K1(l=KkK1𝜶(l)𝐌~(l))𝜷(Kk1)𝐘.\displaystyle\left(\prod_{k=0}^{K-1}\boldsymbol{\alpha}^{(k)}\widetilde{\mathbf{M}}^{(k)}\right)\mathbf{Y}+\boldsymbol{\beta}^{(K-1)}\mathbf{Y}+\sum_{k=0}^{K-1}\left(\prod_{l=K-k}^{K-1}\boldsymbol{\alpha}^{(l)}\widetilde{\mathbf{M}}^{(l)}\right)\boldsymbol{\beta}^{(K-k-1)}\mathbf{Y}.

The result 𝐅(K)\mathbf{F}^{(K)} depends on the key operations 𝜶(k)𝐌~(k)\boldsymbol{\alpha}^{(k)}\widetilde{\mathbf{M}}^{(k)} for k=0,1,,K1k=0,1,...,K-1, where

M~i,j(k)=wi,jdi,idj,jW𝐅(k)[(i,j)]p2[ϕ(W𝐅(k)(vi)p)W𝐅(k)(vi)pp1+ϕ(W𝐅(k)(vj)p)W𝐅(k)(vj)pp1]\displaystyle\widetilde{M}^{(k)}_{i,j}=\frac{w_{i,j}}{\sqrt{d_{i,i}d_{j,j}}}\left\|\nabla_{W}\mathbf{F}^{(k)}[(i,j)]\right\|^{p-2}\!\!\!\cdot\left[\frac{\phi^{\prime}(\|\nabla_{W}\mathbf{F}^{(k)}(v_{i})\|_{p})}{\|\nabla_{W}\mathbf{F}^{(k)}(v_{i})\|^{p-1}_{p}}+\frac{\phi^{\prime}(\|\nabla_{W}\mathbf{F}^{(k)}(v_{j})\|_{p})}{\|\nabla_{W}\mathbf{F}^{(k)}(v_{j})\|^{p-1}_{p}}\right]
=wi,jwi,j(k)(p)di,idj,jW𝐅(k)[(i,j)]p2,\displaystyle=\frac{w_{i,j}\cdot w^{(k)}_{i,j}(p)}{\sqrt{d_{i,i}d_{j,j}}}\left\|\nabla_{W}\mathbf{F}^{(k)}[(i,j)]\right\|^{p-2}, (25)
αi,i(k)=1/(vjviM~i,j(k)+2μ),\displaystyle\alpha_{i,i}^{(k)}=1/\left(\sum_{v_{j}\sim v_{i}}{\widetilde{M}^{(k)}_{i,j}}+2\mu\right), (26)

where M~ij\widetilde{M}_{ij} has absorbed di,id_{i,i} and dj,jd_{j,j} into MijM_{ij} defined in Theorem 3.1, i.e., M~ij=Mi,j/di,idj,j\widetilde{M}_{ij}=M_{i,j}/\sqrt{d_{i,i}d_{j,j}}.

Denote

wi,j(k)(p)=[ϕ(W𝐅(k)(vi)p)W𝐅(k)(vi)pp1+ϕ(W𝐅(k)(vj)p)W𝐅(k)(vj)pp1].w^{(k)}_{i,j}(p)=\left[\frac{\phi^{\prime}(\|\nabla_{W}\mathbf{F}^{(k)}(v_{i})\|_{p})}{\|\nabla_{W}\mathbf{F}^{(k)}(v_{i})\|^{p-1}_{p}}+\frac{\phi^{\prime}(\|\nabla_{W}\mathbf{F}^{(k)}(v_{j})\|_{p})}{\|\nabla_{W}\mathbf{F}^{(k)}(\!v_{j}\!)\|^{p-1}_{p}}\right].

To introduce the following analysis, define the relevant matrices 𝐌~(k)=[M~i,j(k)]\widetilde{\mathbf{M}}^{(k)}=[\widetilde{M}^{(k)}_{i,j}], 𝜶(k)=diag(αii(k))\boldsymbol{\alpha}^{(k)}=\text{diag}(\alpha_{ii}^{(k)}) and 𝐖p(k)=[wi,j(k)(p)]=𝐰p(k)(𝐰p(k))T\mathbf{W}^{(k)}_{p}=[w^{(k)}_{i,j}(p)]=\mathbf{w}^{(k)}_{p}\oplus(\mathbf{w}^{(k)}_{p})^{T} (the Kronecker sum) with the column vector

𝐰p(k)=[ϕ(W𝐅(k)(vi)p)W𝐅(k)(vi)pp1]i=1N.\mathbf{w}^{(k)}_{p}=\left[\frac{\phi^{\prime}(\|\nabla_{W}\mathbf{F}^{(k)}(v_{i})\|_{p})}{\|\nabla_{W}\mathbf{F}^{(k)}(v_{i})\|^{p-1}_{p}}\right]^{N}_{i=1}.

Now, recall that the definition of the classic pp-Laplacian is:

Δp(𝐅):=12div(𝐅p2𝐅),forp1,\displaystyle\Delta_{p}(\mathbf{F}):=-\frac{1}{2}\text{div}(\|\nabla\mathbf{F}\|^{p-2}\nabla\mathbf{F}),\quad\text{for}\,\,\,p\geq 1, (27)

Our purpose is to show that our generalized algorithm Eq. (3.5) above can be implemented as the linear combination of the classic pp-Laplacian filtering. First, based on Fu et al (2022), the operator Eq. (27) in the original pp-Laplacian message passing algorithm can be written in detailed matrix form as follows

Δp(k)(𝐅(k))=((𝜶0(k))12μ𝐈N)𝐅(k)𝐆(k)𝐅(k),\displaystyle\Delta^{(k)}_{p}(\mathbf{F}^{(k)})=\left((\boldsymbol{\alpha}^{(k)}_{0})^{-1}-2\mu\mathbf{I}_{N}\right)\mathbf{F}^{(k)}-{\mathbf{G}^{(k)}}\mathbf{F}^{(k)}, (28)

where the matrix 𝐆(k)\mathbf{G}^{(k)} has elements

Gi,j(k)=\displaystyle G^{(k)}_{i,j}= wi,jdi,idj,jW𝐅(k)[(i,j)]p2,\displaystyle\frac{w_{i,j}}{\sqrt{d_{i,i}d_{j,j}}}\left\|\nabla_{W}\mathbf{F}^{(k)}[(i,j)]\right\|^{p-2},

and the diagonal matrix 𝜶0(k)\boldsymbol{\alpha}^{(k)}_{0} has diagonal element defined as

(α0(k))ii=\displaystyle(\alpha^{(k)}_{0})_{ii}= 1/(vjviGi,j(k)+2μ/p).\displaystyle 1/\left(\sum_{v_{j}\sim v_{i}}{G^{(k)}_{i,j}}+2\mu/p\right).

Eq. (28) shows that the operation Δp(k)(𝐅(k))\Delta^{(k)}_{p}(\mathbf{F}^{(k)}) can be represented as the product of a matrix (still denoted as Δp(k)\Delta^{(k)}_{p}) and the signal matrix 𝐅(k)\mathbf{F}^{(k)}.

Noting that 𝐆(k)=𝐌~(k)𝐖p(k)\mathbf{G}^{(k)}=\widetilde{\mathbf{M}}^{(k)}\oslash\mathbf{W}^{(k)}_{p}, multiplying diagonal matrix 𝜶(k)\boldsymbol{\alpha}^{(k)} on both sides of Eq. (28) and taking out 𝐅(k)\mathbf{F}^{(k)} give

𝜶(k)Δp(k)=(𝜶(k)(𝜶0(k))12μ𝜶(k))𝜶(k)(𝐌~(k)𝐖p(k)).\displaystyle\boldsymbol{\alpha}^{(k)}\Delta^{(k)}_{p}=\left(\boldsymbol{\alpha}^{(k)}(\boldsymbol{\alpha}^{(k)}_{0})^{-1}-2\mu\boldsymbol{\alpha}^{(k)}\right)-\boldsymbol{\alpha}^{(k)}(\widetilde{\mathbf{M}}^{(k)}\oslash\mathbf{W}^{(k)}_{p}).

As 𝜶(k)\boldsymbol{\alpha}^{(k)} is diagonal, we can re-write the above equality as

𝜶(k)𝐌~(k)=((𝜶(k)(𝜶0(k))12μ𝜶(k))𝜶(k)Δp(k))𝐖p(k).\displaystyle\boldsymbol{\alpha}^{(k)}\widetilde{\mathbf{M}}^{(k)}=\left(\left(\boldsymbol{\alpha}^{(k)}(\boldsymbol{\alpha}^{(k)}_{0})^{-1}-2\mu\boldsymbol{\alpha}^{(k)}\right)-\boldsymbol{\alpha}^{(k)}\Delta^{(k)}_{p}\right)\odot\mathbf{W}^{(k)}_{p}.

Given that 𝐖p(k)\mathbf{W}^{(k)}_{p} is the Kronecker sum of the vector 𝐰p(k)\mathbf{w}^{(k)}_{p} and its transpose, if we still use 𝐰p(k)\mathbf{w}^{(k)}_{p} as its diagonal matrix, then we have

𝜶(k)𝐌~(k)=\displaystyle\boldsymbol{\alpha}^{(k)}\widetilde{\mathbf{M}}^{(k)}= ((𝜶(k)/𝜶0(k)2μ𝜶(k))𝜶(k)Δp(k))𝐰p(k)\displaystyle\left(\left(\boldsymbol{\alpha}^{(k)}/\boldsymbol{\alpha}^{(k)}_{0}-2\mu\boldsymbol{\alpha}^{(k)}\right)-\boldsymbol{\alpha}^{(k)}\Delta^{(k)}_{p}\right)\mathbf{w}^{(k)}_{p}
+𝐰p(k)\displaystyle+\mathbf{w}^{(k)}_{p} ((𝜶(k)/𝜶0(k)2μ𝜶(k))𝜶(k)Δp(k)).\displaystyle\left(\left(\boldsymbol{\alpha}^{(k)}/\boldsymbol{\alpha}^{(k)}_{0}-2\mu\boldsymbol{\alpha}^{(k)}\right)-\boldsymbol{\alpha}^{(k)}\Delta^{(k)}_{p}\right). (29)

This demonstrates that the key terms 𝜶(k)𝐌~(k)\boldsymbol{\alpha}^{(k)}\widetilde{\mathbf{M}}^{(k)} in our generalized algorithm is in linear form of pp-Laplacian operator Δp(k)\Delta^{(k)}_{p}. As demonstrated in the research Fu et al (2022), the operator Δp(k)\Delta^{(k)}_{p} approximately performs spectral graph convolution. Hence we can conclude that the generalized pp-Laplacian iterations (23) indeed performs a sequence of graph spectral convolutions (see Lemma 1 below) and gradient-based diagonal transformation (i.e., node feature scaling) over the framelet filtered graph signals.

Lemma 1.

The matrix Δp(k)\Delta^{(k)}_{p} is SPD (see Bozorgnia et al (2019); Ly (2005)) which has its own eigendecomposition, offering a graph spectral convolution interpretation.

Remark 3.

Eq. (3.5) indicates that the pp-Laplacian regularizer provides graph spectral convolution on top of the framelet filtering, which produces a second layer of filtering conceptually and hence restricts the solution space further. See Fig. 1 for the illustration. Interestingly, the combination of pp-Laplacian regularization and framelet offers a more adaptive smoothness that suites both homophilic and heterophilic data as shown in our experiments.

Remark 4.

In terms of asymptotic behavior of Eq. (3.5), one can show roughly that the elements in 𝜶(k)𝐌~(k)\boldsymbol{\alpha}^{(k)}\widetilde{\mathbf{M}}^{(k)} are between 0 and 1, which converges to zero if KK is too large. Therefore a large KK annihilates the first term in Eq. (3.5) but leaves the second term and partial sum from the third. A larger value of μ\mu seems to speed up this convergence further, resulting shortening the time for finding the solution. However, it is a model selection problem for generalizability. Moreover, the inclusion of the source term 𝜷(K1)𝐘\boldsymbol{\beta}^{(K-1)}\mathbf{Y} guarantees to supply certain amount of variations of the node features from both low and high frequency domains of the graph framelet and it has been shown such inclusion can help the model escape from over-smoothing issue (Chien et al, 2021).

Remark 5.

Compared to GPRGNN (Chien et al, 2021) in which the model outcome is obtained by

𝐅^(k)=kK1𝜸k𝐅(k),\displaystyle\hat{\mathbf{F}}^{(k)}=\sum_{k}^{K-1}\boldsymbol{\gamma}_{k}\mathbf{F}^{(k)},

where 𝜸kN×N\boldsymbol{\gamma}_{k}\in\mathbb{R}^{N\times N} is learnable generalized Page rank coefficients, as we have shown in Eq. (3.5), our proposed algorithm defined in Eq. (23) provides a mixed (spectral convolution and diagonal scaling) to the framelet graph signal outputs and thus further directs the solution space of framelet to a reasonably defined region. Furthermore, due to the utilization of the Chebyshev polynomial in framelet, the computational complexity for the first filtering is not high, which is helpful in terms of defining the provably corrected space for the second operation. This shows the efficiency of incorporating framelet in pp-Laplacian based regularization.

4 Discussions on the Proposed Model

In this section, we conduct comprehensive discussions for our proposed models. We note that we will mainly focused on exploring the property of the generalized pp-Laplacian based framelet GNN (Eq. (22)) and the iterative algorithm in Eq. (23), although we may also show some conclusions of pL-UFG and pL-fUFG as well. Specifically, in Section 4.1 we discuss the denoising power of our proposed model. Section 4.2 analyzes the model’s computational complexity. Section 4.3 provides a comparison between the pp-Laplacian regularizer with other regularizers that are applied to GNNs via different learning tasks. Finally, the section is concluded (Section 4.4) by illustrating the limitation of the model and potential aspects for future studies.

4.1 Discussion on the Denoising Power

The denoising power of framelet has been developed in the study(Dong, 2017) and empirically studied by the past works Dong (2017); Zhou et al (2021a); Yang et al (2022). Let 𝐗=𝐅+τ\mathbf{X}=\mathbf{F}+\mathbf{\tau} be the input node feature matrix with noise τ\mathbf{\tau}. In works (Dong, 2017; Zhou et al, 2021a), a framelet-based regularizer was proposed to resolve the optimization problem which can be described as:

min𝐅𝒟𝐅1,𝒢+f(𝐅,𝐗),\displaystyle\min_{\mathbf{F}}\|\mathcal{D}\mathbf{F}\|_{1,\mathcal{G}}+f(\mathbf{F},\mathbf{X}), (30)

where f(,)f(\cdot,\cdot) is some fidelity function that takes different forms for different applications. For any graph signal 𝐅\mathbf{F}, the graph 𝕃p\mathbb{L}_{p}-norm 𝐅p,𝒢:=(i|𝐟i|p×di,i)1/p\|\mathbf{F}\|_{p,\mathcal{G}}:=\left(\sum_{i}|\mathbf{f}_{i}|^{p}\times d_{i,i}\right)^{1/p}, where di,id_{i,i} is the degree of the ii-th node with respect to 𝐟\mathbf{f}. 𝒟\mathcal{D} is a linear transformation generated from discrete transformations, for example, the graph Fourier transforms or the Framelet transforms. Thus, the regularizer assigns a higher penalty to the node with a larger number of neighbours. Specifically, considering the functional 𝒟\mathcal{D} as the framelet transformation, then the first term in Eq. (30) can be written as:

𝒟𝐅1,𝒢=((r,)𝒵i|𝐅r,[i]|p×di,i)1/p.\displaystyle\|\mathcal{D}\mathbf{F}\|_{1,\mathcal{G}}=\left(\sum_{(r,\ell)\in\mathcal{Z}}\sum_{i}|\mathbf{F}_{r,\ell}[i]|^{p}\times d_{i,i}\right)^{1/p}. (31)

where 𝒵={(r,):r=1,,R,=0,,J}{(0,J)}\mathcal{Z}=\{(r,\ell):r=1,...,R,\ell=0,...,J\}\cup\{(0,J)\}. Replacing the pp-Laplacian regularizer in Eq. (15) by the framelet regularizer defined in the above Eq. (31) we have:

𝐅=argmin𝐅\displaystyle\mathbf{F}=\operatorname*{arg\,min}_{\mathbf{F}} ((r,)𝒵i|𝐅r,[i]|p×di,i)1/p+μ𝐅𝒲Tdiag(θ)𝒲𝐗F2.\displaystyle\left(\sum_{(r,\ell)\in\mathcal{Z}}\sum_{i}|\mathbf{F}_{r,\ell}[i]|^{p}\times d_{i,i}\right)^{1/p}+\mu\|\mathbf{F}-\mathcal{W}^{T}\textrm{diag}(\theta)\mathcal{W}\mathbf{X}\|^{2}_{F}. (32)

Followed by the work in the research (Dong, 2017), the framelet regularizer-based denoising problem in Eq. (30) is associated with the variational regularization problem that can be generally described in the form similar to Eq. (15) where the variational term 𝒮p(𝐅)\mathcal{S}_{p}(\mathbf{F}) is utilized for regularizing the framelet objective function. Simply by replacing the input of the first term in Eq. (32) by 𝐅\nabla\mathbf{F} and omitting node degree information, we have:

𝐅=argmin𝐅(r,)𝒵𝐅r,pp+μ𝐅𝒲Tdiag(θ)𝒲𝐗F2.\mathbf{F}=\operatorname*{arg\,min}_{\mathbf{F}}\sum_{(r,\ell)\in\mathcal{Z}}\|\nabla\mathbf{F}_{r,\ell}\|_{p}^{p}+\mu\|\mathbf{F}-\mathcal{W}^{T}\textrm{diag}(\theta)\mathcal{W}\mathbf{X}\|^{2}_{F}. (33)

Therefore our proposed model naturally is equipped with denoising capability and the corresponding (denoising) regularizing term aligns with the denoising regularizer developed in the paper (Zhou and Schölkopf, 2005) without nodes degree information. However, the work in the study (Zhou and Schölkopf, 2005) only handles p=1p=1 and p=2p=2. Whereas our pp-Laplacian framelet model covers the values of p+p\in\mathbb{R}_{+}. This allows more flexibility and effectiveness in the model.

Furthermore, the major difference between most wavelet frame models and the classic variational models is the choice of the underlying transformation (i.e., 𝒟\mathcal{D} applied to the graph signal) that maps the data to the transformed domain that can be sparsely approximated. Given both pp-Laplacian (i.e., Eq. (15)) and framelet regularizers (i.e., Eq. (31)) target on the graph signal 𝐅𝒵\mathbf{F}_{\mathcal{Z}} that is produced from the sparse and tight framelet transformation (Chebyshev polynomials), this further illustrates the equivalence between two denoising regularizers. Please refer to (Dong, 2017) for more details.

4.2 Discussion on the Computational Complexity

In this section, we briefly discuss the computational complexity of our proposed model, specific to the generalized model defined in Eq. (22) with its results that can be approximated by the algorithm presented in Eq. (23). The primary computational cost of our model stems from the generation of the framelet decomposition matrices (𝒲\mathcal{W}) or the framelet transformation applied to the node features. We acknowledge that generating 𝒲\mathcal{W} involves a high computational complexity, as transitional framelet methods typically employ eigen-decomposition of the graph Laplacian for this purpose. Consequently, the framelet transform itself exhibits a time complexity of 𝒪(N2(nj+1)Kc)\mathcal{O}(N^{2}(nj+1)Kc) and a space complexity of 𝒪(N2(nj+1)c)\mathcal{O}(N^{2}(nj+1)c) for a graph with NN nodes and cc-dimensional features. Notably, the constants nn, jj, and KK are independent of the graph data. Existing literature (Zheng et al, 2021b) has demonstrated that when the input graph contains fewer than 4×1044\times 10^{4} nodes (with fixed feature dimension), the computational time for framelet convolution is comparable to that of graph attention networks with 8 attention heads (Veličković et al, 2018). As the graph’s node size increases, the performance of graph attention networks degrades rapidly, while graph framelets maintain faster computation. However, our application of Chebyshev polynomials to approximate 𝒲\mathcal{W} significantly reduces the associated computational cost compared to traditional methods. Additionally, we acknowledge that the inclusion of the pp-Laplacian based implicit layer introduces additional computational cost to the original framelet model, primarily arising from the computation of the norm of the graph gradient, denoted as |𝐅||\nabla\mathbf{F}|. Considering the example of the Euclidean norm, the computational cost for |𝐅||\nabla\mathbf{F}| scales as 𝒪(N×c)\mathcal{O}(N\times c), where NN and cc represent the number of nodes and feature dimension, respectively. Thus, even when accounting for the computation of the implicit layer, the overall cost of our proposed method remains comparable to that of the framelet model.

4.3 Comparison with Other Regularizers and Potential Application Scenarios

As we have illustrated in Section 3.3, with the assistance from different form of the regularizers, GNNs performance could be enhanced via different learning tasks. In this discussion, we position our study within the wider research landscape that investigates various regularizers to enhance the performance of Graph Neural Networks (GNNs) across different learning tasks. We earlier discussed the denoising power of the pL-UFG in Section 4.1, establishing that it can be expressed in terms of a denoising formulation. This is comparable to the approach from Ma et al (2021), who used a different regularizer term to highlight the denoising capabilities of GNNs. They showed that their regularizer can effectively reduce noise and enhance the robustness of GNN models.

Our study, however, emphasizes the unique advantages that the pp-Laplacian brings, a theme also echoed in the works by Liu et al (2010) and Candès et al (2011). Both studies incorporated the L1L_{1}-norm as a regularization term in robust principal component analysis (RPCA), showcasing its ability to recover low-rank matrices even in the presence of significant noise. Furthermore, the study by Liu et al (2021b) reinforces the benefits of the L1L_{1}-norm in preserving discontinuity and enhancing local smoothness. Characterized by its heavy-tail property, the L1L_{1}-norm imposes less penalty on large values, thereby making it effective in handling data with substantial corruption or outliers.

Furthermore, in terms of the potential practical implementations, one can deploy our method into various aspects. For example, the pp-Laplacian based regularizer can be used in image processing and computer vision applications, where it can help to smooth out noisy or jagged images while preserving important features. In addition, we note that, as we implement our graph learning method via an optimization (regularization) framework, this suggests any potential practical implementations (such as graph generation (Ho et al, 2020), graph based time series forecasting (Wen et al, 2023)) of GNNs can also be deployed under our proposed methods with a higher flexibility power.

4.4 Limitation and Potential Future Studies

While outstanding properties have been presented from our proposed model, the exploration of how the pp-Laplacian based regularization framework can be deployed to various types of graphs (i.e., dynamic or spatio-temporal) is still wanted. In fact, one may require the corresponding GNNs to be able to capture the pattern of the evolution (i.e., combing GNN with LSTM or Transformer (Manessi et al, 2020)) of the graph when the input graph is dynamic. We note that such a requirement is beyond the capability of the current model. However, as the pp-Laplacian based regularizer restricts the solution of the graph framelet, it would be interesting to make the quantity of pp learnable throughout the training process. We leave this to future work.

Moreover, followed by the idea of making pp value as a learnable parameter in the model, one can also explore assigning different values of pp to different frequency domains of framelet. It has been verified that the low-frequency domain of framelet induces a smoothing effect on the graph signal whereas the sharpening effect is produced from the high-frequency domain (Han et al, 2022). Therefore, one may prefer to obtain a relatively large quantity of pp to the low-frequency domain to enhance the smoothing effect of the framelet when the input graph is highly homophily, as one prefers to predict identical labels for connected nodes. On the other hand, a smaller value of pp is wanted for a high-frequency domain to further sharpen the differences between node features (i.e., 𝐅\nabla\mathbf{F}) so that distinguished labels can be produced from the model when the input graph is heterophilic. Moreover, as the pp-Laplacian-based implicit layers are allocated before the framelet reconstruction, it would be interesting to explore how such implicit layers can affect the signal reconstruction of the framelet.

5 Experiments

In this section, we present empirical studies on pL-UFG and pL-fUFG on real-world node classification tasks with both heterophilic and homophilic graphs. We also test the robustness of the proposed models against noise. Both two experiments are presented with detailed discussions on their results. In addition, we discuss by adjusting the quantity of the pp in our proposed model, the so-called ablation study is automatically conducted in our experiments. The code for our experiment can be accessed via https://github.com/superca729/pL-UFG. Lastly, it is worth noting that our proposed method has the potential to be applied to the graph learning task other than node classification such as graph level classification (pooling) (Zheng et al, 2021b) and link prediction (Lin and Gao, 2023). Although we have yet to delve into these tasks, we believe that by assigning some simple manipulations to our methods, such as deploying the readout function for graph pooling or computing the log-likelihood for graph link prediction, our method is capable of handling these tasks. Similarly, our model could be beneficial in community detection tasks by possibly identifying clusters of similar characteristics or behaviors. We leave these promising research aspects to our future work.

5.1 Datasets, Baseline Models and the Parameter Setting

Datasets. We use both homophilic and heterophilic graphs from https://www.pyg.org/ to assess pL-UFG and pL-fUFG, including benchmark heterophilic datasets: Chameleon, Squirrel, Actor, Wisconsin, Texas, Cornell, and homophilic datasets: Cora, CiteSeer, PubMed, Computers, Photos, CS, Physics. In our experiments, homophilic graphs are undirected and heterophilic graphs are directed (where we observe an improved performance when direction information is provided). We included the summary statistics of the included datasets together with their homophily index and split ratio in Table 1

Baseline models. We consider eight baseline models for comparison:

  • MLP: standard feedward multiple layer perceptron.

  • GCN (Kipf and Welling, 2017): GCN is the first of its kind to implement linear approximation to spectral graph convolutions.

  • SGC (Wu et al, 2019): SGC reduces GCNs’ complexity by removing nonlinearities and collapsing weight matrices between consecutive layers.

  • GAT (Veličković et al, 2018): GAT is a graph neural network that applies the attention mechanism on node feature to learn edge weight.

  • JKNet (Xu et al, 2018b): JKNet can flexibly leverage different neighbourhood ranges to enable better structure-aware representation for each node.

  • APPNP (Gasteiger et al, 2019): APPNP combines GNN with personalized PageRank to separate the neural network from the propagation scheme.

  • GPRGNN (Chien et al, 2021): GPRGNN architecture that adaptively learns the GPR (General Pagerank) weights so as to jointly optimize node feature and topological information extraction, regardless the level of homophily on a graph.

  • pp-GNN (Fu et al, 2022): pp-GNN is pp-Laplacian based GNN model, whose message-passing mechanism is derived from a discrete regularization framework.

  • UFG (Zheng et al, 2021b): UFG is a type of GNNs based on framelet transforms, the framelet decomposition can naturally aggregate the graph features into low-pass and high-pass spectra.

The test results are reproduced using code that runs in our machine, which might be different from the reported results. In addition, compared to pp-GNN, the extra time and space complexity induced from our algorithm is O(N2(RJ+1)d)O(N^{2}(RJ+1)d).

Hyperparameter tuning. We use grid search for tuning hyperparameters. We test p{1.0,1.5,2.0,2.5}p\in\{1.0,1.5,2.0,2.5\} for PGNN, pL-UFG and pL-fUFG. The learning rate is chosen from {0.01, 0.005}. We consider the number of iterations TT in pp-Laplacian message passing from {4,5}\{4,5\} after 10 warming-up steps. For homophilic datasets, we tune μ{0.1,0.5,1,5,10}\mu\in\{0.1,0.5,1,5,10\} and for heterophilic graphs μ{3,5,10,20,30,50,70}\mu\in\{3,5,10,20,30,50,70\}. The framelet type is fixed as Linear (see (Yang et al, 2022)) and the level JJ is set to 1. The dilation scale s{1,1.5,2,3,6}{s}\in\{1,1.5,2,3,6\}, and for nn, the degree of Chebyshev polynomial approximation to all gg’s in (15)-(18), is drawn from {2, 3, 7}. It is worth noting that in graph framelets, the Chebyshev polynomial is utilized for approximating the spectral filtering of the Laplacian eigenvalues. Thus, a dd-degree polynomial approximates dd-hop neighbouring information of each node of the graph. Therefore, when the input graph is heterophilic, one may require dd to be relatively larger as node labels tend to be different between directly connected (1-hop) nodes. The number of epochs is set to 200, the same as the baseline model(Fu et al, 2022).

Table 1: Statistics of the datasets, (𝒢)\mathcal{H}(\mathcal{G}) represent the level of homophily of overall benchmark datasets.
Datasets Class Feature Node Edge Train/Valid/Test (𝒢)\mathcal{H}(\mathcal{G})
Cora 7 1433 2708 5278 20%/10%/70% 0.825
CiteSeer 6 3703 3327 4552 20%/10%/70% 0.717
PubMed 3 500 19717 44324 20%/10%/70% 0.792
Computers 10 767 13381 245778 20%/10%/70% 0.802
Photo 8 745 7487 119043 20%/10%/70% 0.849
CS 15 6805 18333 81894 20%/10%/70% 0.832
Physics 5 8415 34493 247962 20%/10%/70% 0.915
Chameleon 5 2325 2277 31371 60%/20%/20% 0.247
Squirrel 5 2089 5201 198353 60%/20%/20% 0.216
Actor 5 932 7600 26659 60%/20%/20% 0.221
Wisconsin 5 251 499 1703 60%/20%/20% 0.150
Texas 5 1703 183 279 60%/20%/20% 0.097
Cornell 5 1703 183 277 60%/20%/20% 0.386

5.2 Experiment Results and Discussion

Node Classification Tables 2 and 3 summarize the results on homophilic and heterophilic datasets. The values after ±\pm are standard deviations. The top three results are highlighted in First, Second, and Third. From the experiment results, we observe that pL-UFG and pL-fUFG achieve competitive performances against the baselines in most of the homophily and heterophily benchmarks. For the models’ performance on homophily undirect graphs, Table 2 shows that pL-UFG21.5 has the top accuracy in Cora, Photos and CS. For Citeseer, pL-UFG12.0, pL-UFG22.0 and pL-fUFG2.0 have the best performance. In terms of the performance on heterophily direct graphs (Table 3), we observe that both of our two models are capable of generating the highest accuracy in all benchmark datasets except Actor where the top performance was generated from MLP. However, we note that for Actor, our pL-UFG11.0 achieves almost identical outcomes compared to those of MLP.

Aligned with our theoretical prediction in Remark 4 from the experiment, we discover that the value of the trade-off term μ\mu for pL-UFG and pL-fUFG is significantly higher than that in pGNN, indicating that framelet has a major impact on the performance of the model. The performance of pL-UFG, pL-fUFG and others on heterophilic graphs are shown in Table 3. pL-UFG and pL-fUFG both can outperform MLP and other state-of-the-art GNNs under a low homophilic rate. In terms of denoising capacity, pL-UFG and pL-fUFG are far better than the baseline models. Figs. 6 and 7 show that both pL-UFG and pL-fUFG produce the top accuracies across different noise levels.

Discussion for 𝐩<𝟐\mathbf{p<2}. To enable GNN model to better adapt to the heterophilic graphs, an ideal GNN model shall induce a relatively high variation in terms of the node feature generated from it. Therefore, compared to the models with p=2p=2, the model regularized with p2p\neq 2 regularizer imposes a lesser penalty and thus produces outputs with a higher variation. This is supported by the empirical observation from Table 3 in which the highest prediction accuracy for heterophilic graphs are usually achieved by our model with p<2p<2.

Discussion for pp = 1. Here we specifically focus on p=1p=1 in our models. Recall that when p=1p=1, the pp-Laplacian operator acts on the graph signal 𝐅\mathbf{F} is Δ1𝐅=12div(𝐅𝐅)\Delta_{1}\mathbf{F}=-\frac{1}{2}\text{div}\left(\frac{\nabla\mathbf{F}}{\|\nabla\mathbf{F}\|}\right). Based on Remark 1, when p=1p=1, Δ1𝐅\Delta_{1}\mathbf{F} is equivalent to the mean curvature operator defined on the embedded surface. In analogy to differential geometry, points that curvature can not be properly defined are so-called singular points. Thus one can similarly conclude that when a graph contains a singular node (i.e., the graph is crumpled (Burda et al, 2001)) then both Δ1𝐅\Delta_{1}\mathbf{F} and its corresponding 𝒮p(𝐅)\mathcal{S}_{p}(\mathbf{F}) can not be properly defined. Furthermore, one can easily check that, when p=1p=1, the regularization term ϕ(ξ)=ξp\phi(\xi)=\xi^{p} produces a higher penalty than its 𝒮p(𝐅)\mathcal{S}_{p}(\mathbf{F}) counterparts. This is because, when p=1p=1, 𝒮p(𝐅)=(ic(𝐟i)2)12\mathcal{S}_{p}(\mathbf{F})=(\sum_{i}^{c}(\nabla\mathbf{f}_{i})^{2})^{\frac{1}{2}} whereas ϕ(ξ)=ξ1=i|𝐟i|\phi(\xi)=\xi^{1}=\sum_{i}|\nabla\mathbf{f}_{i}|. Clearly, we have 𝒮p(𝐅)<ϕ(ξ)\mathcal{S}_{p}(\mathbf{F})<\phi(\xi) unless all of the nodes in the graph are singular nodes, or in the graph in which there is only one non-singular point while the rest nodes are singular.

5.3 Visualization on the Effect of pp

Based on the claim we made in previous sections, increasing the value of pp leads our model to exhibit a stronger smoothing effect on the node features, thereby making it more suitable for homophilic graphs. Conversely, when pp is small, the model preserves distinct node features, making it a better fit for heterophilic graphs. To validate this idea, we visualize the changes in relative positions (distances) between node features for p=1p=1 (the minimum value) and p=2.5p=2.5 (the maximum value). We specifically selected the Cora and Actor datasets and employed Isomap (Tenenbaum et al, 2000) to map both the initial node features and the node features generated by our model to 2\mathbb{R}^{2}, while preserving their distances. The results are depicted in Figure 2 and 3.

From both Figure 2 and 3, it can be observed that when p=1p=1, the sharpening effect induced by our model causes the node features to become more distinct from each other. Under the Isomap projection, the nodes are scattered apart compared to the input features. Conversely, when a relatively large value of p=2.5p=2.5 is assigned, the model exhibits a stronger smoothing effect, resulting in all nodes being aggregated towards the center in the Isomap visualization. These observations provide direct support for our main claim regarding the proposed model.

Refer to caption
Figure 2: Isomap Visualization with using homophilic data (Cora) by changing of pp.
Refer to caption
Figure 3: Isomap Visualization with using heterophilic data (Actor) by changing of pp.
Table 2: Test accuracy (%) on homophilic undirected graph.
Method Cora CiteSeer PubMed Computers Photos CS Physics
MLP 66.04±1.1166.04\pm 1.11 68.99±0.4868.99\pm 0.48 82.03±0.2482.03\pm 0.24 71.89±5.3671.89\pm 5.36 86.11±1.3586.11\pm 1.35 93.50±0.2493.50\pm 0.24 94.56±0.1194.56\pm 0.11
GCN 84.72±0.3884.72\pm 0.38 75.04±1.4675.04\pm 1.46 83.19±0.1383.19\pm 0.13 78.82±1.8778.82\pm 1.87 90.00±1.4990.00\pm 1.49 93.00±0.1293.00\pm 0.12 95.55±0.0995.55\pm 0.09
SGC 83.79±0.3783.79\pm 0.37 73.52±0.8973.52\pm 0.89 75.92±0.2675.92\pm 0.26 77.56±0.8877.56\pm 0.88 86.44±0.3586.44\pm 0.35 OOM OOM
GAT 84.37±1.1384.37\pm 1.13 74.80±1.0074.80\pm 1.00 83.92±0.2883.92\pm 0.28 78.68±2.0978.68\pm 2.09 89.63±1.7589.63\pm 1.75 92.57±0.1492.57\pm 0.14 95.13±0.1595.13\pm 0.15
JKNet 83.69±0.7183.69\pm 0.71 74.49±0.7474.49\pm 0.74 82.59±0.5482.59\pm 0.54 69.32±3.9469.32\pm 3.94 86.12±1.1286.12\pm 1.12 91.11±0.2291.11\pm 0.22 94.45±0.3394.45\pm 0.33
APPNP 83.69±0.7183.69\pm 0.71 75.84±0.6475.84\pm 0.64 80.42±0.2980.42\pm 0.29 73.73±2.4973.73\pm 2.49 87.03±0.9587.03\pm 0.95 91.52±0.1491.52\pm 0.14 94.71±0.1194.71\pm 0.11
GPRGNN 83.79±0.9383.79\pm 0.93 75.94±0.6575.94\pm 0.65 82.32±0.2582.32\pm 0.25 74.26±2.9474.26\pm 2.94 88.69±1.3288.69\pm 1.32 91.89±0.0891.89\pm 0.08 94.85±0.2394.85\pm 0.23
UFG 80.64±0.7480.64\pm 0.74 73.30±0.1973.30\pm 0.19 81.52±0.8081.52\pm 0.80 66.39±6.0966.39\pm 6.09 86.60±4.6986.60\pm 4.69 95.27±0.0495.27\pm 0.04 95.77±0.0495.77\pm 0.04
PGNN1.0 84.21±0.9184.21\pm 0.91 75.38±0.8275.38\pm 0.82 84.34±0.3384.34\pm 0.33 81.22±2.6281.22\pm 2.62 87.64±5.0587.64\pm 5.05 94.88±0.1294.88\pm 0.12 96.15±0.1296.15\pm 0.12
PGNN1.5 84.42±0.7184.42\pm 0.71 75.44±0.9875.44\pm 0.98 84.48±0.2184.48\pm 0.21 82.68±1.1582.68\pm 1.15 91.83±0.7791.83\pm 0.77 94.13±0.0894.13\pm 0.08 96.14±0.0896.14\pm 0.08
PGNN2.0 84.74±0.6784.74\pm 0.67 75.62±1.0775.62\pm 1.07 84.25±0.3584.25\pm 0.35 83.40±0.6883.40\pm 0.68 91.71±0.9391.71\pm 0.93 94.28±0.1094.28\pm 0.10 96.03±0.0796.03\pm 0.07
PGNN2.5 84.48±0.7784.48\pm 0.77 75.22±0.7375.22\pm 0.73 83.94±0.4783.94\pm 0.47 82.91±1.3482.91\pm 1.34 91.41±0.6691.41\pm 0.66 93.40±0.0793.40\pm 0.07 95.75±0.0595.75\pm 0.05
pL-UFG11.0 84.54±0.6284.54\pm 0.62 75.88±0.6075.88\pm 0.60 85.56±0.1885.56\pm 0.18 82.07±2.7882.07\pm 2.78 85.57±19.9285.57\pm 19.92 95.03±0.22\mathbf{95.03\!\pm\!0.22} 96.19±0.06\mathbf{96.19\!\pm\!0.06}
pL-UFG11.5 84.96±0.3884.96\pm 0.38 76.04±0.8576.04\pm 0.85 85.59±0.18\mathbf{85.59\!\pm\!0.18} 85.04±1.0685.04\pm 1.06 92.92±0.37\mathbf{92.92\!\pm\!0.37} 95.03±0.22\mathbf{95.03\pm 0.22} 96.27±0.06\mathbf{96.27\!\pm\!0.06}
pL-UFG12.0 85.20±0.4285.20\pm 0.42 76.12±0.82\mathbf{76.12\pm 0.82} 85.59±0.17\mathbf{85.59\!\pm\!0.17} 85.26±1.15\mathbf{85.26\!\pm\!1.15} 92.65±0.65\mathbf{92.65\!\pm\!0.65} 94.77±0.2794.77\pm 0.27 96.04±0.0796.04\pm 0.07
pL-UFG12.5 85.30±0.6085.30\pm 0.60 76.11±0.82\mathbf{76.11\!\pm\!0.82} 85.54±0.1885.54\pm 0.18 85.18±0.8885.18\pm 0.88 91.49±1.2991.49\pm 1.29 94.86±0.1494.86\pm 0.14 95.96±0.1195.96\pm 0.11
pL-UFG21.0 84.42±0.3284.42\pm 0.32 74.79±0.6274.79\pm 0.62 85.45±0.1885.45\pm 0.18 84.88±0.8484.88\pm 0.84 85.30±19.5085.30\pm 19.50 95.03±0.19\mathbf{95.03\!\pm\!0.19} 96.06±0.1196.06\pm 0.11
pL-UFG21.5 85.60±0.36\mathbf{85.60\!\pm\!0.36} 75.61±0.6075.61\pm 0.60 85.59±0.18\mathbf{85.59\!\pm\!0.18} 84.55±1.5784.55\pm 1.57 93.00±0.61\mathbf{93.00\!\pm\!0.61} 95.03±0.19\mathbf{95.03\!\pm\!0.19} 96.14±0.0996.14\pm 0.09
pL-UFG22.0 85.20±0.4285.20\pm 0.42 76.12±\pm0.82 85.59±0.17\mathbf{85.59\!\pm\!0.17} 85.27±1.15\mathbf{85.27\!\pm\!1.15} 92.50±0.4092.50\pm 0.40 94.77±0.2794.77\pm 0.27 96.05±0.0796.05\pm 0.07
pL-UFG22.5 85.31±0.41\mathbf{85.31\!\pm\!0.41} 75.88±0.6775.88\pm 0.67 85.53±0.1985.53\pm 0.19 85.11±0.8185.11\pm 0.81 92.44±1.5192.44\pm 1.51 94.96±0.13\mathbf{94.96\!\pm\!0.13} 95.99±0.1295.99\pm 0.12
pL-fUFG1.0 84.45±0.4384.45\pm 0.43 75.64±0.6475.64\pm 0.64 85.62±0.19\mathbf{85.62\!\pm\!0.19} 84.57±1.0884.57\pm 1.08 86.09±16.3586.09\pm 16.35 94.98±0.23\mathbf{94.98\!\pm\!0.23} 96.23±0.09\mathbf{96.23\!\pm\!0.09}
pL-fUFG1.5 85.40±0.45\mathbf{85.40\!\pm\!0.45} 75.94±0.6375.94\pm 0.63 85.62±0.17\mathbf{85.62\!\pm\!0.17} 85.00±1.2185.00\pm 1.21 92.12±1.3592.12\pm 1.35 94.98±\pm0.23 96.12±0.0896.12\pm 0.08
pL-fUFG2.0 85.20±0.4285.20\pm 0.42 76.12±0.82\mathbf{76.12\!\pm\!0.82} 85.58±0.16\mathbf{85.58\!\pm\!0.16} 85.29±1.16\mathbf{85.29\!\pm\!1.16} 92.47±0.4892.47\pm 0.48 94.77±0.2794.77\pm 0.27 96.05±0.0796.05\pm 0.07
pL-fUFG2.5 85.21±0.4485.21\pm 0.44 76.01±0.9776.01\pm 0.97 85.54±0.2085.54\pm 0.20 84.94±0.9184.94\pm 0.91 92.26±1.2692.26\pm 1.26 94.95±0.1194.95\pm 0.11 96.02±0.0496.02\pm 0.04
Table 3: Test accuracy (%) on heterophilic directed graph
Method Chameleon Squirrel Actor Wisconsin Texas Cornell
MLP 48.82±1.4348.82\pm 1.43 34.30±1.1334.30\pm 1.13 41.66±0.83\mathbf{41.66\!\pm\!0.83} 93.45±2.0993.45\pm 2.09 71.25±12.9971.25\pm 12.99 83.33±4.5583.33\pm 4.55
GCN 33.71±2.2733.71\pm 2.27 26.19±1.3426.19\pm 1.34 33.46±1.4233.46\pm 1.42 67.90±8.1667.90\pm 8.16 53.44±11.2353.44\pm 11.23 55.68±10.5755.68\pm 10.57
SGC 33.83±1.6933.83\pm 1.69 26.89±0.9826.89\pm 0.98 32.08±2.2232.08\pm 2.22 59.56±11.1959.56\pm 11.19 64.38±7.5364.38\pm 7.53 43.18±16.4143.18\pm 16.41
GAT 41.95±2.6541.95\pm 2.65 25.66±1.7225.66\pm 1.72 33.64±3.4533.64\pm 3.45 60.65±11.0860.65\pm 11.08 50.63±28.3650.63\pm 28.36 34.09±29.1534.09\pm 29.15
JKNet 33.50±3.4633.50\pm 3.46 26.95±1.2926.95\pm 1.29 31.14±3.6331.14\pm 3.63 60.42±8.7060.42\pm 8.70 63.75±5.3863.75\pm 5.38 45.45±9.9945.45\pm 9.99
APPNP 34.61±3.1534.61\pm 3.15 32.61±0.9332.61\pm 0.93 39.11±1.1139.11\pm 1.11 82.41±2.1782.41\pm 2.17 80.00±5.3880.00\pm 5.38 60.98±13.4460.98\pm 13.44
GPRGNN 34.23±4.0934.23\pm 4.09 34.01±0.8234.01\pm 0.82 34.63±0.5834.63\pm 0.58 86.11±1.3186.11\pm 1.31 84.38±11.2084.38\pm 11.20 66.29±11.2066.29\pm 11.20
UFG 50.11±1.6750.11\pm 1.67 31.48±2.0531.48\pm 2.05 40.13±1.1140.13\pm 1.11 93.52±2.3693.52\pm 2.36 84.69±4.8784.69\pm 4.87 83.71±3.2883.71\pm 3.28
PGNN1.0 49.04±1.1649.04\pm 1.16 34.79±1.0134.79\pm 1.01 40.91±1.4140.91\pm 1.41 94.35±2.1694.35\pm 2.16 82.00±11.3182.00\pm 11.31 82.73±6.9282.73\pm 6.92
PGNN1.5 49.12±1.1449.12\pm 1.14 34.86±1.2534.86\pm 1.25 40.87±1.4740.87\pm 1.47 94.72±1.9194.72\pm 1.91 81.50±10.7081.50\pm 10.70 81.97±10.1681.97\pm 10.16
PGNN2.0 49.34±1.1549.34\pm 1.15 34.97±1.4134.97\pm 1.41 40.83±1.8140.83\pm 1.81 94.44±1.7594.44\pm 1.75 84.38±11.5284.38\pm 11.52 81.06±10.1881.06\pm 10.18
PGNN2.5 49.16±1.4049.16\pm 1.40 34.94±1.5734.94\pm 1.57 40.78±1.5140.78\pm 1.51 94.35±2.1694.35\pm 2.16 83.38±12.9583.38\pm 12.95 81.82±8.8681.82\pm 8.86
pL-UFG11.0 56.81±1.69\mathbf{56.81\pm 1.69} 38.81±1.9738.81\pm 1.97 41.26±1.66\mathbf{41.26\pm 1.66} 96.48±0.94\mathbf{96.48\pm 0.94} 86.13±7.4786.13\pm 7.47 86.06±3.1686.06\pm 3.16
pL-UFG11.5 56.89±1.17\mathbf{56.89\pm 1.17} 39.73±1.22\mathbf{39.73\pm 1.22} 40.95±0.9340.95\pm 0.93 96.48±1.07\mathbf{96.48\pm 1.07} 87.00±5.1687.00\pm 5.16 86.52±2.29\mathbf{86.52\pm 2.29}
pL-UFG12.0 56.24±1.0256.24\pm 1.02 39.72±1.8639.72\pm 1.86 40.95±0.9340.95\pm 0.93 96.59±0.72\mathbf{96.59\pm 0.72} 86.50±8.8486.50\pm 8.84 85.30±2.3585.30\pm 2.35
pL-UFG12.5 56.11±1.2556.11\pm 1.25 39.38±1.7839.38\pm 1.78 41.04±0.9941.04\pm 0.99 95.34±1.6495.34\pm 1.64 89.00±4.99\mathbf{89.00\pm 4.99} 83.94±3.5383.94\pm 3.53
pL-UFG21.0 55.51±1.5355.51\pm 1.53 36.94±5.6936.94\pm 5.69 29.28±19.2529.28\pm 19.25 93.98±2.9493.98\pm 2.94 85.00±5.2785.00\pm 5.27 87.73±2.49\mathbf{87.73\pm 2.49}
pL-UFG21.5 57.22±1.19\mathbf{57.22\pm 1.19} 39.80±1.42\mathbf{39.80\pm 1.42} 40.89±0.7540.89\pm 0.75 96.48±0.94\mathbf{96.48\pm 0.94} 87.63±5.3287.63\pm 5.32 86.82±1.6786.82\pm 1.67
pL-UFG22.0 56.19±0.9956.19\pm 0.99 39.74±1.66\mathbf{39.74\pm 1.66} 41.01±0.8041.01\pm 0.80 96.14±1.1696.14\pm 1.16 86.50±8.8486.50\pm 8.84 85.30±2.3585.30\pm 2.35
pL-UFG22.5 55.69±1.1555.69\pm 1.15 39.30±1.6839.30\pm 1.68 40.86±0.7440.86\pm 0.74 95.80±1.4495.80\pm 1.44 86.38±2.9886.38\pm 2.98 84.55±3.3184.55\pm 3.31
pL-fUFG1.0 55.80±1.9355.80\pm 1.93 38.43±1.2638.43\pm 1.26 32.84±16.5432.84\pm 16.54 93.98±3.4793.98\pm 3.47 86.25±6.8986.25\pm 6.89 87.27±2.27\mathbf{87.27\pm 2.27}
pL-fUFG1.5 55.65±1.9655.65\pm 1.96 38.40±1.5238.40\pm 1.52 41.00±0.9941.00\pm 0.99 96.48±1.29\mathbf{96.48\pm 1.29} 87.25±3.6187.25\pm 3.61 86.21±2.1986.21\pm 2.19
pL-fUFG2.0 55.95±1.2955.95\pm 1.29 38.33±1.7138.33\pm 1.71 41.25±0.84\mathbf{41.25\pm 0.84} 96.25±1.25\mathbf{96.25\pm 1.25} 88.75±4.97\mathbf{88.75\pm 4.97} 83.94±3.7883.94\pm 3.78
pL-fUFG2.5 55.56±1.6655.56\pm 1.66 38.39±1.4838.39\pm 1.48 40.55±0.5040.55\pm 0.50 95.28±2.2495.28\pm 2.24 88.50±7.37\mathbf{88.50\pm 7.37} 83.64±3.8883.64\pm 3.88

5.4 Experiments on Denoising Capacity

Followed by the discussion in Section 4.1, in this section we evaluate our proposed models’ (pL-UFG and pL-fUFG) denoising capacity on both homophilic (Cora) and heterophilic (Chameleon) graphs. Since the node features of the included datasets are in binary, we randomly assign binary noise with different proportions of the node features (i.e., r{5%,10%,15%,20%}r\in\{5\%,10\%,15\%,20\%\}). From Fig. 6 and 7 we see that pL-UFG21.52^{1.5} defined in Eq. (16) outperforms other baselines and showed the strongest robustness to the contaminated datasets. This is expected based on our discussion on the denoising power in Section 4.1 and the formulation of our models (defined in Eqs. (15), (16) and (17)). The denoising capacity of our proposed model is sourced from two parts including the sparse approximation of the framelet decomposition and reconstruction, i.e., 𝒲\mathcal{W} and 𝒲T\mathcal{W}^{T}, and variational regularizer 𝒮p(𝐅)\mathcal{S}_{p}(\mathbf{F}) (Dong, 2017). Compared to pL-UFG1 defined in Eq. (15), pL-UFG2 assigns the denoising operators, 𝒲\mathcal{W}, 𝒲T\mathcal{W}^{T} and 𝒮p(𝐅)\mathcal{S}_{p}(\mathbf{F}), to the node inputs that is reconstructed from both sparsely decomposed low and high-frequency domains so that the denoising operators target on the original graph inputs at different scales and hence naturally result in a better model denoising performance. In addition, without taking the reconstruction in the first place, the term in pL-fUFG 𝐅r,jdiag(θr,j)𝒲r,j𝐗F2\|\mathbf{F}_{r,j}-\textrm{diag}(\theta_{r,j})\mathcal{W}_{r,j}\mathbf{X}\|^{2}_{F} is less sparse than the pL-UFG2 counterpart. Thus regularizing with the same 𝒮p(𝐅)\mathcal{S}_{p}(\mathbf{F}), the effect from insufficient eliminated noise from pL-fUFG will lead pL-fUFG to an incorrect (respect to pL-UFG2) solution space which can not be recovered by the additional reconstruction using 𝒲T\mathcal{W}^{T}, and this is the potential reason for observing an outperforming result from pL-UFG2 compared to other proposed models.

In addition to the above experiment, here we show how the results of how the quantity of pp affect the denoising power of the proposed model, while keeping all other parameters constant. The results of the changes of the denoising power are included in Fig. 4 (homophilic graph) and 5 (heterophilic graph).

Refer to caption
Figure 4: Changes of the denoising power by the quantity of pp via homophilic graph.
Refer to caption
Figure 5: Changes of the denoising power by the quantity of pp via heterophilic graph.

Based on the observations from the results, it appears that the performance of different values of pp varies under different noise rates. For the homophilic graph, the performance of pL-UFG21.5 and pL-UFG22.0 seems to be relatively stable across different noise rates, while the performance of other values of pp fluctuates more. For the heterophilic graph, the performance of pL-UFG12.5 and pL-UFG22.5 appears to be relatively stable across different noise rates, while the performance of other values of pp fluctuates more.

We also note that these observations on the fluctuations of the results are expected, as we have shown that the denoising process of our proposed model can be presented as:

𝐅=argmin𝐅(r,)𝒵𝐅(r,)pp+μ𝐅𝒲Tdiag(θ)𝒲𝐗F2.\mathbf{F}\!=\!\operatorname*{arg\,min}_{\mathbf{F}}\!\sum_{(r,\ell)\in\mathcal{Z}}\|\nabla\mathbf{F}_{(r,\ell)}\|_{p}^{p}+\!\mu\|\mathbf{F}\!-\!\mathcal{W}^{T}\!\textrm{diag}(\!\theta\!)\mathcal{W}\mathbf{X}\|^{2}_{F}. (34)

where 𝒵={(r,):r=1,,R,=1,,J}{(0,J)}\mathcal{Z}=\{(r,\ell):r=1,...,R,\ell=1,...,J\}\cup\{(0,J)\} is the set of indices of all framelet decomposed frequency domains. It is not hard to verify that once a larger quantity of pp is assigned, the penalty on the node feature difference (𝐅\nabla\mathbf{F}) becomes to be greater. Therefore, a stronger denoising power is induced. In terms of different graph types, when the input graph is heterophily, in most of the cases, the connected nodes tend to have distinct features, after assigning noise to those features, if the feature difference becomes larger, a higher quantity of pp is preferred. In the meanwhile, adding noise for the heterophilic graph could also make the feature difference becomes smaller, in this case a large quantity of pp may not be appropriate, this explains why most of lines in Fig. 5 are with large fluctuations. Similar reasoning can be applied to the case of the denoising performance of our model for homophilic graphs, we omit it here.

5.5 Regarding to Ablation Study

It is worth noting that one of the advantages of assigning an adjustable pp-Laplacian-based regularizer is on the convenience of conducting the ablation study. As the key principle of ablation study is to test whether a new component (in our case the regularizer) in a proposed method can always add advantages over baseline model counterparts regardless of the newly involved parameters. This suggests that the ablation study of our method are naturally conducted to compare our models with the regularizers with the underlying networks via both the node classification tasks and the model denoising power test. It is not hard to observed that in most of cases, regardless of the change of pp, our proposed model kept outperforming baseline models. However, as our proposed model was driven from an regularization framework, another potential parameter that we note could affect the model performance is the quantity of μ\mu. Based on Eq. (22), an increase of the quantity of μ\mu leads to a increase of model’s convexity, as a consequence, could guide model more closed to the global optima when p2p\neq 2. Another observation to μ\mu is we found that a smaller value of μ\mu together with a relatively bigger value of pp are more suitable for homophily/heterophilic graphs, this implies that μ\mu seems to have an opposite effect on model’s adaption power compared to pp. However, to quantify the effect of μ\mu via a suitable measure (i.e., potentially Dirichlet energy (Han et al, 2022)) is out of the scope of this paper, we leave it to future discussions.

Refer to caption
Figure 6: Denoising power on homophilic graph (Cora)
Refer to caption
Figure 7: Denoising power on heterophilic graph (Chameleon)

6 Conclusion and Further work

This paper showcases the application of pp-Laplacian Graph Neural Networks (GNN) in conjunction with framelet graphs. The incorporation of pp-Laplacian regularization brings remarkable flexibility, enabling effective adaptation to both homophilic undirected and heterophilic directed graphs, thereby significantly enhancing the predictive capabilities of the model. To validate the efficacy of our proposed model, we conducted extensive numerical experiments on diverse graph datasets, demonstrating its superiority over baseline methods. Notably, our model exhibits robustness against noise perturbation, even under high noise levels. These promising findings highlight the tremendous potential of our approach and warrant further investigations in several directions. For instance, delving into the intriguing mathematical properties of our model, including weak and strong convergence, analyzing the behavior of (Dirichlet) energy, and establishing connections with non-linear diffusion equations, opens up fascinating avenues for future research.

References

  • Ahmedt-Aristizabal et al (2021) Ahmedt-Aristizabal D, Armin MA, Denman S, Fookes C, Petersson L (2021) Graph-based deep learning for medical diagnosis and analysis: past, present and future. Sensors 21(14):4758
  • Assis et al (2021) Assis AD, Torres LC, Araújo LR, Hanriot VM, Braga AP (2021) Neural networks regularization with graph-based local resampling. IEEE Access 9:50727–50737
  • Belkin et al (2004) Belkin M, Matveeva I, Niyogi P (2004) Tikhonov regularization and semi-supervised learning on large graphs. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, vol 3, pp iii–1000
  • Bozorgnia et al (2019) Bozorgnia F, Mohammadi SA, Vejchodskỳ T (2019) The first eigenvalue and eigenfunction of a nonlinear elliptic system. Applied Numerical Mathematics 145:159–174
  • Bronstein et al (2017) Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P (2017) Geometric deep learning: going beyond Euclidean data. IEEE Signal Processing Magazine 34(4):18–42
  • Bruna et al (2014) Bruna J, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and locally connected networks on graphs. In: Proceedings of International Conference on Learning Representations
  • Burda et al (2001) Burda Z, Correia J, Krzywicki A (2001) Statistical ensemble of scale-free random graphs. Physical Review E 64(4):046118
  • Candès et al (2011) Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? Journal of the ACM (JACM) 58(3):1–37
  • Chen et al (2022a) Chen J, Wang Y, Bodnar C, Liò P, Wang YG (2022a) Dirichlet energy enhancement of graph neural networks by framelet augmentation. https://yuguangwanggithubio/papers/EEConvpdf
  • Chen et al (2022b) Chen Q, Wang Y, Wang Y, Yang J, Lin Z (2022b) Optimization-induced graph implicit nonlinear diffusion. In: Proceedings of the 39th International Conference on Machine Learning
  • Cheng and Wang (2020) Cheng T, Wang B (2020) Graph and total variation regularized low-rank representation for hyperspectral anomaly detection. IEEE Transactions on Geoscience and Remote Sensing 58(1):391–406, DOI 10.1109/TGRS.2019.2936609
  • Chien et al (2021) Chien E, Peng J, Li P, Milenkovic O (2021) Adaptive universal generalized pagerank graph neural network. In: Proceedings of International Conference on Learning Representations
  • Ciotti et al (2016) Ciotti V, Bonaventura M, Nicosia V, Panzarasa P, Latora V (2016) Homophily and missing links in citation networks. EPJ Data Science 5:1–14
  • Defferrard et al (2016) Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Advances in Neural Information Processing Systems 29
  • Dong (2017) Dong B (2017) Sparse representation on graphs by tight wavelet frames and applications. Applied and Computational Harmonic Analysis 42(3):452–479, DOI 10.1016/j.acha.2015.09.005
  • Fan et al (2019) Fan W, Ma Y, Li Q, He Y, Zhao E, Tang J, Yin D (2019) Graph neural networks for social recommendation. In: Proceedings of WWW, pp 417–426
  • Frecon et al (2022) Frecon J, Gasso G, Pontil M, Salzo S (2022) Bregman neural networks. In: International Conference on Machine Learning, PMLR, pp 6779–6792
  • Fu et al (2022) Fu G, Zhao P, Bian Y (2022) pp-Laplacian based graph neural networks. In: Proceedings of the 39th International Conference on Machine Learning, PMLR, vol 162, pp 6878–6917
  • Gasteiger et al (2019) Gasteiger J, Bojchevski A, Günnemann S (2019) Predict then propagate: Graph neural networks meet personalized pagerank. In: Proceedings of International Conference on Learning Representations
  • Gilmer et al (2017) Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. In: International Conference on Machine Learning, PMLR, pp 1263–1272
  • Gu et al (2020) Gu F, Chang H, Zhu W, Sojoudi S, El Ghaoui L (2020) Implicit graph neural networks. In: Advances in Neural Information Processing Systems
  • Hammond et al (2011) Hammond DK, Vandergheynst P, Gribonval R (2011) Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis (2):129–150
  • Han et al (2022) Han A, Shi D, Shao Z, Gao J (2022) Generalized energy and gradient flow via graph framelets. arXiv:221004124
  • He et al (2021) He M, Wei Z, Xu H, et al (2021) Bernnet: Learning arbitrary graph spectral filters via bernstein approximation. Advances in Neural Information Processing Systems 34:14239–14251
  • He and Kempe (2015) He X, Kempe D (2015) Stability of influence maximization. In: Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining, p 1256–1265
  • Ho et al (2020) Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33:6840–6851
  • Hu et al (2018) Hu X, Sun Y, Gao J, Hu Y, Yin B (2018) Locality preserving projection based on F-norm. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pp 1330–1337
  • Kipf and Welling (2017) Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of International Conference on Learning Representations
  • Leskovec and Faloutsos (2006) Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining, p 631–636, DOI 10.1145/1150402.1150479
  • Li et al (2022) Li J, Lin S, Blanchet J, Nguyen VA (2022) Tikhonov regularization is optimal transport robust under martingale constraints. 2210.01413
  • Lin and Gao (2023) Lin L, Gao J (2023) A magnetic framelet-based convolutional neural network for directed graphs. In: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1–5, DOI 10.1109/ICASSP49357.2023.10097148
  • Liu et al (2022) Liu A, Li B, Li T, Zhou P, Wang R (2022) An-gcn: An anonymous graph convolutional network against edge-perturbing attacks. IEEE Transactions on Neural Networks and Learning Systems pp 1–15, DOI 10.1109/TNNLS.2022.3172296
  • Liu et al (2010) Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by low-rank representation. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 663–670
  • Liu et al (2021a) Liu J, Kawaguchi K, Hooi B, Wang Y, Xiao X (2021a) Eignn: Efficient infinite-depth graph neural networks. In: Advances in Neural Information Processing Systems
  • Liu et al (2021b) Liu X, Jin W, Ma Y, Li Y, Liu H, Wang Y, Yan M, Tang J (2021b) Elastic graph neural networks. In: International Conference on Machine Learning, PMLR, pp 6837–6849
  • Luo et al (2010) Luo D, Huang H, Ding CHQ, Nie F (2010) On the eigenvectors of pp-Laplacian. Machine Learning 81(1):37–51
  • Ly (2005) Ly I (2005) The first eigenvalue for the p-laplacian operator. JIPAM J Inequal Pure Appl Math 6:91
  • Ma et al (2021) Ma Y, Liu X, Zhao T, Liu Y, Tang J, Shah N (2021) A unified view on graph neural networks as graph signal denoising. URL https://openreview.net/forum?id=MD3D5UbTcb1
  • Manessi et al (2020) Manessi F, Rozza A, Manzo M (2020) Dynamic graph convolutional networks. Pattern Recognition 97:107000
  • Oka and Yamada (2023) Oka T, Yamada T (2023) Topology optimization method with nonlinear diffusion. Computer Methods in Applied Mechanics and Engineering 408:115940, DOI https://doi.org/10.1016/j.cma.2023.115940, URL https://www.sciencedirect.com/science/article/pii/S0045782523000634
  • Ortega et al (2018) Ortega A, Frossard P, Kovacević J, Moura JMF, Vandergheynst P (2018) Graph signal processing: Overview, challenges, and applications. Proceedings of the IEEE 106(5):808–828
  • Pandit et al (2007) Pandit S, Chau DH, Wang S, Faloutsos C (2007) Netprobe: a fast and scalable system for fraud detection in online auction networks. In: Proceedings of the 16th international conference on World Wide Web, pp 201–210
  • Park et al (2021) Park J, Choo J, Park J (2021) Convergent graph solvers. In: Proceedings of International Conference on Learning Representations
  • Scarselli et al (2009) Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Transactions on Neural Networks 20(1):61–80
  • Sweldens (1998) Sweldens W (1998) The lifting scheme: A construction of second generation wavelets. SIAM Journal on Mathematical Analysis 29(2):511–546, DOI 10.1137/S0036141095289051, https://doi.org/10.1137/S0036141095289051
  • Tenenbaum et al (2000) Tenenbaum JB, Silva Vd, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. science 290(5500):2319–2323
  • Veličković et al (2018) Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2018) Graph attention networks. In: Proceedings of International Conference on Learning Representations
  • Wen et al (2023) Wen H, Lin Y, Xia Y, Wan H, Zimmermann R, Liang Y (2023) Diffstg: Probabilistic spatio-temporal graph forecasting with denoising diffusion models. arXiv preprint arXiv:230113629
  • Wu et al (2019) Wu F, Zhang T, Souza AHd, Fifty C, Yu T, Weinberger KQ (2019) Simplifying graph convolutional networks. In: Proceedings of International Conference on Machine Learning
  • Wu et al (2021) Wu J, Sun J, Sun H, Sun G (2021) Performance analysis of graph neural network frameworks. In: Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, pp 118–127, DOI 10.1109/ISPASS51385.2021.00029
  • Wu et al (2022) Wu S, Sun F, Zhang W, Xie X, Cui B (2022) Graph neural networks in recommender systems: a survey. ACM Computing Surveys to appear
  • Wu et al (2020) Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS (2020) A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems 32(1):4–24
  • Xu et al (2018a) Xu K, Hu W, Leskovec J, Jegelka S (2018a) How powerful are graph neural networks? In: Proceedings of International Conference on Learning Representations
  • Xu et al (2018b) Xu K, Li C, Tian Y, Sonobe T, Kawarabayashi Ki, Jegelka S (2018b) Representation learning on graphs with jumping knowledge networks. In: Proceedings of International Conference on Machine Learning
  • Yang et al (2022) Yang M, Zheng X, Yin J, Gao J (2022) Quasi-Framelets: Another improvement to graph neural networks. arXiv:220104728 2201.04728
  • Zhang et al (2003) Zhang Q, Gu Y, Mateusz M, Baktashmotlagh M, Eriksson A (2003) Implicitly defined layers in neural networks. arxiv200301822
  • Zhang et al (2022) Zhang Z, Cui P, Zhu W (2022) Deep learning on graphs: A survey. IEEE Transactions on Knowledge and Data Engineering 34(1):249–270, DOI 10.1109/TKDE.2020.2981333
  • Zheng et al (2021a) Zheng X, Liu Y, Pan S, Zhang M, Jin D, Yu PS (2021a) Graph neural networks for graphs with heterophily: A survey. In: Proceedings of the AAAI Conference on Artificial Intelligence
  • Zheng et al (2021b) Zheng X, Zhou B, Gao J, Wang YG, Lio P, Li M, Montufar G (2021b) How framelets enhance graph neural networks. In: Proceedings of International Conference on Machine Learning
  • Zheng et al (2022) Zheng X, Zhou B, Wang YG, Zhuang X (2022) Decimated framelet system on graphs and fast g-framelet transforms. Journal of Machine Learning Research 23:18–1
  • Zhou et al (2021a) Zhou B, Li R, Zheng X, Wang YG, Gao J (2021a) Graph denoising with framelet regularizer. arXiv:211103264
  • Zhou et al (2021b) Zhou B, Liu X, Liu Y, Huang Y, Lio P, Wang Y (2021b) Spectral transform forms scalable transformer. arXiv:211107602
  • Zhou and Schölkopf (2005) Zhou D, Schölkopf B (2005) Regularization on discrete spaces. In: The 27th DAGM Symposium,, August 31 - September 2, 2005, Vienna, Austria, vol 3663, pp 361–368
  • Zhou et al (2020) Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: A review of methods and applications. AI Open 1:57–81
  • Zhu et al (2020a) Zhu J, Yan Y, Zhao L, Heimann M, Akoglu L, Koutra D (2020a) Beyond homophily in graph neural networks: Current limitations and effective designs. In: Advances in Neural Information Processing Systems, vol 33, pp 7793–7804
  • Zhu et al (2021) Zhu M, Wang X, Shi C, Ji H, Cui P (2021) Interpreting and unifying graph neural networks with an optimization framework. In: Proceedings of WWW
  • Zhu et al (2020b) Zhu S, Pan S, Zhou C, Wu J, Cao Y, Wang B (2020b) Graph geometry interaction learning. Advances in Neural Information Processing Systems 33:7548–7558
  • Zou et al (2022) Zou C, Han A, Lin L, Gao J (2022) A simple yet effective SVD-GCN for directed graphs. arxiv220509335