This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Local Vertex Colouring Graph Neural Networks

Shouheng Li    Dongwoo Kim    Qing Wang
Abstract

In recent years, there has been a significant amount of research focused on expanding the expressivity of Graph Neural Networks (GNNs) beyond the Weisfeiler-Lehman (1-WL) framework. While many of these studies have yielded advancements in expressivity, they have frequently come at the expense of decreased efficiency or have been restricted to specific types of graphs. In this study, we investigate the expressivity of GNNs from the perspective of graph search. Specifically, we propose a new vertex colouring scheme and demonstrate that classical search algorithms can efficiently compute graph representations that extend beyond the 1-WL. We show the colouring scheme inherits useful properties from graph search that can help solve problems like graph biconnectivity. Furthermore, we show that under certain conditions, the expressivity of GNNs increases hierarchically with the radius of the search neighbourhood. To further investigate the proposed scheme, we develop a new type of GNN based on two search strategies, breadth-first search and depth-first search, highlighting the graph properties they can capture on top of 1-WL. Our code is available at https://github.com/seanli3/lvc.

Machine Learning, ICML
\declaretheorem

[name=Theorem,numberwithin=section]thm \declaretheorem[name=Lemma,numberwithin=section]lemma \declaretheorem[name=Proposition,numberwithin=section]prop \declaretheorem[name=Corollary,numberwithin=section]corollary \declaretheorem[name=Definition,numberwithin=section]definition


1 Introduction

Graph neural networks (GNNs) have emerged as the de-facto method for representation learning on graphs. One popular architecture of GNNs is the message-passing neural networks (MPNNs) which propagate information between vertices along edges (Gilmer et al., 2017; Kipf & Welling, 2017; Velickovic et al., 2017). In Xu et al. (2019), it is shown that the design of MPNNs aligns with the Weisfeiler-Lehman (1-WL) test, a classical algorithm for testing graph isomorphism. Therefore, the expressivity of MPNNs is upper-bounded by the 1-WL test. Intuitively, if two vertices have the same computational/message-passing graph in an MPNN, they are indistinguishable.

Recent studies attempted to increase the expressivity of GNNs beyond the 1-WL. One direction is to extend GNNs to match higher-order WL tests (Morris et al., 2019, 2020; Maron et al., 2019; Geerts & Reutter, 2022). While these methods offer improved expressivity, they come at the cost of decreased efficiency, as higher-order WL tests are known to be computationally expensive. Another line of research focuses on incorporating graph substructures into feature aggregation (Bodnar et al., 2021b, a). However, these approaches often rely on task-specific, hand-picked substructures. One other strategy is to enhance vertex or edge features with additional distance information relative to target vertices (You et al., 2019; Li et al., 2020). Despite the efforts, Zhang et al. (2023) have shown that MPNNs cannot solve the biconnectivity problem, which can be efficiently solved using the depth-first search algorithm (Tarjan, 1974).

In light of the aforementioned understanding, one may question how the design of GNNs can surpass the limitations of 1-WL to address issues that cannot be resolved by MPNNs. In this work, we systematically study an alternative approach to MPNN, which propagates information along graph search trees. This paper makes the following contributions:

  • We design a novel colouring scheme, called local vertex colouring (LVC), based on breath-first and depth-first search algorithms, that goes beyond 1-WL.

  • We show that LVC can learn representations to distinguish several graph properties such as biconnectivity, cycles, cut vertices and edges, and ego short-path graphs (ESPGs) that 1-WL and MPNNs cannot.

  • We analyse the expressivity of LVC in terms of breadth-first colouring and depth-first colouring, and provide systematical comparisons with 1-WL and 3-WL.

  • We further design a graph search-guided GNN architecture, Search-guided Graph Neural Network (SGN), which inherits the properties of LVC.

2 Related Work

Graph isomorphism and colour refinement.

The graph isomorphism problem concerns whether two graphs are identical topologically, e.g. for two graphs GG and HH, whether there is a bijection f:VGVHf:V_{G}\rightarrow V_{H} such that any two vertices uu and vv are adjacent in GG if and only if f(u)f(u) and f(v)f(v) are adjacent in HH. If such a bijection exists, we say GG and HH are isomorphic (GHG\simeq H).

Let CC be a set of colours. A vertex colouring refinement function λ:VC\lambda:V\rightarrow C assigns each vertex vVv\in V with a colour λ(v)C\lambda(v)\in C. This assignment is performed iteratively until the vertex colours no longer change. Colour refinement can be used to test graph isomorphism by comparing the multisets of vertex colours {{λ(v):vVG}}\{\!\!\{\lambda(v):v\in V_{G}\}\!\!\} and {{λ(u):uVH}}\{\!\!\{\lambda(u):u\in V_{H}\}\!\!\}, given that λ()\lambda(\cdot) is invariant under isomorphic permutations. A classic example of such tests is the Weisfeiler-Lehman (WL) test  (Weisfeiler & Leman, 1968), which assigns a colour to a vertex based on the colours of its neighbours. Cai et al. (1992) extends 1-WL to compute a colour on each k-tuple of vertices; this extension is known as the k-dimensional Folklore Weisfeiler-Lehman algorithms (k-FWL). We may apply a colouring refinement to all vertices in GG iteratively until vertex colours are stabilised.

Let (G,λi)(G,\lambda^{i}) denote a colouring on vertices of GG after applying a colour refinement function ii times, i.e., after the ii-th iteration, and P(λi)P(\lambda^{i}) a partition of the vertex set induced by the colouring (G,λi)(G,\lambda^{i}). For two vertex partitions P(λi)P(\lambda^{i}) and P(λj)P(\lambda^{j}) on GG, if every element of P(λi)P(\lambda^{i}) is a (not necessarily proper) subset of an element of P(λj)P(\lambda^{j}), we say P(λi)P(\lambda^{i}) refines P(λj)P(\lambda^{j}). When P(λj)P(λj+1)P(\lambda^{j})\equiv P(\lambda^{j+1}), we call λj\lambda^{j} a stable colouring and P(λj)P(\lambda^{j}) a stable partition of GG.

GNNs beyond 1-WL.

MPNN is a widely adopted graph representation learning approach in many applications. However, there are a few caveats. Firstly, as shown by Xu et al. (2019), the expressive power of MPNN is upper-bounded by 1-WL, which is known to have limited power in distinguishing isomorphic graphs. Secondly, to increase the receptive field, MPNN needs to be stacked deeply, which causes over-smoothing (Zhao & Akoglu, 2019; Chen et al., 2020) and over-squashing (Topping et al., 2022) that degrade performance. Recent research aims to design more powerful GNNs by incorporating higher-order neighbourhoods (Maron et al., 2019; Morris et al., 2019). However, these methods incur high computational costs and thus are not feasible for large datasets. Other methods alter the MPNN framework or introduce extra heuristics to improve expressivity (Bouritsas et al., 2022; Bodnar et al., 2021b, a; Bevilacqua et al., 2022; Wijesinghe & Wang, 2022). However, while these methods are shown to be more powerful than 1-WL, it is still unclear what additional properties they can capture beyond 1-WL.

GNNs in learning graph algorithms.

Velickovic et al. (2020) show that MPNN can imitate classical graph algorithms to learn shortest paths (Bellman-Ford algorithm) and minimum spanning trees (Prim’s algorithm). Georgiev & Lió (2020) show that MPNNs can execute the more complex Ford-Fulkerson algorithm, which consists of several composable subroutines, for finding maximum flow. Loukas (2020) further shows that certain GNN can solve graph problems like cycle detection and minimum cut, but only when its depth and width reach a certain level. Xu et al. (2020) show that the ability of MPNN to imitate complex graph algorithms is restrained by the alignment between its computation structure and the algorithmic structure of the relevant reasoning process. One such example, as shown by Zhang et al. (2023), is that MPNN cannot solve the biconnectivity problem, despite that this problem has an efficient algorithmic solution linear to graph size.

3 Preliminaries

Let {}\{\cdot\} denote sets and {{}}\{\!\!\{\cdot\}\!\!\} multisets. We consider undirected simple graphs G=(V,E)G=(V,E) where VV is the vertex set and EE is the edge set. We use ||\lvert\cdot\rvert to denote the cardinality of a set/multiset/sequence, evu={v,u}e_{vu}=\{v,u\} an undirected edge connecting vertices vv and uu, and evu=(v,u)\vec{e}_{vu}=(v,u) a directed edge that starts from vertex vv and ends at vertex uu. Thus, evu=euve_{vu}=e_{uv} and evueuv\vec{e}_{vu}\neq\vec{e}_{uv}. We use d(v,u)d(v,u) to denote the shortest-path distance between vertices vv and uu. A δ\delta-neighborhood of a vertex vv is a set of vertices within the distance δ\delta from vv, i.e. Nδ(v)={uV:1d(v,u)δ}N_{\delta}(v)=\{u\in V:1\leq d(v,u)\leq\delta\}. A path 𝒫w0wk\mathcal{P}_{w_{0}w_{k}} of length kk in GG, called k-path, is a sequence (w0,w1,,wk)(w_{0},w_{1},...,w_{k}) of distinct vertices such that (wi1,wi)E(w_{i-1},w_{i})\in E for i=1,2,,ki=1,2,...,k.

Graph searching.

Graph traversal, or graph searching, visits each vertex in a graph. Breadth-First Search (BFS) and Depth-First Search (DFS) are the two most widely used graph search algorithms, which differ in the order of visiting vertices. Both methods start at a vertex vv. BFS first visits the direct neighbours in N1(v)N_{1}(v) of vv and then the neighbours in N2(v)N_{2}(v) that have not been visited, etc. The idea is to process all vertices in NiN_{i} of distance (or level) ii from vv before processing vertices at level i+1i+1 or greater. The process is repeated until all vertices reachable from vv have been visited. In DFS we instead go as “deep” as possible from vertex vv until no new vertices can be visited. Then we backtrack and try other neighbours that were missed from the farthest in the search paths until all vertices are visited. The visited vertices and the edges along the search paths form a BFS/DFS search tree. For example, the solid lines in Figures 1(b) and 1(c) represent two different search trees for the graph in Figure 1(a). Given a graph, generally, there are many ways to construct different search trees by selecting different starting points and different edges to visit vertices.

Visiting order subscripting.

Vertices being visited in a graph search form a linear order. We use subscripts to label this order based on their discovery/first-visited time (Cormen et al., 2022). Given a search tree, we say viv_{i} precedes vjv_{j}, denoted as vivjv_{i}\prec v_{j}, if viv_{i} is discovered before vjv_{j}. ii and jj are subscripts that indicate the relation: for all ii and jj, vivjv_{i}\prec v_{j} if and only if i<ji<j. Each vertex is discovered once, so a subscript ranges from 0 to N1N-1 in a graph GG. v0v_{0} is called the root. Figure 1 shows how the subscripting is used to indicate vertex visiting orders.

Tree and back edges.

For an undirected simple graph G=(V,E)G=(V,E), BFS/DFS categorise the edges in EE into tree edges and back edges.

Tree edges and back edges are both directed, i.e., (v,u)(u,v)(v,u)\neq(u,v). Let TvT_{v} denote a search tree rooted at vertex vv. A tree edge is an edge in the search tree TvT_{v}, starting from an early-visited vertex to a later-visited vertex. A back edge starts from a later-visited vertex to an early-visited vertex. A directed edge (vi,vj)(v_{i},v_{j}) is either a tree edge (if i<ji<j), or a back edge (if i>ji>j). For example, dashed lines in Figures 1(b) and 1(c) represent back edges in BFS and DFS search trees, respectively. In BFS, a back edge (vi,vj)(v_{i},v_{j}) connects vertices at the same or adjacent level, i.e. d(v0,vi)=d(v0,vj)d(v_{0},v_{i})=d(v_{0},v_{j}) or |d(v0,vi)d(v0,vj)|=1|d(v_{0},v_{i})-d(v_{0},v_{j})|=1 (Cormen et al., 2022). For this reason back edges in BFS are sometimes referred to as cross edges. In DFS, a back edge (vi,vj)(v_{i},v_{j}) connects a vertex viv_{i} and its non-parent ancestor vjv_{j}; therefore we always have vjviv_{j}\prec v_{i} (or i>ji>j).

We use EtreeTvE_{\text{tree}}^{T_{v}} and EbackTvE_{\text{back}}^{T_{v}} to denote the tree edge set and the back edge set with respect to a search tree TvT_{v} rooted at vv.

Refer to caption
(a)
Refer to caption
(b) BFS
Refer to caption
(c) DFS
Figure 1: Tree edges (green solid lines) and back edges (black dashed lines) classified by BFS and DFS. The subscripted labels v0v_{0}, …, v5v_{5} denote the visit sequence of each vertex, e.g. v1v_{1} is visited after v0v_{0}.

4 Local Vertex Colouring

In this section, we introduce the building blocks of a search-guided colouring scheme and demonstrate its expressive power in distinguishing non-isomorphic graphs. We also discuss how this search-guided colouring scheme can solve the biconnectivity and ego shortest-path graph problems that cannot be solved by MPNN.

4.1 Search-guided Vertex Colour Update

We first describe a way to update vertex colours guided by graph searching. As described in Section 3, graph searching w.r.t. a search tree TvT_{v} categorises edges into two sets: a tree edge set EtreeTvE_{\text{tree}}^{T_{v}} and a back edge set EbackTvE_{\text{back}}^{T_{v}}. Instead of propagating messages along all edges like MPNN, we design a scheme that propagates vertex information, i.e., vertex colours, along tree edges and back edges. Given a search tree TvT_{v}, we define the vertex colouring refinement function on each vertex uu as follows

λl+1(u):=ρ(λl(u),{{λvl+1(u):vNδ(u)}})\lambda^{l+1}(u):=\rho(\lambda^{l}(u),\{\!\!\{\lambda_{v}^{l+1}(u):v\in N_{\delta}(u)\}\!\!\}) (1)

where ρ()\rho(\cdot) is an injective function and λvl+1(u)\lambda_{v}^{l+1}(u) is the vertex colour computed, based on the search tree TvT_{v}, as

λvl+1(u):=ϕ(λl(u),ψ({{λvl(w):wη(u,EtreeTv,EbackTv)}}))\displaystyle\begin{split}&\lambda_{v}^{l+1}(u):=\\ &\phi\left(\lambda^{l}(u),\psi\left(\{\!\!\{\lambda_{v}^{l}(w):w\in\eta(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}})\}\!\!\}\right)\right)\end{split} (2)

where ϕ()\phi(\cdot) and ψ()\psi(\cdot) are injective functions, and ll is the number of the current iteration. Let (E)\mathbb{P}(E) and (V)\mathbb{P}(V) be the power sets of EE and VV, respectively. The function η:V×(E)×(E)(V)\eta:V\times\mathbb{P}(E)\times\mathbb{P}(E)\rightarrow\mathbb{P}(V) takes a vertex uVu\in V to be coloured, a tree edge set EtreeTv(E)E_{\text{tree}}^{T_{v}}\in\mathbb{P}(E), and a back edge set EbackTv(E)E_{\text{back}}^{T_{v}}\in\mathbb{P}(E) as input, and produces a vertex set in (V)\mathbb{P}(V).

At the first iteration, λv0(w)\lambda^{0}_{v}(w) and λ0(w)\lambda^{0}(w) are the initial colours of ww. There are two steps in the colouring scheme. The first step is search-guided colour propagation (Equation 2), where vertex colours are propagated along the search paths of each TvT_{v} based on η(u,EtreeTv,EbackTv)\eta(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}}). After this step, a vertex uu obtains a colour λvl+1(u)\lambda^{l+1}_{v}(u) w.r.t. each root vertex vNδ(u)v\in N_{\delta}(u). We obtain in total |Nδ(u)||N_{\delta}(u)| colours for uu. The second step is neighbourhood aggregation (Equation 1), where the |Nδ(u)||N_{\delta}(u)| colours obtained from the first step are aggregated and used to compute a new colour λl+1(u)\lambda^{l+1}(u) for uu. These two steps are repeated with the new vertex colours.

When Nδ(v)VN_{\delta}(v)\neq V, we call this vertex colouring scheme δ\delta-local vertex colouring (LVC-δ\delta). LVC-δ\delta is applied to colour vertices in GG iteratively until vertex colours are stabilised. We omit superscript and use λ\lambda to refer to a stable colouring.

Search order permutation.

Graph searching may encounter cases where a search algorithm needs to decide a priority between two or more unvisited vertices (a tie). In such cases, the search algorithm can visit any vertex in the tie. This yields different vertex visiting orders, we call this permutation in vertex visiting order search order permutation. For example, in Figure 1(b), the BFS rooted at vertex v0v_{0} faces a tie, where it can visit either v1v_{1} or v2v_{2} first since both v1v_{1} and v2v_{2} are adjacent to v0v_{0}. Figure 1(b) shows the search trajectory when visiting v1v_{1} first; however, if the BFS visits v2v_{2} first, then tree edges and back edges will be different.

Design considerations.

A key function that controls the colouring in Equation 2 is η\eta. For instance, we can make LVC-δ\delta identical to 1-WL, if we define η(u,EtreeTv,EbackTv)={w:(u,w)EtreeTv(w,u)EtreeTv(u,w)EbackTv(w,u)EbackTv}\eta(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}})=\{w:(u,w)\in E_{\text{tree}}^{T_{v}}\vee(w,u)\in E_{\text{tree}}^{T_{v}}\vee(u,w)\in E_{\text{back}}^{T_{v}}\vee(w,u)\in E_{\text{back}}^{T_{v}}\}. We name it 1-WL-equivalent LVC. In this definition, η\eta effectively yields all neighbouring vertices of uu, making it equivalent to 1-WL.

Since the design of η\eta is crucial, we hereby introduce two key points that should be considered when designing η\eta.

  • η\eta should be invariant to search order permutation, i.e. a change of vertex visiting order should not alter the output of η\eta. If η\eta is not invariant to search order permutation, the colouring scheme will not be permutation invariant. The 1-WL equivalent LVC example in the previous paragraph is invariant to search order permutation.

  • η\eta should inherit properties of a search algorithm that are informative about identifying graph structure. Graph searches like BFS and DFS are widely used in graph algorithms to capture structural properties such as cycles and biconnectivity. η\eta should be designed to incorporate such structural information in vertex colours.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 2: A graph and its vertex colours after the first BFC iteration. 2(a) shows the uncoloured graph, where the vertex colours obtained from 2(b) and 2(c) are shown next to each vertex. 2(b) and 2(c) show BFC with two different roots (marked with *), respectively, where each vertex is assigned a new colour by BFC. The colour map is shown in the bottom left corner. The subscripted labels v0v_{0}, …, v5v_{5} denote the visit sequence of each vertex, e.g. v1v_{1} is visited after v0v_{0}. For brevity, we show BFC with only two roots.

4.2 BFS-guided Colouring

BFS is perhaps the most widely used graph search algorithm and its applications include finding shortest paths, minimum spanning tree, cycle detection, and bipartite graph test. We design η\eta for BFS, denoted as ηbfc\eta_{\text{bfc}}, as

ηbfc(u,EtreeTv,EbackTv)=\displaystyle\eta_{\text{bfc}}(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}})= (3)
{o:(o,u)EtreeTv((o,u)EbackTvd(v,o)d(v,u))}\displaystyle\left\{o:(o,u)\in E_{\text{tree}}^{T_{v}}\vee\left((o,u)\in E_{\text{back}}^{T_{v}}\wedge d(v,o)\neq d(v,u)\right)\right\}

The part {o:(o,u)EtreeTv}\{o:(o,u)\in E_{\text{tree}}^{T_{v}}\} preserves vertices that lead to vertex uu via tree edges. {o:(o,u)EbackTv}\{o:(o,u)\in E_{\text{back}}^{T_{v}}\} preserves vertices that lead to vertex uu via back edges. The condition d(v,o)d(v,u)d(v,o)\neq d(v,u) ensures only back edges connecting vertices at different levels are included. The vertex colouring scheme using ηbfc\eta_{\text{bfc}} is called breadth-first colouring (BFC).

Figure 2 shows BFC rooted at two different vertices, where the grey dashed lines indicate back edges that are excluded by BFC. In Figure 2(b), when u=v4u=v_{4}, ηbfc\eta_{\text{bfc}} returns v1v_{1} and v2v_{2}, but excludes v3v_{3} because it is at the same level as v4v_{4}.

An important property of BFS is that tree edges form the shortest paths between a root to other vertices, e.g., in Figure 2(b) (v0,v1)(v_{0},v_{1}) and (v1,v4)(v_{1},v_{4}) form the shortest path (v0,v1,v4)(v_{0},v_{1},v_{4}) between v0v_{0} and v4v_{4}. There are two categories of back edges in a BFS tree: the ones connecting vertices across two adjacent levels and the ones connecting vertices at the same level (Cormen et al., 2022). The first category of back edges forms alternative shortest paths, e.g., (v0,v2)(v_{0},v_{2}) and (v2,v4)(v_{2},v_{4}) form another shortest path (v0,v2,v4)(v_{0},v_{2},v_{4}) between v0v_{0} and v4v_{4}. The second category of back edges does not participate in any shortest path. ηbfc\eta_{\text{bfc}} only preserves the first category of back edges and excludes the second category by the condition d(v,o)d(v,u)d(v,o)\neq d(v,u). All shortest paths from a root to a vertex, e.g., (v0,v1,v4)(v_{0},v_{1},v_{4}) and (v0,v2,v4)(v_{0},v_{2},v_{4}), form an induced shortest path graph (SPG) (Wang et al., 2021), which is permutation invariant. For example, if we swap the search order between v1v_{1} and v2v_{2} in Figure 2(b), (v1,v4)(v_{1},v_{4}) becomes a back edge and (v2,v4)(v_{2},v_{4}) becomes a tree edge, but ηbfc\eta_{\text{bfc}} returns the same vertex set. Hence, BFC defined by Equations 1, 2, and 3 is also permutation invariant.

BFC is referred as BFC-δ\delta if the search range is limited to a δ\delta-hop neighbourhood for each root vertex.

Distinguishing shortest-path graphs.

Velickovic et al. (2020) show that MPNN can imitate classical graph algorithms to learn shortest paths (Bellman-Ford algorithm) and minimum spanning trees (Prim’s algorithm). Xu et al. (2020) further show that MPNN is theoretically suitable for learning tasks that are solvable using dynamic programming. Since the base form of BFC (i.e. BFC-1) aligns with MPNN (will be discussed later), BFC also inherits these properties. However, we are more interested in tasks which BFC can do but MPNN cannot. We show that one of such tasks is to distinguish ego shortest-path graphs.

{definition}

For two vertices v,uVGv,u\in V_{G}, a shortest-path graph SPG(v,u)SPG(v,u) is a subgraph of GG, where SPG(v,u)SPG(v,u) contains all and only vertices and edges occurring in the shortest paths between uu and vv.

{lemma}

[] Let (u,v)(u,v) and (u,v)(u^{\prime},v^{\prime}) be two pairs of vertices. Then SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) if and only if one of the following conditions hold under BFC: (1) λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}) and λu(v)=λu(v)\lambda_{u}(v)=\lambda_{u^{\prime}}(v^{\prime}); (2) λv(u)=λu(v)\lambda_{v}(u)=\lambda_{u^{\prime}}(v^{\prime}) and λu(v)=λv(u)\lambda_{u}(v)=\lambda_{v^{\prime}}(u^{\prime}).

{definition}

Given a vertex vVGv\in V_{G} and a fixed δ1\delta\geq 1, an ego shortest-path graph (ESPG) Sv=(VSv,ESv)S_{v}=(V_{S_{v}},E_{S_{v}}) is a subgraph of GG, where VSv=Nδ(v)V_{S_{v}}=N_{\delta}(v) and ESE_{S} is the set of all edges that form all shortest paths between vv and uNδ(v)u\in N_{\delta}(v).

Refer to caption
Figure 3: A pair of non-isomorphic three-regular graphs. Pink edges form ESPGs (δ=2\delta=2) for vertices vv and uu.
{lemma}

[] Let v,uVv,u\in V be any two vertices and λ(v)\lambda(v) and λ(u)\lambda(u) be the corresponding stable colours of vv and uu by running BFC. We have SvSuS_{v}\simeq S_{u} if and only if λ(v)=λ(u)\lambda(v)=\lambda(u), where SvS_{v} and SuS_{u} are the ESPGs of vv and uu, respectively.

{lemma}

[] MPNN cannot distinguish one or more pairs of graphs that have non-isomorphic ESPGs. Figure 3 shows a pair of graphs with non-isomorphic ESPGs that BFC is able to distinguish but MPNN cannot.

4.3 DFS-guided Colouring

DFS is also a fundamental graph search method for graph problems, including detecting cycles, topological sorting, and biconnectivity. Unlike BFS whose search range grows incrementally by hop, DFS explores vertices as far as possible along each branch before backtracking. Before introducing a DFS-guided vertex colouring, recalling that we must have vjviv_{j}\prec v_{i} (or i>ji>j) for any DFS back edge (vi,vj)(v_{i},v_{j}), we introduce two concepts used in defining η\eta for DFS.

{definition}

[Back edge crossover] Let TvT_{v} be a search tree rooted at vertex vVv\in V and EbackTvE_{\text{back}}^{T_{v}} be the set of back edges of TvT_{v}. A relation crossover \nmid on EbackTvE_{\text{back}}^{T_{v}} is defined as e1e2\vec{e}_{1}\nmid\vec{e}_{2}, where e1=(vi1,vj1)\vec{e}_{1}=(v_{i_{1}},v_{j_{1}}) and e2=(vi2,vj2)\vec{e}_{2}=(v_{i_{2}},v_{j_{2}}), if one of the following four conditions is satisfied:

  1. (1)

    j2<j1<i2<i1j_{2}<j_{1}<i_{2}<i_{1}

  2. (2)

    j1<j2<i1<i2j_{1}<j_{2}<i_{1}<i_{2}

  3. (3)

    j1=j2i1i2j_{1}=j_{2}\text{, }i_{1}\neq i_{2}

  4. (4)

    i1=i2j1j2i_{1}=i_{2}\text{, }j_{1}\neq j_{2}

It is easy to see that \nmid is symmetric. That is, e2e1\vec{e}_{2}\nmid\vec{e}_{1} if and only if e1e2\vec{e}_{1}\nmid\vec{e}_{2}. For instance, in Figure 4(b) we have (v2,v0)(v3,v1)(v_{2},v_{0})\nmid(v_{3},v_{1}). In Figure 1(c), we have (v3,v1)(v5,v2)(v_{3},v_{1})\nmid(v_{5},v_{2}) and (v5,v0)(v5,v2)(v_{5},v_{0})\nmid(v_{5},v_{2}).

{definition}

[Back edge cover] Let vVv\in V be a vertex, TvT_{v} be a search tree rooted at vertex vv, eEbackTv\vec{e}\in E_{\text{back}}^{T_{v}} be a back edge of TvT_{v}, and e=(vi,vj)\vec{e}=(v_{i},v_{j}). We say that vv is covered by e\vec{e}, denoted as vev\dashv\vec{e}, if and only if v𝒫vjviTvv\in\mathcal{P}^{T_{v}}_{v_{j}v_{i}}, where 𝒫vjviTv\mathcal{P}^{T_{v}}_{v_{j}v_{i}} is a path from vjv_{j} to viv_{i}, formed by tree edges of TvT_{v}. In Figure 1(c), we have v2(v3,v1)v_{2}\dashv(v_{3},v_{1}). In Figure 4(c), we have v1(v2,v0)v_{1}\dashv(v_{2},v_{0}).

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 4: An uneven barbell graph and vertex colours after the first DFC iteration. 4(a) shows an uncoloured graph, where the vertex colours obtained from 4(b) and 4(c) are shown next to each vertex. 4(b) and 4(c) show DFC with two different roots (marked with *), where each vertex is assigned a new colour by DFC. The colour map is shown on the left. The subscripted label v0v_{0}, …, v5v_{5} denote the visit sequence of each vertex, e.g. v1v_{1} is visited after v0v_{0}. For brevity, DFC is shown with only two roots.

Depth-first colouring (DFC).

Given a vertex uVu\in V, we define the set of back edges that cover vertex uu as

QuTv={e:ue,eEbackTv}Q_{u}^{T_{v}}=\{\vec{e}:{u\dashv\vec{e},\vec{e}\in E_{\text{back}}^{T_{v}}}\} (4)

We then find all back edges that are transitively crossovered by QuTvQ_{u}^{T_{v}}. We define DuTv,0=QuTvD_{u}^{T_{v},0}=Q_{u}^{T_{v}} for the first iteration. At the kk-th iteration, we define

ΔDuTv,k={e2:e1e2,e1DuTv,k1,e2EbackTv}\Delta D_{u}^{T_{v},k}=\{\vec{e}_{2}:\vec{e}_{1}\nmid\vec{e}_{2},\vec{e}_{1}\in D_{u}^{T_{v},k-1},\vec{e}_{2}\in E_{\text{back}}^{T_{v}}\}
DuTv,k:=ΔDuTv,kDuTv,k1D_{u}^{T_{v},k}:=\Delta D_{u}^{T_{v},k}\cup D_{u}^{T_{v},k-1}

We use DuTvD_{u}^{T_{v}} to denote the set of all back edges that are transitively crossovered by edges in QuTvQ_{u}^{T_{v}}, i.e., when the fixed point is reached. The set of vertices covered by at least one back edge in DuTvD_{u}^{T_{v}} is defined as

BuTv={o:oe,eDuTv,oV}.B_{u}^{T_{v}}=\{o:o\dashv\vec{e},\vec{e}\in D^{T_{v}}_{u},o\in V\}. (5)

We design η\eta for DFS, denoted ηdfc\eta_{\text{dfc}}, as

ηdfc(u,EtreeTv,EbackTv)={o:(o,u)EtreeTv}{o:oBuTv}\displaystyle\begin{split}\eta_{\text{dfc}}(u,E_{\text{tree}}^{T_{v}},&E_{\text{back}}^{T_{v}})=\\ &\{o:(o,u)\in E_{\text{tree}}^{T_{v}}\}\cup\{o:o\in B_{u}^{T_{v}}\}\end{split} (6)

The part {o:(o,u)EtreeTv}\{o:(o,u)\in E_{\text{tree}}^{T_{v}}\} preserves vertices that lead to uu via tree edges. {o:oBuTv}\{o:o\in B_{u}^{T_{v}}\} preserves vertices that are covered by the same back edges covering uu, or by back edges that transitively cover uu.

The vertex colouring scheme defined using Equations 1, 2, and 6 is called depth-first colouring (DFC). DFC is referred to as DFC-δ\delta if the search range is limited to a δ\delta-hop neighbourhood for each uVu\in V. Because ηdfc\eta_{\text{dfc}} is permutation invariant, it is easy to see that DFC-δ\delta is also permutation invariant. Figure 4 shows an example of colouring an uneven barbell graph using DFC. It can be seen that the vertices at two ends of the barbell share the same colour, while the vertices on the connecting path have different colours. {lemma}[] ηdfc\eta_{\text{dfc}} is invariant under search order permutation.

Distinguishing biconnectivity.

We hereby show that DFC is expressive for distinguishing graphs that exhibit the biconnectivity property. Graph biconnectivity is a well-studied topic in graph theory and often discussed in the context of network flow and planar graph isomorphism (Hopcroft & Tarjan, 1973). Zhang et al. (2023) first draw attention to biconnectivity in the context of GNN and show that most GNNs cannot learn biconnectivity.

A vertex vVv\in V is said to be a cut vertex (or articulation point) in GG if removing the vertex disconnects GG. Thus, the removal of a cut vertex increases the number of connected components in a graph (a connected component is an inducted subgraph of GG in which each pair of vertices is connected via a path). Similarly, an edge (v,u)E(v,u)\in E is a cut edge (or bridge) if removing (v,u)(v,u) increases the number of connected components. A graph is vertex-biconnected if it is connected and does not have any cut vertices. Similarly, A graph is edge-biconnected if it is connected and does not have any cut edges.

{lemma}

[] Let GG and HH be two graphs, and {{λ(u):uVG}}\{\!\!\{\lambda(u):u\in V_{G}\}\!\!\} and {{λ(u):uVH}}\{\!\!\{\lambda(u^{\prime}):u^{\prime}\in V_{H}\}\!\!\} be the corresponding multisets of stable vertex colours of GG and HH by running DFC. Then the following statements hold:

  • For any two vertices uVGu\in V_{G} and uVHu^{\prime}\in V_{H}, if λ(u)=λ(u)\lambda(u)=\lambda(u^{\prime}), then uu is a cut vertex if and only if uu^{\prime} is a cut vertex.

  • For any two edges (u1,u2)EG(u_{1},u_{2})\in E_{G} and (u1,u2)EH(u^{\prime}_{1},u^{\prime}_{2})\in E_{H}, if {{λ(u1),λ(u2)}}={{λ(u1),λ(u2)}}\{\!\!\{\lambda(u_{1}),\lambda(u_{2})\}\!\!\}=\{\!\!\{\lambda(u^{\prime}_{1}),\lambda(u^{\prime}_{2})\}\!\!\}, then (u1,u2)(u_{1},u_{2}) is a cut edge if and only if (u1,u2)(u^{\prime}_{1},u^{\prime}_{2}) is a cut edge.

  • If GG is vertex/edge-biconnected but HH is not, then {{λ(u):uVG}}{{λ(u):uVH}}\{\!\!\{\lambda(u):u\in V_{G}\}\!\!\}\neq\{\!\!\{\lambda(u^{\prime}):u^{\prime}\in V_{H}\}\!\!\}.

{corollary}

[] Let uVGu\in V_{G} and uVHu^{\prime}\in V_{H} be two vertices, and λ(u)=λ(u)\lambda(u)=\lambda(u^{\prime}). Then uu is in a cycle if and only if uu^{\prime} is in a cycle.

4.4 Expressivity Analysis

Comparison with 1-WL.

When δ=1\delta=1, it is easy to see that search trees in BFC contain only direct neighbours of the root and edges between the root and its neighbours, and there are no back edges. Thus, we have the lemma below. {lemma}[] BFC-1 is equivalent to 1-WL.

When δ=1\delta=1, search trees in DFC contain direct neighbours of the root, edges between the root and the neighbours, and edges between the neighbours. We have the following lemma. {lemma}[] DFC-1 is more expressive than 1-WL.

{corollary}

[] When δ>1\delta>1, BFC-δ\delta can distinguish one or more pairs of graphs that cannot be distinguished by 1-WL.

{corollary}

[] When δ1\delta\geq 1, DFC-δ\delta can distinguish one or more pairs of graphs that cannot be distinguished by 1-WL.

Taking a pair of graphs - one is two triangles and the other is one six-cycle - for example, these two graphs cannot be distinguished by 1-WL. However, they can be distinguished by BFC-2 and DFC-1 (Figure 5).

Refer to caption
Figure 5: (Left) A graph pair can be distinguished by BFC-2 and DFC-1 but not by BFC-1 and 1-WL. (Right) A graph pair can be distinguished by BFC-3 and DFC-2 but not by BFC-2, DFC-1 and 1-WL.

Comparison with 3-WL.

We can also show that, regardless of the choice of δ\delta, BFC is no more expressive than 3-WL. This leads to the following theorem.

{thm}

[] The expressive power of BFC-δ\delta is strictly upper bounded by 3-WL.

However, unlike BFS, DFC can distinguish graphs that cannot be distinguished by 3-WL. For example, DFC-1 can distinguish the strongly regular graph pair shown in Figure 6 which cannot be distinguished by 3-WL: the 4x4 Rook’s graph of 16 vertices (Laskar & Wallis, 1999) and the Shrikhande graph (Shrikhande, 1959). In the 4x4 Rook’s graph, each vertex’s 1-hop subgraph has a cut vertex, while there is no cut vertex in the Shrikhande graph. So according to Section 4.3, DFC-1 can distinguish these two graphs. On the other hand, there are also graphs that 3-WL can distinguish but DFC-1 cannot. For example, the right graph pair in Figure 5. Thus, we have the following theorem.

{thm}

[] The expressive powers of DFC-δ\delta and 3-WL are incomparable.

Refer to caption
Figure 6: The 4x4 Rook’s graph and the Shrikhande graph Arvind et al. (2020). Some edges are dashed for readability.

Expressivity hierarchy.

The following theorem states that there exists a hierarchy among the expressive powers of BFC-δ\delta when increasing δ\delta.

{thm}

[] BFC-δ\delta+1 is strictly more expressive than BFC-δ\delta in distinguishing non-isomorphic graphs.

Section 4.4 implies that BFC can be used as an alternative way, separating from the WL test hierarchy, to measure the expressivity of GNNs.

Nevertheless, DFC-δ\delta does not exhibit a hierarchy. Figure 7 depicts two pairs of non-isomorphic graph pairs: one pair can be distinguished by DFC-1 but not by DFC-2 while the other pair can be distinguished by DFC-2 but not by DFC-3. This leads to Theorem 4.4 below.

{thm}

[] DFC-δ\delta+1 is not necessarily more expressive than DFC-δ\delta in distinguishing non-isomorphic graphs.

Refer to caption
Figure 7: Non-isomorphic graph pairs. A pair can be distinguished by DFC-1 but not by DFC-2 (Left). A pair can be distinguished by DFC-2 but not by DFC-3 (Right).

5 Search Guided Graph Neural Network

We propose a graph neural network that adopts the same colouring mechanisms as BFC and DFC for feature propagation. Let hu(l)h_{u}^{(l)} be an embedding of vertex uu after the ll-th layer. Our search-guided graph neural network propagates embeddings over a graph via the following update rule:

hu(l+1)=MLP((1+ϵ(l+1))hu(l)vNδ(u)huv(l+1))h_{u}^{(l+1)}=\text{MLP}\left(\left(1+\epsilon^{(l+1)}\right)\cdot h_{u}^{(l)}\parallel\sum_{v\in N_{\delta}(u)}h_{u\leftarrow v}^{(l+1)}\right) (7)

where

huv(l+1)=(hu(l)+wηv(u)hwv(l))Wch_{u\leftarrow v}^{(l+1)}=\left(h_{u}^{(l)}+\sum_{w\in\eta_{v}(u)}h_{w\leftarrow v}^{(l)}\right)W_{c} (8)

where, hu(0)h^{(0)}_{u} is the input vertex feature of uu and \parallel is the concatenation operator. hwv(l)h^{(l)}_{w\leftarrow v} is the vertex representation of ww obtained from a graph search rooted at vv. When w=vw=v, we have hvv(l)=hv(l)h^{(l)}_{v\leftarrow v}=h^{(l)}_{v}. Here, WcF×HW_{c}\in\mathbb{R}^{F\times H} is a learnable parameter matrix, where FF and HH are the dimensions of the input feature and hidden layer, respectively. Note that we use the same WcW_{c} for all vertices in {uV:c=d(u,v)}\{u\in V:c=d(u,v)\}, which have the same shortest path distance c=d(u,v)c=d(u,v) to vertex vv. For brevity, we use ηv(u)\eta_{v}(u) to denote η(u,EtreeTv,EbackTv)\eta(u,E^{T_{v}}_{\text{tree}},E^{T_{v}}_{\text{back}}).

For the model to learn an injective transformation, we use a multilayer perceptron (MLP) and a learnable scalar parameter ϵ\epsilon adopted from Xu et al. (2019). The model architecture closely matches the one in Equations 1 and 2, except that ψ()\psi(\cdot) is replaced with a summation, ϕ()\phi(\cdot) is replaced by matrix multiplication, and MLP is used in place for ρ()\rho(\cdot).

We term the model as Search-guided Graph Neural Network (SGN). When ηbfc()\eta_{\text{bfc}}(\cdot) is used in place for ηv()\eta_{v}(\cdot), we call it SGN-BF. When ηdfc()\eta_{\text{dfc}}(\cdot) is used, we call it SGN-DF.

{thm}

[] SGN defined by the update rule in Equations 7 and 8 and with sufficiently many layers is as powerful as LVCLVC.

Table 1: Time and space complexity comparison.
MPNN Graphormer-GD SGN-BF SGN-DF
Time |V|+|E||V|+|E| |V|2|V|^{2} |V|+|V|dδ1|V|+|V|d^{\delta-1} |V|+|V|d2δ|V|+|V|d^{2\delta}
Space |V||V| |V||V| |V|dδ1|V|d^{\delta-1} |V|d2δ|V|d^{2\delta}

Choice of δ\delta.

As of Section 4.4, a larger δ\delta implies higher expressivity for SGN-BF in distinguishing isomorphic graphs. However, increasing δ\delta also increases the size of ηv(u)\eta_{v}(u), which makes it more expensive to compute as more aggregation operations are needed. Further, a larger δ\delta means a larger receptive field at each layer, which is more likely to cause over-squashing (Topping et al., 2022) leading to degrade performance on vertex-level tasks, e.g. vertex classification on heterophilic graphs.

For SGN-DF, increasing δ\delta does not necessarily increase expressivity (Section 4.4). However, having a larger δ\delta means that the model can detect larger biconnected components. Therefore, in practice, we combine vertex representations from δ=1\delta=1 with a larger δ\delta for SGN-DF. This guarantees that the model is more expressive than 1-WL and allows the additional δ\delta to be fine-tuned for each dataset and task.

Complexity.

ηv(u)\eta_{v}(u) in Equation 8 only needs to be computed once, along with graph searching. Therefore, the time complexity to compute ηv(u)\eta_{v}(u) is on par with the adopted graph searching. Both BFS and DFS have the worst-case complexity O(|V|+|E|)O(|V|+|E|) for searching the whole graph. Assuming dd is the average vertex degree, when the search is limited to Nδ(v)N_{\delta}(v), we have the complexity O(dδ(1+d))O({d}^{\delta}(1+{d})). We compute ηv(u)\eta_{v}(u) for each vVv\in V so in total it is O(|V|dδ(1+d))O(|V|{d}^{\delta}(1+{d})). Each layer in SGN (Equation 7) aggregates |Nδ(u)||N_{\delta}(u)| vertex representation vectors for each vertex. Each vertex representation further aggregates |ηv(u)||\eta_{v}(u)| vector in Equation 8, i.e., |Nδ(u)||ηv(u)||N_{\delta}(u)|\cdot|\eta_{v}(u)| operations, or, O(dδ|ηv(u)|)O({d}^{\delta}|\eta_{v}(u)|) in total. The magnitude of |ηv(u)||\eta_{v}(u)| can be very different for BFS and DFS, and varies from graph to graph. In the worst case of BFS where every vertex of hop δ+1\delta+1 is connected to every vertex of hop δ\delta; thus |ηv(u)|=1dδ1d|\eta_{v}(u)|=\frac{1-{d}^{\delta}}{1-{d}}. In the worst case of DFS, ηv(u)\eta_{v}(u) includes all vertices in Nδ(u)N_{\delta}(u) which has the size of dδ{d}^{\delta}. We compare the time and space complexity of our model in Table 1 with MPNN and Graphoormer-GD (Zhang et al., 2023).

6 Experiments

We evaluate SGN on two prediction tasks: vertex classification and graph classification. SGN is implemented using Pytorch and Pytorch Geometric (Fey & Lenssen, 2019). Experiments are run on a single NVIDIA RTX A6000 GPU with 48GB memory.

Table 2: Vertex classification results on homophilic datasets.
Computers Photo CiteSeer Cora Pubmed
MLP 82.9±0.4 84.7±0.3 76.6±0.9 77.0±1.0 85.9±0.2
GCN 83.3±0.3 88.3±0.7 79.9±0.7 87.1±1.0 86.7±0.3
GCN+JK\text{GCN+JK}^{\dagger} - - 74.5±1.8 85.8±0.9 88.4±0.5
GAT 83.3±0.4 90.9±0.7 80.5±0.7 88.0±0.8 87.0±0.2
APPNP 85.3±0.4 88.5±0.3 80.5±0.7 88.1±0.7 88.1±0.3
ChevNet 87.5±0.4 93.8±0.3 79.1±0.8 86.7±0.8 88.0±0.3
GPRGNN 86.9±0.3 93.9±0.3 80.1±0.8 88.6±0.7 88.5±0.3
BernNet 87.6±0.4 93.6±0.4 80.1±0.8 88.5 88.5±1.0
H2GCN\text{H}_{2}\text{GCN}^{\dagger} - - 77.1±1.6 87.8±1.4 89.6±0.3
SGN-BF 90.7 96.1±0.2 78.0±1.0 88.7±0.1 90.2±3.5
SGN-DF 90.9±0.4 95.2±0.8 79.7±0.7 89.5±0.6 89.5±0.6
Table 3: Vertex classification results on heterophilic datasets.
Wisconsin Cornell Texas Chameleon Squirrel
MLP 85.3±3.6 90.8±1.6 91.5±1.1 46.9±1.5 31.0±1.2
GCN 59.8±7.0 65.9±4.4 77.4±3.3 59.6±2.2 46.8±0.9
GCN+JK\text{GCN+JK}^{\dagger} 74.3±6.4 74.3±6.4 64.6±8.7 63.4±2.0 40.5±1.6
GAT 55.3±8.7 78.2±3.0 80.8±2.1 63.1±1.9 44.5±0.9
APPNP - 91.8±2.0 91.0±1.6 51.8±1.8 34.7±0.6
ChevNet 82.6±4.6 83.9±2.1 86.2±2.5 59.3±1.3 40.6±0.4
GPRGNN - 91.4±1.8 93.0±1.3 67.3±1.1 50.2±1.9
H2GCN\text{H}_{2}\text{GCN}^{\dagger} 86.7±4.7 82.2±4.8 84.5±6.8 59.4±2.0 37.9±2.0
SGN-BF 91.2±1.0 89.5±2.7 88.7±4.3 72.8±0.2 59.0±0.3
SGN-DF 84.1±3.6 83.2±5.8 86.8±5.2 56.6±3.0 47.0±1.5

6.1 Vertex Classification

Datasets.

We use three citation graphs, Cora, CiteSeer and Pubmed, and two Amazon co-purchase graphs, Computers and Photo. As shown in Chien et al. (2021) these five datasets are homophilic graphs on which adjacent vertices tend to share the same label. We also use two Wikipedia graphs, Chameleon and Squirrel, and two webpage graphs, Texas and Cornell, from WebKB (Pei et al., 2020). These five datasets are heterophilic datasets on which adjacent vertices tend to have different labels. Details about these datasets are shown in Table 7.

Setup and baselines.

We adopt the same experimental setup as He et al. (2021), where each dataset is randomly split into train/validation/test set with the ratio of 60%/20%/20%. In total, we use 10 random splits for each dataset, and the reported results are averaged over all splits.

We compare SGN with seven baseline models: MLP, GCN (Kipf & Welling, 2017), GAT (Velickovic et al., 2017), APPNP (Klicpera et al., 2019), ChebNet (Defferrard et al., 2016), GPRGNN (Chien et al., 2021), and BernNet (He et al., 2021).

We perform a hyperparameter search on four parameters in the following ranges: number of layers{1,2,3,4,5}\text{number of layers}\in\{1,2,3,4,5\}, dropout probability{0.2,0.5,0.7,0.9}\text{dropout probability}\in\{0.2,0.5,0.7,0.9\}, δ{1,2,3,4}\delta\in\{1,2,3,4\}, and hidden layer dimension{64,128}\text{hidden layer dimension}\in\{64,128\}.

Observation.

Results on homophilic and heterophilic datasets are presented in Tables 3 and 3, respectively. Results marked with \dagger are obtained from Zhu et al. (2020), and other baseline results are taken from He et al. (2021).

From Tables 3 and 3, we can see that SGN generalizes to both homophilic and heterophilic graphs in vertex classification tasks. Specifically, SGN-BF outperforms baselines in 7 out of 10 datasets, while SGN-DF outperforms 3 out of 10. SGN-BF performs better than SGN-DF in heterophilic graphs. We find that the number of vertices in Nδ(v)N_{\delta}(v) grows much faster for SGN-DF than SGN-BF as δ\delta increases. This can be explained as the paths to reach each uNδ(v)u\in N_{\delta}(v) from vv are longer in DFS than that of BFS (in BFS the path lengths are always less than or equal to δ\delta). Therefore more vertices are included in Nδ(v)N_{\delta}(v) for DFS. This further implies more vertex features are aggregated for each vertex in SGN-DF, resulting in over-squashing which degrades the performance of SGN-DF on heterophilic graphs.

We also list the training runtime in Table 4. As expected, as δ\delta increases, the runtime of SGN-BF increases but stays on par with GCN. When δ=1\delta=1, SGN-DF aggregates more vertices thus is slower than SGN-BF

Table 4: Training runtime per epoch in seconds.
GCN SGN-BF (δ=1\delta=1) SGN-BF (δ=2\delta=2) SGN-BF (δ=3\delta=3) SGN-DF (δ=1\delta=1)
Cora 0.177 0.125 0.213 0.239 0.179
Pubmed 0.349 0.224 0.315 1.271 0.219
Chameleon 0.205 0.198 0.457 - 0.346

6.2 Graph Classification

Datasets.

We evaluate SGN on graph classification for chemical compounds, using four molecular datasets: D&D (Dobson & Doig, 2003), PROTEINS (Borgwardt et al., 2005), NCI1 (Wale et al., 2008) and ENZYMES (Schomburg et al., 2004). We also include a social dataset IMDB-BINARY. Following Errica et al. (2020), for molecular datasets, vertex features are a one-hot encoding of atom type, with the exception of ENZYMES where additional 18 features are used, whereas for IMDB-BINARY the degree of each vertex is the sole vertex feature. Details about these datasets are shown in Table 8.

Setup and baselines.

We adopt the fair and reproducible evaluation setup from Errica et al. (2020), which uses an internal hold-out model selection for each of the 10-fold cross-validation stratified splits.

We compare SGN against five GNNs: DGCNN (Zhang et al., 2018), DiffPool (Ying et al., 2018), ECC (Simonovsky & Komodakis, 2017), GIN (Xu et al., 2019) and GraphSAGE (Hamilton et al., 2017), as well as two variants of the contextual graph Markov model: E-CGMM (Atzeni et al., 2021) and ICGMM (Castellana et al., 2022). We also include a competitive structure-agnostic baseline method, dubbed Baseline, from Errica et al. (2020).

We perform the hyperparameter search: number of layers{1,2,3,4,5}\text{number of layers}\in\{1,2,3,4,5\}, dropout probability{0.2,0.5,0.7}\text{dropout probability}\in\{0.2,0.5,0.7\}, δ{1,2,3}\delta\in\{1,2,3\}, and hidden layer dimension{64}\text{hidden layer dimension}\in\{64\}.

Observation.

Results are presented in Table 5. Results marked with \ddagger are obtained from Castellana et al. (2022), and other baseline results are taken from Errica et al. (2020).

We first observe that SGN-DF outperforms other GNNs and CGMM variants consistently. SGN-DF also outperforms Baseline on 4 out of 5 datasets. Although SGN-BF also yields competitive results on several benchmarks, it does not perform better than SGN-DF. This suggests that the graph properties captured by SGN-DF, such as biconnectivity, might be useful to classify such graphs.

Table 5: Graph classification results.
D&D NCI1 PROTEINS ENZYMES IMDB-BINARY
Baseline 78.4±4.5 69.8±2.2 75.8±3.7 65.2±6.4 70.8±5.0
DGCNN 76.6±4.3 76.4±1.7 72.9±3.5 38.9±5.7 69.2±3.0
DiffPool 75.0±3.5 76.9±1.9 73.7±3.5 59.5±5.6 68.4±3.3
ECC 72.6±4.1 76.2±1.4 72.3±3.4 29.5±8.2 67.7±2.8
GIN 75.3±2.9 80.0±1.4 73.3±4.0 59.6±4.5 71.2±3.9
GraphSAGE 72.9±2.0 76.0±1.8 73.0±4.5 58.2±6.0 68.8±4.5
E-CGMM\ddagger 73.9±4.1 78.5±1.7 73.3±4.1 - 70.7±3.8
ICGMM\ddagger 76.3±5.6 77.6±1.5 73.3±2.9 - 73.0±4.3
SGN-BF 76.3±3.2 78.8±2.9 74.0±3.9 64.8±7.2 71.4±7.1
SGN-DF 78.01±4.0 81.0±1.4 76.1±1.6 66.9±7.5 72.3±5.4

7 Conclusion

Inspired by the 1-WL test, we propose a new graph colouring scheme, called local vertex colouring (LVC). LVC iteratively refines the colours of vertices based on a graph search algorithm. LVC surpasses the expressivity limitations of 1-WL. We also prove that combining LVC with breath-first and depth-first searches can solve graph problems that cannot be solved with 1-WL test.

Based on LVC, we propose a novel variant of graph neural network, named search-guided graph neural network (SGN). SGN is permutation invariant and inherits the properties from LVC by adopting its colouring scheme to learn the embeddings of vertices. Through the experiments on a vertex classification task, we show that SGN can generalize to both homophilic and heterophilic graphs. The result of the graph classification task further verifies the efficiency of the proposed model.

Acknowledgements

We thank Professor Brendan Mckay for helpful suggestions. We acknowledge the support of the Australian Research Council under Discovery Project DP210102273. This work was also partly supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-01906, Artificial Intelligence Graduate School Program(POSTECH)) and National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2021R1C1C1011375).

References

  • Arvind et al. (2020) Arvind, V., Fuhlbrück, F., Köbler, J., and Verbitsky, O. On weisfeiler-leman invariance: Subgraph counts and related graph properties. Journal of Computer and System Sciences, 113:42–59, 2020. ISSN 0022-0000. doi: https://doi.org/10.1016/j.jcss.2020.04.003.
  • Atzeni et al. (2021) Atzeni, D., Bacciu, D., Errica, F., and Micheli, A. Modeling edge features with deep bayesian graph networks. In International Joint Conference on Neural Networks, IJCNN 2021, Shenzhen, China, July 18-22, 2021, pp.  1–8. IEEE, 2021. doi: 10.1109/IJCNN52387.2021.9533430.
  • Bevilacqua et al. (2022) Bevilacqua, B., Frasca, F., Lim, D., Srinivasan, B., Cai, C., Balamurugan, G., Bronstein, M. M., and Maron, H. Equivariant subgraph aggregation networks. International Conference on Learning Representations, 2022.
  • Bodnar et al. (2021a) Bodnar, C., Frasca, F., Otter, N., Wang, Y., Lio, P., Montufar, G. F., and Bronstein, M. Weisfeiler and lehman go cellular: Cw networks. Advances in Neural Information Processing Systems, 34:2625–2640, 2021a.
  • Bodnar et al. (2021b) Bodnar, C., Frasca, F., Wang, Y., Otter, N., Montufar, G. F., Lio, P., and Bronstein, M. Weisfeiler and lehman go topological: Message passing simplicial networks. In International Conference on Machine Learning (ICML), pp. 1026–1037, 2021b.
  • Borgwardt et al. (2005) Borgwardt, K. M., Ong, C. S., Schönauer, S., Vishwanathan, S. V. N., Smola, A. J., and Kriegel, H.-P. Protein function prediction via graph kernels. Bioinformatics, 21 Suppl 1:i47–56, June 2005.
  • Bouritsas et al. (2022) Bouritsas, G., Frasca, F., Zafeiriou, S. P., and Bronstein, M. Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  • Cai et al. (1992) Cai, J.-Y., Fürer, M., and Immerman, N. An optimal lower bound on the number of variables for graph identification. Combinatorica, 12(4):389–410, 1992.
  • Castellana et al. (2022) Castellana, D., Errica, F., Bacciu, D., and Micheli, A. The infinite contextual graph markov model. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S. (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp.  2721–2737. PMLR, 2022.
  • Chen et al. (2020) Chen, D., Lin, Y., Li, W., Li, P., Zhou, J., and Sun, X. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In AAAI Conference on Artificial Intelligence, volume 34, pp. 3438–3445, 2020.
  • Chien et al. (2021) Chien, E., Peng, J., Li, P., and Milenkovic, O. Adaptive universal generalized pagerank graph neural network. In 9th International Conference on Learning Representations, ICLR, 2021.
  • Cormen et al. (2022) Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. Introduction to Algorithms, fourth edition. MIT Press, April 2022.
  • Defferrard et al. (2016) Defferrard, M., Bresson, X., and Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Neural Information Processing Systems (NeurIPS), pp. 3844–3852, 2016.
  • Dobson & Doig (2003) Dobson, P. D. and Doig, A. J. Distinguishing enzyme structures from non-enzymes without alignments. J. Mol. Biol., 330(4):771–783, July 2003.
  • Errica et al. (2020) Errica, F., Podda, M., Bacciu, D., and Micheli, A. A fair comparison of graph neural networks for graph classification. In International Conference on Learning Representations (ICLR), 2020.
  • Fey & Lenssen (2019) Fey, M. and Lenssen, J. E. Fast graph representation learning with pytorch geometric. In Representation Learning on Graphs and Manifolds Workshop, International Conference on Learning Representations ICLR, 2019.
  • Geerts & Reutter (2022) Geerts, F. and Reutter, J. L. Expressiveness and approximation properties of graph neural networks. In International Conference on Learning Representations (ICLR), 2022.
  • Georgiev & Lió (2020) Georgiev, D. and Lió, P. Neural bipartite matching. In Workshop of the 37th International Conference on Machine Learning ICML, 2020.
  • Gilmer et al. (2017) Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chemistry. In International Conference on Machine Learning (ICML), pp. 1263–1272, 2017.
  • Hamilton et al. (2017) Hamilton, W., Ying, Z., and Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems (NeurIPS), pp.  1024–1034, 2017.
  • He et al. (2021) He, M., Wei, Z., Huang, Z., and Xu, H. Bernnet: Learning arbitrary graph spectral filters via bernstein approximation. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp.  14239–14251, 2021.
  • Hopcroft & Tarjan (1973) Hopcroft, J. and Tarjan, R. Algorithm 447: efficient algorithms for graph manipulation. Commun. ACM, 16(6):372–378, June 1973.
  • Hornik (1991) Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251–257, 1991. doi: 10.1016/0893-6080(91)90009-T.
  • Hornik et al. (1989) Hornik, K., Stinchcombe, M. B., and White, H. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, 1989. doi: 10.1016/0893-6080(89)90020-8.
  • Kipf & Welling (2017) Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
  • Klicpera et al. (2019) Klicpera, J., Bojchevski, A., and Günnemann, S. Predict then propagate: Graph neural networks meet personalized pagerank. In Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019.
  • Laskar & Wallis (1999) Laskar, R. and Wallis, C. Chessboard graphs, related designs, and domination parameters. Journal of Statistical Planning and Inference, 76(1):285–294, 1999. ISSN 0378-3758.
  • Li et al. (2020) Li, P., Wang, Y., Wang, H., and Leskovec, J. Distance encoding: Design provably more powerful neural networks for graph representation learning. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  • Loukas (2020) Loukas, A. What graph neural networks cannot learn: depth vs width. In International Conference on Learning Representations (ICLR), 2020.
  • Maron et al. (2019) Maron, H., Ben-Hamu, H., Serviansky, H., and Lipman, Y. Provably powerful graph networks. Advances in Neural Information Processing Systems (NeurIPS), 32, 2019.
  • Morris et al. (2019) Morris, C., Ritzert, M., Fey, M., Hamilton, W. L., Lenssen, J. E., Rattan, G., and Grohe, M. Weisfeiler and leman go neural: Higher-order graph neural networks. In AAAI Conference on Artificial Intelligence, volume 33, pp. 4602–4609, 2019.
  • Morris et al. (2020) Morris, C., Rattan, G., and Mutzel, P. Weisfeiler and leman go sparse: Towards scalable higher-order graph embeddings. Neural Information Processing Systems (NeurIPS), 33, 2020.
  • Pei et al. (2020) Pei, H., Wei, B., Chang, K. C., Lei, Y., and Yang, B. Geom-gcn: Geometric graph convolutional networks. In 8th International Conference on Learning Representations, ICLR, 2020.
  • Schomburg et al. (2004) Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., and Schomburg, D. BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res., 32(Database issue):D431–3, January 2004.
  • Shrikhande (1959) Shrikhande, S. S. The Uniqueness of the L2\mathrm{L}_{2} Association Scheme. The Annals of Mathematical Statistics, 30(3):781 – 798, 1959.
  • Simonovsky & Komodakis (2017) Simonovsky, M. and Komodakis, N. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp.  29–38. IEEE Computer Society, 2017. doi: 10.1109/CVPR.2017.11.
  • Tarjan (1974) Tarjan, R. E. A note on finding the bridges of a graph. Inf. Process. Lett., 2(6):160–161, April 1974.
  • Topping et al. (2022) Topping, J., Giovanni, F. D., Chamberlain, B. P., Dong, X., and Bronstein, M. M. Understanding over-squashing and bottlenecks on graphs via curvature. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
  • Velickovic et al. (2017) Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. Graph attention networks. International Conference on Learning Representations (ICLR), 2017.
  • Velickovic et al. (2020) Velickovic, P., Ying, R., Padovano, M., Hadsell, R., and Blundell, C. Neural execution of graph algorithms. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
  • Wale et al. (2008) Wale, N., Watson, I. A., and Karypis, G. Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl. Inf. Syst., 14(3):347–375, March 2008.
  • Wang et al. (2023) Wang, Q., Chen, D. Z., Wijesinghe, A., Li, S., and Farhan, M. $\mathscr{N}$-WL: A new hierarchy of expressivity for graph neural networks. In The Eleventh International Conference on Learning Representations, 2023.
  • Wang et al. (2021) Wang, Y., Wang, Q., Koehler, H., and Lin, Y. Query-by-Sketch: Scaling shortest path graph queries on very large networks. In Proceedings of the 2021 International Conference on Management of Data, SIGMOD ’21, pp.  1946–1958, New York, NY, USA, June 2021. Association for Computing Machinery.
  • Weisfeiler & Leman (1968) Weisfeiler, B. and Leman, A. The reduction of a graph to canonical form and the algebra which appears therein. NTI, Series, 2(9):12–16, 1968.
  • Wijesinghe & Wang (2022) Wijesinghe, A. and Wang, Q. A new perspective on” how graph neural networks go beyond weisfeiler-lehman?”. In International Conference on Learning Representations, 2022.
  • Xu et al. (2019) Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerful are graph neural networks? In International Conference on Learning Representations (ICLR), 2019.
  • Xu et al. (2020) Xu, K., Li, J., Zhang, M., Du, S. S., Kawarabayashi, K., and Jegelka, S. What can neural networks reason about? In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
  • Ying et al. (2018) Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W. L., and Leskovec, J. Hierarchical graph representation learning with differentiable pooling. In Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp.  4805–4815, 2018.
  • You et al. (2019) You, J., Ying, R., and Leskovec, J. Position-aware graph neural networks. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pp.  7134–7143. PMLR, 2019.
  • Zhang et al. (2023) Zhang, B., Luo, S., Wang, L., and He, D. Rethinking the expressive power of GNNs via graph biconnectivity. In 11th International Conference on Learning Representations, ICLR 2020, Kigali, Rwanda , May 1-5, 2023, 2023.
  • Zhang et al. (2018) Zhang, M., Cui, Z., Neumann, M., and Chen, Y. An end-to-end deep learning architecture for graph classification. In AAAI Conference on Artificial Intelligence, 2018.
  • Zhao & Akoglu (2019) Zhao, L. and Akoglu, L. Pairnorm: Tackling oversmoothing in gnns. In International Conference on Learning Representations (ICLR), 2019.
  • Zhu et al. (2020) Zhu, J., Yan, Y., Zhao, L., Heimann, M., Akoglu, L., and Koutra, D. Beyond homophily in graph neural networks: Current limitations and effective designs. Advances in Neural Information Processing Systems, 33:7793–7804, 2020.

Appendix A Summary of Notations

We summarise the notations used throughout the paper in Table 6 below.

Table 6: Summary of notations.

Symbol Description
{}\{\cdot\} a set
{{}}\{\!\!\{\cdot\}\!\!\} a multiset
|||\cdot| cardinality of a set/multiset/sequence
()\mathbb{P}(\cdot) a power set
GG, HH undirected graphs
VV a vertex set
EE an edge set
evu\vec{e}_{vu} directed edge from vertex vv to uu
(v,u)(v,u) a vertex sequence of a directed edge
d(v,u)d(v,u) shortest distance between vertex vv and uu
Nδ(v)N_{\delta}(v) a set of vertices within δ\delta hops of vertex vv
𝒫vu\mathcal{P}_{vu} a path from vertex vv to uu
(w0,w1,,wk)(w_{0},w_{1},...,w_{k}) a vertex sequence in a path
TvT_{v} a search tree rooted at vertex vv
vuv\prec u vv precedes uu in search visiting order
vuv\succ u vv succeeds uu in search visiting order
EbackTvE_{\text{back}}^{T_{v}} a back edge set of the search tree TvT_{v} rooted at vertex vv
EtreeTvE_{\text{tree}}^{T_{v}} a tree edge set of the search tree TvT_{v} rooted at vertex vv
GHG\simeq H GG and HH are isomorphic
CC a set of colours
λ:VC\lambda:V\rightarrow C a colour refinement function that assigns a colour in CC to a vertex in VV
λv:VC\lambda_{v}:V\rightarrow C a colour refinement function that assigns a colour in CC to a vertex in VV, based on a graph search rooted at vertex vv
η(u,EtreeTv,EbackTv)\eta(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}}) a function which takes a vertex uu, a tree edge set EtreeTvE_{\text{tree}}^{T_{v}} and a back edge set EbackTvE_{\text{back}}^{T_{v}} as input, outputs a vertex set
\vee logical or
\wedge logical and
\parallel concatenation
ϕ()\phi(\cdot), ψ()\psi(\cdot), ρ()\rho(\cdot) injective functions
hvh_{v} vector representation of vertex vv
MLPMLP multilayer perceptron
ϵ\epsilon a learnable parameter
WcW_{c} a learnable matrix with respect to the shortest path distance cc between two vertices

Appendix B Dataset Statistics

Statistics of the datasets used in our experiments are listed in Table 7 and Table 8.

Table 7: Statistics of datasets used for vertex classification.
# Vertices # Edges # Features # Classes
Cora 2708 5278 1433 7
CiteSeer 3327 4552 3703 6
Pubmed 19717 44324 500 5
Computers 13752 245861 767 10
Photo 7650 119081 745 8
Chameleon 2277 31371 2325 5
Squirrel 5201 198353 2089 5
Texas 183 279 1703 5
Cornell 183 277 1703 5
Wisconsin 251 466 1703 5
Table 8: Statistics of datasets used for graph classification.
# Graphs # Classes Avg. # vertices Avg. # edges # Features
D&D 1178 2 284.32 715.66 89
ENZYMES 600 6 32.63 64.14 3
NCI1 4110 2 29.78 32.30 37
PROTEINS 1113 2 39.06 72.82 3
IMDB-BINARY 1000 2 19.77 96.53 -

Appendix C Proofs

We first introduce several lemmas that are useful for later proofs.

{lemma}

Given two pairs of vertices (u,v)(u,v) and (u,v)(u^{\prime},v^{\prime}), if d(u,v)d(u,v)d(u,v)\neq d(u^{\prime},v^{\prime}), d(u,v)δd(u,v)\leq\delta, and d(u,v)δd(u^{\prime},v^{\prime})\leq\delta, then λv(u)λv(u)\lambda_{v}(u)\neq\lambda_{v^{\prime}}(u^{\prime}) under BFC.

Proof.

We only show the case that λv(u)λv(u)\lambda_{v}(u)\neq\lambda_{v^{\prime}}(u^{\prime}) when d(u,v)<d(u,v)d(u,v)<d(u^{\prime},v^{\prime}). This is because λv(u)λv(u)\lambda_{v}(u)\neq\lambda_{v^{\prime}}(u^{\prime}) can be shown for d(u,v)>d(u,v)d(u,v)>d(u^{\prime},v^{\prime}) in the same way. We prove this by contradiction and thus assume that d(u,v)=δd(u,v)=\delta^{\prime}, d(u,v)=δ+Δd(u^{\prime},v^{\prime})=\delta^{\prime}+\Delta, and λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}), where δδ\delta^{\prime}\leq\delta and Δ1\Delta\geq 1.

Since λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}), by Equations 1 and 2, we know that λl(u)=λl(u)\lambda^{l}(u)=\lambda^{l}(u^{\prime}) must hold for any lδl\geq\delta^{\prime}; otherwise, the injectivitity of the functions ρ\rho and ϕ\phi would lead to λv(u)λv(u)\lambda_{v}(u)\neq\lambda_{v^{\prime}}(u^{\prime}), contradicting with the assumption.

Then, by Equations 1 and 2 again, we have the following for any lδl\geq\delta^{\prime}:

ψ({{λvl(w):wηbfc(u,EtreeTv,EbackTv)}})=\displaystyle\psi\left(\{\!\!\{\lambda^{l}_{v}(w):w\in\eta_{\text{bfc}}(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}})\}\!\!\}\right)= ψ({{λvl(w):wηbfc(u,EtreeTv,EbackTv)}}).\displaystyle\psi\left(\{\!\!\{\lambda^{l}_{v^{\prime}}(w^{\prime}):w^{\prime}\in\eta_{\text{bfc}}(u^{\prime},E_{\text{tree}}^{T_{v^{\prime}}},E_{\text{back}}^{T_{v^{\prime}}})\}\!\!\}\right). (9)

Because ψ\psi is injective, the above leads to

|ηbfc(u,EtreeTv,EbackTv)|=\displaystyle|\eta_{\text{bfc}}(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}})|= |ηbfc(u,EtreeTv,EbackTv)| and d(u,u)=d(u,u)=0.\displaystyle|\eta_{\text{bfc}}(u^{\prime},E_{\text{tree}}^{T_{v^{\prime}}},E_{\text{back}}^{T_{v^{\prime}}})|\text{ and }d(u,u)=d(u^{\prime},u^{\prime})=0. (10)

Again we know that λl1(w)=λl1(w)\lambda^{l-1}(w)=\lambda^{l-1}(w^{\prime}) because of the injectivity of ρ\rho and ϕ\phi. Similarly, we have the following

|ηbfc(w,EtreeTv,EbackTv)|=\displaystyle|\eta_{\text{bfc}}(w,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}})|= |ηbfc(w,EtreeTv,EbackTv)| and d(u,w)=d(u,w)=1.\displaystyle|\eta_{\text{bfc}}(w^{\prime},E_{\text{tree}}^{T_{v^{\prime}}},E_{\text{back}}^{T_{v^{\prime}}})|\text{ and }d(u,w)=d(u^{\prime},w^{\prime})=1. (11)

The above can be shown recursively for all vertices in SPG(u,v)SPG(u,v) and SPG(u,v)SPG(u^{\prime},v^{\prime}). So we know that, for any vertex zz^{\prime} in SPG(u,v)SPG(u^{\prime},v^{\prime}), there must exist a vertex zz in SPG(u,v)SPG(u,v) such that the following conditions hold:

|ηbfc(z,EtreeTv,EbackTv)|=\displaystyle|\eta_{\text{bfc}}(z,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}})|= |ηbfc(z,EtreeTv,EbackTv)| and d(u,z)=d(u,z).\displaystyle|\eta_{\text{bfc}}(z^{\prime},E_{\text{tree}}^{T_{v^{\prime}}},E_{\text{back}}^{T_{v^{\prime}}})|\text{ and }d(u,z)=d(u^{\prime},z^{\prime}). (12)

However, since d(u,v)=δd(u,v)=\delta^{\prime} and d(u,v)=δ+Δd(u^{\prime},v^{\prime})=\delta^{\prime}+\Delta, there must exist at least one vertex zz^{\prime} in SPG(u,v)SPG(u^{\prime},v^{\prime}) with δ<d(u,z)δ+Δ\delta^{\prime}<d(u^{\prime},z^{\prime})\leq\delta^{\prime}+\Delta, which cannot satisfy the above conditions. This implies that λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}) does not hold, contradicting our assumption. The proof is done.

{lemma}

|Nδ(v)|=|Nδ(u)||N_{\delta}(v)|=|N_{\delta}(u)| if λ(v)=λ(u)\lambda(v)=\lambda(u).

Proof.

According to Equation 1, when λ(v)=λ(u)\lambda(v)=\lambda(u), we have

ϕ(λ(v),ψ{{λw1(v):w1Nδ(v)}}))=ϕ(λ(u),ψ{{λw2(u):w2Nδ(u)}})).\phi\left(\lambda(v),\psi\left\{\!\!\{\lambda_{w_{1}}(v):w_{1}\in N_{\delta}(v)\}\!\!\}\right)\right)=\phi\left(\lambda(u),\psi\left\{\!\!\{\lambda_{w_{2}}(u):w_{2}\in N_{\delta}(u)\}\!\!\}\right)\right).

Because ϕ()\phi(\cdot) and ψ()\psi(\cdot) are injective, we have

{{λw1(v):w1Nδ(v)}}={{λw2(u):w2Nδ(u)}}.\{\!\!\{\lambda_{w_{1}}(v):w_{1}\in N_{\delta}(v)\}\!\!\}=\{\!\!\{\lambda_{w_{2}}(u):w_{2}\in N_{\delta}(u)\}\!\!\}.

Hence, the number of elements should be the same for the multisets on the left and right sides. This leads to |Nδ(v)|=|Nδ(u)||N_{\delta}(v)|=|N_{\delta}(u)|. The proof is done. ∎

See 4.2

Proof.

We first show SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) if one of the two conditions holds.

We only show the statement holds when the first condition holds, i.e. “SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) if λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}) and λu(v)=λu(v)\lambda_{u}(v)=\lambda_{u^{\prime}}(v^{\prime})”. The case for the second condition, i.e. “SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) if λv(u)=λu(v)\lambda_{v}(u)=\lambda_{u^{\prime}}(v^{\prime}) and λu(v)=λv(u)\lambda_{u}(v)=\lambda_{v^{\prime}}(u^{\prime})”, can be shown in the same way.

Assuming λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}) and λu(v)=λu(v)\lambda_{u}(v)=\lambda_{u^{\prime}}(v^{\prime}), we show SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) by induction.

  • When d(u,v)=0d(u,v)=0 and d(u,v)=0d(u^{\prime},v^{\prime})=0, we must have u=vu=v and u=vu^{\prime}=v^{\prime}. Accordingly, both SPG(u,v)SPG(u,v) and SPG(u,v)SPG(u^{\prime},v^{\prime}) contain only one node. Thus we must have SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}).

  • When d(u,v)=1d(u,v)=1 and d(u,v)=1d(u^{\prime},v^{\prime})=1, we have (u,v)E(u,v)\in E and (u,v)E(u^{\prime},v^{\prime})\in E. Then both SPG(u,v)SPG(u,v) and SPG(u,v)SPG(u^{\prime},v^{\prime}) contain only one edge. Thus we must have SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}).

  • When d(u,v)=2d(u,v)=2 and d(u,v)=2d(u^{\prime},v^{\prime})=2, SPG(u,v)SPG(u,v) and SPG(u,v)SPG(u^{\prime},v^{\prime}) can only have one isomorphic type: assuming there are total NN vertices in SPG(u,v)SPG(u,v) and SPG(u,v)SPG(u^{\prime},v^{\prime}), there are N2N-2 vertices adjacent to both uu/uu^{\prime} and vv/vv^{\prime}. It is easy to see SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}).

  • Now assume that the statement “SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) if λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}) and λu(v)=λu(v)\lambda_{u}(v)=\lambda_{u^{\prime}}(v^{\prime}) under BFC” holds for any two pairs of vertices (u,v)(u,v) and (u,v)(u^{\prime},v^{\prime}) when d(u,v)=d(u,v)Δd(u,v)=d(u^{\prime},v^{\prime})\leq\Delta. We want to show that this statement will hold for the case d(u,v)=d(u,v)=Δ+1d(u,v)=d(u^{\prime},v^{\prime})=\Delta+1. It is easy to see that λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}) if and only if λu(v)=λu(v)\lambda_{u}(v)=\lambda_{u^{\prime}}(v^{\prime}), so we just need to show SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) if λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}).

    When d(u,v)=Δ+1d(u,v)=\Delta+1, we may express SPG(u,v)SPG(u,v) as a tree rooted at vertex uu, which has a number of children {{SPG(u1,v),,SPG(uq,v)}}\{\!\!\{SPG(u_{1},v),\dots,SPG(u_{q},v)\}\!\!\} where d(ui,v)=Δd(u_{i},v)=\Delta for 1iq1\leq i\leq q and (u,ui)E(u,u_{i})\in E. Accordingly, we may express SPG(u,v)SPG(u^{\prime},v^{\prime}) as a tree rooted at vertex uu^{\prime}, which has a number of children {{SPG(u1,v),,SPG(uq,v)}}\{\!\!\{SPG(u^{\prime}_{1},v^{\prime}),\dots,SPG(u^{\prime}_{q},v)\}\!\!\} where d(ui,v)=Δd(u^{\prime}_{i},v^{\prime})=\Delta for 1ip1\leq i\leq p and (u,ui)E(u^{\prime},u^{\prime}_{i})\in E.

    Because λv(u)=ϕ(λ(u),ψ({{λv(u1),,λvuq}}))\lambda_{v}(u)=\phi(\lambda(u),\psi(\{\!\!\{\lambda_{v}(u_{1}),\dots,\lambda_{v}{u_{q}}\}\!\!\})) and λv(u)=ϕ(λ(u),ψ({{λv(u1),,λv(up)}}))\lambda_{v^{\prime}}(u^{\prime})=\phi(\lambda(u^{\prime}),\psi(\{\!\!\{\lambda_{v^{\prime}}(u^{\prime}_{1}),\dots,\lambda_{v^{\prime}}(u^{\prime}_{p})\}\!\!\})), we must have p=qp=q and {{λv(u1),,λv(uq)}}={{λv(u1),,λv(up)}}\{\!\!\{\lambda_{v}(u_{1}),\dots,\lambda_{v}(u_{q})\}\!\!\}=\{\!\!\{\lambda_{v^{\prime}}(u^{\prime}_{1}),\dots,\lambda_{v^{\prime}}(u^{\prime}_{p})\}\!\!\}. Without loss of generality, assuming λv(u1)=λv(uq)\lambda_{v}(u_{1})=\lambda_{v}^{\prime}(u^{\prime}_{q}), …, λv(up)=λv(up)\lambda_{v}(u_{p})=\lambda_{v}^{\prime}(u^{\prime}_{p}), by our assumption for the case d(u,v)=d(u,v)Δd(u,v)=d(u^{\prime},v^{\prime})\leq\Delta, we have SPG(u1,v)SPG(u1,v)SPG(u_{1},v)\simeq SPG(u^{\prime}_{1},v^{\prime}), …, SPG(uq,v)SPG(up,v)SPG(u_{q},v)\simeq SPG(u^{\prime}_{p},v^{\prime}). This implies {{SPG(u1,v),,SPG(uq,v)}}{{SPG(u1,v),,SPG(up,v)}}\{\!\!\{SPG(u_{1},v),\dots,SPG(u_{q},v)\}\!\!\}\simeq\{\!\!\{SPG(u^{\prime}_{1},v^{\prime}),\dots,SPG(u^{\prime}_{p},v^{\prime})\}\!\!\}. Thus, we must have SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}). So the statement “SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) if λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}) and λu(v)=λu(v)\lambda_{u}(v)=\lambda_{u^{\prime}}(v^{\prime}) under BFC” holds for the case d(u,v)=d(u,v)=Δ+1d(u,v)=d(u^{\prime},v^{\prime})=\Delta+1.

Now we show SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) only if one of the two conditions holds. Assuming SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}), we show this by induction.

  • When d(u,v)=0d(u,v)=0 and d(u,v)=0d(u^{\prime},v^{\prime})=0, we must have u=vu=v and u=vu^{\prime}=v^{\prime}. Accordingly, both SPG(u,v)SPG(u,v) and SPG(u,v)SPG(u^{\prime},v^{\prime}) contain only one node. Thus both conditions hold.

  • When d(u,v)=1d(u,v)=1 and d(u,v)=1d(u^{\prime},v^{\prime})=1, we have (u,v)E(u,v)\in E and (u,v)E(u^{\prime},v^{\prime})\in E. Then both SPG(u,v)SPG(u,v) and SPG(u,v)SPG(u^{\prime},v^{\prime}) contain only one edge. Thus both conditions hold.

  • When d(u,v)=2d(u,v)=2 and d(u,v)=2d(u^{\prime},v^{\prime})=2, SPG(u,v)SPG(u,v) and SPG(u,v)SPG(u^{\prime},v^{\prime}) can only have one isomorphic type: assuming there are total NN vertices in SPG(u,v)SPG(u,v) and SPG(u,v)SPG(u^{\prime},v^{\prime}), there are N2N-2 vertices adjacent to both uu/uu^{\prime} and vv/vv^{\prime}. It is easy to see both conditions hold.

  • Now assume that the statement holds for the first condition, i.e. “SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) only if λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}) and λu(v)=λu(v)\lambda_{u}(v)=\lambda_{u^{\prime}}(v^{\prime}) under BFC” holds for any two pairs of vertices (u,v)(u,v) and (u,v)(u^{\prime},v^{\prime}) when d(u,v)=d(u,v)Δd(u,v)=d(u^{\prime},v^{\prime})\leq\Delta. We want to show that this statement also hold for the case d(u,v)=d(u,v)=Δ+1d(u,v)=d(u^{\prime},v^{\prime})=\Delta+1.

    It is easy to see that λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}) if and only if λu(v)=λu(v)\lambda_{u}(v)=\lambda_{u^{\prime}}(v^{\prime}), so we just need to show SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) only if λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}). Similar with before, when d(u,v)=Δ+1d(u,v)=\Delta+1, we express SPG(u,v)SPG(u,v) as a tree rooted at vertex uu, which has a number of children {{SPG(u1,v),,SPG(uq,v)}}\{\!\!\{SPG(u_{1},v),\dots,SPG(u_{q},v)\}\!\!\} where d(ui,v)=Δd(u_{i},v)=\Delta for 1iq1\leq i\leq q and (u,ui)E(u,u_{i})\in E. Accordingly, we express SPG(u,v)SPG(u^{\prime},v^{\prime}) as a tree rooted at vertex uu^{\prime}, which has a number of children {{SPG(u1,v),,SPG(uq,v)}}\{\!\!\{SPG(u^{\prime}_{1},v^{\prime}),\dots,SPG(u^{\prime}_{q},v)\}\!\!\} where d(ui,v)=Δd(u^{\prime}_{i},v)=\Delta for 1ip1\leq i\leq p and (u,ui)E(u^{\prime},u^{\prime}_{i})\in E. Because SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}), we must have p=qp=q. This further implies {{SPG(u1,v),,SPG(uq,v)}}{{SPG(u1,v),,SPG(up,v)}}\{\!\!\{SPG(u_{1},v),\dots,SPG(u_{q},v)\}\!\!\}\simeq\{\!\!\{SPG(u^{\prime}_{1},v^{\prime}),\dots,SPG(u^{\prime}_{p},v^{\prime})\}\!\!\}. By our assumption for the case d(u,v)=d(u,v)Δd(u,v)=d(u^{\prime},v^{\prime})\leq\Delta, we have {{λv(u1),,λv(uq)}}={{λv(u1),,λv(up)}}\{\!\!\{\lambda_{v}(u_{1}),\dots,\lambda_{v}(u_{q})\}\!\!\}=\{\!\!\{\lambda_{v^{\prime}}(u^{\prime}_{1}),\dots,\lambda_{v^{\prime}}(u^{\prime}_{p})\}\!\!\}, which further leads to λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}). So the statement “SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) only if λv(u)=λv(u)\lambda_{v}(u)=\lambda_{v^{\prime}}(u^{\prime}) and λu(v)=λu(v)\lambda_{u}(v)=\lambda_{u^{\prime}}(v^{\prime}) under BFC” holds for the case d(u,v)=d(u,v)=Δ+1d(u,v)=d(u^{\prime},v^{\prime})=\Delta+1.

    Now assume that the statement holds for the second condition, i.e. “SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) only if λv(u)=λu(v)\lambda_{v}(u)=\lambda_{u^{\prime}}(v^{\prime}) and λu(v)=λv(u)\lambda_{u}(v)=\lambda_{v^{\prime}}(u^{\prime}) under BFC” holds for any two pairs of vertices (u,v)(u,v) and (u,v)(u^{\prime},v^{\prime}) when d(u,v)=d(u,v)Δd(u,v)=d(u^{\prime},v^{\prime})\leq\Delta. We can show the statement also holds for the case d(u,v)=d(u,v)=Δ+1d(u,v)=d(u^{\prime},v^{\prime})=\Delta+1 in the same way as before.

The proof is done. ∎

See 3

Proof.

We first show that, if λ(v)=λ(u)\lambda(v)=\lambda(u), then we have SvSuS_{v}\simeq S_{u}. We prove this by induction:

  • When δ=0\delta=0, there is only one vertex in SvS_{v} and SuS_{u}, it is trivial to see SvSuS_{v}\simeq S_{u}.

  • When δ=1\delta=1, there are only vv and its direct neighbours in SvS_{v}, and only uu and its direct neighbours SuS_{u}. vv and uu must have the same degree for λ(v)=λ(u)\lambda(v)=\lambda(u) to hold, so we have SvSuS_{v}\simeq S_{u}.

  • Now assume that the statement “SvSuS_{v}\simeq S_{u} if λ(v)=λ(u)\lambda(v)=\lambda(u) under BFC” holds for any two vertices vv and uu when δΔ\delta\leq\Delta. We want to show that this statement will hold for the case δ=Δ+1\delta=\Delta+1. When δ=Δ+1\delta=\Delta+1, we only consider the case where SvSuS_{v}\simeq S_{u} for δ=Δ\delta=\Delta; otherwise, we immediately have λ(v)λ(u)\lambda(v)\neq\lambda(u) according to Appendix C. Assuming that there are qq vertices {v1,,vq}\{v_{1},\dots,v_{q}\} in NΔ+1(v)NΔ(v)N_{\Delta+1}(v)\setminus N_{\Delta}(v) and pp vertices {u1,,up}\{u_{1},\dots,u_{p}\} in NΔ+1(u)NΔ(u)N_{\Delta+1}(u)\setminus N_{\Delta}(u), where p1p\geq 1 and q1q\geq 1. If pqp\neq q we have λ(v)λ(u)\lambda(v)\neq\lambda(u). So we only consider the case where p=qp=q. In this case, for λ(v)=λ(u)\lambda(v)=\lambda(u) to hold, we must have {{λv1(v),,λv1(v)}}={{λu1(u),,λup(u)}}\{\!\!\{\lambda_{v_{1}}(v),\dots,\lambda_{v_{1}}(v)\}\!\!\}=\{\!\!\{\lambda_{u_{1}}(u),\dots,\lambda_{u_{p}}(u)\}\!\!\}. Then we have {{SPG(v1,v),,SPG(vq,v)}}{{SPG(u1,u),,SPG(up,u)}}\{\!\!\{SPG(v_{1},v),\dots,SPG(v_{q},v)\}\!\!\}\simeq\{\!\!\{SPG(u_{1},u),\dots,SPG(u_{p},u^{\prime})\}\!\!\} according to Section 4.2. Thus, we must have SvSuS_{v}\simeq S_{u}.

We then show that, if SvSuS_{v}\simeq S_{u} we have λ(v)=λ(u)\lambda(v)=\lambda(u). We prove this by induction:

  • When δ=0\delta=0, there is only one vertex in SvS_{v} and SuS_{u}, it is trivial to see λ(v)=λ(u)\lambda(v)=\lambda(u).

  • When δ=1\delta=1, there are only vv and its direct neighbours in SvS_{v}, and only uu and its direct neighbours SuS_{u}. vv and uu must have the same degree for SvSuS_{v}\simeq S_{u} to hold, so we have λ(v)=λ(u)\lambda(v)=\lambda(u).

  • Now assume that the statement “λ(v)=λ(u)\lambda(v)=\lambda(u) if SvSuS_{v}\simeq S_{u} under BFC” holds for any two vertices vv and uu when δΔ\delta\leq\Delta. We want to show that this statement will hold for the case δ=Δ+1\delta=\Delta+1. Assuming that there are qq vertices {v1,,vq}\{v_{1},\dots,v_{q}\} in NΔ+1(v)NΔ(v)N_{\Delta+1}(v)\setminus N_{\Delta}(v) and pp vertices {u1,,up}\{u_{1},\dots,u_{p}\} in NΔ+1(u)NΔ(u)N_{\Delta+1}(u)\setminus N_{\Delta}(u), where p1p\geq 1 and q1q\geq 1. If pqp\neq q we have Sv≄SuS_{v}\not\simeq S_{u}. So we only consider the case where p=qp=q. Because SvSuS_{v}\simeq S_{u} for δ=Δ+1\delta=\Delta+1, we know {{SPG(v1,v),,SPG(vq,v)}}{{SPG(u1,u),,SPG(up,u)}}\{\!\!\{SPG(v_{1},v),\dots,SPG(v_{q},v)\}\!\!\}\simeq\{\!\!\{SPG(u_{1},u),\dots,SPG(u_{p},u^{\prime})\}\!\!\}. According to Section 4.2, we must have {{λv1(v),,λvq(v)}}={{λu1(u),,λup(u)}}\{\!\!\{\lambda_{v_{1}}(v),\dots,\lambda_{v_{q}}(v)\}\!\!\}=\{\!\!\{\lambda_{u_{1}}(u),\dots,\lambda_{u_{p}}(u)\}\!\!\}. Thus, we have λ(v)=λ(u)\lambda(v)=\lambda(u).

The proof is done. ∎

See 3

Proof.

Consider vertices vv and uu and their 1-hop neighbour vertex sets N1(v)N_{1}(v) and N1(u)N_{1}(u), respectively. We use λWL()\lambda_{\text{WL}}(\cdot) to denote the colour mapping of 1-WL. We show an example where λWL(v)=λWL(u)\lambda_{\text{WL}}(v)=\lambda_{\text{WL}}(u) but Sv≄SuS_{v}\not\simeq S_{u}. In Figure 3, SvS_{v} and SuS_{u} are non-isomorphic; however, we can see λWL(v)=λWL(u)\lambda_{\text{WL}}(v)=\lambda_{\text{WL}}(u) because each vertex is adjacent to the same number of vertices. We know from Xu et al. (2019) that MPNN’s expressivity is upper-bounded by 1-WL; thus, the proof is done. ∎

See 4.3

Proof.

By showing that ηdfc\eta_{\text{dfc}} is invariant to vertex search orders, we can prove that ηdfc\eta_{\text{dfc}} is permutation invariant.

Note that for DFS rooted at vertex vv, there are multiple search trees if and only if there exist back edges. For a DFS tree TvT_{v}, any alternative DFS tree can be formed by changing a tree edge to a back edge and swap their directions. If there are no back edges in a DFS tree, the DFS tree is canonical such that the output of ηdfc\eta_{\text{dfc}} does not change. So below we only consider the case where back edges exist.

For a non-root vertex uu in GG, there must be one and only one tree edge leading to uu, i.e. exact one vertex in {o:(o,u)EtreeTv}\{o:(o,u)\in E_{\text{tree}}^{T_{v}}\}. If QuTvQ_{u}^{T_{v}} in Equation 4 is empty, uu is not covered by a back edge and uu is not in any cycle because a cycle is formed by at least one back edge (Cormen et al., 2022). Since uu is not in any cycle, the vertex ww precedes uu in TvT_{v} is deterministic.

Now we consider the case where QuTvQ_{u}^{T_{v}} is not empty, which has two further cases. Let ww be a vertex preceding uu in the tree edge (w,u)(w,u).

  • The first case is when ww and uu are covered together by a back edge, or by two back edges that crossover each other. In this case, uu and ww are in the same cycle(s), so ww appears in BuTvB_{u}^{T_{v}}. For any alternative search tree to TvT_{v}, ww and uu must also be covered together by a back edge, or by two back edges that crossover each other. So the union {o:(o,u)EtreeTv}{o:oBuTv}\{o:(o,u)\in E_{\text{tree}}^{T_{v}}\}\cup\{o:o\in B_{u}^{T_{v}}\} in Equation 6 remains the same for all possible search trees.

  • The second case is when ww and uu are not covered by a back edge, or are covered by two back edges that do not crossover each other. In this case, (w,u)(w,u) appears as a tree edge in all possible search trees rooted at vv, because the tree edge (w,u)(w,u) is not in any cycles. In this case, DuTvD^{T_{v}}_{u} remains the same for all search trees, which makes the union {o:(o,u)EtreeTv}{o:oBuTv}\{o:(o,u)\in E_{\text{tree}}^{T_{v}}\}\cup\{o:o\in B_{u}^{T_{v}}\} invariant to traversal order.

Thus, ηdfc\eta_{\text{dfc}} is invariant to vertex traversal order. The proof is done. ∎

In the following, we introduce a lemma that is useful for proving Section 4.3.

{lemma}

Each vertex in a biconnected component is covered by at least one DFS back edge.

Proof.

A biconnected component is an induced subgraph of GG which stays connected by removing any one vertex. Therefore, each vertex in a biconnected component should participate in at least one cycle. We know a vertex forms a cycle if and only if it is covered by a DFS back edge. Hence, vertices in a biconnected component is covered by at least one DFS back edge. ∎

See 4.3

Proof.

We prove the statements in Section 4.3 one by one.

  • By the definition that a vertex uu is a cut vertex of GG if removing uu increases the number of connected components, Hopcroft & Tarjan (1973) shows that uu is a cut vertex if one of the following two conditions is true:

    1. 1.

      uu is not the root of a DFS tree, and it has a child cc such that no vertex in the subtree rooted with cc has a back edge to one of the ancestors in the DFS tree of uu.

    2. 2.

      uu is the root of a DFS tree, and it has at least two children.

    Equation 1 only computes vertex colour based on search trees where uu is not the root, so we can just focus on condition 1. There are two types of cut vertex: one that does not form biconnected components, called the type-1 cut vertex (e.g. v4v_{4} and v5v_{5} in Figure 4(b)), and the other that forms biconnected components, called the type-2 cut vertex (v2v_{2} and v6v_{6} in Figure 4(b)). We first show that the first statement holds for the type-1 cut vertex.

    For a type-1 cut vertex uu, it is easy to see that we have DuTv=D^{T_{v}}_{u}=\varnothing because of Appendix C. Hence, ηdfc(u,EtreeTv,EbackTv)\eta_{\text{dfc}}(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}}) returns a single vertex set containing ww, where ww is the vertex preceding uu in the tree edge (w,u)(w,u). Note that both a type-1 cut vertex and a leaf vertex (a vertex without children) obtain a single vertex set from Equation 6; however, according to Condition 1, a leaf vertex is not a vertex tree. We show that if uu is a cut vertex and xx is a leaf vertex, λG(u)λG(x)\lambda_{G}(u)\neq\lambda_{G}(x). We show this by contradiction. A leaf vertex has a degree of 1 while a cut vertex must have a degree larger than 1 (because it connects at least two biconnected components). For λG(u)=λG(x)\lambda_{G}(u)=\lambda_{G}(x) to hold, according to Equation 1, there must be the same number of vertices in Nδ(u)N_{\delta}(u) and Nδ(x)N_{\delta}(x). Because xx is a leaf vertex, there is only one vertex, w1w_{1}, that contributes to the colour of uu without involving other vertices. Because uu is a cut vertex, there are at least two vertices, w1w^{\prime}_{1} and w2w^{\prime}_{2}, that are adjacent to uu. Assuming λw1(x)=λw1(u)\lambda_{w_{1}}(x)=\lambda_{w^{\prime}_{1}}(u), there must exist another vertex w2Nδ(x)w_{2}\in N_{\delta}(x) such that λw2(x)=λw2(u)\lambda_{w_{2}}(x)=\lambda_{w^{\prime}_{2}}(u) and w2w_{2} does not have an edge with xx. For simplicity, we omit the superscript in Equation 2, and unpack λw2(x)\lambda_{w_{2}}(x) and λw2(u)\lambda_{w^{\prime}_{2}}(u). We then have λw2(x)=ϕ(λ(x),ψ({{λw2(w1)}}))\lambda_{w_{2}}(x)=\phi\big{(}\lambda(x),\psi\left(\{\!\!\{\lambda_{w_{2}}(w_{1})\}\!\!\}\right)\big{)} and λw2(u)=ϕ(λ(u),ψ({{λw2(w2)}}))\lambda_{w^{\prime}_{2}}(u)=\phi\big{(}\lambda(u),\psi\left(\{\!\!\{\lambda_{w^{\prime}_{2}}(w^{\prime}_{2})\}\!\!\}\right)\big{)}. Hence, we need λw2(w1)=λw2(w2)\lambda_{w_{2}}(w_{1})=\lambda_{w^{\prime}_{2}}(w^{\prime}_{2}). Because λw2(w1)=ϕ(λ(w1),ψ({{λw2(w2)}}))\lambda_{w_{2}}(w_{1})=\phi\big{(}\lambda(w_{1}),\psi(\{\!\!\{\lambda_{w_{2}}(w_{2})\}\!\!\})\big{)}, λw2(w2)=λ(w2)\lambda_{w_{2}}(w_{2})=\lambda(w_{2}), and λw2(w2)=λ(w2)\lambda_{w_{2}}(w^{\prime}_{2})=\lambda(w^{\prime}_{2}), we need to have λ(w2)=ϕ(λ(w1),ψ({{λw2(w2)}}))\lambda(w^{\prime}_{2})=\phi\big{(}\lambda(w_{1}),\psi(\{\!\!\{\lambda_{w_{2}}(w_{2})\}\!\!\})\big{)} which obviously does not hold. Therefore, if uu is a cut vertex and xx is a leaf vertex, λG(u)λG(x)\lambda_{G}(u)\neq\lambda_{G}(x).

    For a type-2 cut vertex uu, it is obvious that DuTvD^{T_{v}}_{u}\neq\varnothing and thus the colour of uu is clearly different from any leaf vertices or any type-1 cut vertices. So now we just need to show λG(u)λH(x)\lambda_{G}(u)\neq\lambda_{H}(x) when xx is a non-cut vertex in a biconnected component. Because uu is a type-2 cut vertex, ηdfc(u,EtreeTv,EbackTv)\eta_{\text{dfc}}(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}}) includes all vertices in the biconnected components in which uu participates (at least two biconnected components), while for a non-cut vertex xx, ηdfc(u,EtreeTv,EbackTv)\eta_{\text{dfc}}(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}}) includes only vertices in a single biconnected component. So a type-2 cut vertex has a different number of vertices in ηdfc(u,EtreeTv,EbackTv)\eta_{\text{dfc}}(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}}) comparing with a non-cut vertex. Therefore, it is easy to see that λG(u)λH(x)\lambda_{G}(u)\neq\lambda_{H}(x) when uu is a type-2 cut vertex and xx is a non-cut vertex in a biconnected component.

    Hence, if λG(u)=λG(x)\lambda_{G}(u)=\lambda_{G}(x), and xx is a cut vertex if and only if uu is a vertex. The proof for the first statement is done.

  • According to Tarjan’s algorithm for finding cut edges (Tarjan, 1974), in a DFS tree, a tree edge (u1,u2)(u_{1},u_{2}) is a cut edge if there is a path from u1u_{1} to u2u_{2} and every path from u1u_{1} to u2u_{2} contains edge (u1,u2)(u_{1},u_{2}). This can be interpreted as follows: (u1,u2)(u_{1},u_{2}) is a cut edge if u1u_{1} and u2u_{2} are not covered by the same back edge; otherwise, these back edges do not crossover each other. We prove the second statement by contradiction. If (u1,u2)(u_{1},u_{2}) is a cut edge and (x1,x2)(x_{1},x_{2}) is not a cut edge, assuming {{λ(u1),λ(u2)}}={{λ(x1),λ(x2)}}\{\!\!\{\lambda(u_{1}),\lambda(u_{2})\}\!\!\}=\{\!\!\{\lambda(x_{1}),\lambda(x_{2})\}\!\!\}, then x1x_{1} and x2x_{2} must be covered either by the same back edge or by different back edges that do not crossover each other. Hence, based on Equation 6, ηdfc(x1,EtreeTv,EbackTv)=ηdfc(x2,EtreeTv,EbackTv)\eta_{\text{dfc}}(x_{1},E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}})=\eta_{\text{dfc}}(x_{2},E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}}). Since (u1,u2)(u_{1},u_{2}) is a cut edge, we know ηdfc(u1,EtreeTv,EbackTv)ηdfc(u2,EtreeTv,EbackTv)\eta_{\text{dfc}}(u_{1},E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}})\neq\eta_{\text{dfc}}(u_{2},E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}}). This contradicts to {{λ(u1),λ(u2)}}={{λ(x1),λ(x2)}}\{\!\!\{\lambda(u_{1}),\lambda(u_{2})\}\!\!\}=\{\!\!\{\lambda(x_{1}),\lambda(x_{2})\}\!\!\}; thus (x1,x2)(x_{1},x_{2}) must be a cut edge. We can swap (x1,x2)(x_{1},x_{2}) and (u1,u2)(u_{1},u_{2}) in the proof to show if (x1,x2)(x_{1},x_{2}) is a cut edge, (u1,u2)(u_{1},u_{2}) must also be a cut edge. The proof is done for the second statement.

  • For a graph GG to be cut vertex/edge connected, in a DFS tree, there must be a set of back edges that crossover each other and cover all vertices of GG. Therefore, if ηdfc(u,EtreeTv,EbackTv)\eta_{\text{dfc}}(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}}) includes all vertices of GG, then we have η(u0,E,E)=η(u1,E,E)==η(uN1,E,E)\eta(u_{0},E_{\succ},E_{\prec})=\eta(u_{1},E_{\succ},E_{\prec})=\dots=\eta(u_{N-1},E_{\succ},E_{\prec}) for u0,u1,,uN1VGu_{0},u_{1},\dots,u_{N-1}\in V_{G}. If HH is not cut vertex/edge connected, then there must exist at least one vertex wVHw\in V_{H} such that ww is not covered by the same set of crossover back edges that cover other vertices. Thus, ηdfc(w,EtreeTv,EbackTv)ηdfc(x,EtreeTv,EbackTv)\eta_{\text{dfc}}(w,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}})\neq\eta_{\text{dfc}}(x,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}}) for xVH{w}x\in V_{H}\setminus\{w\}. We have {{λ(u):uVG}}{{λ(u):uVH}}\{\!\!\{\lambda(u):u\in V_{G}\}\!\!\}\neq\{\!\!\{\lambda(u^{\prime}):u^{\prime}\in V_{H}\}\!\!\} if GG is vertex/edge-biconnected but HH is not. The proof is done for the third statement.

See 4.3

Proof.

From the proofs of Sections 4.3 and C, we know that, if uu is in a cycle, uu is covered by at least one back edge. So ηdfc(u,EtreeTv,EbackTv)\eta_{\text{dfc}}(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}}) contains more than 2 vertices. Since λ(u)=λ(u)\lambda(u)=\lambda(u^{\prime}), ηdfc(u,EtreeTv,EbackTv)\eta_{\text{dfc}}(u^{\prime},E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}}) also needs to contain more than two vertices, which indicates uu^{\prime} is in a cycle.

See 4.4

Proof.

When Δ=1\Delta=1, the vertex colouring function in Equation 1 becomes

λi+1(u):=ρ(λi(u),{{λvi+1(u)|(u,v)E}}).\lambda^{i+1}(u):=\rho(\lambda^{i}(u),\{\!\!\{\lambda^{i+1}_{v}(u)|(u,v)\in E\}\!\!\}). (13)

Accordingly, we have Equation 2

λvi+1(u):=ϕ(λi(u),ψ({{λi(v)}})),\lambda^{i+1}_{v}(u):=\phi\left(\lambda^{i}(u),\psi(\{\!\!\{\lambda^{i}(v)\}\!\!\})\right), (14)

which leads to

λi+1(u):=ρ(λi(u),{{ϕ(λi(u),ψ({{λi(v)}}))|(u,v)E}}).\lambda^{i+1}(u):=\rho\left(\lambda^{i}(u),\{\!\!\{\phi\left(\lambda^{i}(u),\psi\left(\{\!\!\{\lambda^{i}(v)\}\!\!\}\right)\right)|(u,v)\in E\}\!\!\}\right). (15)

Elements in the multiset {{ϕ(λi(u),ψ({{λi(v)}}))|(u,v)E}}\{\!\!\{\phi\left(\lambda^{i}(u),\psi\left(\{\!\!\{\lambda^{i}(v)\}\!\!\}\right)\right)|(u,v)\in E\}\!\!\} only differ in the input of ψ()\psi(\cdot). Thus, we can simplify it by defining a new injective function κ(λi(v))=ϕ(λi(u),ψ({{λi(v)}}))\kappa(\lambda^{i}(v))=\phi\left(\lambda^{i}(u),\psi\left(\{\!\!\{\lambda^{i}(v)\}\!\!\}\right)\right). Equation 15 becomes

λi+1(u):=ρ(λi(u),{{κ(λi(v))|(u,v)E}}),\lambda^{i+1}(u):=\rho\left(\lambda^{i}(u),\{\!\!\{\kappa(\lambda^{i}(v))|(u,v)\in E\}\!\!\}\right), (16)

which is exactly the vertex colouring function of 1-WL. The proof is done. ∎

See 4.4

Proof.

Consider two vertices vv and uu, and their 1-hop neighbour vertex sets N1(v)N_{1}(v) and N1(u)N_{1}(u). We use λdfc()\lambda_{\text{dfc}}(\cdot) and λwl()\lambda_{\text{wl}}(\cdot) to denote the colour refinements by DFC-1 and 1-WL, respectively. We first show that if λdfc(v)=λdfc(u)\lambda_{\text{dfc}}(v)=\lambda_{\text{dfc}}(u), then λwl(v)=λwl(u)\lambda_{\text{wl}}(v)=\lambda_{\text{wl}}(u). For λdfc(v)=λdfc(u)\lambda_{\text{dfc}}(v)=\lambda_{\text{dfc}}(u) to hold, the number of vertices in N1(v)N_{1}(v) must be the same as N1(u)N_{1}(u) because of Appendix C. This means that vv and uu have the same degree, and thus λwl(v)=λwl(u)\lambda_{\text{wl}}(v)=\lambda_{\text{wl}}(u) must hold.

We now show that if λwl(v)=λwl(u)\lambda_{\text{wl}}(v)=\lambda_{\text{wl}}(u), λdfc(v)=λdfc(u)\lambda_{\text{dfc}}(v)=\lambda_{\text{dfc}}(u) may not hold. Consider the left graph pair in Figure 5, each vertex has a degree of 2 and all vertices have the same colour under 1-WL. However, the number of vertices returned by Equation 6 is not the same for the vertices in the left graph and the vertices in the right graph. Specifically, ηdfc(u,EtreeTv,EbackTv)\eta_{\text{dfc}}(u,E_{\text{tree}}^{T_{v}},E_{\text{back}}^{T_{v}}) returns a set of 3 vertices for each vertex in the left graph and a set of 2 vertices for the right graph. Thus, λdfc(v)λdfc(u)\lambda_{\text{dfc}}(v)\neq\lambda_{\text{dfc}}(u). This means that DFC-1 can distinguish some pairs of non-isomorphic graphs which 1-WL cannot distinguish. The proof is done. ∎

See 4.4

Proof.

The left graph pair in Figure 5 can be distinguished by BFC-2 but not by 1-WL. According to Section 4.4, for any δ>2\delta>2, BFC-δ\delta can also distinguish this graph pair. Similarly, the right graph pair in Figure 5 can be distinguished by BFC-3 but not by 1-WL. So for any δ>3\delta>3, BFC-δ\delta can also distinguish this graph pair. ∎

See 4.4

Proof.

The left graph pair in Figure 5 can be distinguished by DFC-δ\delta for any δ1\delta\geq 1. The right graph pair in Figure 5 can be distinguished by DFC-δ\delta for any δ2\delta\geq 2. Both graph pairs cannot be distinguished by 1-WL. ∎

Before proving Section 4.4, we first define the version of the kk-WL test studied by Cai et al. (1992), which is also called folklore-WL (FWL) (Morris et al., 2019). Let v=(v1,v2,v3,,vk)\overrightarrow{v}=(v_{1},v_{2},v_{3},\dots,v_{k}) be a kk-tuple. Then the neighbourhood of kk-FWL is defined as a set of n=|V|n=|V| elements:

NF(v)={NwF(v)|wV},N^{F}(\overrightarrow{v})=\{N^{F}_{w}(\overrightarrow{v})|w\in V\},

where each NwF(v)N^{F}_{w}(\overrightarrow{v}) is defined as:

NwF(v)={(w,v2,v3,,vk),(v1,w,v3,,vk),,(v1,v2,,vk1,w)}.\hskip 17.07182ptN^{F}_{w}(\overrightarrow{v})=\{(w,v_{2},v_{3},\dots,v_{k}),(v_{1},w,v_{3},\dots,v_{k}),\dots,(v_{1},v_{2},\dots,v_{k-1},w)\}.

It is known that kk-WL is equivalent to (kk-1)-FWL in distinguishing non-isomorphic graphs when k>2k>2 (Cai et al., 1992). Thus, below we only need to show that the expressivity of BFC is strictly upper bounded by 2-FWL.

When k=2k=2, the neighbourhood of 22-FWL is a set of nn elements of a pair of 2-tuples:

NF(u,v)={(u,w),(w,v)|wV}.N^{F}(u,v)=\{(u,w),(w,v)|w\in V\}. (17)

Equivalently, the above equation may be written as:

NF(u,v)={(u,w,v)|wV}.N^{F}(u,v)=\{(u,w,v)|w\in V\}. (18)

Let G=(V,E)G=(V,E) be an input graph, 22-FWL assigns colours to all pairs of vertices of GG. Initially, there are three colours being assigned: edge, nonedge, and self. Then the colours of these pairs of vertices are refined iteratively by assigning a new colour to each pair (u,v)(u,v) depending on the colours of {(u,w,v)|wV}\{(u,w,v)|w\in V\} in GG. This process continues until the colours of all pairs of vertices stabilize.

Let Col(u,v)Col(u,v) denote the colour of the vertex pair (u,v)(u,v) by 2-FWL, and d(u,v)d(u,v) denote the shortest-path distance between uu and vv. We have Appendix C.

{lemma}

[] Given two pairs of vertices (u,v)(u,v) and (u,v)(u^{\prime},v^{\prime}), if SPG(u,v)≄SPG(u,v)SPG(u,v)\not\simeq SPG(u^{\prime},v^{\prime}), then Col(u,v)Col(u,v)Col(u,v)\neq Col(u^{\prime},v^{\prime}).

Proof.

We prove this by induction.

  • When d(u,v)=0d(u,v)=0 and d(u,v)=0d(u^{\prime},v^{\prime})=0, we must have u=vu=v and u=vu^{\prime}=v^{\prime}. Accordingly, both SPG(u,v)SPG(u,v) and SPG(u,v)SPG(u^{\prime},v^{\prime}) contain only one node. Thus SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) and it is impossible to have SPG(u,v)≄SPG(u,v)SPG(u,v)\not\simeq SPG(u^{\prime},v^{\prime}).

  • When d(u,v)=1d(u,v)=1 and d(u,v)=1d(u^{\prime},v^{\prime})=1, we have (u,v)E(u,v)\in E and (u,v)E(u^{\prime},v^{\prime})\in E. Then both SPG(u,v)SPG(u,v) and SPG(u,v)SPG(u^{\prime},v^{\prime}) contain only one edge. Thus SPG(u,v)SPG(u,v)SPG(u,v)\simeq SPG(u^{\prime},v^{\prime}) and it is impossible to have SPG(u,v)≄SPG(u,v)SPG(u,v)\not\simeq SPG(u^{\prime},v^{\prime}).

  • When d(u,v)=2d(u,v)=2 and d(u,v)=2d(u^{\prime},v^{\prime})=2, we must have (u,v)E(u,v)\not\in E and (u,v)E(u^{\prime},v^{\prime})\not\in E. By Equation 18, we also know that

    NF(u,v)=\displaystyle N^{F}(u,v)= {(u,w,v)|(u,w)E,(w,v)E,wV\{u,v}}\displaystyle\{(u,w,v)|(u,w)\in E,(w,v)\in E,w\in V\backslash\{u,v\}\}\cup
    {(u,w,v)|(u,w)E,(w,v)E,wV\{u,v}}\displaystyle\{(u,w,v)|(u,w)\in E,(w,v)\not\in E,w\in V\backslash\{u,v\}\}\cup
    {(u,w,v)|(u,w)E,(w,v)E,wV\{u,v}}\displaystyle\{(u,w,v)|(u,w)\not\in E,(w,v)\in E,w\in V\backslash\{u,v\}\}\cup
    {(u,w,v)|(u,w)E,(w,v)E,wV\{u,v}}\displaystyle\{(u,w,v)|(u,w)\not\in E,(w,v)\not\in E,w\in V\backslash\{u,v\}\}\cup
    {(u,w,v)|w=u}\displaystyle\{(u,w,v)|w=u\}\cup
    {(u,w,v)|w=v}\displaystyle\{(u,w,v)|w=v\}

    Note that, each of the subsets in the above equation corresponds to a different kind of neighbour in the neighbourhood of (u,v)(u,v). In this case, equivalently, SPG(u,v)=(Vuv,Euv)SPG(u,v)=(V_{uv},E_{uv}) can also be expressed as the subsets of NF(u,v)N^{F}(u,v) that corresponds to three kinds of neighbours in the neighbourhood of (u,v)(u,v) (Lines 1, 5, and 6 in the above equation):

    Vuv=\displaystyle V_{uv}= {u,v}{w|(u,w)E,(w,v)E,wV\{u,v}}\displaystyle\{u,v\}\cup\{w|(u,w)\in E,(w,v)\in E,w\in V\backslash\{u,v\}\}
    Euv=\displaystyle E_{uv}= {(u,w),(w,v)|(u,w)E,(w,v)E,wV\{u,v}}\displaystyle\{(u,w),(w,v)|(u,w)\in E,(w,v)\in E,w\in V\backslash\{u,v\}\}

    We know that the colouring of 2-FWL preserves injectivity. Thus, if SPG(u,v)≄SPG(u,v)SPG(u,v)\not\simeq SPG(u^{\prime},v^{\prime}), it means that their corresponding subsets in NF(u,v)N^{F}(u,v) and NF(u,v)N^{F}(u^{\prime},v^{\prime}) are not isomorphic. Then Col(u,v)Col(u,v)Col(u,v)\neq Col(u^{\prime},v^{\prime}).

  • Now assume that the statement “if SPG(u,v)≄SPG(u,v)SPG(u,v)\not\simeq SPG(u^{\prime},v^{\prime}), then Col(u,v)Col(u,v)Col(u,v)\neq Col(u^{\prime},v^{\prime})” holds for any two pairs of vertices (u,v)(u,v) and (u,v)(u^{\prime},v^{\prime}) when d(u,v)=d(u,v)Δd(u,v)=d(u^{\prime},v^{\prime})\leq\Delta. We want to show that this statement will hold for the case d(u,v)=d(u,v)=Δ+1d(u,v)=d(u^{\prime},v^{\prime})=\Delta+1.

    When d(u,v)=Δ+1d(u,v)=\Delta+1, we may express SPG(u,v)SPG(u,v) as a tree rooted at vertex uu, which has a number of children {SPG(u1,v),,SPG(uq,v)}\{SPG(u_{1},v),\dots,SPG(u_{q},v)\} where d(ui,v)=Δd(u_{i},v)=\Delta for 1iq1\leq i\leq q. Accordingly, we may express SPG(u,v)SPG(u^{\prime},v^{\prime}) in a similar way. Thus, if SPG(u,v)≄SPG(u,v)SPG(u,v)\not\simeq SPG(u^{\prime},v^{\prime}), there are two cases:

    • (1)

      {SPG(u,u),SPG(v,v)}≄{SPG(u,u),SPG(v,v)}\{SPG(u,u),SPG(v,v)\}\not\simeq\{SPG(u^{\prime},u^{\prime}),SPG(v^{\prime},v^{\prime})\}
      By our assumption for the case d(u,v)=d(u,v)Δd(u,v)=d(u^{\prime},v^{\prime})\leq\Delta, we have {Col(u,u),Col(v,v)}≄{Col(u,u),Col(v,v)}\{Col(u,u),Col(v,v)\}\not\simeq\{Col(u^{\prime},u^{\prime}),Col(v^{\prime},v^{\prime})\}. Hence, we know that Col(u,v)Col(u,v)Col(u,v)\neq Col(u^{\prime},v^{\prime}).

    • (2)

      {SPG(u,u),SPG(v,v)}{SPG(u,u),SPG(v,v)}\{SPG(u,u),SPG(v,v)\}\simeq\{SPG(u^{\prime},u^{\prime}),SPG(v^{\prime},v^{\prime})\}
      Without loss of generality, we assume that SPG(u,u)SPG(u,u)SPG(u,u)\simeq SPG(u^{\prime},u^{\prime}) and SPG(v,v)SPG(v,v)SPG(v,v)\simeq SPG(v^{\prime},v^{\prime}). Then in this case SPG(u,v)≄SPG(u,v)SPG(u,v)\not\simeq SPG(u^{\prime},v^{\prime}) implies that {SPG(u1,v),,SPG(uq,v)}≄{SPG(u1,v),,SPG(up,v)}\{SPG(u_{1},v),\dots,SPG(u_{q},v)\}\not\simeq\{SPG(u^{\prime}_{1},v^{\prime}),\dots,SPG(u^{\prime}_{p},v^{\prime})\} where (u,ui)E(u,u_{i})\in E and d(ui,v)=Δd(u_{i},v)=\Delta for 1iq1\leq i\leq q, and (u,uj)E(u^{\prime},u^{\prime}_{j})\in E and d(uj,v)=Δd(u^{\prime}_{j},v^{\prime})=\Delta for 1jp1\leq j\leq p. Here, p=qp=q must hold; otherwise we immediately have Col(u,v)Col(u,v)Col(u,v)\neq Col(u^{\prime},v^{\prime}). By our assumption for the case d(u,v)=d(u,v)Δd(u,v)=d(u^{\prime},v^{\prime})\leq\Delta, we have {Col(u1,v),,Col(uq,v)}≄{Col(u1,v),,Col(up,v)}\{Col(u_{1},v),\dots,Col(u_{q},v)\}\not\simeq\{Col(u^{\prime}_{1},v^{\prime}),\dots,Col(u^{\prime}_{p},v^{\prime})\}. Thus, Col(u,v)Col(u,v)Col(u,v)\neq Col(u^{\prime},v^{\prime}) must hold.

See 4.4

Proof.

We first seek to show 3-WL is at least as powerful as BFC-δ\delta. Let G=(VG,EG)G=(V_{G},E_{G}) and H=(VH,EH)H=(V_{H},E_{H}) be two input graphs, according to Figure 3, BFC can distinguish graphs only when they have different ESPGs, e.g. {{λSv(v):vVG}}{{λSu(u):uVH}}\{\!\!\{\lambda_{S_{v}}(v):v\in V_{G}\}\!\!\}\neq\{\!\!\{\lambda_{S_{u}}(u):u\in V_{H}\}\!\!\}. So we just need to show {{Col(v,v)|vVG}}{{Col(u,u)|uVH}}\{\!\!\{Col(v,v^{\prime})|v^{\prime}\in V_{G}\}\!\!\}\neq\{\!\!\{Col(u,u^{\prime})|u^{\prime}\in V_{H}\}\!\!\} for any vVGv\in V_{G} and uVHu\in V_{H} when Sv≄SuS_{v}\not\simeq S_{u}. Sv≄SuS_{v}\not\simeq S_{u} implies {{SPG(v,v):vVG}}≄{{SPG(u,u):uVH}}\{\!\!\{SPG(v^{\prime},v):v^{\prime}\in V_{G}\}\!\!\}\not\simeq\{\!\!\{SPG(u^{\prime},u):u\in V_{H}\}\!\!\}. According to Appendix C, we must have {{Col(v,v)|vVG}}{{Col(u,u)|uVH}}\{\!\!\{Col(v,v^{\prime})|v^{\prime}\in V_{G}\}\!\!\}\neq\{\!\!\{Col(u,u^{\prime})|u^{\prime}\in V_{H}\}\!\!\}. Thus 3-WL can distinguish graphs when they have different ESPGs, so 3-WL is at least as powerful as BFC-δ\delta.

Now we seek to show the strictness of this bound, that is 3-WL is strictly more expressive than BFC. We show this by introducing an example graph pair in Figure 8. In this example 3-WL can distinguish the graphs but BFC cannot because the two graphs have isomorphic ESPGs. Therefore the expressiveness of BFC is strictly upper bounded by 3-WL.

Refer to caption
Figure 8: A pair of non-isomorphic graphs that can be distinguished by 3-WL but not by BFC.

To prove Section 4.4, we first introduce Appendix C.

{lemma}

[] BFC-δ+1{\delta+1} is at least as expressive as BFC-δ{\delta} in distinguishing non-isomorphic graphs.

Proof.

This lemma requires to show that, for any ii-th iteration, if BFC-δ+1\delta+1 must have the same multiset of vertex colours for G1G_{1} and G2G_{2}, then BFC-δ\delta must also have the same multisets of vertex colours for G1G_{1} and G2G_{2}. Below we show for any iteration ii, if the colours of any two vertices in G1G_{1} and G2G_{2} are the same by BFC-δ+1\delta+1, then their colours by BFC-δ\delta must also be the same. We show this by induction.

  • For i=0i=0, it is obvious that the initial vertex colours are the same for BFC-δ+1\delta+1 and BFC-δ\delta.

  • For i>0i>0, we assume that this statement “if λi(u1)=λi(u2)\lambda^{i}(u_{1})=\lambda^{i}(u_{2}) for BFC-δ+1\delta+1 then λi(u1)=λi(u2)\lambda^{i}(u_{1})=\lambda^{i}(u_{2}) for BFC-δ\delta” holds for the ii-th iteration, and seek to show that the statement also holds for the (i+1)(i+1)-th iteration. We show this by contradiction. Assuming λi+1(u1)=λi+1(u2)\lambda^{i+1}(u_{1})=\lambda^{i+1}(u_{2}) hold for BFC-δ+1\delta+1 but not for BFC-δ\delta, we have

    ρ(λi(u1),{{λvi(u1)|vNδ+1(u1)}})=ρ(λi(u2),{{λvi(u2)|vNδ+1(u2)}})\rho(\lambda^{i}(u_{1}),\{\!\!\{\lambda^{i}_{v}(u_{1})|v\in N_{\delta+1}(u_{1})\}\!\!\})=\rho(\lambda^{i}(u_{2}),\{\!\!\{\lambda^{i}_{v}(u_{2})|v\in N_{\delta+1}(u_{2})\}\!\!\}) (19)

    and

    ρ(λi(u1),{{λvi(u1)|vNδ(u1)}})ρ(λi(u2),{{λvi(u2)|vNδ(u2)}}).\rho(\lambda^{i}(u_{1}),\{\!\!\{\lambda^{i}_{v}(u_{1})|v\in N_{\delta}(u_{1})\}\!\!\})\neq\rho(\lambda^{i}(u_{2}),\{\!\!\{\lambda^{i}_{v}(u_{2})|v\in N_{\delta}(u_{2})\}\!\!\}). (20)

    Because λi+1(u1)=λi+1(u2)\lambda^{i+1}(u_{1})=\lambda^{i+1}(u_{2}) for BFC-δ+1\delta+1, we must have λi(u1)=λi(u2)\lambda^{i}(u_{1})=\lambda^{i}(u_{2}) for BFC-δ+1\delta+1. According to our assumption “if λi(u1)=λi(u2)\lambda^{i}(u_{1})=\lambda^{i}(u_{2}) for BFC-δ+1\delta+1 then λi(u1)=λi(u2)\lambda^{i}(u_{1})=\lambda^{i}(u_{2}) for BFC-δ\delta”, we have λi(u1)=λi(u2)\lambda^{i}(u_{1})=\lambda^{i}(u_{2}) hold for BFC-δ\delta. Thus, we can simplify Equations 19 and 20 as

    {{λvi(u1)|vNδ+1(u1)}}={{λvi(u2)|vNδ+1(u2)}}\{\!\!\{\lambda^{i}_{v}(u_{1})|v\in N_{\delta+1}(u_{1})\}\!\!\}=\{\!\!\{\lambda^{i}_{v}(u_{2})|v\in N_{\delta+1}(u_{2})\}\!\!\} (21)
    {{λvi(u1)|vNδ(u1)}}{{λvi(u2)|vNδ(u2)}}\{\!\!\{\lambda^{i}_{v}(u_{1})|v\in N_{\delta}(u_{1})\}\!\!\}\neq\{\!\!\{\lambda^{i}_{v}(u_{2})|v\in N_{\delta}(u_{2})\}\!\!\} (22)

    Because Nδ(u)Nδ+1(u)N_{\delta}(u)\subseteq N_{\delta+1}(u) and Nδ+1(u)Nδ(u)={v|d(u,v)=δ+1}N_{\delta+1}(u)-N_{\delta}(u)=\{v|d(u,v)=\delta+1\}, there must exist at least one pair of vertices uu^{\prime} and u′′u^{\prime\prime}, where d(v,u)=δ+1d(v,u^{\prime})=\delta+1 and d(v,u′′)δd(v,u^{\prime\prime})\leq\delta, such that λvi(u)=λvi(u′′)\lambda^{i}_{v}(u^{\prime})=\lambda^{i}_{v}(u^{\prime\prime}). This contradicts Appendix C. So the assumption “λi+1(u1)λi+1(u2)\lambda^{i+1}(u_{1})\neq\lambda^{i+1}(u_{2}) for BFC-δ\delta” must not hold. Thus, the statement “if λi+1(u1)=λi+1(u2)\lambda^{i+1}(u_{1})=\lambda^{i+1}(u_{2}) for BFC-δ+1\delta+1 then λi+1(u1)=λi+1(u2)\lambda^{i+1}(u_{1})=\lambda^{i+1}(u_{2}) for BFC-δ\delta” holds.

This means that, for any iteration ii, if the colours of any two vertices in G1G_{1} and G2G_{2} are the same by BFC-δ+1\delta+1, then their colours by BFC-δ\delta must also be the same. The proof is done. ∎

Now we prove Section 4.4. See 4.4

Proof.

By Appendix C, we know that BFC-δ+1\delta+1 is at least as expressive as BFC-δ\delta. Now we just need to show that there exists at least one pair of non-isomorphic graphs (G^1,G^2)(\hat{G}_{1},\hat{G}_{2}) that can be distinguished by BFC-δ+1\delta+1 but not by BFC-δ\delta.

Inspired by Wang et al. (2023), we hereby show a specific construction of such graph pairs using cycles. We construct G^1\hat{G}_{1} to be two cycles of length 2δ+12\delta+1, and G^2\hat{G}_{2} to be one cycle of length 4δ+24\delta+2, for any δ1\delta\geq 1. G^1\hat{G}_{1} and G^2\hat{G}_{2} can be distinguished by BFC-δ+1\delta+1 but not by BFC-δ\delta. Figure 5 shows two examples graph pairs constructed using this method. The proof is done. ∎

See 5

Proof.

The concatenation operator in Equation 7 preserves injectivity. So Equation 7 is equivalent of replacing ρ()\rho(\cdot) in Equation 1 with an MLP. The summation operator in Equation 8 is the injective variant of ψ()\psi(\cdot) in Equation 2. The multiplication with WcW_{c} is a single-layer MLP that used in place for ϕ()\phi(\cdot) in Equation 2. As a universal approximator (Hornik, 1991; Hornik et al., 1989), MLP can learn injective functions. Hence, SGN can be shown to be as expressive as LVC following the proof of Theorem 3 by Xu et al. (2019). We leave the details out for brevity. ∎