This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Graph Neural Networks with Local Graph Parameters

Pablo Barceló PUC Chile and IMFD Chile Floris Geerts University of Antwerp Juan Reutter PUC Chile and IMFD Chile Maksimilian Ryschkov University of Antwerp
Abstract

Various recent proposals increase the distinguishing power of Graph Neural Networks (𝖦𝖭𝖭s)(\mathsf{GNN}\text{s}) by propagating features between kk-tuples of vertices. The distinguishing power of these “higher-order” 𝖦𝖭𝖭s\mathsf{GNN}\text{s} is known to be bounded by the kk-dimensional Weisfeiler-Leman (𝖶𝖫\mathsf{WL}) test, yet their 𝒪(nk)\mathcal{O}(n^{k}) memory requirements limit their applicability. Other proposals infuse 𝖦𝖭𝖭s\mathsf{GNN}\text{s} with local higher-order graph structural information from the start, hereby inheriting the desirable 𝒪(n)\mathcal{O}(n) memory requirement from 𝖦𝖭𝖭s\mathsf{GNN}\text{s} at the cost of a one-time, possibly non-linear, preprocessing step. We propose local graph parameter enabled 𝖦𝖭𝖭s\mathsf{GNN}\text{s} as a framework for studying the latter kind of approaches and precisely characterize their distinguishing power, in terms of a variant of the 𝖶𝖫\mathsf{WL} test, and in terms of the graph structural properties that they can take into account. Local graph parameters can be added to any 𝖦𝖭𝖭\mathsf{GNN} architecture, and are cheap to compute. In terms of expressive power, our proposal lies in the middle of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} and their higher-order counterparts. Further, we propose several techniques to aide in choosing the right local graph parameters. Our results connect 𝖦𝖭𝖭s\mathsf{GNN}\text{s} with deep results in finite model theory and finite variable logics. Our experimental evaluation shows that adding local graph parameters often has a positive effect for a variety of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}, datasets and graph learning tasks.

1 Introduction

Context.

Graph neural networks (𝖦𝖭𝖭s\mathsf{GNN}\text{s}) (Merkwirth & Lengauer, 2005; Scarselli et al., 2009), and its important class of Message Passing Neural Networks (𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}) (Gilmer et al., 2017), are one of the most popular methods for graph learning tasks. Such 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} use an iterative message passing scheme, based on the adjacency structure of the underlying graph, to compute vertex (and graph) embeddings in some real Euclidean space. The expressive (or discriminative) power of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} is, however, rather limited (Xu et al., 2019; Morris et al., 2019). Indeed, 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} will always identically embed two vertices (graphs) when these vertices (graphs) cannot be distinguished by the one-dimensional Weisfeiler-Leman (𝖶𝖫\mathsf{WL}) algorithm. Two graphs G1G_{1} and H1H_{1} and vertices vv and ww that cannot be distinguished by 𝖶𝖫\mathsf{WL} (and thus any 𝖬𝖯𝖭𝖭\mathsf{MPNN}) are shown in Fig. 1. The expressive power of 𝖶𝖫\mathsf{WL} is well-understood (Cai et al., 1992; Arvind et al., 2020; Dell et al., 2018) and basically can only use tree-based structural information in the graphs to distinguish vertices. As a consequence, no 𝖬𝖯𝖭𝖭\mathsf{MPNN} can detect that vertex vv in Fig. 1 is part of a 33-clique, whereas ww is not. Similarly, 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} cannot detect that ww is part of a 44-cycle, whereas vv is not. Further limitations of 𝖶𝖫\mathsf{WL} in terms of graph properties can be found, e.g., in Fürer (2017); Arvind et al. (2020); Chen et al. (2020); Tahmasebi & Jegelka (2020).

To remedy the weak expressive power of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, so-called higher-order 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} were proposed (Morris et al., 2019; Maron et al., 2019b; Morris et al., 2020), whose expressive power is measured in terms of the kk-dimensional 𝖶𝖫\mathsf{WL} procedures (k-𝖶𝖫k\text{-}\mathsf{WL}) (Maron et al., 2019a; Chen et al., 2019a; Azizian & Lelarge, 2021; Geerts, 2020; Sato, 2020; Damke et al., 2020). In a nutshell, k-𝖶𝖫k\text{-}\mathsf{WL} operates on kk-tuples of vertices and allows to distinguish vertices (graphs) based on structural information related to graphs of treewidth kk (Dvorak, 2010; Dell et al., 2018). By definition, 𝖶𝖫=1-𝖶𝖫\mathsf{WL}=1\text{-}\mathsf{WL}. As an example, 2-𝖶𝖫2\text{-}\mathsf{WL} can detect that vertex vv in Fig. 1 belongs to a 33-clique or a 44-cycle since both have treewidth two. While more expressive than 𝖶𝖫\mathsf{WL}, the 𝖦𝖭𝖭s\mathsf{GNN}\text{s} based on k-𝖶𝖫k\text{-}\mathsf{WL} require 𝒪(nk)\mathcal{O}(n^{k}) operations in each iteration, where nn is the number of vertices, hereby hampering their applicability.

vv(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)(2)G1G_{1}
ww(0)(0)(0)(0)(0)(0)(0)(0)(0)(0)(0)(0)H1H_{1}
Figure 1: Two graphs that are indistinguishable by the 𝖶𝖫\mathsf{WL}-test. The numbers between round brackets indicate how many homomorphic images of the 33-clique each vertex is involved in.

A more practical approach is to extend the expressive power of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} whilst preserving their 𝒪(n)\mathcal{O}(n) cost in each iteration. Various such extensions (Kipf & Welling, 2017; Chen et al., 2019a; Li et al., 2019; Ishiguro et al., 2020; Bouritsas et al., 2020; Geerts et al., 2020) achieve this by infusing 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} with local graph structural information from the start. That is, the iterative message passing scheme of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} is run on vertex labels that contain quantitative information about local graph structures. It is easy to see that such architectures can go beyond the 𝖶𝖫\mathsf{WL} test: for example, adding triangle counts to 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} suffices to distinguish the vertices vv and ww and graphs G1G_{1} and H1H_{1} in Fig. 1. Moreover, the cost is a single preprocessing step to count local graph parameters, thus maintaining the 𝒪(n)\mathcal{O}(n) cost in the iterations of the 𝖬𝖯𝖭𝖭\mathsf{MPNN}. While there are some partial results showing that local graph parameters increase expressive power (Bouritsas et al., 2020; Li et al., 2019), their precise expressive power and relationship to higher-order 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} was unknown, and there is little guidance in terms of which local parameters do help 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and which ones do not. The main contribution of this paper is a precise characterization of the expressive power of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} with local graph parameters and its relationship to the hierarchy of higher-order 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}.

Our contributions.

In order to nicely formalize local graph parameters, we propose to extend vertex labels with homomorphism counts of small graph patterns.111We recall that homomorphisms are edge-preserving mappings between the vertex sets. More precisely, given a graphs PP and GG, and vertices rr in PP and vv in GG, we add the number of homomorphisms from PP to GG that map rr to vv, denoted by 𝗁𝗈𝗆(Pr,Gv)\mathsf{hom}(P^{r},G^{v}), to the initial features of vv. Such counts satisfy conditions (i) and (ii). Indeed, homomorphism counts are known to measure the similarity of vertices and graphs (Lovász, 1967; Grohe, 2020a), and serve as a basis for the efficient computation of a number of other important graph parameters, e.g., subgraph and induced subgraphs counts (Curticapean et al., 2017; Zhang et al., 2020). Furthermore, homomorphism counts underly characterisations of the expressive power of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and higher-order 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. As an example, two vertices vv and ww in graphs GG and HH, respectively, are indistinguishable by 𝖶𝖫\mathsf{WL}, and hence by 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, precisely when 𝗁𝗈𝗆(Tr,Gv)=𝗁𝗈𝗆(Tr,Hw)\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w}) for every rooted tree TrT^{r} (Dvorak, 2010; Dell et al., 2018).

Concretely, we propose \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} where ={P1r,,Pr}\mathcal{F}=\{P_{1}^{r},\allowbreak\dots,\allowbreak P_{\ell}^{r}\} is a set of (graph) patterns, by (i) first allowing a pre-processing step that labels each vertex vv of a graph GG with the vector (𝗁𝗈𝗆(P1r,Gv),,𝗁𝗈𝗆(Pr,Gv))\bigl{(}\mathsf{hom}(P^{r}_{1},G^{v}),\ldots,\mathsf{hom}(P^{r}_{\ell},G^{v})\bigr{)}, and (ii) then run an 𝖬𝖯𝖭𝖭\mathsf{MPNN} on this labelling. As such, we can turn any 𝖬𝖯𝖭𝖭\mathsf{MPNN} into an \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} by simply augmenting the initial vertex embedding. Furthermore, several recently proposed extensions of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} fit in this approach, including 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} extended with information about vertex degrees (Kipf & Welling, 2017), walk counts (Chen et al., 2019a), tree-based counts (Ishiguro et al., 2020) and subgraph counts (Bouritsas et al., 2020). Hence, \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} can also be regarded as a unifying theoretical formalism.

Our main results can be summarised, as follows:

  1. 1.

    We precisely characterise the expressive power of \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} by means of an extension of 𝖶𝖫\mathsf{WL}, denoted by \mathcal{F}-𝖶𝖫\mathsf{WL}. For doing so, we use \mathcal{F}-pattern trees, which are obtained from standard trees by joining an arbitrary number of copies of the patterns in \mathcal{F} to each one of its vertices. Our result states that vertices vv and ww in graphs GG and HH, respectively, are indistinguishable by \mathcal{F}-𝖶𝖫\mathsf{WL}, and hence by \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, precisely when 𝗁𝗈𝗆(Tr,Gv)=𝗁𝗈𝗆(Tr,Hw)\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w}) for every \mathcal{F}-pattern tree TrT^{r}. This characterisation gracefully extends the characterisation for standard 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, mentioned earlier, by setting =\mathcal{F}=\emptyset. Furthermore, \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} provide insights in the expressive power of existing 𝖬𝖯𝖭𝖭\mathsf{MPNN} extensions, most notably the Graph Structure Networks of Bouritsas et al. (2020).

  2. 2.

    We compare \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} to higher-order 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, which are characterized in terms of the k-𝖶𝖫k\text{-}\mathsf{WL}-test. On the one hand, while \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} strictly increase the expressive power of the 𝖶𝖫\mathsf{WL}-test, for any finite set \mathcal{F} of patterns, 2-𝖶𝖫2\text{-}\mathsf{WL} can distinguish graphs which \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} cannot. On the other hand, for each k1k\geq 1 there are patterns PP such that {P}\{P\}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} can distinguish graphs which k-𝖶𝖫k\text{-}\mathsf{WL} cannot.

  3. 3.

    We deal with the technically challenging problem of pattern selection and comparing \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} based on the patterns included in \mathcal{F}. We prove two partial results: one establishing when a pattern PP in \mathcal{F} is redundant, based on whether or not PP is the join of other patterns in \mathcal{F}, and another result indicating when PP does add expressive power, based on the treewidth of PP compared to the treewidth of other patterns in \mathcal{F}.

  4. 4.

    Our theoretical results are complemented by an experimental study in which we show that for various 𝖦𝖭𝖭\mathsf{GNN} architectures, datasets and graph learning tasks, all part of the recent benchmark by Dwivedi et al. (2020), the augmentation of initial features with homomorphism counts of graph patterns has often a positive effect, and the cost for computing these counts incurs little to no overhead.

As such, we believe that \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} not only provide an elegant theoretical framework for understanding local graph parameter enabled 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, they are also a valuable alternative to higher-order 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} as way to increase the expressive power of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. In addition, and as will be explained in Section 2, \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} provide a unifying framework for understanding the expressive power of several other existing extensions of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. Proofs of our results and further details on the relationship to existing approaches and experiments can be found in the appendix.

Related Work.

Works related to the distinguishing power of the 𝖶𝖫\mathsf{WL}-test, 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and their higher-order variants are cited throughout the paper. Beyond distinguishability, 𝖦𝖭𝖭s\mathsf{GNN}\text{s} are analyzed in terms of universality and generalization properties (Maron et al., 2019c; Keriven & Peyré, 2019; Chen et al., 2019b; Garg et al., 2020), local distributed algorithms (Sato et al., 2019; Loukas, 2020), randomness in features (Sato et al., 2020; Abboud et al., 2020) and using local context matrix features (Vignac et al., 2020). Other extensions of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} are surveyed, e.g., in Wu et al. (2021); Zhou et al. (2018) and Chami et al. (2021). Related are the Graph Homomorphism Convolutions by NT & Maehara (2020) which apply 𝖲𝖵𝖬\mathsf{SVM}s directly on the representation of vertices by homomorphism counts. Finally, our approach is reminiscent of the graph representations by means of graphlet kernels (Shervashidze et al., 2009), but then on the level of vertices.

2 Local Graph Parameter Enabled 𝗠𝗣𝗡𝗡s\mathsf{MPNN}\text{s}

In this section we introduce 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} with local graph parameters. We start by introducing preliminary concepts.

Graphs.

We consider undirected vertex-labelled graphs G=(V,E,χ)G=(V,E,\chi), with VV the set of vertices, EE the set of edges and χ\chi a mapping assigning a label to each vertex in VV. The set of neighbours of a vertex is denoted as NG(v)={uV|{u,v}E}N_{G}(v)=\bigl{\{}u\in V\bigm{|}\{u,v\}\in E\bigr{\}}. A rooted graph is a graph in which one its vertices is declared as its root. We denote a rooted graph by GvG^{v}, where vVv\in V is the root and depict them as graphs in which the root is a blackened vertex. Given graphs G=(VG,EG,χG)G=(V_{G},E_{G},\chi_{G}) and H=(HH,EH,χH)H=(H_{H},E_{H},\chi_{H}), an homomorphism hh is a mapping h:VGVHh:V_{G}\to V_{H} such that (i) {h(u),h(v)}EH\{h(u),h(v)\}\in E_{H} for every {u,v}EG\{u,v\}\in E_{G}, and (ii) χG(u)=χH(h(u))\chi_{G}(u)=\chi_{H}(h(u)) for every uVGu\in V_{G}. For rooted graphs GvG^{v} and HwH^{w}, an homomorphism must additionally map vv to ww. We denote by 𝗁𝗈𝗆(G,H)\mathsf{hom}(G,H) the number of homomorphisms from GG to HH; similarly for rooted graphs. For simplicity of exposition we focus on vertex-labelled undirected graphs but all our results can be extended to edge-labelled directed graphs.

Message passing neural networks.

The basic architecture for 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (Gilmer et al., 2017), and the one used in recent studies on GNN expressibility (Morris et al., 2019; Xu et al., 2019; Barceló et al., 2020), consists of a sequence of rounds that update the feature vector of every vertex in the graph by combining its current feature vector with the result of an aggregation over the feature vectors of its neighbours. Formally, for a graph G=(V,E,χ)G=(V,E,\chi), let 𝐱M,G,v(d)\mathbf{x}_{M,G,v}^{(d)} denote the feature vector computed for vertex vVv\in V by an 𝖬𝖯𝖭𝖭\mathsf{MPNN} MM in round dd. The initial feature vector 𝐱M,G,v(0)\mathbf{x}_{M,G,v}^{(0)} is a one-hot encoding of its label χ(v)\chi(v). This feature vector is iteratively updated in a number of rounds. In particular, in round dd,

𝐱M,G,v(d):=Upd(d)(𝐱M,G,v(d1),Comb(d)({{𝐱M,G,u(d1)uNG(v)}})),\mathbf{x}_{M,G,v}^{(d)}\,:=\,\textsc{Upd}^{(d)}\Big{(}\mathbf{x}_{M,G,v}^{(d-1)},\textsc{Comb}^{(d)}\big{(}\{\mskip-5.0mu\{\mathbf{x}_{M,G,u}^{(d-1)}\,\mid\,u\in N_{G}(v)\}\mskip-5.0mu\}\big{)}\Big{)},

where Comb(d)\textsc{Comb}^{(d)} and Upd(d)\textsc{Upd}^{(d)} are an aggregating and update function, respectively. Thus, the feature vectors 𝐱M,G,u(d1)\mathbf{x}_{M,G,u}^{(d-1)} of all neighbours uu of vv are combined by the aggregating function Comb(d)\textsc{Comb}^{(d)} into a single vector, and then this vector is used together with 𝐱M,G,v(d1)\mathbf{x}_{M,G,v}^{(d-1)} in order to produce 𝐱M,G,v(d)\mathbf{x}_{M,G,v}^{(d)} by applying the update function Upd(d)\textsc{Upd}^{(d)}.

𝗠𝗣𝗡𝗡s\mathsf{MPNN}\text{s} with local graph parameters.

The 𝖦𝖭𝖭s\mathsf{GNN}\text{s} studied in this paper leverage the power of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} by enhancing initial features of vertices with local graph parameters that are beyond their classification power. To illustrate the idea, consider the graphs in Fig. 1. As mentioned, these graphs cannot be distinguished by the 𝖶𝖫\mathsf{WL}-test, and therefore cannot be distinguished by the broad class of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (see e.g. (Xu et al., 2019; Morris et al., 2019)). If we allow a pre-processing stage, however, in which the initial labelling of every vertex vv is extended with the number of (homomorphic images of) 33-cliques in which vv participates (indicated by numbers between brackets in Fig. 1), then clearly vertices vv and ww (and the graphs G1G_{1} and H1H_{1}) can be distinguished based on this extra structural information. In fact, the initial labelling already suffices for this purpose.

Let ={P1r,,Pr}\mathcal{F}=\{P_{1}^{r},\dots,P_{\ell}^{r}\} be a set of (rooted) graphs, which we refer to as patterns. Then, \mathcal{F}-enabled 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, or just \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, are defined in the same way as 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} with the crucial difference that now the initial feature vector of a vertex vv is a one-hot encoding of the label χG(v)\chi_{G}(v) of the vertex, and all the homomorphism counts from patterns in \mathcal{F}. Formally, in each round dd an \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} MM labels each vertex vv in graph GG with a feature vector 𝐱M,,G,v(d)\mathbf{x}_{M,\mathcal{F},G,v}^{(d)} which is inductively defined as follows:

𝐱M,,G,v(0)\displaystyle\mathbf{x}_{M,\mathcal{F},G,v}^{(0)} :=(χG(v),𝗁𝗈𝗆(P1r,Gv),,𝗁𝗈𝗆(Pr,Gv))\displaystyle:=\big{(}\chi_{G}(v),\mathsf{hom}(P_{1}^{r},G^{v}),\dots,\mathsf{hom}(P_{\ell}^{r},G^{v})\big{)}
𝐱M,,G,v(d)\displaystyle\mathbf{x}_{M,\mathcal{F},G,v}^{(d)} :=Upd(d)(𝐱M,,G,v(d1),Comb(d)({{𝐱M,,G,v(d1)uNG(v)}})).\displaystyle:=\,\textsc{Upd}^{(d)}\Big{(}\mathbf{x}_{M,\mathcal{F},G,v}^{(d-1)},\textsc{Comb}^{(d)}\big{(}\{\mskip-5.0mu\{\mathbf{x}_{M,\mathcal{F},G,v}^{(d-1)}\,\mid\,u\in N_{G}(v)\}\mskip-5.0mu\}\big{)}\Big{)}.

We note that standard 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} are \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} with =\mathcal{F}=\emptyset. As for 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, we can equip \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} with a Readout function that aggregates all final feature vectors into a single feature vector in order to classify or distinguish graphs.

We emphasise that any 𝖬𝖯𝖭𝖭\mathsf{MPNN} architecture can be turned in an \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} by a simple homomorphism counting preprocessing step. As such, we propose a generic plug-in for a large class of 𝖦𝖭𝖭\mathsf{GNN} architectures. Better still, homomorphism counts of small graph patterns can be efficiently computed even on large datasets (Zhang et al., 2020) and they form the basis for counting (induced) subgraphs and other notions of subgraphs (Curticapean et al., 2017). Despite the simplicity of \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, we will show that they can substantially increase the power of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} by varying \mathcal{F}, only paying the one-time cost for preprocessing.

𝓕\mathcal{F}-𝗠𝗣𝗡𝗡s\mathsf{MPNN}\text{s} as unifying framework.

An important aspect of \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} is that they allow a principled analysis of the power of existing extensions of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. For example, taking ={[Uncaptioned image]}\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=11.62494pt]{L1.pdf}}\} suffices to capture degree-aware 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (Geerts et al., 2020), such as the Graph Convolution Networks (𝖦𝖢𝖭\mathsf{GCN}s) (Kipf & Welling, 2017), which use the degree of vertices; taking ={L1,L2,,L}\mathcal{F}=\{L_{1},L_{2},\ldots,L_{\ell}\} for rooted paths LiL_{i} of length ii suffices to model the walk counts used in Chen et al. (2019a); and taking \mathcal{F} as the set of labeled trees of depth one precisely corresponds to the use of the 𝖶𝖫\mathsf{WL}-labelling obtained after one round by Ishiguro et al. (2020). Furthermore, {C}\{C_{\ell}\}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, where CC_{\ell} denotes the cycle of length \ell, correspond to the extension proposed in Section 4 in Li et al. (2019).

In addition, \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} can also capture the Graph Structure Networks (𝖦𝖲𝖭s\mathsf{GSN}\text{s}) by Bouritsas et al. (2020), which use subgraph isomorphism counts of graph patterns. We recall that an isomorphism from GG to HH is a bijective homomorphism hh from GG to HH which additionally satisfies (i) {h1(u),h1(v)}EG\{h^{-1}(u),h^{-1}(v)\}\in E_{G} for every {u,v}EH\{u,v\}\in E_{H}, and (ii) χG(h1(u))=χH(u)\chi_{G}(h^{-1}(u))=\chi_{H}(u) for every uVHu\in V_{H}. When GG and HH are rooted graphs, isomorphisms should preserve the roots as well. Now, in a 𝖦𝖲𝖭\mathsf{GSN}, the feature vector of each vertex vv is augmented with the the counts of every isomorphism from a rooted pattern PrP^{r} to GvG^{v}, for rooted patterns PP in a set of patterns 𝒫\mathcal{P},222The use of orbits in Bouritsas et al. (2020) is here ignored, but explained in the appendix. and this is followed by the execution of an 𝖬𝖯𝖭𝖭\mathsf{MPNN}, just as for our \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. It now remains to observe that subgraph isomorphism counts can be computed in terms of homomorphism counts, and vice versa (Curticapean et al., 2017). That is, 𝖦𝖲𝖭s\mathsf{GSN}\text{s} can be viewed as \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and thus our results for \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} carry over to 𝖦𝖲𝖭s\mathsf{GSN}\text{s}. We adopt homomorphism counts instead of subgraph isomorphism counts because homomorphisms counts underly existing characterizations of the expressive power of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and homomorphism counts are in general more efficient to compute. Also, Curticapean et al. (2017) indicate that all common graph counts are interchangeable in terms of expressive power.

vvG2G_{2}
wwH2H_{2}
Figure 2: Two graphs and vertices that cannot be distinguished by the 𝖶𝖫\mathsf{WL}-test, and therefore by standard 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. When considering \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, with ={Refer to caption}\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{K3.pdf}}\}, one can show that both graphs cannot be distinguished just by focusing on the initial labelling, but they can be distinguished by an \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} with just one aggregation layer.

3 Expressive Power of 𝓕\mathcal{F}-𝗠𝗣𝗡𝗡s\mathsf{MPNN}\text{s}

Recall that the standard 𝖶𝖫\mathsf{WL}-test (Weisfeiler & Lehman, 1968; Grohe, 2017) iteratively constructs a labelling of the vertices in a graph G=(V,E,χ)G=(V,E,\chi) as follows. In round dd, for each vertex vv the algorithm first collects the label of vv and all of its neighbours after round d1d-1, and then it hashes this aggregated multiset of labels into a new label for vv. The initial label of vv is χ(v)\chi(v). As shown in Xu et al. (2019) and Morris et al. (2019), the 𝖶𝖫\mathsf{WL}-test provides a bound on the classification power of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}: if two vertices or two graphs are indistinguishable by the 𝖶𝖫\mathsf{WL} test, then they will not be distinguished by any 𝖬𝖯𝖭𝖭\mathsf{MPNN}.

In turn, the expressive power of the 𝖶𝖫\mathsf{WL}-test, and thus of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, can be characterised in terms of homomorphism counts of trees (Dvorak, 2010; Dell et al., 2018). This result can be seen as a characterisation of the expressiveness of the 𝖶𝖫\mathsf{WL}-test in terms of a particular infinite-dimensional graph kernel: the one defined by the number of homomorphisms from every tree TT into the underlying graph GG.

In this section we show that both characterisations extend in an elegant way to the setting of \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, confirming that \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} are not just a useful, but also a well-behaved generalisation of standard 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}.

3.1 Characterisation in terms of 𝓕-𝗪𝗟\mathcal{F}\text{-}\mathsf{WL}

We bound the expressive power of \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} in terms of what we call the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test. Formally, the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test extends 𝖶𝖫\mathsf{WL} in the same way as \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} extend standard 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}: by including homomorphism counts of patterns in \mathcal{F} in the initial labelling. That is, let ={P1r,,Pr}\mathcal{F}=\{P_{1}^{r},\dots,P_{\ell}^{r}\}. The -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test is a vertex labelling algorithm that iteratively computes a label χ,G,v(d)\chi_{\mathcal{F},G,v}^{(d)} for each vertex vv of a graph GG, defined as follows.

χ,G,v(0)\displaystyle\chi_{\mathcal{F},G,v}^{(0)}\, :=(χG(v),𝗁𝗈𝗆(P1r,Gv),,𝗁𝗈𝗆(Pr,Gv))\displaystyle:=\,\bigl{(}\chi_{G}(v),\mathsf{hom}(P_{1}^{r},G^{v}),\dots,\mathsf{hom}(P_{\ell}^{r},G^{v})\bigr{)}
χ,G,v(d)\displaystyle\chi_{\mathcal{F},G,v}^{(d)} :=Hash(χ,G,v(d1),{{χ,G,u(d1)uNG(v)}}).\displaystyle:=\,\textsc{Hash}\bigl{(}\chi_{\mathcal{F},G,v}^{(d-1)},\{\!\!\{\chi_{\mathcal{F},G,u}^{(d-1)}\mid u\in N_{G}(v)\}\!\!\}\bigr{)}.

The -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test stops in round dd when no new pair of vertices are identified by means of χ,G,v(d)\chi_{\mathcal{F},G,v}^{(d)}, that is, when for any two vertices v1v_{1} and v2v_{2} from GG, χ,G,v1(d1)=χ,G,v2(d1)\chi_{\mathcal{F},G,v_{1}}^{(d-1)}=\chi_{\mathcal{F},G,v_{2}}^{(d-1)} implies χ,G,v1(d)=χ,G,v2(d)\chi_{\mathcal{F},G,v_{1}}^{(d)}=\chi_{\mathcal{F},G,v_{2}}^{(d)}. Notice that 𝖶𝖫=-𝖶𝖫\mathsf{WL}=\emptyset\text{-}\mathsf{WL}.

We can use the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test to compare vertices of the same graphs, or different graphs. We say that the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test cannot distinguish vertices if their final labels are the same, and that the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test cannot distinguish graphs GG and HH if the multiset containing each label computed for GG is the same as that of HH.

Similarly as for 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and the 𝖶𝖫\mathsf{WL}-test (Xu et al., 2019; Morris et al., 2019), we obtain that the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test provides an upper bound for the expressive power of \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}.

Proposition 1.

If two vertices of a graph cannot be distinguished by the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test, then they cannot be distinguished by any \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} either. Moreover, if two graphs cannot be distinguished by the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test, then they cannot be distinguished by any \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} either.

We can also construct \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} that mimic the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test: Simply adding local parameters from a set \mathcal{F} of patterns to the 𝖦𝖨𝖭\mathsf{GIN} architecture of Xu et al. (2019) results in an \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} that classifies vertices and graphs as the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test.

3.2 Characterisation in terms of 𝓕\mathcal{F}-pattern trees

At the core of several results about the 𝖶𝖫\mathsf{WL}-test lies a characterisation linking the test with homomorphism counts of (rooted) trees (Dvorak, 2010; Dell et al., 2018). In view of the connection to 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, it tells that 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} only use quantitative tree-based structural information from the underlying graphs. We next extend this characterisation to -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} by using homomorphism counts of so-called \mathcal{F}-pattern trees. In view of the connection with \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (Proposition 1), this reveals that \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} can use quantitative information of richer graph structures than 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}.

To define \mathcal{F}-pattern trees we need the graph join operator \star. Given two rooted graphs GvG^{v} and HwH^{w}, the join graph (GH)v(G\star H)^{v} is obtained by taking the disjoint union of GvG^{v} and HwH^{w}, followed by identifying ww with vv. The root of the join graph is vv. For example, the join of [Uncaptioned image] and [Uncaptioned image] is [Uncaptioned image]. Further, if GG is a graph and PrP^{r} is a rooted graph, then joining a vertex vv in GG with PrP^{r} results in the disjoint union of GG and PrP^{r}, where rr is identified with vv.

Let ={P1r,,Pr}\mathcal{F}=\{P_{1}^{r},\ldots,P_{\ell}^{r}\}. An \mathcal{F}-pattern tree TrT^{r} is obtained from a standard rooted tree Sr=(V,E,χ)S^{r}=(V,E,\chi), called the backbone of TrT^{r}, followed by joining every vertex sVs\in V with any number of copies of patterns from \mathcal{F}.

We define the depth of an \mathcal{F}-pattern tree as the depth of its backbone. Examples of \mathcal{F}-pattern trees, for ={[Uncaptioned image]}\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{K3.pdf}}\}, are:

[Uncaptioned image]

where grey vertices are part of the backbones of the \mathcal{F}-pattern trees. Standard trees are also \mathcal{F}-pattern trees.

We next use \mathcal{F}-pattern trees to characterise the expressive power of -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} and thus, by Proposition 1, of \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}.

Theorem 1.

For any finite collection \mathcal{F} of patterns, vertices vv and ww in a graph GG 𝗁𝗈𝗆(Tr,Gv)=𝗁𝗈𝗆(Tr,Gw)\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},G^{w}) for every rooted \mathcal{F}-pattern tree TrT^{r}. Similarly, GG and HH are undistinguishable by the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test if and only if 𝗁𝗈𝗆(T,G)=𝗁𝗈𝗆(T,H)\mathsf{hom}(T,G)=\mathsf{hom}(T,H), for every (unrooted) \mathcal{F}-pattern tree.

The proof of this theorem, which can be found in the appendix, requires extending techniques from Grohe (2020a, b) that were used to characterise the expressiveness of 𝖶𝖫\mathsf{WL} in terms of homomorphism counts of trees.

In fact, we can make the above theorem more precise. When -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} is run for dd rounds, then only \mathcal{F}-patterns trees of depth dd are required. This tells that increasing the number of rounds of -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} results in that more complicated structural information is taken into account For example, consider the two graphs G2G_{2} and H2H_{2} and vertices vVG2v\in V_{G_{2}} and wVH2w\in V_{H_{2}}, shown in Fig. 2. Let ={[Uncaptioned image]}\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{K3.pdf}}\}. By definition, -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} cannot distinguish vv from ww based on the initial labelling. If run for one round, Theorem 1 implies that -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} cannot distinguish vv from ww if and only if 𝗁𝗈𝗆(Tr,G2v)=𝗁𝗈𝗆(Tr,H2w)\mathsf{hom}(T^{r},G_{2}^{v})=\mathsf{hom}(T^{r},H_{2}^{w}) for any \mathcal{F}-pattern tree of depth at most 11. It is readily verified that

𝗁𝗈𝗆([Uncaptioned image],G2v)=04=𝗁𝗈𝗆([Uncaptioned image],H2w),\mathsf{hom}(\raisebox{0.0pt}{\includegraphics[height=17.22217pt]{Ftree1g.pdf}},G_{2}^{v})=0\neq 4=\mathsf{hom}(\raisebox{0.0pt}{\includegraphics[height=17.22217pt]{Ftree1g.pdf}},H_{2}^{w}),

and thus -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} distinguishes vv from ww after one round. Similarly, G2G_{2} and H2H_{2} can be distinguished by -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} after one round. We observe that G2G_{2} and H2H_{2} are indistinguishable by 𝖶𝖫\mathsf{WL}. Hence, \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} are more expressive than 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}.

Importantly, Theorem 1 discloses the boundaries of \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. To illustrate this for some specific instances of \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} mentioned earlier, the expressive power of degree-based 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (Kipf & Welling, 2017; Geerts et al., 2020) is captured by {L1}\{L_{1}\}-pattern trees, and walk counts-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (Chen et al., 2019a) are captured by {L1,,L}\{L_{1},\ldots,L_{\ell}\}-pattern trees. These pattern trees are just trees, since joining paths to trees only results in bigger trees. Thus, Theorem 1 tells that all these extensions are still bounded by 𝖶𝖫\mathsf{WL} (albeit needing less rounds). In contrast, beyond 𝖶𝖫\mathsf{WL}, {C}\{C_{\ell}\}-pattern trees capture cycle count 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (Li et al., 2019), and for 𝖦𝖲𝖭s\mathsf{GSN}\text{s} (Bouritsas et al., 2020) which use subgraph isomorphism counts of pattern P𝒫P\in\mathcal{P}, their expressive power is captured by 𝗌𝗉𝖺𝗌𝗆(𝒫)\mathsf{spasm}(\mathcal{P})-pattern trees, where 𝗌𝗉𝖺𝗌𝗆(𝒫)\mathsf{spasm}(\mathcal{P}) consists of all surjective homomorphic images of patterns in 𝒫\mathcal{P} (Curticapean et al., 2017).

4 A Comparison with the 𝒌-𝗪𝗟k\text{-}\mathsf{WL}-test

We propose \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} as an alternative and efficient way to extend the expressive power of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (and thus the 𝖶𝖫\mathsf{WL}-test) compared to the computationally intensive higher-order 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} based on the k-𝖶𝖫k\text{-}\mathsf{WL}-test (Morris et al., 2019, 2020; Maron et al., 2019a). In this section we situate -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} in the k-𝖶𝖫k\text{-}\mathsf{WL} hierarchy. The definition of k-𝖶𝖫k\text{-}\mathsf{WL} is deferred to the appendix.

We have seen that -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} can distinguish graphs that 𝖶𝖫\mathsf{WL} cannot: it suffices to consider {K3}-𝖶𝖫\{K_{3}\}\text{-}\mathsf{WL} for the 33-clique K3K_{3}. In order to generalise this observation we need some notation. Let \mathcal{F} and 𝒢\mathcal{G} be two sets of patterns and consider an \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} MM and a 𝒢\mathcal{G}-𝖬𝖯𝖭𝖭\mathsf{MPNN} NN. We say that MM is upper bounded in expressive power by NN if for any graph GG, if NN cannot distinguish vertices vv and ww,333Just as for the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test, an \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} cannot distinguish two vertices if the label computed for both of them is the same then neither can MM. A similar notion is in place for pairs of graphs: if NN cannot distinguish graphs GG and HH, then neither can MM.

More generally, let \mathcal{M} be a class of \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} and 𝒩\mathcal{N} be a class of 𝒢\mathcal{G}-𝖬𝖯𝖭𝖭\mathsf{MPNN}. We say that the class \mathcal{M} is upper bounded in expressive power by 𝒩\mathcal{N} if every MM\in\mathcal{M} is upper bounded in expressive power by an N𝒩N\in\mathcal{N} (which may depend on MM). When \mathcal{M} is upper bounded by 𝒩\mathcal{N} and vice versa, then \mathcal{M} and 𝒩\mathcal{N} are said to have the same expressive power. A class 𝒩\mathcal{N} is more expressive than a class \mathcal{M} when \mathcal{M} is upper bounded in expressive power by 𝒩\mathcal{N}, but there exist graphs that can be distinguished by 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} in 𝒩\mathcal{N} but not by any 𝖬𝖯𝖭𝖭\mathsf{MPNN} in \mathcal{M}.

Finally, we use the notion of treewidth of a graph, which measures the tree-likeness of a graph. For example, trees have treewidth one, cycles have treewidth two, and the kk-clique KkK_{k} has treewidth k1k-1 (for k>1k>1). We define this standard notion in the appendix and only note that we define the treewidth of a pattern PrP^{r} as the treewidth of its unrooted version PP.

Our first result is a consequence of the characterisation of k-𝖶𝖫k\text{-}\mathsf{WL} in terms of homomorphism counts of graphs of treewidth kk (Dvorak, 2010; Dell et al., 2018).

Proposition 2.

For each finite set \mathcal{F} of patterns, the expressive power of -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} is bounded by k-𝖶𝖫k\text{-}\mathsf{WL}, where kk is the largest treewidth of a pattern in \mathcal{F}.

For example, since the treewidth of K3K_{3} is 22, we have that {K3}-𝖶𝖫\{K_{3}\}\text{-}\mathsf{WL} is bounded by 2-𝖶𝖫2\text{-}\mathsf{WL}. Similarly, {Kk+1}-𝖶𝖫\{K_{k+1}\}\text{-}\mathsf{WL} is bounded in expressive power by k-𝖶𝖫k\text{-}\mathsf{WL}.

Our second result tells how to increase the expressive power of -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} beyond k-𝖶𝖫k\text{-}\mathsf{WL}. A pattern PrP^{r} is a core if any homomorphism from PP to itself is injective. For example, any clique KkK_{k} and cycle of odd length is a core.

Theorem 2.

Let \mathcal{F} be a finite set of patterns. If \mathcal{F} contains a pattern PrP^{r} which is a core and has treewidth kk, then there exist graphs that can be distinguished by -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} but not by (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL}.

In other words, for such \mathcal{F}, -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} is not bounded by (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL}. For example, since K3K_{3} is a core, {K3}-𝖶𝖫\{K_{3}\}\text{-}\mathsf{WL} is not bounded in expressive power by 𝖶𝖫=1-𝖶𝖫\mathsf{WL}=1\text{-}\mathsf{WL}. More generally, {Kk}-𝖶𝖫\{K_{k}\}\text{-}\mathsf{WL} is not bounded by (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL}. The proof of Theorem 2 is based on extending deep techniques developed in finite model theory, and that have been used to understand the expressive power of finite variable logics (Atserias et al., 2007; Bova & Chen, 2019). This result is stronger than the one underlying the strictness of the k-𝖶𝖫k\text{-}\mathsf{WL} hierarchy (Otto, 2017), which states that k-𝖶𝖫k\text{-}\mathsf{WL} is strictly more expressive than (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL}. Indeed, Otto (2017) only shows the existence of a pattern PrP^{r} of treewidth kk such that (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL} is not bounded by {Pr}-𝖶𝖫\{P^{r}\}\text{-}\mathsf{WL}. In Theorem 2 we provide an explicit recipe for finding such a pattern PrP^{r}, that is, PrP^{r} can be taken a core of treewidth kk.

In summary, we have shown that there is a set \mathcal{F} of patterns such that (i) -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} can distinguish graphs which cannot be distinguished by (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL}, yet (ii) -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} cannot distinguish more graphs than k-𝖶𝖫k\text{-}\mathsf{WL}. This begs the question whether there is a finite set \mathcal{F} such that -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} is equivalent in expressive power to k-𝖶𝖫k\text{-}\mathsf{WL}. We answer this negatively.

Proposition 3.

For any k>1k>1, there does not exist a finite set \mathcal{F} of patterns such that -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} is equivalent in expressive power to k-𝖶𝖫k\text{-}\mathsf{WL}.

In view of the connection between \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and 𝖦𝖲𝖭s\mathsf{GSN}\text{s} mentioned earlier, we thus show that no 𝖦𝖲𝖭\mathsf{GSN} can match the power of k-𝖶𝖫k\text{-}\mathsf{WL}, which was a question left open in Bouritsas et al. (2020). We remark that if we allow \mathcal{F} to consist of all (infinitely many) patterns of treewidth kk, then -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} is equivalent in expressive power to k-𝖶𝖫k\text{-}\mathsf{WL} (Dvorak, 2010; Dell et al., 2018).

5 When Do Patterns Extend Expressiveness?

Patterns are not learned, but must be passed as an input to 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} together with the graph structure. Thus, knowing which patterns work well, and which do not, is of key importance for the power of the resulting \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. This is a difficult question to answer since determining which patterns work well is clearly application-dependent. From a theoretical point of view, however, we can still look into interesting questions related to the problem of which patterns to choose. One such a question, and the one studied in this section, is when a pattern adds expressive power over the ones that we have already selected. More formally, we study the following problem: Given a finite set \mathcal{F} of patterns, when does adding a new pattern PrP^{r} to \mathcal{F} extends the expressive power of the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test?

To answer this question in the positive, we need to find two graphs GG and HH, show that they are indistinguishable by the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test, but show that they can be distinguished by the {Pr}-𝖶𝖫\mathcal{F}\cup\{P^{r}\}\text{-}\mathsf{WL}-test. As an example of this technique we show that longer cycles always add expressive power. We use CkC_{k} to represent the cycle of length kk.

Proposition 4.

For any k>3k>3, {C3r,,Ckr}-𝖶𝖫\{C^{r}_{3},\dots,C^{r}_{k}\}\text{-}\mathsf{WL} is more expressive than {C3r,,Ck1r}-𝖶𝖫\{C^{r}_{3},\dots,C^{r}_{k-1}\}\text{-}\mathsf{WL}.

We also observe that, by Proposition 2, {C3r,,Ckr}-𝖶𝖫\{C^{r}_{3},\dots,C^{r}_{k}\}\text{-}\mathsf{WL} is bounded by 2-𝖶𝖫2\text{-}\mathsf{WL} for any k3k\geq 3 because cycles have treewidth two. It is often the case, however, that finding such graphs and proving that they are indistinguishable, can be rather challenging. Instead, in this section we provide two techniques that can be used to partially answer the question posed above by only looking at properties of the sets of patterns. Our first result is for establishing when a pattern does not add expressive power to a given set \mathcal{F} of patterns, and the second one when it does.

5.1 Detecting when patterns are superfluous

Our first result is a simple recipe for choosing local features: instead of choosing complex patterns that are the joins of smaller patterns, one should opt for the smaller patterns.

Proposition 5.

Let Pr=P1rP2rP^{r}=P_{1}^{r}\star P_{2}^{r} be a pattern that is the join of two smaller patterns. Then for any set \mathcal{F} of patterns, we have that {Pr}\mathcal{F}\cup\{P^{r}\} is upper bounded by {P1r,P2r}\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\}.

Stated differently, this means that adding to \mathcal{F} any pattern which is the join of two patterns already in \mathcal{F} does not add expressive power.

Thus, instead of using, for example, the pattern [Uncaptioned image], one should prefer to use instead the triangle [Uncaptioned image]. This result is in line with other advantages of smaller patterns: their homomorphism counts are easier to compute, and, since they are less specific, they should tend to produce less over-fitting.

5.2 Detecting when patterns add expressiveness

Joining patterns into new patterns does not give extra expressive power, but what about patterns which are not joins? We provide next a useful recipe for detecting when a pattern does add expressive power. We recall that the core of a graph PP is its unique (up to isomorphism) induced subgraph which is a core.

Theorem 3.

Let \mathcal{F} be a finite set of patterns and let QrQ^{r} be a pattern whose core has treewidth kk. Then, {Qr}-𝖶𝖫\mathcal{F}\cup\{Q^{r}\}\text{-}\mathsf{WL} is more expressive than -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} if every pattern PrP^{r}\in\mathcal{F} satisfies one of the following conditions: (i) PrP^{r} has treewidth <k<k; or (ii)  PrP^{r} does not map homomorphically to QrQ^{r}.

As an example, {K3,,Kk}-𝖶𝖫\{K_{3},\ldots,K_{k}\}\text{-}\mathsf{WL} is more expressive than {K3,,Kk1}-𝖶𝖫\{K_{3},\ldots,K_{k-1}\}\text{-}\mathsf{WL} for any k>3k>3 because of the first condition. Similarly, {K3,,Kk,C}-𝖶𝖫\{K_{3},\ldots,K_{k},C_{\ell}\}\text{-}\mathsf{WL} is more expressive than {K3,,Kk}-𝖶𝖫\{K_{3},\ldots,K_{k}\}\text{-}\mathsf{WL} for odd cycles CC_{\ell}. Indeed, such cycles are cores and no clique KkK_{k} with k>2k>2 maps homomorphically to CC_{\ell}.

6 Experiments

We next showcase that 𝖦𝖭𝖭\mathsf{GNN} architectures benefit when homomorphism counts of patterns are added as additional vertex features. For patterns where homomorphism and subgraph isomorphism counts differ (e.g., cycles) we compare with 𝖦𝖲𝖭s\mathsf{GSN}\text{s} (Bouritsas et al., 2020). We use the benchmark for 𝖦𝖭𝖭s\mathsf{GNN}\text{s} by Dwivedi et al. (2020), as it offers a broad choice of models, datasets and graph classification tasks.

Selected 𝗚𝗡𝗡s\mathsf{GNN}\text{s}.

We select the best architectures from Dwivedi et al. (2020): Graph Attention Networks (𝖦𝖠𝖳\mathsf{GAT}) (Velickovic et al., 2018), Graph Convolutional Networks (𝖦𝖢𝖭\mathsf{GCN}) (Kipf & Welling, 2017), 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} (Hamilton et al., 2017), Gaussian Mixture Models (𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet}) (Monti et al., 2017) and 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭\mathsf{GatedGCN} (Bresson & Laurent, 2017). We leave out various linear architectures such as 𝖦𝖨𝖭\mathsf{GIN} (Xu et al., 2019) as they were shown to perform poorly on the benchmark.

Learning tasks and datasets.

As in Dwivedi et al. (2020) we consider (i) graph regression and the ZINC dataset (Irwin et al., 2012a; Dwivedi et al., 2020); (ii) vertex classification and the PATTERN and CLUSTER datasets (Dwivedi et al., 2020); and (iii) link prediction and the COLLAB dataset (Hu et al., 2020). We omit graph classification: for this task, the graph datasets from Dwivedi et al. (2020) originate from image data and hence vertex neighborhoods carry little information.

Patterns.

We extend the initial features of vertices with homomorphism counts of cycles CC_{\ell} of length 10\ell\leq 10, when molecular data (ZINC) is concerned, and with homomorphism counts of kk-cliques KkK_{k} for k5k\leq 5, when social or collaboration data (PATTERN, CLUSTER, COLLAB) is concerned. We use the zz-score of the logarithms of homomorphism counts to make them standard-normally distributed and comparable to other features. Section 5 tells us that all these patterns will increase expressive power (Theorem 3 and Proposition 4) and are “minimal” in the sense that they are not the join of smaller patterns (Proposition 5). Similar pattern choices were used in Bouritsas et al. (2020). We use DISC (Zhang et al., 2020)444We thank the authors for providing us with an executable., a tool specifically built to get homomorphism counts for large datasets. Each model is trained and tested independently using combinations of patterns.

Higher-order 𝗚𝗡𝗡s\mathsf{GNN}\text{s}.

We do not compare to higher-order 𝖦𝖭𝖭s\mathsf{GNN}\text{s} since this was already done by Dwivedi et al. (2020). They included ring-𝖦𝖭𝖭s\mathsf{GNN}\text{s} (which outperform 𝟤𝖶𝖫\mathsf{2WL}-𝖦𝖭𝖭𝗌\mathsf{GNNs}) and 𝟥𝖶𝖫\mathsf{3WL}-𝖦𝖭𝖭𝗌\mathsf{GNNs} in their experiments, and these were outperformed by our selected “linear” architectures. Although the increased expressive power of higher-order 𝖦𝖭𝖭s\mathsf{GNN}\text{s} may be beneficial for learning, scalability and learning issues (e.g., loss divergence) hamper their applicability (Dwivedi et al., 2020). Our approach thus certainly outperforms higher-order 𝖦𝖭𝖭s\mathsf{GNN}\text{s} with respect to the benchmark.

Methodology.

Graphs were divided between training/test as instructed by Dwivedi et al. (2020), and all numbers reported correspond to the test set. The reported performance is the average over four runs with different random seeds for the respective combinations of patterns in \mathcal{F}, model and dataset. Training times were comparable to the baseline of training models without any augmented features.555Code to reproduce our experiments is available at https://github.com/LGP-GNN-2021/LGP-GNN All models for ZINC, PATTERN and COLLAB were trained on a GeForce GTX 1080 Ti GPU, for CLUSTER a Tesla V100-SXM3-32GB GPU was used.

Next we summarize our results for each learning task separately.

Graph regression.

The first task of the benchmark is the prediction of the solubility of molecules in the ZINC dataset (Irwin et al., 2012a; Dwivedi et al., 2020), a dataset of about 12 00012\,000 graphs of small size, each of them consisting of one particular molecule. The results in Table 1 show that each of our models indeed improves by adding homomorphism counts of cycles and the best result is obtained by considering all cycles. 𝖦𝖲𝖭s\mathsf{GSN}\text{s} were applied to the ZINC dataset as well (Bouritsas et al., 2020). In Table 1 we also report results by using subgraph isomorphism counts (as in 𝖦𝖲𝖭s\mathsf{GSN}\text{s}): homomorphism counts generally provide better results than subgraph isomorphisms counts. Our best result (framed in Table 1) is competitive to the value of 0.1390.139 reported in Bouritsas et al. (2020). By looking at the full results, we see that some cycles are much more important than others. Table 2 shows which cycles have greatest impact for the worst-performing baseline, 𝖦𝖠𝖳\mathsf{GAT}. Remarkably, adding homomorphism counts makes the 𝖦𝖠𝖳\mathsf{GAT} model competitive with the best performers of the benchmark.

Table 1: Results for the ZINC dataset show that homomorphism (hom) counts of cycles improve every model. We compare the mean absolute error (MAE) of each model without any homomorphism count (baseline), against the model augmented with the hom count, and with subgraph isomorphism (iso) counts of C3C_{3}C10C_{10}.

Model MAE (base) MAE (hom) MAE (iso)
𝖦𝖠𝖳\mathsf{GAT} 0.47±\pm0.02 0.22±\pm0.01 0.24±\pm0.01
𝖦𝖢𝖭\mathsf{GCN} 0.35±\pm0.01 0.20±\pm0.01 0.22±\pm0.01
𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} 0.44±\pm0.01 0.24±\pm0.01 0.24±\pm0.01
𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} 0.25±\pm0.01 0.19±\pm0.01 0.16±\pm0.01
𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭\mathsf{GatedGCN} 0.34±\pm0.05 ​​​0.1353±\pm0.01 0.1357±\pm0.01
Table 2: The effect of different cycles for the 𝖦𝖠𝖳\mathsf{GAT} model over the ZINC dataset, using mean absolute error.

Set ()(\mathcal{F}) MAE
None 0.47±\pm0.02
{C3}\{C_{3}\} 0.45±\pm0.01
{C4}\{C_{4}\} 0.34±\pm0.02
{C6}\{C_{6}\} 0.31±\pm0.01
{C5,C6}\{C_{5},C_{6}\} 0.28±\pm0.01
{C3,,C6}\{C_{3},\ldots,C_{6}\} 0.23±\pm0.01
{C3,,C10}\{C_{3},\ldots,C_{10}\} 0.22±\pm0.01
Vertex classification.

The next task in the benchmark corresponds to vertex classification. Here we analyze two datasets, PATTERN and CLUSTER (Dwivedi et al., 2020), both containing over 12 00012\,000 artificially generated graphs resembling social networks or communities. The task is to predict whether a vertex belongs to a particular cluster or pattern, and all results are measured using the accuracy of the classifier. Also here, our results show that homomorphism counts, this times of cliques, tend to improve the accuracy of our models. Indeed, for the PATTERN dataset we see an improvement in all models but 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭\mathsf{GatedGCN} (Table 3), and three models are improved in the CLUSTER dataset (reported in the appendix). Once again, the best performer in this task is a model that uses homomorphism counts. We remark that for cliques, homomorphism counts coincide with subgraph isomorphism counts (up to a constant factor) so our extensions behave like 𝖦𝖲𝖭s\mathsf{GSN}\text{s}.

Table 3: Results for the PATTERN dataset show that homomorphism counts improve all models except 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭\mathsf{GatedGCN}. We compare weighted accuracy of each model without any homomorphism count (baseline) against the model augmented with the counts of the set \mathcal{F} that showed best performance (best \mathcal{F}).

Model + best \mathcal{F} Accuracy baseline Accuracy best
𝖦𝖠𝖳\mathsf{GAT}{K3,K4,K5}\{K_{3},K_{4},K_{5}\} 78.83 ±\pm 0.60 85.50 ±\pm 0.23
𝖦𝖢𝖭\mathsf{GCN}{K3,K4,K5}\{K_{3},K_{4},K_{5}\} 71.42 ±\pm 1,38 82.49 ±\pm 0.48
𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} {K3,K4,K5}\{K_{3},K_{4},K_{5}\} 70.78 ±\pm 0,19 85,85 ±\pm 0.15
𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} {K3,K4,K5}\{K_{3},K_{4},K_{5}\} 85.90 ±\pm 0,03 86.63 ±\pm 0.03
𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭\mathsf{GatedGCN} {}\{\emptyset\} 86.15 ±\pm 0.08 86.15 ±\pm 0.08
Link prediction

In our final task we consider a single graph, COLLAB (Hu et al., 2020), with over 235 000235\,000 vertices, containing information about the collaborators in an academic network, and the task at hand is to predict future collaboration. The metric used in the benchmark is the Hits@50 evaluator (Hu et al., 2020). Here, positive collaborations are ranked among randomly sampled negative collaborations, and the metric is the ratio of positive edges that are ranked at place 50 or above. Once again, homomorphism counts of cliques improve the performance of all models, see Table 4. An interesting observation is that this time the best set of features (cliques) does depend on the model, although the best model uses all cliques again.

Table 4: All models improve the Hits@50 metric over the COLLAB dataset. We compare each model without any homomorphism count (baseline) against the model augmented with the counts of the set of patterns that showed best performance (best \mathcal{F}).

Model + best \mathcal{F} Hits@50 baseline Hits@50 best
𝖦𝖠𝖳\mathsf{GAT}{K3}\{K_{3}\} 50.32±\pm0.55 52.87±\pm0.87
𝖦𝖢𝖭\mathsf{GCN}{K3,K4,K5}\{K_{3},K_{4},K_{5}\} 51.35±\pm1.30 54.60±\pm1.01
𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage}{K5}\{K_{5}\} 50.33±\pm0.68 51.39±\pm1.23
𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet}{K4}\{K_{4}\} 49.81±\pm1.56 51.76±\pm1.38
𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭\mathsf{GatedGCN}{K3}\{K_{3}\} 51.00±\pm2.54 51.57±\pm0.68
Remarks.

The best performers in each task use homomorphism counts, in accordance with our theoretical results, showing that such counts do extend the power of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. Homomorphism counts are also cheap to compute. For COLLAB, the largest graph in our experiments, the homomorphism counts of all patterns we used, for all vertices, could be computed by DISC (Zhang et al., 2020) in less than 3 minutes. One important remark is that selecting the best set of features is still a challenging endeavor. Our theoretical results help us streamline this search, but for now it is still an exploratory task. In our experiments we first looked at adding each pattern individually, and then tried with combinations of those that showed the best improvements. This feature selection strategy incurs considerable cost, both computational and environmental, and needs further investigation.

7 Conclusion

We propose local graph parameter enabled 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} as an efficient way to increase the expressive power of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. The take-away message is that enriching features with homomorphism counts of small patterns is a promising add-on to any 𝖦𝖭𝖭\mathsf{GNN} architecture, with little to no overhead. Regarding future work, the problem of which parameters to choose deserves further study. In particular, we plan to provide a complete characterisation of when adding a new pattern to \mathcal{F} adds expressive power to the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test.

References

  • Abbe (2018) Abbe, E. Community detection and stochastic block models: Recent developments. Journal of Machine Learning Research, 18(177):1–86, 2018. URL http://jmlr.org/papers/v18/16-480.html.
  • Abboud et al. (2020) Abboud, R., Ceylan, İ. İ., Grohe, M., and Lukasiewicz, T. The surprising power of graph neural networks with random node initialization. arXiv, 2020. URL https://arxiv.org/abs/2010.01179.
  • Arvind et al. (2020) Arvind, V., Fuhlbrück, F., Köbler, J., and Verbitsky, O. On Weisfeiler-Leman invariance: Subgraph counts and related graph properties. Journal of Computer and System Sciences, 113:42 – 59, 2020. URL https://doi.org/10.1016/j.jcss.2020.04.003.
  • Atserias et al. (2007) Atserias, A., Bulatov, A. A., and Dalmau, V. On the power of kk-consistency. In Proceedings of the 34th International Colloquium on Automata, Languages and Programming, volume 4596 of Lecture Notes in Computer Science, pp.  279–290, 2007. URL https://doi.org/10.1007/978-3-540-73420-8_26.
  • Azizian & Lelarge (2021) Azizian, W. and Lelarge, M. Expressive power of invariant and equivariant graph neural networks. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=lxHgXYN4bwl.
  • Barceló et al. (2020) Barceló, P., Kostylev, E. V., Monet, M., Pérez, J., Reutter, J., and Silva, J. P. The logical expressiveness of graph neural networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=r1lZ7AEKvB.
  • Belkin & Niyogi (2003) Belkin, M. and Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003. URL https://ieeexplore.ieee.org/document/6789755.
  • Bouritsas et al. (2020) Bouritsas, G., Frasca, F., Zafeiriou, S., and Bronstein, M. M. Improving graph neural network expressivity via subgraph isomorphism counting. arXiv, 2020. URL https://arxiv.org/abs/2006.09252.
  • Bova & Chen (2019) Bova, S. and Chen, H. How many variables are needed to express an existential positive query? Theory Comput. Syst., 63(7):1573–1594, 2019. URL https://doi.org/10.1007/s00224-018-9884-z.
  • Bresson & Laurent (2017) Bresson, X. and Laurent, T. Residual gated graph convnets. arXiv, 2017. URL https://arxiv.org/abs/1711.07553.
  • Cai et al. (1992) Cai, J., Fürer, M., and Immerman, N. An optimal lower bound on the number of variables for graph identifications. Combinatorica, 12(4):389–410, 1992. URL https://doi.org/10.1007/BF01305232.
  • Chami et al. (2021) Chami, I., Abu-El-Haija, S., Perozzi, B., Ré, C., and Murphy, K. Machine learning on graphs: A model and comprehensive taxonomy. arXiv, 2021. URL https://arxiv.org/abs/2005.03675.
  • Chen et al. (2019a) Chen, T., Bian, S., and Sun, Y. Are powerful graph neural nets necessary? A dissection on graph classification. arXiv, 2019a. URL https://arxiv.org/abs/1905.04579.
  • Chen et al. (2019b) Chen, Z., Villar, S., Chen, L., and Bruna, J. On the equivalence between graph isomorphism testing and function approximation with GNNs. In Advances in Neural Information Processing Systems, volume 32, pp.  15894–15902, 2019b. URL https://proceedings.neurips.cc/paper/2019/file/71ee911dd06428a96c143a0b135041a4-Paper.pdf.
  • Chen et al. (2020) Chen, Z., Chen, L., Villar, S., and Bruna, J. Can graph neural networks count substructures? In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://arxiv.org/abs/2002.04025.
  • Curticapean et al. (2017) Curticapean, R., Dell, H., and Marx, D. Homomorphisms are a good basis for counting small subgraphs. In Proceedings of the 49th Symposium on Theory of Computing, pp.  210––223, 2017. URL http://dx.doi.org/10.1145/3055399.3055502.
  • Damke et al. (2020) Damke, C., Melnikov, V., and Hüllermeier, E. A novel higher-order Weisfeiler-Lehman graph convolution. arXiv, 2020. URL https://arxiv.org/abs/2007.00346.
  • Dell et al. (2018) Dell, H., Grohe, M., and Rattan, G. Lovász meets Weisfeiler and Leman. In Proceedings of the 45th International Colloquium on Automata, Languages, and Programming, volume 107, pp.  40:1–40:14, 2018. URL https://doi.org/10.4230/LIPIcs.ICALP.2018.40.
  • Dvorak (2010) Dvorak, Z. On recognizing graphs by numbers of homomorphisms. Journal of Graph Theory, 64(4):330–342, 2010. URL https://doi.org/10.1002/jgt.20461.
  • Dwivedi et al. (2020) Dwivedi, V. P., Joshi, C. K., Laurent, T., Bengio, Y., and Bresson, X. Benchmarking graph neural networks. arXiv, 2020. URL https://arxiv.org/abs/2003.00982.
  • Fürer (2017) Fürer, M. On the combinatorial power of the Weisfeiler-Lehman algorithm. In Proceedings of the 10th International Conference on Algorithms and Complexity, volume 10236 of Lecture Notes in Computer Science, pp.  260–271, 2017. URL https://doi.org/10.1007/978-3-319-57586-5_22.
  • Garg et al. (2020) Garg, V. K., Jegelka, S., and Jaakkola, T. S. Generalization and representational limits of graph neural networks. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pp.  3419–3430, 2020. URL http://proceedings.mlr.press/v119/garg20c.html.
  • Geerts (2020) Geerts, F. The expressive power of kkth-order invariant graph networks. arXiv, 2020. URL https://arxiv.org/abs/2007.12035.
  • Geerts et al. (2020) Geerts, F., Mazowiecki, F., and Pérez, G. A. Let’s agree to degree: Comparing graph convolutional networks in the message-passing framework. arXiv, 2020. URL https://arxiv.org/abs/2004.02593.
  • Gilmer et al. (2017) Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pp.  1263–1272, 2017. URL {http://proceedings.mlr.press/v70/gilmer17a/gilmer17a.pdf}.
  • Grohe (2017) Grohe, M. Descriptive Complexity, Canonisation, and Definable Graph Structure Theory. Lecture Notes in Logic. Cambridge University Press, 2017. URL https://doi.org/10.1017/9781139028868.
  • Grohe (2020a) Grohe, M. word2vec, node2vec, graph2vec, x2vec: Towards a theory of vector embeddings of structured data. In Proceedings of the 39th Symposium on Principles of Database Systems, pp.  1–16, 2020a. URL https://doi.org/10.1145/3375395.3387641.
  • Grohe (2020b) Grohe, M. Counting bounded tree depth homomorphisms. In Proceedings of the 35th Symposium on Logic in Computer Science, pp.  507–520, 2020b. URL https://doi.org/10.1145/3373718.3394739.
  • Hamilton et al. (2017) Hamilton, W. L., Ying, Z., and Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, volume 30, pp.  1024–1034, 2017.
  • Hella (1996) Hella, L. Logical hierarchies in PTIME. Inf. Comput., 129(1):1–19, 1996. URL https://doi.org/10.1006/inco.1996.0070.
  • Hu et al. (2020) Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., and Leskovec, J. Open graph benchmark: Datasets for machine learning on graphs. In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://arxiv.org/abs/2005.00687.
  • Irwin et al. (2012a) Irwin, J. J., Sterling, T., Mysinger, M. M., Bolstad, E. S., and Coleman, R. G. Zinc: A free tool to discover chemistry for biology. Journal of Chemical Information and Modeling, 52(7):1757–1768, 2012a. URL https://doi.org/10.1021/ci3001277.
  • Irwin et al. (2012b) Irwin, J. J., Sterling, T., Mysinger, M. M., Bolstad, E. S., and Coleman, R. G. Zinc: A free tool to discover chemistry for biology. Journal of Chemical Information and Modeling, 52(7):1757–1768, 2012b. URL https://doi.org/10.1021/ci3001277.
  • Ishiguro et al. (2020) Ishiguro, K., Oono, K., and Hayashi, K. Weisfeiler-Lehman embedding for molecular graph neural networks. arXiv, 2020. URL https://arxiv.org/abs/2006.06909.
  • Keriven & Peyré (2019) Keriven, N. and Peyré, G. Universal invariant and equivariant graph neural networks. In Advances in Neural Information Processing Systems, volume 32, pp.  7092–7101, 2019. URL https://proceedings.neurips.cc/paper/2019/file/ea9268cb43f55d1d12380fb6ea5bf572-Paper.pdf.
  • Kipf & Welling (2017) Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=SJU4ayYgl.
  • Lacoste et al. (2019) Lacoste, A., Luccioni, A., Schmidt, V., and Dandres, T. Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700, 2019.
  • Li et al. (2019) Li, M. L., Dong, M., Zhou, J., and Rush, A. M. A hierarchy of graph neural networks based on learnable local features. arXiv, 2019. URL https://arxiv.org/abs/1911.05256.
  • Loukas (2020) Loukas, A. What graph neural networks cannot learn: depth vs width. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=B1l2bp4YwS.
  • Lovász (1967) Lovász, L. Operations with structures. Acta Mathematica, 18:321–328, 1967. URL https://doi.org/10.1007/BF02280291.
  • Maron et al. (2019a) Maron, H., Ben-Hamu, H., Serviansky, H., and Lipman, Y. Provably powerful graph networks. In Advances in Neural Information Processing Systems, volume 32, pp.  2153–2164, 2019a. URL http://papers.nips.cc/paper/8488-provably-powerful-graph-networks.
  • Maron et al. (2019b) Maron, H., Ben-Hamu, H., Shamir, N., and Lipman, Y. Invariant and equivariant graph networks. In International Conference on Learning Representations, 2019b. URL https://openreview.net/forum?id=Syx72jC9tm.
  • Maron et al. (2019c) Maron, H., Fetaya, E., Segol, N., and Lipman, Y. On the universality of invariant networks. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pp.  4363–4371, 2019c. URL http://proceedings.mlr.press/v97/maron19a.html.
  • Merkwirth & Lengauer (2005) Merkwirth, C. and Lengauer, T. Automatic generation of complementary descriptors with molecular graph networks. J. Chem. Inf. Model., 45(5):1159–1168, 2005. URL https://pubs.acs.org/doi/pdf/10.1021/ci049613b.
  • Monti et al. (2017) Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., and Bronstein, M. M. Geometric deep learning on graphs and manifolds using mixture model CNNs. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  5115–5124, 2017. URL https://openaccess.thecvf.com/content_cvpr_2017/papers/Monti_Geometric_Deep_Learning_CVPR_2017_paper.pdf.
  • Morris et al. (2019) Morris, C., Ritzert, M., Fey, M., Hamilton, W. L., Lenssen, J. E., Rattan, G., and Grohe, M. Weisfeiler and Leman go neural: Higher-order graph neural networks. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp.  4602–4609, 2019. URL https://doi.org/10.1609/aaai.v33i01.33014602.
  • Morris et al. (2020) Morris, C., Rattan, G., and Mutzel, P. Weisfeiler and leman go sparse: Towards scalable higher-order graph embeddings. In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://arxiv.org/abs/1904.01543.
  • NT & Maehara (2020) NT, H. and Maehara, T. Graph homomorphism convolution. arXiv, 2020. URL https://arxiv.org/abs/2005.01214.
  • Otto (2017) Otto, M. Bounded Variable Logics and Counting: A Study in Finite Models, volume 9 of Lecture Notes in Logic. Cambridge University Press, 2017. URL https://doi.org/10.1017/9781316716878.
  • Sato (2020) Sato, R. A survey on the expressive power of graph neural networks. arXiv, 2020. URL https://arxiv.org/abs/2003.04078.
  • Sato et al. (2019) Sato, R., Yamada, M., and Kashima, H. Approximation ratios of graph neural networks for combinatorial problems. In Advances in Neural Information Processing Systems, volume 32, pp.  4083–4092, 2019. URL https://papers.nips.cc/paper/2019/file/635440afdfc39fe37995fed127d7df4f-Paper.pdf.
  • Sato et al. (2020) Sato, R., Yamada, M., and Kashima, H. Random features strengthen graph neural networks. arXiv, 2020. URL https://arxiv.org/abs/2002.03155.
  • Scarselli et al. (2009) Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. The graph neural network model. IEEE Trans. Neural Networks, 20(1):61–80, 2009. URL https://doi.org/10.1109/TNN.2008.2005605.
  • Shervashidze et al. (2009) Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., and Borgwardt, K. Efficient graphlet kernels for large graph comparison. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, volume 5, pp.  488–495, 2009. URL http://proceedings.mlr.press/v5/shervashidze09a.html.
  • Tahmasebi & Jegelka (2020) Tahmasebi, B. and Jegelka, S. Counting substructures with higher-order graph neural networks: Possibility and impossibility results. arXiv, 2020. URL https://arxiv.org/abs/2012.03174.
  • Velickovic et al. (2018) Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. Graph attention networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJXMpikCZ.
  • Vignac et al. (2020) Vignac, C., Loukas, A., and Frossard, P. Building powerful and equivariant graph neural networks with structural message-passing. In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/a32d7eeaae19821fd9ce317f3ce952a7-Abstract.html.
  • Weisfeiler & Lehman (1968) Weisfeiler, B. J. and Lehman, A. A. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsiya, 2(9):12–16, 1968. https://www.iti.zcu.cz/wl2018/pdf/wl_paper_translation.pdf.
  • Wu et al. (2021) Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. S. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32(1):4–24, 2021. URL https://doi.org/10.1109/TNNLS.2020.2978386.
  • Xu et al. (2019) Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerful are graph neural networks? In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
  • Zhang et al. (2020) Zhang, H., Yu, J. X., Zhang, Y., Zhao, K., and Cheng, H. Distributed subgraph counting: A general approach. Proc. VLDB Endow., 13(11):2493–2507, 2020. URL http://www.vldb.org/pvldb/vol13/p2493-zhang.pdf.
  • Zhou et al. (2018) Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., and Sun, M. Graph neural networks: A review of methods and applications. ArXiv, 2018. URL http://arxiv.org/abs/1812.08434.

Appendix

Appendix A Proofs of Section 3

We use the following notions. Let GG and HH be graphs, vVGv\in V_{G}, wVHw\in V_{H}, and d0d\geq 0. The vertices vv and ww are said to be indistinguishably by -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} in round dd, denoted by (G,v)-𝖶𝖫(d)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w), iff χ,G,v(d)=χ,H,w(d)\chi_{\mathcal{F},G,v}^{(d)}=\chi_{\mathcal{F},H,w}^{(d)}. Similarly, GG and HH are said to be indistinguishable by -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} in round dd, denoted by G-𝖶𝖫(d)HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H, iff {{χ,G,v(d)vVG}}={{χ,H,w(d)wVH}}\{\!\!\{\chi_{\mathcal{F},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F},H,w}^{(d)}\mid w\in V_{H}\}\!\!\}. Along the same lines, vv and ww are indistinguishable by an \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} MM, denoted by (G,v)M,(d)(H,w)(G,v)\equiv_{M,\mathcal{F}}^{(d)}(H,w), iff 𝐱M,,G,v(d)=𝐱M,,H,w(d)\mathbf{x}_{M,\mathcal{F},G,v}^{(d)}=\mathbf{x}_{M,\mathcal{F},H,w}^{(d)}. Similarly, GG and HH are said to be indistinguishable by MM in round dd, denoted by GM,(d)HG\equiv_{M,\mathcal{F}}^{(d)}H, iff {{𝐱M,,G,v(d)vVG}}={{𝐱M,,H,w(d)wVH}}\{\!\!\{\mathbf{x}_{M,\mathcal{F},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\mathbf{x}_{M,\mathcal{F},H,w}^{(d)}\mid w\in V_{H}\}\!\!\}.

A.1 Proof of Proposition 1

We show that the class of \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} is upper bounded in expressive power by -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}. The proof is analogous to the proof in Morris et al. (2019) showing that 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} are bounded by 𝖶𝖫\mathsf{WL}.

We show a stronger result by upper bounding \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} by -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test, layer by layer. More precisely, we show that for every \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} MM, graphs GG and HH, vertices vVGv\in V_{G}, wVHw\in V_{H}, and d0d\geq 0,

  • (1)

    (G,v)-𝖶𝖫(d)(H,w)(G,v)M,(d)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)\ \ \Longrightarrow\ \ (G,v)\equiv_{M,\mathcal{F}}^{(d)}(H,w); and

  • (2)

    G-𝖶𝖫(d)HGM,(d)HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H\ \ \Longrightarrow\ \ G\equiv_{M,\mathcal{F}}^{(d)}H.

Clearly, these imply that \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} are bounded in expressive power by -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}, both when vertex and graph distinguishability are concerned.

Proof of implication (1). We show this implication by induction on the number of rounds.

Base case. We first assume (G,v)-𝖶𝖫(0)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(0)}(H,w). In other words, χ,G,v(0)=χ,H,w(0)\chi_{\mathcal{F},G,v}^{(0)}=\chi_{\mathcal{F},H,w}^{(0)} and thus, χG(v)=χH(w)\chi_{G}(v)=\chi_{H}(w) and for every PrP^{r}\in\mathcal{F} we have 𝗁𝗈𝗆(Pr,Gv)=𝗁𝗈𝗆(Pr,Hw)\mathsf{hom}(P^{r},G^{v})=\mathsf{hom}(P^{r},H^{w}). By definition, 𝐱M,,G,v(0)\mathbf{x}^{(0)}_{M,\mathcal{F},G,v} is a hot-one encoding of χG(v)\chi_{G}(v) combined with 𝗁𝗈𝗆(Pr,Gv)\mathsf{hom}(P^{r},G^{v}) for PrP^{r}\in\mathcal{F}, for every 𝖬𝖯𝖭𝖭\mathsf{MPNN} MM, graph GG and vertex vVGv\in V_{G}. Since these agree with the labelling and homomorphism counts for vertex wVHw\in V_{H} in graph HH, we also have that 𝐱M,,G,v(0)=𝐱M,,H,w(0)\mathbf{x}^{(0)}_{M,\mathcal{F},G,v}=\mathbf{x}^{(0)}_{M,\mathcal{F},H,w}, as desired.

Inductive step. We next assume (G,v)-𝖶𝖫(d)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w). By the definition of -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} this is equivalent to (G,v)-𝖶𝖫(d1)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,w) and {{χ,G,v(d1)vNG(v)}}={{χ,H,w(d1)wNH(w)}}\{\!\!\{\chi_{\mathcal{F},G,v^{\prime}}^{(d-1)}\mid v^{\prime}\in N_{G}(v)\}\!\!\}=\{\!\!\{\chi_{\mathcal{F},H,w^{\prime}}^{(d-1)}\mid w^{\prime}\in N_{H}(w)\}\!\!\}. By the induction hypothesis, this implies (G,v)M,(d1)(H,w)(G,v)\equiv_{M,\mathcal{F}}^{(d-1)}(H,w) and there exists a bijection β:NG(v)NH(w)\beta:N_{G}(v)\to N_{H}(w) such that (G,v)M,(d1)(H,β(v))(G,v^{\prime})\equiv_{M,\mathcal{F}}^{(d-1)}(H,\beta(v^{\prime})) for every vNG(v)v^{\prime}\in N_{G}(v), and every \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} MM. In other words, 𝐱M,,G,v(d1)=𝐱M,,H,w(d1)\mathbf{x}^{(d-1)}_{M,\mathcal{F},G,v}=\mathbf{x}^{(d-1)}_{M,\mathcal{F},H,w} and 𝐱M,,G,v(d1)=𝐱M,,H,β(v)(d1)\mathbf{x}^{(d-1)}_{M,\mathcal{F},G,v^{\prime}}=\mathbf{x}^{(d-1)}_{M,\mathcal{F},H,\beta(v^{\prime})} for every vNG(v)v^{\prime}\in N_{G}(v). By the definition of \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} this implies that Comb(d)({{𝐱M,,G,v(d1)vNG(v)}})\textsc{Comb}^{(d)}\bigl{(}\{\!\!\{\mathbf{x}^{(d-1)}_{M,\mathcal{F},G,v^{\prime}}\mid v^{\prime}\in N_{G}(v)\}\!\!\}\bigr{)} is equal to Comb(d)({{𝐱M,,H,w(d1)wNG(w)}})\textsc{Comb}^{(d)}\bigl{(}\{\!\!\{\mathbf{x}^{(d-1)}_{M,\mathcal{F},H,w^{\prime}}\mid w^{\prime}\in N_{G}(w)\}\!\!\}\bigr{)} and hence also, after applying Upd(d)\textsc{Upd}^{(d)}, 𝐱M,,G,v(d)=𝐱M,,H,w(d)\mathbf{x}^{(d)}_{M,\mathcal{F},G,v}=\mathbf{x}^{(d)}_{M,\mathcal{F},H,w}. That is, (G,u)M,(d)(H,w)(G,u)\equiv_{M,\mathcal{F}}^{(d)}(H,w), as desired.

Proof of implication (2). The implication G-𝖶𝖫(d)HGM,(d)HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H\ \ \Longrightarrow\ \ G\equiv_{M,\mathcal{F}}^{(d)}H now easily follows. Indeed, G-𝖶𝖫(d)HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H is equivalent to {{χ,G,v(d)vVG}}={{χ,H,w(d)wVH}}\{\!\!\{\chi_{\mathcal{F},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F},H,w}^{(d)}\mid w\in V_{H}\}\!\!\}. In other words, there exists a bijection α:VGVH\alpha:V_{G}\to V_{H} such that χ,G,v(d)=χ,H,α(v)(d)\chi_{\mathcal{F},G,v}^{(d)}=\chi_{\mathcal{F},H,\alpha(v)}^{(d)} for every vVGv\in V_{G}. We have just shown that this implies 𝐱M,,G,v(d)=𝐱M,,H,α(v)(d)\mathbf{x}^{(d)}_{M,\mathcal{F},G,v}=\mathbf{x}^{(d)}_{M,\mathcal{F},H,\alpha(v)} for every vVGv\in V_{G} and for every \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} MM. Hence, {{𝐱M,,G,v(d)vVG}}={{𝐱M,,H,w(d)wVH}}\{\!\!\{\mathbf{x}^{(d)}_{M,\mathcal{F},G,v}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\mathbf{x}^{(d)}_{M,\mathcal{F},H,w}\mid w\in V_{H}\}\!\!\}, or GM,(d)HG\equiv_{M,\mathcal{F}}^{(d)}H, as desired. ∎

A.2 Proof of Theorem 1

We show that for any finite collection \mathcal{F} of patterns, graphs GG and HH, vertices vVGv\in V_{G} and wVHw\in V_{H}, and d0d\geq 0:

(G,v)-𝖶𝖫(d)(H,w)𝗁𝗈𝗆(Tr,Gv)=𝗁𝗈𝗆(Tr,Hw),(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)\ \Longleftrightarrow\ \mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w}), (1)

for every \mathcal{F}-pattern tree TrT^{r} of depth at most dd. Similarly,

G-𝖶𝖫(d)H𝗁𝗈𝗆(T,G)=𝗁𝗈𝗆(T,H),G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H\ \Longleftrightarrow\ \mathsf{hom}(T,G)=\mathsf{hom}(T,H), (2)

for every (unrooted) \mathcal{F}-pattern tree of depth at most dd.

For a given set ={P1r,,Pr}\mathcal{F}=\{P_{1}^{r},\ldots,P_{\ell}^{r}\} of patterns and 𝐬=(s1,,s)\mathbf{s}=(s_{1},\ldots,s_{\ell})\in\mathbb{N}^{\ell}, we denote by 𝐬\mathcal{F}^{\mathbf{s}} the graph pattern of the form (P1s1Ps)r(P_{1}^{s_{1}}\star\cdots\star P_{\ell}^{s_{\ell}})^{r}, that is, we join s1s_{1} copies of P1P_{1}, s2s_{2} copies of P2P_{2} and so on.

Proof of equivalence (1). The proof is by induction on the number of rounds dd.

\Longrightarrow We first consider the implication (G,v)-𝖶𝖫(d)(H,w)𝗁𝗈𝗆(Tr,Gv)=𝗁𝗈𝗆(Tr,Hw)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)\Longrightarrow\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w}) for every \mathcal{F}-pattern tree TrT^{r} of depth at most dd.

Base case. Let us first consider the base case, that is, d=0d=0. In other words, we consider \mathcal{F}-pattern trees TrT^{r} consisting of a single root rr adorned with a pattern 𝗌\mathcal{F}^{\mathsf{s}} for some 𝐬=(s1,,s)\mathbf{s}=(s_{1},\ldots,s_{\ell})\in\mathbb{N}^{\ell}. We note that due to the properties of the graph join operator:

𝗁𝗈𝗆(Tr,Gv)=i=1(𝗁𝗈𝗆(Pir,Gv))si.\mathsf{hom}(T^{r},G^{v})=\prod_{i=1}^{\ell}\bigl{(}\mathsf{hom}(P_{i}^{r},G^{v})\bigr{)}^{s_{i}}. (3)

Since, (G,v)-𝖶𝖫(0)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(0)}(H,w), we know that χG(v)=χH(w)=a\chi_{G}(v)=\chi_{H}(w)=a for some aΣa\in\Sigma and 𝗁𝗈𝗆(Pir,Gv)=𝗁𝗈𝗆(Pir,Hw)\mathsf{hom}(P_{i}^{r},G^{v})=\mathsf{hom}(P_{i}^{r},H^{w}) for all PirP_{i}^{r}\in\mathcal{F}. This implies that the product in (3) is equal to

i=1(𝗁𝗈𝗆(Pir,Hw))si=𝗁𝗈𝗆(Tr,Hw),\prod_{i=1}^{\ell}\bigl{(}\mathsf{hom}(P_{i}^{r},H^{w})\bigr{)}^{s_{i}}=\mathsf{hom}(T^{r},H^{w}),

as desired.

Inductive step. Suppose next that we know that the implication holds for d1d-1. We assume now (G,v)-𝖶𝖫(d)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w) and consider an \mathcal{F}-pattern tree TrT^{r} of depth at most dd. Assume that in the backbone of TrT^{r}, the root rr has mm children c1,,cmc_{1},\ldots,c_{m}, and denote by T1c1,,TmcT_{1}^{c_{1}},\ldots,T_{m}^{c_{\ell}} the \mathcal{F}-pattern trees in TrT^{r} rooted at cic_{i}. Furthermore, we denote by Ti(r,ci)T_{i}^{(r,c_{i})} the \mathcal{F}-pattern tree obtained from TiciT_{i}^{c_{i}} by attaching rr to cic_{i}; Ti(r,ci)T_{i}^{(r,c_{i})} has root rr. Let 𝐬\mathcal{F}^{\mathbf{s}} be the pattern in TrT^{r} associated with rr. The following equalities are readily verified:

𝗁𝗈𝗆(Tr,Gv)\displaystyle\mathsf{hom}(T^{r},G^{v}) =𝗁𝗈𝗆(𝐬,Gv)i=1m𝗁𝗈𝗆(Ti(r,ci),Gv)=𝗁𝗈𝗆(𝐬,Gv)i=1m(vNG(v)𝗁𝗈𝗆(Tici,Gv)).\displaystyle=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},G^{v})\prod_{i=1}^{m}\mathsf{hom}(T_{i}^{(r,c_{i})},G^{v})=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},G^{v})\prod_{i=1}^{m}\Bigl{(}\sum_{v^{\prime}\in N_{G}(v)}\mathsf{hom}(T_{i}^{c_{i}},G^{v^{\prime}})\Bigr{)}. (4)

Recall now that we assume (G,v)-𝖶𝖫(d)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w) and thus, in particular, (G,v)-𝖶𝖫(0)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(0)}(H,w). Hence, by induction, 𝗁𝗈𝗆(Sr,Gv)=𝗁𝗈𝗆(Sr,Hw)\mathsf{hom}(S^{r},G^{v})=\mathsf{hom}(S^{r},H^{w}) for every \mathcal{F}-pattern tree SrS^{r} of depth 0. In particular, this holds for Sr=𝐬S^{r}=\mathcal{F}^{\mathbf{s}} and hence

𝗁𝗈𝗆(𝐬,Gv)=𝗁𝗈𝗆(𝐬,Hw).\mathsf{hom}(\mathcal{F}^{\mathbf{s}},G^{v})=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},H^{w}).

Furthermore, (G,v)-𝖶𝖫(d)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w) implies that there exists a bijection β:NG(v)NH(w)\beta:N_{G}(v)\to N_{H}(w) such that (G,v)-𝖶𝖫(d1)(H,β(v))(G,v^{\prime})\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,\beta(v^{\prime})) for every vNG(v)v^{\prime}\in N_{G}(v). By induction, for every vNG(v)v^{\prime}\in N_{G}(v) there thus exists a unique wNH(w)w^{\prime}\in N_{H}(w) such that 𝗁𝗈𝗆(Sr,Gv)=𝗁𝗈𝗆(Sr,Hw)\mathsf{hom}(S^{r},G^{v^{\prime}})=\mathsf{hom}(S^{r},H^{w^{\prime}}) for every \mathcal{F}-pattern tree SrS^{r} of depth at most d1d-1. In particular, for every vNG(v)v^{\prime}\in N_{G}(v) there exists a wNH(w)w^{\prime}\in N_{H}(w) such that

𝗁𝗈𝗆(Tici,Gv)=𝗁𝗈𝗆(Tici,Hw),\mathsf{hom}(T_{i}^{c_{i}},G^{v^{\prime}})=\mathsf{hom}(T_{i}^{c_{i}},H^{w^{\prime}}),

for each of the sub-trees TiciT_{i}^{c_{i}} in TrT^{r}. Hence, (4) is equal to

𝗁𝗈𝗆(𝐬,Hw)i=1m(wNH(w)𝗁𝗈𝗆(Tici,Hw)),\mathsf{hom}(\mathcal{F}^{\mathbf{s}},H^{w})\prod_{i=1}^{m}\Bigl{(}\sum_{w^{\prime}\in N_{H}(w)}\mathsf{hom}(T_{i}^{c_{i}},H^{w^{\prime}})\Bigr{)},

which in turn is equal to 𝗁𝗈𝗆(Tr,Hw)\mathsf{hom}(T^{r},H^{w}), as desired.

\Longleftarrow We next consider the other direction, that is, we show that when 𝗁𝗈𝗆(Tr,Gv)=𝗁𝗈𝗆(Sr,Hw)\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(S^{r},H^{w}) holds for every \mathcal{F}-pattern tree TrT^{r} of depth at most dd, then (G,v)-𝖶𝖫(d)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w) holds. This is again verified by induction on dd. This direction is more complicated and is similar to techniques used in Grohe (2020b). In our induction hypothesis we further include that a finite number of \mathcal{F}-pattern trees suffices to infer (G,v)-𝖶𝖫(d)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w) for graphs GG and HH and vertices vVGv\in V_{G} and wVHw\in V_{H}.

Base case. Let us consider the base case d=0d=0 first. We need to show that χG(v)=χH(w)\chi_{G}(v)=\chi_{H}(w) and 𝗁𝗈𝗆(Pir,Gv)=𝗁𝗈𝗆(Pir,Hw)\mathsf{hom}(P_{i}^{r},G^{v})=\mathsf{hom}(P_{i}^{r},H^{w}) for every PirP_{i}^{r}\in\mathcal{F}, since this implies (G,v)-𝖶𝖫(0)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(0)}(H,w).

We first observe that 𝗁𝗈𝗆(Tr,Gv)=𝗁𝗈𝗆(Tr,Hw)\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w}) for every \mathcal{F}-pattern tree TrT^{r} of depth 0, implies that vv and ww must be assigned the same label, say aa, by χG\chi_{G} and χH\chi_{H}, respectively.

Indeed, if we take TrT^{r} to consist of a single root rr labeled with aa (and thus rr is associated with the pattern 𝟎\mathcal{F}^{\mathbf{0}}), then 𝗁𝗈𝗆(Tr,Gv)=𝗁𝗈𝗆(Tr,Hw)\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w}) will be one if χG(v)=χH(w)=a\chi_{G}(v)=\chi_{H}(w)=a and zero otherwise. This implies that χG(v)=χH(w)=a\chi_{G}(v)=\chi_{H}(w)=a.

Next, we show that 𝗁𝗈𝗆(Pir,Gv)=𝗁𝗈𝗆(Pir,Hw)\mathsf{hom}(P_{i}^{r},G^{v})=\mathsf{hom}(P_{i}^{r},H^{w}) for every PirP_{i}^{r}\in\mathcal{F}. It suffices to consider the \mathcal{F}-pattern tree TirT^{r}_{i} consisting of a root rr joined with a single copy of PirP_{i}^{r}.

We observe that we only need a finite number of \mathcal{F}-pattern trees to infer (G,v)-𝖶𝖫(0)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(0)}(H,w). Indeed, suppose that χG\chi_{G} and χH\chi_{H} assign labels a1,,aLa_{1},\ldots,a_{L}, then we need LL single vertex trees with no patterns attached and root labeled with one of these labels. In addition, we need one \mathcal{F}-pattern tree for each pattern PirP_{i}^{r}\in\mathcal{F} and each label a1,,aLa_{1},\ldots,a_{L}. That is, we need L(+1)L(\ell+1) \mathcal{F}-pattern trees of depth 0.

Inductive step. We now assume that the implication holds for d1d-1 and consider dd. That is, we assume that if 𝗁𝗈𝗆(Tr,Gv)=𝗁𝗈𝗆(Tr,Hw)\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w}) holds for every \mathcal{F}-pattern tree TrT^{r} of depth at most d1d-1, then (G,v)-𝖶𝖫(d1)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,w) holds. Furthermore, we assume that only a finite number KK of \mathcal{F}-pattern trees S1r,,SKrS_{1}^{r},\ldots,S_{K}^{r} of depth at most d1d-1 suffice to infer (G,v)-𝖶𝖫(d1)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,w).

So, for dd, let us assume that 𝗁𝗈𝗆(Tr,Gv)=𝗁𝗈𝗆(Tr,Hw)\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w}) holds for every \mathcal{F}-pattern tree of depth at most dd. We need to show (G,v)-𝖶𝖫(d)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w) and that we can again assume that a finite number of \mathcal{F}-pattern trees of depth at most dd suffice to infer (G,v)-𝖶𝖫(d)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w).

By definition of (G,v)-𝖶𝖫(d)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w), we can, equivalently, show that (G,v)-𝖶𝖫(d1)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,w) and that there exists a bijection β:NG(v)NH(w)\beta:N_{G}(v)\to N_{H}(w) such that (G,v)-𝖶𝖫(d1)(H,β(v))(G,v^{\prime})\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,\beta(v^{\prime})) for every vNG(v)v^{\prime}\in N_{G}(v). That (G,v)-𝖶𝖫(d1)(H,w)(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,w) holds, is by induction, since 𝗁𝗈𝗆(Tr,Gv)=𝗁𝗈𝗆(Tr,Hw)\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w}) for every \mathcal{F}-pattern tree of depth at most dd and thus also for every \mathcal{F}-pattern tree of depth at most d1d-1. We may thus focus on showing the existence of the bijection β\beta.

Let X,Y{G,H}X,Y\in\{G,H\}, xVXx\in V_{X} and yVYy\in V_{Y}. We know, by induction and the proof of the previous implication, that (X,x)-𝖶𝖫(d1)(Y,y)(X,x)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(Y,y) if and only if 𝗁𝗈𝗆(Sir,Xx)=𝗁𝗈𝗆(Sir,Yy)\mathsf{hom}(S_{i}^{r},X^{x})=\mathsf{hom}(S_{i}^{r},Y^{y}) for each iKi\in K. Denote by R1,,ReR_{1},\ldots,R_{e} the equivalence class on VXVYV_{X}\cup V_{Y} induced by -𝖶𝖫(d1)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}. Furthermore, define Nj,X(x):=NX(x)RjN_{j,X}(x):=N_{X}(x)\cap R_{j} and let nj=|Nj,G(v)|n_{j}=|N_{j,G}(v)| and mj=|Nj,H(w)|m_{j}=|N_{j,H}(w)| for vVGv\in V_{G} and wVHw\in V_{H}, for each j[e]j\in[e]. If we can show that nj=mjn_{j}=m_{j} for each j[e]j\in[e], then this implies the existence of the desired bijection.

Let Tir=aT_{i}^{r=a} be the \mathcal{F}-pattern tree of depth at most dd obtained by attaching SirS_{i}^{r} to a new root vertex rr labeled with aa. We may assume that vv and ww both have label aa, since their homomorphism counts for the single root trees with labels from Σ\Sigma. The root vertex rr is not joined with any 𝐬\mathcal{F}^{\mathbf{s}} (or alternatively it is joined with 𝟎\mathcal{F}^{\mathbf{0}}). It will be convenient to denote the root of SirS_{i}^{r} by rir_{i} instead of rr. Then for each i[K]i\in[K]:

𝗁𝗈𝗆(Tir=a,Gv)\displaystyle\mathsf{hom}(T_{i}^{r=a},G^{v}) =vNG(v)𝗁𝗈𝗆(Siri,Gv)=j[e]nj𝗁𝗈𝗆(Siri,Gvj)\displaystyle=\sum_{v^{\prime}\in N_{G}(v)}\mathsf{hom}(S_{i}^{r_{i}},G^{v^{\prime}})=\sum_{j\in[e]}n_{j}\mathsf{hom}(S_{i}^{r_{i}},G^{v^{\prime}_{j}})
=j[e]mj𝗁𝗈𝗆(Siri,Hwj)=wNH(w)𝗁𝗈𝗆(Siri,Hw)=𝗁𝗈𝗆(Tir=a,Hw),\displaystyle=\sum_{j\in[e]}m_{j}\mathsf{hom}(S_{i}^{r_{i}},H^{w_{j}^{\prime}})=\sum_{w^{\prime}\in N_{H}(w)}\mathsf{hom}(S_{i}^{r_{i}},H^{w^{\prime}})=\mathsf{hom}(T_{i}^{r=a},H^{w}),

where vjv_{j}^{\prime} and wjw_{j}^{\prime} denote arbitrary vertices in Nj,G(v)N_{j,G}(v) and Nj,H(w)N_{j,H}(w), respectively. Let us denote 𝗁𝗈𝗆(Siri,Gvj)\mathsf{hom}(S_{i}^{r_{i}},G^{v_{j}^{\prime}}) by aija_{ij} and observe that this is equal to 𝗁𝗈𝗆(Siri,Hwj)\mathsf{hom}(S_{i}^{r_{i}},H^{w_{j}^{\prime}}). Hence, we know that for each i[K]i\in[K]:

j[e]aijnj=j[e]aijmj.\sum_{j\in[e]}a_{ij}n_{j}=\sum_{j\in[e]}a_{ij}m_{j}.

Let us call a set I[K]I\subseteq[K] compatible if all roots in SiriS_{i}^{r_{i}}, for iIi\in I, have the same label. Consider a vector 𝐬=(s1,,sK)K\mathbf{s}=(s_{1},\ldots,s_{K})\in\mathbb{N}^{K} and define its support as 𝗌𝗎𝗉𝗉(𝐬):={i[K]si0}\mathsf{supp}(\mathbf{s}):=\{i\in[K]\mid s_{i}\neq 0\}. We say that 𝐬\mathbf{s} is compatible if its support is. For such a compatible 𝐬\mathbf{s} we now define Tr=a,𝐬T^{r=a,\mathbf{s}} to be the \mathcal{F}-pattern tree with root rr labeled with aa, with one child cc which is joined with (and inheriting the label from) the following \mathcal{F}-pattern tree of depth d1d-1:

i𝗌𝗎𝗉𝗉(𝐬)Sisi.\bigstar_{i\in\mathsf{supp}(\mathbf{s})}S_{i}^{s_{i}}.

In other words, we simply join together powers of the SiriS_{i}^{r_{i}}’s that have roots with the same label. Then for every compatible 𝐬K\mathbf{s}\in\mathbb{N}^{K}:

𝗁𝗈𝗆(Tr=a,𝐬,Gv)\displaystyle\mathsf{hom}(T^{r=a,\mathbf{s}},G^{v}) =vNG(v)i[K](𝗁𝗈𝗆(Siri,Gv))si=j[e]nji[K](𝗁𝗈𝗆(Siri,Gvj))si\displaystyle=\sum_{v^{\prime}\in N_{G}(v)}\prod_{i\in[K]}\bigl{(}\mathsf{hom}(S_{i}^{r_{i}},G^{v^{\prime}})\bigr{)}^{s_{i}}=\sum_{j\in[e]}n_{j}\prod_{i\in[K]}\bigl{(}\mathsf{hom}(S_{i}^{r_{i}},G^{v_{j}^{\prime}})\bigr{)}^{s_{i}}
=j[e]mji[K](𝗁𝗈𝗆(Siri,Hwj))si=wNH(w)i[K](𝗁𝗈𝗆(Siri,Hw))si\displaystyle=\sum_{j\in[e]}m_{j}\prod_{i\in[K]}\bigl{(}\mathsf{hom}(S_{i}^{r_{i}},H^{w_{j}^{\prime}})\bigr{)}^{s_{i}}=\sum_{w^{\prime}\in N_{H}(w)}\prod_{i\in[K]}\bigl{(}\mathsf{hom}(S_{i}^{r_{i}},H^{w^{\prime}})\bigr{)}^{s_{i}}
=𝗁𝗈𝗆(Tir=a,𝐬,Hw),\displaystyle=\mathsf{hom}(T_{i}^{r=a,\mathbf{s}},H^{w}),

where, as before, vjv_{j}^{\prime} and wjw_{j}^{\prime} denote arbitrary vertices in Nj,G(v)N_{j,G}(v) and Nj,H(w)N_{j,H}(w), respectively. Hence, for any compatible 𝐬K\mathbf{s}\in\mathbb{N}^{K}:

j[e]nji[K]aijsi=j[e]mji[K]aijsi.\sum_{j\in[e]}n_{j}\prod_{i\in[K]}a_{ij}^{s_{i}}=\sum_{j\in[e]}m_{j}\prod_{i\in[K]}a_{ij}^{s_{i}}.

We now continue in the same way as in the proof of Lemma 4.2 in Grohe (2020b). We repeat the argument here for completeness. Define 𝐚j𝐬:=i=1Kaijsi\mathbf{a}_{j}^{\mathbf{s}}:=\prod_{i=1}^{K}a_{ij}^{s_{i}} for each j[e]j\in[e]. We assume, for the sake of contradiction, that there exists a j[e]j\in[e] such that njmjn_{j}\neq m_{j}. We choose such a j0[e]j_{0}\in[e] for which S=𝗌𝗎𝗉𝗉(𝐚j0)S=\mathsf{supp}(\mathbf{a}_{j_{0}}) is inclusion-wise maximal.

We first rule out that S=S=\emptyset. Indeed, suppose that S=S=\emptyset. This implies that 𝐚j0=𝟎\mathbf{a}_{j_{0}}=\mathbf{0}. Now observe that 𝐚j\mathbf{a}_{j} and 𝐚j\mathbf{a}_{j^{\prime}} are mutually distinct for all j,j[e]j,j^{\prime}\in[e], jjj\neq j^{\prime}. Indeed, if they were equal then this would imply that Rj=RjR_{j}=R_{j^{\prime}}. Hence, 𝗌𝗎𝗉𝗉(𝐚j)\mathsf{supp}(\mathbf{a}_{j})\neq\emptyset for any jj0j\neq j_{0}. We note that nj=mjn_{j}=m_{j} for all jj0j\neq j_{0} by the maximality of SS. Hence, nj0=njj0nj=njj0mj=mj0n_{j_{0}}=n-\sum_{j\neq j_{0}}n_{j}=n-\sum_{j\neq{j_{0}}}m_{j}=m_{j_{0}}, contradicting our assumption. Hence, SS\neq\emptyset.

Consider J:={j[e]𝗌𝗎𝗉𝗉(𝐚j)=S}J:=\{j\in[e]\mid\mathsf{supp}(\mathbf{a}_{j})=S\}. For each jJj\in J, consider the truncated vector 𝐚^j:=(aijiS)\hat{\mathbf{a}}_{j}:=(a_{ij}\mid i\in S). We note that 𝐚^j\hat{\mathbf{a}}_{j}, for jJj\in J, all have positive entries and are mutually distinct. Lemma 4.1 in Grohe (2020b) implies that we can find a vector (with non-zero entries) 𝐬^=(s^iiS)\hat{\mathbf{s}}=(\hat{s}_{i}\mid i\in S) such that the numbers 𝐚^j𝐬^\hat{\mathbf{a}}_{j}^{\hat{\mathbf{s}}} for jJj\in J are mutually distinct as well. We next consider 𝐬=(s1,,sK)\mathbf{s}=(s_{1},\ldots,s_{K}) with si=s^is_{i}=\hat{s}_{i} if iSi\in S and si=0s_{i}=0 otherwise. Then by definition of 𝐬^\hat{\mathbf{s}}, also 𝐚j𝐬\mathbf{a}_{j}^{\mathbf{s}} for jJj\in J are mutually distinct.

We next note that for every pp\in\mathbb{N}, 𝐚jp𝐬=(𝐚j𝐬)p\mathbf{a}_{j}^{p\mathbf{s}}=(\mathbf{a}_{j}^{\mathbf{s}})^{p} and if we define 𝐀\mathbf{A} to be the |J|×|J||J|\times|J|-matrix such that Ajj:=𝐚jj𝐬A_{jj^{\prime}}:=\mathbf{a}_{j}^{j^{\prime}\mathbf{s}} then this will be an invertible matrix (Vandermonde). We use this invertibility to show that nj0=mj0n_{j_{0}}=m_{j_{0}}.

Let 𝐧J:=(njjJ)\mathbf{n}_{J}:=(n_{j}\mid j\in J) and 𝐦J=(mjjJ)\mathbf{m}_{J}=(m_{j}\mid j\in J). If we inspect the jj^{\prime}th entry of 𝐧J𝐀\mathbf{n}_{J}\cdot\mathbf{A}, then this is equal to

jJnj𝐚jj𝐬=j[e]nj𝐚jj𝐬j[e]S𝗌𝗎𝗉𝗉(𝐚j)nj𝐚jj𝐬j[e]S𝗌𝗎𝗉𝗉(𝐚j)nj𝐚jj𝐬.\displaystyle\sum_{j\in J}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=\sum_{j\in[e]}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}-\sum_{\begin{subarray}{c}j\in[e]\\ S\not\subseteq\mathsf{supp}(\mathbf{a}_{j})\end{subarray}}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}-\sum_{\begin{subarray}{c}j\in[e]\\ S\subset\mathsf{supp}(\mathbf{a}_{j})\end{subarray}}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}.

We want to reduce the above expression to

jJnj𝐚jj𝐬=j[e]nj𝐚jj𝐬j[e]S𝗌𝗎𝗉𝗉(𝐚j)nj𝐚jj𝐬.\sum_{j\in J}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=\sum_{j\in[e]}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}-\sum_{\begin{subarray}{c}j\in[e]\\ S\subset\mathsf{supp}(\mathbf{a}_{j})\end{subarray}}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}.

To see that this holds, we verify that when S𝗌𝗎𝗉𝗉(𝐚j)S\not\subseteq\mathsf{supp}(\mathbf{a}_{j}) then 𝐚jj𝐬=0\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=0. Indeed, take an S\ell\in S such that 𝗌𝗎𝗉𝗉(𝐚j)\ell\not\in\mathsf{supp}(\mathbf{a}_{j}). Then, 𝐚jj𝐬\mathbf{a}_{j}^{j^{\prime}\mathbf{s}} contains the factor ajjs=0sa_{\ell j}^{j^{\prime}s_{\ell}}=0^{s_{\ell}} with s=s^0s_{\ell}=\hat{s}_{\ell}\neq 0. Hence, 𝐚jj𝐬=0\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=0.

Now, by the maximality of SS, for all jj with S𝗌𝗎𝗉𝗉(𝐚j)S\subset\mathsf{supp}(\mathbf{a}_{j}) we have nj=mjn_{j}=m_{j} and thus

j[e]S𝗌𝗎𝗉𝗉(𝐚j)nj𝐚jj𝐬=j[e]S𝗌𝗎𝗉𝗉(𝐚j)mj𝐚jj𝐬.\sum_{\begin{subarray}{c}j\in[e]\\ S\subset\mathsf{supp}(\mathbf{a}_{j})\end{subarray}}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=\sum_{\begin{subarray}{c}j\in[e]\\ S\subset\mathsf{supp}(\mathbf{a}_{j})\end{subarray}}m_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}.

Since j[e]nj𝐚jj𝐬=j[e]mj𝐚jj𝐬\sum_{j\in[e]}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=\sum_{j\in[e]}m_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}, we thus also have that

jJnj𝐚jj𝐬=jJmj𝐚jj𝐬.\sum_{j\in J}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=\sum_{j\in J}m_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}.

Since this holds for all jJj^{\prime}\in J, we have 𝐧J𝐀=𝐦J𝐀\mathbf{n}_{J}\cdot\mathbf{A}=\mathbf{m}_{J}\cdot\mathbf{A} and by the invertibility of 𝐀\mathbf{A}, 𝐧J=𝐦J\mathbf{n}_{J}=\mathbf{m}_{J}. In particular, since j0Jj_{0}\in J, nj0=mj0n_{j_{0}}=m_{j_{0}} contradicting our assumption.

As a consequence, nj=mjn_{j}=m_{j} for all j[e]j\in[e] and thus we have our desired bijection.

It remains to verify that we only need a finite number of \mathcal{F}-pattern trees to conclude that nj=mjn_{j}=m_{j} for all j[e]j\in[e]. In fact, the above proof indicates that we just need to check test for each root label aa, we need to check identities for the finite number of pattern trees used to define the matrix 𝐀\mathbf{A}.

Proof of equivalence 2 This equivalence is shown just like proof of Theorem 4.4. in Grohe (2020a) with the additional techniques from Lemma 4.2 in Grohe (2020b).

\Longrightarrow We first show that G-𝖶𝖫(d)HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H implies 𝗁𝗈𝗆(T,G)=𝗁𝗈𝗆(T,H)\mathsf{hom}(T,G)=\mathsf{hom}(T,H) for unrooted \mathcal{F}-pattern trees TT of depth at most dd.

Assume that VXVY=V_{X}\cap V_{Y}=\emptyset for X,Y{G,H}X,Y\in\{G,H\}. For xVXx\in V_{X} and yVYy\in V_{Y}, define xdyx\sim_{d}y if and only if 𝗁𝗈𝗆(Tr,Xx)=𝗁𝗈𝗆(Tr,Yy)\mathsf{hom}(T^{r},X^{x})=\mathsf{hom}(T^{r},Y^{y}) for all \mathcal{F}-pattern trees TrT^{r} of depth at most dd. Let R1,,ReR_{1},\ldots,R_{e} be the d\sim_{d}-equivalence classes and for each j[e]j\in[e], let pj:=|RjVG|p_{j}:=|R_{j}\cap V_{G}| and qj:=|RjVH|q_{j}:=|R_{j}\cap V_{H}|. Suppose that G-𝖶𝖫(d)HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H. This implies that pj=qjp_{j}=q_{j} for every j[e]j\in[e].

Let TT be an unrooted \mathcal{F}-pattern tree of depth at most dd, let rr be any vertex on the backbone of TT, and let TrT^{r} be the rooted \mathcal{F}-pattern tree obtained from TT by declaring rr as its root. By definition, for X{G,H}X\in\{G,H\}, any xVXRjx\in V_{X}\cap R_{j}, 𝗁𝗈𝗆(Tr,Xx)\mathsf{hom}(T^{r},X^{x}) are all the same number, only dependent on j[e]j\in[e]. Hence,

𝗁𝗈𝗆(T,G)\displaystyle\mathsf{hom}(T,G) =vV(G)𝗁𝗈𝗆(Tr,Gv)=j[e]pj𝗁𝗈𝗆(Tr,Gvj)\displaystyle=\sum_{v\in V(G)}\mathsf{hom}(T^{r},G^{v})=\sum_{j\in[e]}p_{j}\mathsf{hom}(T^{r},G^{v_{j}})
=j[e]qj𝗁𝗈𝗆(Tr,Hwj)=wV(H)𝗁𝗈𝗆(Tr,Hw)=𝗁𝗈𝗆(T,H),\displaystyle=\sum_{j\in[e]}q_{j}\mathsf{hom}(T^{r},H^{w_{j}})=\sum_{w\in V(H)}\mathsf{hom}(T^{r},H^{w})=\mathsf{hom}(T,H),

where vjv_{j} and wjw_{j} are arbitrary vertices in RjVGR_{j}\cap V_{G} and RjVHR_{j}\cap V_{H}, respectively, and where we used that 𝗁𝗈𝗆(Tr,Gvj)=𝗁𝗈𝗆(Tr,Hwj)\mathsf{hom}(T^{r},G^{v_{j}})=\mathsf{hom}(T^{r},H^{w_{j}}) and pj=qjp_{j}=q_{j}. Since this holds for any unrooted \mathcal{F}-pattern tree TT of depth at most dd, we have show the desired implication.

\Longleftarrow We next check the other direction. That is, we assume that 𝗁𝗈𝗆(T,G)=𝗁𝗈𝗆(T,H)\mathsf{hom}(T,G)=\mathsf{hom}(T,H) holds for any unrooted \mathcal{F}-pattern tree TT of depth at most dd and verify that G-𝖶𝖫(d)HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H.

For xdyx\sim_{d}y to hold for xVXx\in V_{X}, yVYy\in V_{Y} and X,Y{G,H}X,Y\in\{G,H\}, we earlier showed that this corresponds to checking whether 𝗁𝗈𝗆(Tiri,Xx)=𝗁𝗈𝗆(Tiri,Yy)\mathsf{hom}(T_{i}^{r_{i}},X^{x})=\mathsf{hom}(T_{i}^{r_{i}},Y^{y}) for a finite number KK rooted \mathcal{F}-pattern trees TiriT_{i}^{r_{i}}. By definition of the RjR_{j}’s, aij:=𝗁𝗈𝗆(Tiri,Xx)a_{ij}:=\mathsf{hom}(T_{i}^{r_{i}},X^{x}) for xRjx\in R_{j} is well-defined (independent of the choice of X{G,H}X\in\{G,H\} xVXx\in V_{X}). For the rooted TiriT_{i}^{r_{i}}’s we denote by TiT_{i} its unrooted version. Similarly as before,

𝗁𝗈𝗆(Ti,G)=j[e]aijpj=j[e]aijqj=𝗁𝗈𝗆(Ti,H).\mathsf{hom}(T_{i},G)=\sum_{j\in[e]}a_{ij}p_{j}=\sum_{j\in[e]}a_{ij}q_{j}=\mathsf{hom}(T_{i},H).

We next show that pj=qjp_{j}=q_{j} for j[e]j\in[e]. In fact, this is shown in precisely the same way as in our previous characterisation and based on Lemma 4.2 in Grohe (2020b). That is, we again consider trees obtained by joining copies of the TiT_{i}’s, to obtain, for compatible 𝐬K\mathbf{s}\in\mathbb{N}^{K},

j[e]aijsipj=j[e]aijsiqj.\sum_{j\in[e]}a_{ij}^{s_{i}}p_{j}=\sum_{j\in[e]}a_{ij}^{s_{i}}q_{j}.

It now suffices to repeat the same argument as before (details omitted). ∎

Appendix B Proofs of Section 4

B.1 Additional details of standard concepts

Core and treewidth.

A graph GG is a core if all homomorphisms from GG to itself are injective. The treewidth of a graph G=(V,E,χ)G=(V,E,\chi) is a measure of how much GG resembles a tree. This is defined in terms of the tree decompositions of GG, which are pairs (T,λ)(T,\lambda), for a tree T=(VT,ET)T=(V_{T},E_{T}) and λ\lambda a mapping that associates each vertex tt of VTV_{T} with a set λ(t)V\lambda(t)\subseteq V, satisfying the following:

  • The union of λ(t)\lambda(t), for tVTt\in V_{T}, is equal to VV;

  • The set {tVTvλ(t)}\{t\in V_{T}\mid v\in\lambda(t)\} is connected, for all vVv\in V; and

  • For each {u,v}E\{u,v\}\in E there is tVTt\in V_{T} with {u,v}λ(t)\{u,v\}\in\lambda(t).

The width of (T,λ)(T,\lambda) is mintT(|λ(t)|)1\min_{t\in T}(|\lambda(t)|)-1. The treewidth of GG is the minimum width of its tree decompositions. For instance, trees have treewidth one, cycles have clique two, and the kk-clique KkK_{k} has treewidth k1k-1 (for k>1k>1).

If PrP^{r} is a pattern, then its treewidth is defined as the treewidth of the graph PP. Similarly, PrP^{r} is a core if PP is.

𝒌-𝗪𝗟k\text{-}\mathsf{WL}.

A partial isomorphism from a graph GG to a graph HH is a set πVG×VH\pi\subseteq V_{G}\times V_{H} such that all (v,w),(v,w)π(v,w),(v^{\prime},w^{\prime})\in\pi satisfy the equivalences v=vw=wv=v^{\prime}\Leftrightarrow w=w^{\prime}, {v,v}EG{w,w}EH\{v,v^{\prime}\}\in E_{G}\Leftrightarrow\{w,w^{\prime}\}\in E_{H}, χG(v)=χH(w)\chi_{G}(v)=\chi_{H}(w) and χG(v)=χH(w)\chi_{G}(v^{\prime})=\chi_{H}(w^{\prime}). We may view π\pi as a bijective mapping from a subset XVGX\subseteq V_{G} to a subset of YVHY\subseteq V_{H} that is an isomorphism from the induced subgraph G[X]G[X] to the induced subgraph H[Y]H[Y]. The isomorphism type 𝗂𝗌𝗈𝗍𝗉(G,v¯)\mathsf{isotp}(G,\bar{v}) of a kk-tuple v¯=(v1,,vk)\bar{v}=(v_{1},\ldots,v_{k}) is a label in some alphabet Σ\Sigma such that 𝗂𝗌𝗈𝗍𝗉(G,v¯)=𝗂𝗌𝗈𝗍𝗉(H,w¯)\mathsf{isotp}(G,\bar{v})=\mathsf{isotp}(H,\bar{w}) if and only if π={(v1,w1),,(vk,wk)}\pi=\{(v_{1},w_{1}),\ldots,(v_{k},w_{k})\} is a partial isomorphism from GG to HH.

Let k1k\geq 1 and G=(V,E,χ)G=(V,E,\chi). The kk-dimensional Weisfeiler-Leman algorithm (k-𝖶𝖫k\text{-}\mathsf{WL}) computes a sequence of labellings χk,G(d)\chi_{k,G}^{(d)} from VkΣV^{k}\to\Sigma. We denote by χk,G,v¯(d)\chi_{k,G,\bar{v}}^{(d)} the label assigned to the kk-tuple v¯Vk\bar{v}\in V^{k} in round dd. The initial labelling χk,G(0)\chi_{k,G}^{(0)} assigns to each kk-tuple v¯\bar{v} is isomorphism type 𝗂𝗌𝗈𝗍𝗉(G,v¯)\mathsf{isotp}(G,\bar{v}). Then, for round dd,

χk,G,v¯(d):=(χk,G,v¯(d1),Mv¯(d1)),\chi_{k,G,\bar{v}}^{(d)}:=\big{(}\chi_{k,G,\bar{v}}^{(d-1)},M^{(d-1)}_{\bar{v}}\big{)},

where Mv¯(d1)M^{(d-1)}_{\bar{v}} is the multiset

{{(𝗂𝗌𝗈𝗍𝗉(v1,,vk,w),χk,G,(v1,,vk1,w)(d1),χk,G,(v1,,vk2,w,vk)(d1),,χk,G,(w,v2,,vk)(d1))|wV}}.\Big{\{}\!\!\!\Big{\{}\big{(}\mathsf{isotp}(v_{1},\ldots,v_{k},w),\chi_{k,G,(v_{1},\ldots,v_{k-1},w)}^{(d-1)},\chi_{k,G,(v_{1},\ldots,v_{k-2},w,v_{k})}^{(d-1)},\ldots,\chi_{k,G,(w,v_{2},\ldots,v_{k})}^{(d-1)}\big{)}\>\Big{|}\;w\in V\Big{\}}\!\!\!\Big{\}}.

As observed in Dell et al. (2018), if k2k\geq 2 holds, then we can omit the entry 𝗂𝗌𝗈𝗍𝗉(v1,,vk,w)\mathsf{isotp}(v_{1},\ldots,v_{k},w) from the tuples in Mv¯M_{\bar{v}}, because all the information it contains is also contained in the entries χk,G,(d1)\chi_{k,G,\ldots}^{(d-1)} of these tuples. Also, 𝖶𝖫=1-𝖶𝖫\mathsf{WL}=1\text{-}\mathsf{WL} in the sense that χG,v(d)=χG,v(d)\chi_{G,v}^{(d)}=\chi_{G,v^{\prime}}^{(d)} if and only if χ1,G,v(d)=χ1,G,v(d)\chi_{1,G,v}^{(d)}=\chi_{1,G,v^{\prime}}^{(d)} for all v,vVv,v^{\prime}\in V. The k-𝖶𝖫k\text{-}\mathsf{WL} algorithm is run until the labelings stabilises, i.e., if for all v¯,w¯Vk\bar{v},\bar{w}\in V^{k}, χk,G,v¯(d)=χk,G,w¯(d)\chi_{k,G,\bar{v}}^{(d)}=\chi_{k,G,\bar{w}}^{(d)} if and only if χk,G,v¯(d+1)=χk,G,w¯(d+1)\chi_{k,G,\bar{v}}^{(d+1)}=\chi_{k,G,\bar{w}}^{(d+1)}. We say that k-𝖶𝖫k\text{-}\mathsf{WL} distinguishes two graphs GG and HH if the multisets of labels for all kk-tuples of vertices in GG and HH, respectively, coincides. Similar notions as are place for distinguishing kk-tuples, and for distinguishing graphs (or vertices) based on labels computed by a given number of rounds.

We remark that k-𝖶𝖫k\text{-}\mathsf{WL} algorithm given here is sometimes referred to as the “folklore” version of the kk-dimensional Weisfeiler-Leman algorithm. It is known that indistinguishability of graphs by k-𝖶𝖫k\text{-}\mathsf{WL} is equivalent to indistinguishability by sentences in the the k+1k+1-variable fragment of first order logic with counting (Cai et al., 1992), and to 𝗁𝗈𝗆(P,G)=𝗁𝗈𝗆(P,H)\mathsf{hom}(P,G)=\mathsf{hom}(P,H) for every graph of treewidth kk (Dvorak, 2010; Dell et al., 2018).

B.2 Proof of Proposition 2

We show that for each finite set \mathcal{F} of patterns, the expressive power of -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} is bounded by k-𝖶𝖫k\text{-}\mathsf{WL}, where kk is the largest treewidth of a pattern in \mathcal{F}.

We first recall the following characterisation of k-𝖶𝖫k\text{-}\mathsf{WL}-equivalence (Dvorak, 2010; Dell et al., 2018). For any two graphs GG and HH,

Gk-𝖶𝖫H𝗁𝗈𝗆(P,G)=𝗁𝗈𝗆(P,H)G\equiv_{k\text{-}\mathsf{WL}}H\Longleftrightarrow\mathsf{hom}(P,G)=\mathsf{hom}(P,H)

for every graph PP of treewidth at most kk. On the other hand, we know from Theorem 1 that

G-𝖶𝖫H𝗁𝗈𝗆(T,G)=𝗁𝗈𝗆(T,H)G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H\Longleftrightarrow\mathsf{hom}(T,G)=\mathsf{hom}(T,H)

for every \mathcal{F}-pattern tree TT. Hence, we may conclude that

Gk-𝖶𝖫HG-𝖶𝖫HG\equiv_{k\text{-}\mathsf{WL}}H\Longrightarrow G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H

if we can show that any \mathcal{F}-pattern tree has treewidth at most kk.

Suppose that kk is the maximal treewidth of a pattern in \mathcal{F}. To conclude the proof, we verify that the treewidth of any \mathcal{F}-pattern tree is bounded by kk.

Lemma 1.

If kk is the maximal treewidth of a pattern in \mathcal{F}, then the treewidth of any \mathcal{F}-pattern tree TT is bounded by kk.

Proof.

The proof is by induction on the number of patterns joined at any leaf of TT. Clearly, if no patterns are joined, then TT is simply a tree and its treewidth is 11. Otherwise, consider a \mathcal{F}-pattern tree T=(V,E,χ)T=(V,E,\chi) whose treewidth is at most kk, and a pattern PrP^{r} of treewidth kk that is to be joined at vertex tt of TT. By the induction hypothesis, there is a decomposition (H,λ)(H,\lambda) for TT witnessing its bounded treewidth, that is,

  1. 1.

    The union of all λ(h)\lambda(h), for hVHh\in V_{H}, is equal to VV;

  2. 2.

    The set {hVHtλ(h)}\{h\in V_{H}\mid t\in\lambda(h)\} is connected, for all tVt\in V;

  3. 3.

    For each {u,v}E\{u,v\}\in E there is hVHh\in V_{H} with {u,v}λ(h)\{u,v\}\in\lambda(h); and

  4. 4.

    The size of each set λ(h)\lambda(h) is at most k+1k+1.

Likewise, by assumption, for pattern PrP^{r} we have such a tree decomposition, say (HP,λP)(H^{P},\lambda^{P}).

Now consider any vertex hh of the decomposition of TT such that λ(h)\lambda(h) contains vertex tt in TT to which PrP^{r} is to be joined at its root. We can create a joint tree decomposition for the join of PrP^{r} and TT (at node tt) by merging HH and HPH^{P} with an edge from vertex hh in HH to the root of HPH^{P} (recall HPH^{P} is a tree by definition). It is readily verified that this decomposition maintains all necessary properties. Indeed, condition 1 is clearly satisfied since λ\lambda and λp\lambda^{p} combined cover all vertices of the join of TT with PrP^{r}. Furthermore, since the only node shared by TT and PrP^{r} is the join node, and we merge HH and HPH^{P} by putting an edge from node hh in HH to the root of HPH^{P}, connectivity of is guaranteed and condition 2 is satisfied. Moreover, since the operation of joining TT and PrP^{r} does not create any extra edges, condition 2 is immediately verified, and so is 3, because we do not create any new vertices, neither in HH nor in HPH^{P}, and we already know that λ\lambda and λP\lambda^{P} are bounded by k+1k+1. ∎

B.3 Proof of Theorem 2

We show that if \mathcal{F} contains a pattern PrP^{r} which is a core and has treewidth kk, then -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} is not bounded by (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL}. In other words, we construct two graphs GG and HH that can be distinguished by -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} but not by (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL}. It suffices to find such graphs that can be distinguished by {Pr}-𝖶𝖫\{P^{r}\}\text{-}\mathsf{WL} but not by (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL}. The proof relies on the characterisation of (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL} indistinguishability in terms of the kk-variable fragment 𝖢k\mathsf{C}^{k} of first logic with counting and of kk-pebble bijective games in particular (Cai et al., 1992; Hella, 1996). More precisely, G(k1)-𝖶𝖫HG\equiv_{(k-1)\text{-}\mathsf{WL}}H if and only if no sentence in 𝖢k\mathsf{C}^{k} can distinguish GG from HH. In other words, for any sentence φ\varphi in 𝖢k\mathsf{C}^{k}, GφG\models\varphi if and only if HφH\models\varphi. We denote indistinguishability by 𝖢k\mathsf{C}^{k} by G𝖢kHG\equiv_{\mathsf{C}^{k}}H. We heavily rely on the constructions used in Atserias et al. (2007) and Bova & Chen (2019). In fact, we show that the graphs GG and HH constructed in those works, suffice for our purpose, by extending their strategy for the kk-pebble game to kk-pebble bijective games.

Construction of the graphs 𝑮G and 𝑯H.

Let PrP^{r} be a pattern in \mathcal{F} which is a core and has treewidth kk. For a vertex vVPv\in V_{P}, we gather all its edges in Ev:={{v,v}{v,v}EP}E_{v}:=\bigl{\{}\{v,v^{\prime}\}\mid\{v,v^{\prime}\}\in E_{P}\bigr{\}}. Let v1v_{1} be one of the vertices in VPV_{P}.

For GG, as vertex set VGV_{G} we take vertices of the form (v,f)(v,f) with vVPv\in V_{P} and f:Ev{0,1}f:E_{v}\to\{0,1\}. We require that

eEv1f(e)mod2=1 and eEvf(e)mod2=0 for vv1vVP.\sum_{e\in E_{v_{1}}}f(e)\!\!\mod 2=1\text{ and }\sum_{e\in E_{v}}f(e)\!\!\mod 2=0\text{ for $v\neq v_{1}$, $v\in V_{P}$}.

For HH, as vertex set VHV_{H} we take vertices of the form (v,f)(v,f) with vVPv\in V_{P} and f:Ev{0,1}f:E_{v}\to\{0,1\}. We require that eEvf(e)mod2=0\sum_{e\in E_{v}}f(e)\mod 2=0, for all vVPv\in V_{P}. We observe that GG and HH have the same number of vertices.

The edge sets EGE_{G} and EHE_{H} of GG and HH, respectively, are defined as follows: (v,f)(v,f) and (v,f)(v^{\prime},f^{\prime}) are adjacent if and only if vvv\neq v^{\prime} and furthermore,

f({v,v}=f({v,v}).f(\{v,v^{\prime}\}=f^{\prime}(\{v,v^{\prime}\}).

It is known that 𝗁𝗈𝗆(P,G)=0\mathsf{hom}(P,G)=0 (here it is used that PP is a core), 𝗁𝗈𝗆(P,H)0\mathsf{hom}(P,H)\neq 0 and GG and HH are indistinguishably by means of sentences in the kk-variable fragment 𝖥𝖮k\mathsf{FO}^{k} of first order logic (Atserias et al., 2007; Bova & Chen, 2019). To show our theorem, we thus need to verify that G𝖢kHG\equiv_{\mathsf{C}^{k}}H as well. Indeed, for if this holds, then G(k1)-𝖶𝖫HG\equiv_{(k-1)\text{-}\mathsf{WL}}H yet G{P}-𝖶𝖫(0)HG\not\equiv_{\{P\}\text{-}\mathsf{WL}}^{(0)}H. Indeed, Theorem 1 implies that for G{P}-𝖶𝖫(0)HG\not\equiv_{\{P\}\text{-}\mathsf{WL}}^{(0)}H to hold, 𝗁𝗈𝗆(P,G)=𝗁𝗈𝗆(P,H)\mathsf{hom}(P,G)=\mathsf{hom}(P,H), which we know not to be true. Hence, G{P}-𝖶𝖫HG\not\equiv_{\{P\}\text{-}\mathsf{WL}}H, as desired.

Showing 𝗖𝒌\mathsf{C}^{k}-indistinguishability of 𝑮G and 𝑯H.

We next show that the graphs GG and HH are indistinguishable by sentences in 𝖢k\mathsf{C}^{k}. This will be shown by verifying that the Duplicator has a winning strategy for the kk-pebble bijective game on GG and HH (Hella, 1996).

The kk-pebble bijective game. We recall that the kk-pebble bijective game is played between two players, the Spoiler and the Duplicator, each placing at most kk pebbles on the vertices of GG and HH, respectively. The game is played in a number of rounds. The pebbles placed after round rr are typically represented by a partial function p(r):{1,,k}VG×VHp^{(r)}:\{1,\ldots,k\}\to V_{G}\times V_{H}. When p(r)(i)p^{(r)}(i) is defined, say, p(r)(i)=(v,w)p^{(r)}(i)=(v,w), this means that the Spoiler places the iith pebble on vertex vv and the Duplicator places the iith pebble on ww. Initially, no pebbles are placed on GG and HH and hence p(0)p^{(0)} is undefined everywhere.

Then in round r>0r>0, the game proceeds as follows:

  1. 1.

    The Spoiler selects a pebble ii in [k][k]. All other already placed pebbles are kept on the same vertices. We define p(r)(j)=p(r1)(j)p^{(r)}(j)=p^{(r-1)}(j) for all j[k]j\in[k], jij\neq i.

  2. 2.

    The Duplicator responds by choosing a bijection h:VGVHh:V_{G}\to V_{H}. This bijection should be consistent with the pebbles in the restriction of p(r1)p^{(r-1)} to [k]{i}[k]\setminus\{i\}. That is, for every j[k]j\in[k], jij\neq i, if p(r1)(j)=(v,w)p^{(r-1)}(j)=(v,w) then w=h(v)w=h(v).

  3. 3.

    Next, the Spoiler selects an element vVGv\in V_{G}.

  4. 4.

    The Duplicator defines p(r)(i)=(v,h(v))p^{(r)}(i)=(v,h(v)). Hence, after this round, the iith pebble is placed on vv by the Spoiler and on h(v)h(v) by the Duplicator.

Let 𝖽𝗈𝗆(p(r))\mathsf{dom}(p^{(r)}) be the elements in [k][k] for which p(r)p^{(r)} is defined. For i𝖽𝗈𝗆(p(r))i\in\mathsf{dom}(p^{(r)}) denote by (vi,wi)VG×VH(v_{i},w_{i})\in V_{G}\times V_{H} the pair of vertices on which the iith pebble is placed. The Duplicator wins round rr if the mapping viwiv_{i}\mapsto w_{i} is partial isomorphism between GG and HH. More precisely, it should hold that for all edges {vi,vj}EG\{v_{i},v_{j}\}\in E_{G} if and only if (wi,wj)EH(w_{i},w_{j})\in E_{H}. In this case, the game continues to the next round. Infinite games are won by the Duplicator. A winning strategy consists of defining a bijection in step 2 in each round, allowing the game to continue, irregardless of which vertex vv the Spoiler places a pebble in Step 3.

Winning strategy. We will now provide a winning strategy for the kk-bijective game on our constructed graphs GG and HH. We recall that VGV_{G} and VHV_{H} have the same number of vertices, so a bijection between VGV_{G} and VHV_{H} exists. We show how the Duplicator can select a “good” bijection in Step 2 of the game, by induction on the number of rounds.

To state our induction hypothesis, we first recall some notions and properties from Atserias et al. (2007) and Bova & Chen (2019).

Let WW be a walk in PP and let ee be an edge in EPE_{P}. Then, 𝗈𝖼𝖼W(e)\mathsf{occ}_{W}(e) denotes the number of occurrences of the edge ee in the walk. More precisely, if W=(a1,,a)W=(a_{1},\ldots,a_{\ell}) is a walk in PP of length \ell, then

𝗈𝖼𝖼W(e):=|{i[1]e={ai,ai+1}}|.\mathsf{occ}_{W}(e):=|\{i\in[\ell-1]\mid e=\{a_{i},a_{i+1}\}\}|.

Furthermore, for a subset SVPS\subseteq V_{P}, we define

𝖺𝗏𝗈𝗂𝖽(S):={M,MS=}M,\mathsf{avoid}(S):=\bigcup_{\{M\in\mathcal{M},M\cap S=\emptyset\}}M,

where \mathcal{M} is an arbitrary bramble of PP of order >k>k. A bramble \mathcal{M} is a set of connected subsets of VPV_{P} such that for any two elements M1M_{1} and M2M_{2} in \mathcal{M}, either M1M2M_{1}\cap M_{2}\neq\emptyset, or there exists a vertex aM1a\in M_{1} and bM2b\in M_{2} such that {a,b}EP\{a,b\}\in E_{P}. The order of a bramble is the minimum size of a hitting set for \mathcal{M}. It is known that PP has treewidth k\geq k if and only if it has a bramble of order >k>k. In what follows, we let \mathcal{M} be any such bramble.

Lemma 2 (Lemma 14 in Bova & Chen (2019)).

For any 1k1\leq\ell\leq k, let (a1,f1),,(a,f)(a_{1},f_{1}),\ldots,(a_{\ell},f_{\ell}) be vertices in VGV_{G}. Let WW be a walk in PP from v1v_{1} to 𝖺𝗏𝗈𝗂𝖽({a1,,a})\mathsf{avoid}(\{a_{1},\ldots,a_{\ell}\}). For all i[]i\in[\ell], let fi:Eai{0,1}f_{i}^{\prime}:E_{a_{i}}\to\{0,1\} be defined by

fi(e)=fi(e)+𝗈𝖼𝖼W(e)mod2f^{\prime}_{i}(e)=f_{i}(e)+\mathsf{occ}_{W}(e)\mod 2

for all eEaie\in E_{a_{i}}. Then, the mapping (ai,fi)(ai,fi)(a_{i},f_{i})\mapsto(a_{i},f_{i}^{\prime}), for all i[]i\in[\ell], is a partial isomorphism from GG to HH.∎

We use this lemma to show that the bijection (to be defined shortly) selected by the Duplicator induces a partial isomorphism between GG and HH on the pebbled vertices.

We can now state our induction hypothesis: In each round rr, there exists a bijection h:VGVHh:V_{G}\to V_{H} which is

  • (a)

    consistent with the pebbles in the restriction of p(r1)p^{(r-1)} to [k]{i}[k]\setminus\{i\} (Recall, Pebble ii is selected by the Spoiler in Step 1.)

  • (b)

    If p(r)(j)=(aj,fj,h(aj,fj))p^{(r)}(j)=(a_{j},f_{j},h(a_{j},f_{j})) for j𝖽𝗈𝗆(p(r))j\in\mathsf{dom}(p^{(r)}), then there exists a walk W(r)W^{(r)} in PP, from vv to 𝖺𝗏𝗈𝗂𝖽({ajj𝖽𝗈𝗆(p(r))}\mathsf{avoid}(\{a_{j}\mid j\in\mathsf{dom}(p^{(r)})\}, such that

    h(aj,fj)=(aj,fj),h(a_{j},f_{j})=(a_{j},f_{j}^{\prime}),

    where fj(e)=fj(e)+𝗈𝖼𝖼W(r)(e)mod2f_{j}^{\prime}(e)=f_{j}(e)+\mathsf{occ}_{W^{(r)}}(e)\mod 2 for every eEaje\in E_{a_{j}}. In other words, on the vertices in VGV_{G} pebbled by p(r)p^{(r)}, the bijection hh is, by the previous Lemma, a partial isomorphism from GG to HH.

If this holds, then the strategy for the Duplicator is selecting that bijection hh in each round.

Verification of the induction hypothesis. We assume that the special vertex v1v_{1} in PP has at least two neighbours. Such a vertex exists since otherwise PP consists of a single edge while we assume PP to be of treewidth at least two.

Base case. For the base case (r=0)r=0) we define two walks: W1=v1,v2W_{1}=v_{1},v_{2} and W2=v1,tW_{2}=v_{1},t with v2tv_{2}\neq t and v2,tv_{2},t are neighhbours of v1v_{1}. We define h(ai,f)=(ai,f)h(a_{i},f)=(a_{i},f^{\prime}) with f(e)=f(e)+𝗈𝖼𝖼W1(e)mod2f^{\prime}(e)=f(e)+\mathsf{occ}_{W_{1}}(e)\mod 2 if aita_{i}\neq t, and h(t,f)=(t,f)h(t,f)=(t,f^{\prime}) with f(e)=f(e)+𝗈𝖼𝖼W2(e)mod2f^{\prime}(e)=f(e)+\mathsf{occ}_{W_{2}}(e)\mod 2.

The mapping hh is a bijection from VGV_{G} to VHV_{H}. We note that it suffices to show that hh is injective since VGV_{G} and VHV_{H} contain the same number of vertices. Since h(ai,fi)h(aj,fj)h(a_{i},f_{i})\neq h(a_{j},f_{j}) whenever aiaja_{i}\neq a_{j}, we can focus on comparing h(ai,f)h(a_{i},f) and h(ai,g)h(a_{i},g) with fgf\neq g. This implies that f(e)g(e)f(e)\neq g(e) for at least one edge eNaie\in N_{a_{i}}. Clearly, this implies that f(e)=f(e)+𝗈𝖼𝖼W(e)mod2g(e)=g(e)+𝗈𝖼𝖼W(e)mod2f^{\prime}(e)=f(e)+\mathsf{occ}_{W}(e)\mod 2\neq g^{\prime}(e)=g(e)+\mathsf{occ}_{W}(e)\mod 2. In fact this, holds for any walk WW and thus in particular for W1W_{1} and W2W_{2}. We further observe that hh is consistent simply because no pebbles have been placed yet. For the same reason we can take the walk W(0)W^{(0)} to be either W1W_{1} or W2W_{2}.

Inductive case. Assume that the induction hypothesis holds for round rr and consider round r+1r+1. Let p(r)=(aj,fj,aj,fj)p^{(r)}=(a_{j},f_{j},a_{j},f_{j}^{\prime}) for j𝖽𝗈𝗆(p(r))j\in\mathsf{dom}(p^{(r)}). By induction, there exists a bijection h:VGVHh^{\prime}:V_{G}\to V_{H} such that h(aj,fj)=(aj,fj)h(a_{j},f_{j})=(a_{j},f_{j}^{\prime}) and furthermore, fj(e)=fj(e)+𝗈𝖼𝖼W(r)(e)mod2f_{j}^{\prime}(e)=f_{j}(e)+\mathsf{occ}_{W^{(r)}}(e)\mod 2 for every eNaje\in N_{a_{j}}, for some walk W(r)W^{(r)} from v1v_{1} to t𝖺𝗏𝗈𝗂𝖽({ajj𝖽𝗈𝗆(p(r))})t\in\mathsf{avoid}(\{a_{j}\mid j\in\mathsf{dom}(p^{(r)})\}).

Assume that the Spoiler selects i[k]i\in[k] in Step 1 in round r+1r+1. We define the Duplicator’s bijection h:VGVHh:V_{G}\to V_{H} for round r+1r+1, as follows. Recall that tVPt\in V_{P} is the vertex in which the walk W(r)W^{(r)} ends.

  • For all (a,f)VG(a,f)\in V_{G} such that ata\neq t, we define h(a,f)=(a,f)h(a,f)=(a,f^{\prime}) where for each eEae\in E_{a}:

    f(e)=f(e)+𝗈𝖼𝖼W(r)(e)mod2.f^{\prime}(e)=f(e)+\mathsf{occ}_{W^{(r)}}(e)\mod 2.
  • For all (t,f)VG(t,f)\in V_{G}, we will extend W(r)W^{(r)} with a walk WW^{\prime} so that it ends in a vertex tt^{\prime} different from tt. Suppose that MM\in\mathcal{M} such that tMt\in M. We want to find an MM^{\prime}\in\mathcal{M} such that M({ajj𝖽𝗈𝗆(p(r)),ji}{t})=M^{\prime}\cap(\{a_{j}\mid j\in\mathsf{dom}(p^{(r)}),j\neq i\}\cup\{t\})=\emptyset. We can then take tt^{\prime} to be a vertex in MM^{\prime} and since MM and MM^{\prime} are both connected, and either have a vertex in common or an edge between them, we can let WW^{\prime} be a walk from tt to tt^{\prime} entirely in MM and MM^{\prime}. Now, such an MM^{\prime} exists since otherwise {ajj𝖽𝗈𝗆(p(r)),ji}{t}\{a_{j}\mid j\in\mathsf{dom}(p^{(r)}),j\neq i\}\cup\{t\} would be a hitting set for \mathcal{M} of size at most kk. We know, however, that any hitting set \mathcal{M} must be of size k+1k+1 because of the treewidth kk assumption for PP. We now define the bijection as h(t,f)=(t,f)h(t,f)=(t,f^{\prime}) where for each eEte\in E_{t}:

    f(e)=f(e)+𝗈𝖼𝖼W(r),W(e)mod2.f^{\prime}(e)=f(e)+\mathsf{occ}_{W^{(r)},W^{\prime}}(e)\mod 2.

This concludes the definition of h:VGVHh:V_{G}\to V_{H}. We need to verify a couple of things: (i) hh is bijection; (ii) hh is consistent with all pebbles in p(r)p^{(r)} except for the “unpebbled” one p(r)(i)p^{(r)}(i); and (iii) it induces a partial isomorphism on pebbled vertices.

  • (i)

    hh is a bijection. Since VGV_{G} and VHV_{H} are of the same size, it suffices to show that hh is an injection. Clearly, h(a1,f1)h(a2,f2)h(a_{1},f_{1})\neq h(a_{2},f_{2}) whenever a1a2a_{1}\neq a_{2}. We can thus focus on h(a,f1)h(a,f_{1}) and h(a,f2)h(a,f_{2}) with f1f2f_{1}\neq f_{2}. Then, f1f_{1} and f2f_{2} differ in at least one edge eEae\in E_{a} and for this edge:

    f1(e)=f1(e)+𝗈𝖼𝖼W(e)mod2f2(e)+𝗈𝖼𝖼W(e)mod2=f2(e).f_{1}^{\prime}(e)=f_{1}(e)+\mathsf{occ}_{W}(e)\mod 2\neq f_{2}(e)+\mathsf{occ}_{W}(e)\mod 2=f_{2}^{\prime}(e).

    for any walk WW. In particular, this holds for both walks used in the definition of hh: W(r)W^{(r)}, used when ata\neq t, and W(r),WW^{(r)},W^{\prime} used when a=ta=t. Hence, hh is indeed a bijection.

  • (ii)

    hh is consistent. For each j𝖽𝗈𝗆(p(r+1))j\in\mathsf{dom}(p^{(r+1)}) with jij\neq i, let p(r+1)=(aj,fj,aj,fj)p^{(r+1)}=(a_{j},f_{j},a_{j},f_{j}^{\prime}). Now, by induction, W(r)W^{(r)} ended in a vertex tt distinct from any of these aja_{j}’s and thus none of these aja_{j}’s are equal to tt. This implies that h(aj,fj)=(aj,fj′′)h(a_{j},f_{j})=(a_{j},f_{j}^{\prime\prime}) with fj′′(e)=fj(e)+𝗈𝖼𝖼W(r)(e)mod2f_{j}^{\prime\prime}(e)=f_{j}(e)+\mathsf{occ}_{W^{(r)}}(e)\mod 2. But this is precisely how p(r)p^{(r)} placed its pebbles, by induction. Hence, fj′′(e)=fj(e)f_{j}^{\prime\prime}(e)=f_{j}^{\prime}(e) and thus hh is consistent.

  • (iii)

    p(r+1)p^{(r+1)} induces a partial isomorphism. After the Spoiler picked an element (ai,fi)VG(a_{i},f_{i})\in V_{G}, we now know that p(r+1)(j)=(aj,fj,h(aj,fj))p^{(r+1)}(j)=(a_{j},f_{j},h(a_{j},f_{j})) for all j𝖽𝗈𝗆(p(r+1))j\in\mathsf{dom}(p^{(r+1)}). We recall that hh is defined in two possible ways, using two distinct walks: W(r)W^{(r)}, for vertices in VGV_{G} not involving tt, or, otherwise using the walk W(r),WW^{(r)},W^{\prime}, for vertices in VGV_{G} involving tt.

    Hence, when all aja_{j}’s for p(r+1)p^{(r+1)} are distinct from tt, then h(aj,fj)=(aj,fj)h(a_{j},f_{j})=(a_{j},f_{j}^{\prime}) with fj(e)=fj(e)+𝗈𝖼𝖼W(r)(e)mod2f_{j}^{\prime}(e)=f_{j}(e)+\mathsf{occ}_{W^{(r)}}(e)\mod 2 and we can simply take the new walk W(r+1)W^{(r+1)} to be W(r)W^{(r)}. Then, Lemma 2 implies that the mapping (aj,fj)h(aj,fj)(a_{j},f_{j})\to h(a_{j},f_{j}), for j𝖽𝗈𝗆(p(r+1))j\in\mathsf{dom}(p^{(r+1)}) is a partial isomorphism from GG to HH, as desired.

    Otherwise, we know that ajta_{j}\neq t for jij\neq i but ai=ta_{i}=t. That is, the Spoiler places the iith pebble on a vertex of the form (t,f)(t,f) in VGV_{G}. We now have that hh is defined in two ways for the pebbled elements using the two distinct walks. We next show that W(r),WW^{(r)},W^{\prime} can be used for both types of pebbled elements in p(r+1)p^{(r+1)}, those of the form (aj,f)(a_{j},f) with ajta_{j}\neq t and (t,f)(t,f). For the last type this is obvious since we defined h(t,f)h(t,f) in terms of W(r),WW^{(r)},W^{\prime}. For the former type, we note that ajMa_{j}\not\in M and ajMa_{j}\not\in M^{\prime} for jij\neq i. If we take an edge eNaje\in N_{a_{j}}, then 𝗈𝖼𝖼Wr,W(e)=𝗈𝖼𝖼W(r)(e)\mathsf{occ}_{W^{r},W^{\prime}}(e)=\mathsf{occ}_{W^{(r)}}(e) because WW^{\prime} lies entirely in MM and MM^{\prime}. As a consequence, for (aj,fj)(a_{j},f_{j}) with jij\neq i, for all eNje\in N_{j}:

    fj(e)\displaystyle f^{\prime}_{j}(e) =fj(e)+𝗈𝖼𝖼W(r)(e)mod2\displaystyle=f_{j}(e)+\mathsf{occ}_{W^{(r)}}(e)\mod 2
    =fj(e)+𝗈𝖼𝖼W(r),W(e)mod2.\displaystyle=f_{j}(e)+\mathsf{occ}_{W^{(r)},W^{\prime}}(e)\mod 2.

    Then, Lemma 2 implies that the mapping (aj,fj)h(aj,fj)(a_{j},f_{j})\to h(a_{j},f_{j}), for j𝖽𝗈𝗆(p(r+1))j\in\mathsf{dom}(p^{(r+1)}) is a partial isomorphism from GG to HH, because we can use the same walk W(r),WW^{(r),W^{\prime}} for all pebbled vertices.

B.4 Proof of Proposition 3

We show that no finite set \mathcal{F} of patterns suffices for -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} to be equivalent to k-𝖶𝖫k\text{-}\mathsf{WL}, for k>1k>1, in terms of expressive power. The proof is by contradiction. That is, suppose that there exists a set \mathcal{F} such that G-𝖶𝖫HGk-𝖶𝖫HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H\Leftrightarrow G\equiv_{k\text{-}\mathsf{WL}}H for any two graphs GG and HH. In particular, G-𝖶𝖫HGk-𝖶𝖫HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H\Rightarrow G\equiv_{k\text{-}\mathsf{WL}}H and thus also G-𝖶𝖫HG2-𝖶𝖫HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H\Rightarrow G\equiv_{2\text{-}\mathsf{WL}}H, since the 2-𝖶𝖫2\text{-}\mathsf{WL}-test is upper bounded by any k-𝖶𝖫k\text{-}\mathsf{WL}-test for k>2k>2. We argue that no finite set \mathcal{F} exists satisfying G-𝖶𝖫HG2-𝖶𝖫HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H\Rightarrow G\equiv_{2\text{-}\mathsf{WL}}H.

Let mm denote the maximum number of vertices of any pattern in \mathcal{F}.666Strictly speaking, we can use the diameter of any pattern in \mathcal{F} instead, but it is easier to convey the proof simply by taking number of vertices. Furthermore, consider graphs GG and HH, where GG is the disjoint union of m+2m+2 copies of the cycle Cm+1C_{m+1}, and HH is the union of m+1m+1 copies of the cycle Cm+2C_{m+2}. Note that GG and HH have the same number of vertices.

We observe that any homomorphism from a pattern PrP^{r} in \mathcal{F} to GvG^{v} or HwH^{w}, for vertices vVGv\in V_{G} and wVHw\in V_{H}, maps PrP^{r} to either a copy of Cm+1C_{m+1} (for GG) or a copy of Cm+2C_{m+2} (for HH). Furthermore, any such homomorphism maps PrP^{r} in a subgraph of Cm+1C_{m+1} or Cm+2C_{m+2}, consisting of at most mm vertices. There is, however, a unique (up to isomorphism) subgraph of mm vertices in Cm+1C_{m+1} and Cm+2C_{m+2}. Indeed, such subgraphs will be a path of length mm. This implies that 𝗁𝗈𝗆(Pr,Gv)=𝗁𝗈𝗆(Pr,Hw)\mathsf{hom}(P^{r},G^{v})=\mathsf{hom}(P^{r},H^{w}) for any vVGv\in V_{G} and wVHw\in V_{H}. Since the argument holds for any pattern PrP^{r} in \mathcal{F}, all vertices in GG and HH will have the same homomorphism count for patterns in \mathcal{F}. Furthermore, since both GG and HH are regular graphs (each vertex has degree two), this implies that -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} cannot distinguish between GG and HH. This is formalised in the following lemma. We recall that a tt-regular graph is a graph in which every vertex has degree tt.

Lemma 3.

For any set \mathcal{F} of patterns and any two tt-regular (unlabelled) graphs GG and HH such that 𝗁𝗈𝗆(Pr,Xx)=𝗁𝗈𝗆(Pr,Yy)\mathsf{hom}(P^{r},X^{x})=\mathsf{hom}(P^{r},Y^{y}) for PrP^{r}\in\mathcal{F}, X,Y{G,H}X,Y\in\{G,H\}, xVXx\in V_{X} and yVYy\in V_{Y} holds, G-𝖶𝖫HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H.

Proof.

The lemma is readily verified by induction on the number dd of rounds of -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}. We show a stronger result in that χ,X,x(d)=χ,Y,y(d)\chi_{\mathcal{F},X,x}^{(d)}=\chi_{\mathcal{F},Y,y}^{(d)} for any dd, X,Y{G,H}X,Y\in\{G,H\}, xVXx\in V_{X} and yVYy\in V_{Y}, from which G-𝖶𝖫HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H follows. By our Theorem 1, it suffices to show that 𝗁𝗈𝗆(Tr,Xx)=𝗁𝗈𝗆(Tr,Yy)\mathsf{hom}(T^{r},X^{x})=\mathsf{hom}(T^{r},Y^{y}) for \mathcal{F}-pattern trees of depth at most dd. Let ={P1r,,Pr}\mathcal{F}=\{P_{1}^{r},\ldots,P_{\ell}^{r}\}. For the base case, let TrT^{r} be a join pattern 𝐬\mathcal{F}^{\mathbf{s}} for some 𝐬=(s1,,s)\mathbf{s}=(s_{1},\ldots,s_{\ell})\in\mathbb{N}^{\ell}. Then,

𝗁𝗈𝗆(Tr,Xx)=i=1(𝗁𝗈𝗆(Pir,Xx))si=i=1(𝗁𝗈𝗆(Pir,Yy))si=𝗁𝗈𝗆(Tr,Yy),\displaystyle\mathsf{hom}(T^{r},X^{x})=\prod_{i=1}^{\ell}\left(\mathsf{hom}(P_{i}^{r},X^{x})\right)^{s_{i}}=\prod_{i=1}^{\ell}\left(\mathsf{hom}(P_{i}^{r},Y^{y})\right)^{s_{i}}=\mathsf{hom}(T^{r},Y^{y}),

since 𝗁𝗈𝗆(Pir,Xx)=𝗁𝗈𝗆(Pir,Yy)\mathsf{hom}(P_{i}^{r},X^{x})=\mathsf{hom}(P_{i}^{r},Y^{y}) for any PirP_{i}^{r}\in\mathcal{F}. Then, for the inductive case, assume that 𝗁𝗈𝗆(Sr,Xx)=𝗁𝗈𝗆(Sr,Yy)\mathsf{hom}(S^{r},X^{x})=\mathsf{hom}(S^{r},Y^{y}) for any \mathcal{F}-pattern tree SrS^{r} of depth at most d1d-1, X,Y{G,H}X,Y\in\{G,H\}, xVXx\in V_{X} and yVYy\in V_{Y}, and consider an \mathcal{F}-pattern TrT^{r} of depth dd. Let S1c1,,SpcpS_{1}^{c_{1}},\ldots,S_{p}^{c_{p}} be the \mathcal{F}-pattern trees of depth at most d1d-1 rooted at the children c1,,cpc_{1},\ldots,c_{p} of rr in the backbone of TrT^{r}. As before, let 𝐬\mathcal{F}^{\mathbf{s}} the pattern joined at rr in TrT^{r}. Then,

𝗁𝗈𝗆(Tr,Xx)\displaystyle\mathsf{hom}(T^{r},X^{x}) =𝗁𝗈𝗆(𝐬,Xx)i=1pxNX(x)𝗁𝗈𝗆(Sici,Xx)=𝗁𝗈𝗆(𝐬,Xx)i=1pt𝗁𝗈𝗆(Sici,Xx~)\displaystyle=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},X^{x})\prod_{i=1}^{p}\sum_{x^{\prime}\in N_{X}(x)}\mathsf{hom}(S_{i}^{c_{i}},X^{x^{\prime}})=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},X^{x})\prod_{i=1}^{p}t\cdot\mathsf{hom}(S_{i}^{c_{i}},X^{\tilde{x}})
=𝗁𝗈𝗆(𝐬,Yy)i=1pt𝗁𝗈𝗆(Sici,Yy~)=𝗁𝗈𝗆(𝐬,Yy)i=1pyNY(y)𝗁𝗈𝗆(Sici,Yy)\displaystyle=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},Y^{y})\prod_{i=1}^{p}t\cdot\mathsf{hom}(S_{i}^{c_{i}},Y^{\tilde{y}})=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},Y^{y})\prod_{i=1}^{p}\sum_{y^{\prime}\in N_{Y}(y)}\mathsf{hom}(S_{i}^{c_{i}},Y^{y^{\prime}})
=𝗁𝗈𝗆(Tr,Yy),\displaystyle=\mathsf{hom}(T^{r},Y^{y}),

where we used that NX(x)N_{X}(x) and NY(y)N_{Y}(y) both consists of tt vertices (regularity), by the induction hypothesis all vertex have the same homomorphism counts for \mathcal{F}-patterns trees of depth at most d1d-1, and where x~\tilde{x} and y~\tilde{y} are taken to be arbitrary vertices in NX(x)N_{X}(x) and NY(y)N_{Y}(y), respectively. ∎

Hence, since GG and HH are 22-regular and satisfy the conditions of the lemma, we may indeed infer that G-𝖶𝖫HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H. We note, however, that G2-𝖶𝖫HG\not\equiv_{2\text{-}\mathsf{WL}}H. Indeed, from Dvorak (2010) and Dell et al. (2018) we know that G2-𝖶𝖫HG\equiv_{2\text{-}\mathsf{WL}}H implies that 𝗁𝗈𝗆(P,G)=𝗁𝗈𝗆(P,H)\mathsf{hom}(P,G)=\mathsf{hom}(P,H) for any graph PP of treewidth at most two. In particular, G2-𝖶𝖫HG\equiv_{2\text{-}\mathsf{WL}}H implies that 𝗁𝗈𝗆(C,G)=𝗁𝗈𝗆(C,H)\mathsf{hom}(C_{\ell},G)=\mathsf{hom}(C_{\ell},H) for all cycles CC_{\ell}. We now conclude by observing that 𝗁𝗈𝗆(Cm+1,G)𝗁𝗈𝗆(Cm+1,H)\mathsf{hom}(C_{m+1},G)\neq\mathsf{hom}(C_{m+1},H) by construction. We have thus found two graphs with cannot be distinguished by -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}, but that can be distinguished by 2-𝖶𝖫2\text{-}\mathsf{WL}, contradicting our assumption that G-𝖶𝖫HG2-𝖶𝖫HG\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H\Rightarrow G\equiv_{2\text{-}\mathsf{WL}}H.

Appendix C Proofs of Section 5

C.1 Proof of Proposition 4

We show that for any k>3k>3, {C3r,,Ckr}-𝖶𝖫\{C^{r}_{3},\dots,C^{r}_{k}\}\text{-}\mathsf{WL} is more expressive than {C3r,,Ck1r}-𝖶𝖫\{C^{r}_{3},\dots,C^{r}_{k-1}\}\text{-}\mathsf{WL}. More precisely, we construct two graphs GG and HH such that GG and HH cannot be distinguished by {C3r,,Ck1r}-𝖶𝖫\{C^{r}_{3},\dots,C^{r}_{k-1}\}\text{-}\mathsf{WL}, but they can be distinguished by {C3r,,Ckr}-𝖶𝖫\{C^{r}_{3},\dots,C^{r}_{k}\}\text{-}\mathsf{WL}.

The proof is analogous to the proof of Proposition 3. Indeed, it suffices to let GG consist of kk disjoint copies of Ck+1C_{k+1} and HH to consist of k+1k+1 disjoint copies of CkC_{k}. Then, as observed in the proof of Proposition 3, GG and HH will be indistinguishable by {C3r,,Ck1r}-𝖶𝖫\{C^{r}_{3},\dots,C^{r}_{k-1}\}\text{-}\mathsf{WL} simply because each pattern has at most k1k-1 vertices. Yet, by construction, 𝗁𝗈𝗆(Ck,G)𝗁𝗈𝗆(Ck,H)\mathsf{hom}(C_{k},G)\neq\mathsf{hom}(C_{k},H) and thus GG and HH are distinguishable (already by the initial labelling) by {C3r,,Ckr}-𝖶𝖫\{C^{r}_{3},\dots,C^{r}_{k}\}\text{-}\mathsf{WL}.

C.2 Proof of Proposition 5

Let Pr=P1rP2rP^{r}=P_{1}^{r}\star P_{2}^{r} be a pattern that is the join of two smaller patterns. We show that for any any set \mathcal{F} of patterns, we have that {Pr}-𝖶𝖫\mathcal{F}\cup\{P^{r}\}\text{-}\mathsf{WL} is upper bounded by {P1r,P2r}-𝖶𝖫\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\}\text{-}\mathsf{WL}. That is, for every two graphs GG and HH, G{P1r,P2r}-𝖶𝖫HG\equiv_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\}\text{-}\mathsf{WL}}H implies G{Pr}-𝖶𝖫HG\equiv_{\mathcal{F}\cup\{P^{r}\}\text{-}\mathsf{WL}}H. By definition, G{P1r,P2r}-𝖶𝖫HG\equiv_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\}\text{-}\mathsf{WL}}H is equivalent to {{χ{P1r,P2r},G,v(d)vVG}}={{χ{P1r,P2r},H,w(d)wVH}}\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}\mid w\in V_{H}\}\!\!\}. In other words, with every vVGv\in V_{G} we can associate a unique wVHw\in V_{H} such that χ{P1r,P2r},G,v(d)=χ{P1r,P2r},H,w(d)\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}=\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}. We show, by induction on dd, that this implies that χ{Pr},G,v(d)=χ{Pr},H,w(d)\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(d)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(d)}. This suffices to conclude that {{χ{Pr},G,v(d)vVG}}={{χ{Pr},H,w(d)wVH}}\{\!\!\{\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(d)}\mid w\in V_{H}\}\!\!\} and thus G{Pt}-𝖶𝖫HG\equiv_{\mathcal{F}\cup\{P^{t}\}\text{-}\mathsf{WL}}H.

Base case. We show that {{χ{P1r,P2r},G,v(d)vVG}}={{χ{P1r,P2r},H,w(d)wVH}}\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}\mid w\in V_{H}\}\!\!\} implies that with every vVGv\in V_{G} we can associate a unique wVHw\in V_{H} satisfying χ{Pr},G,v(0)=χ{Pr},H,w(0)\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(0)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(0)}. Indeed, as already observed, {{χ{P1r,P2r},G,v(d)vVG}}={{χ{P1r,P2r},H,w(d)wVH}}\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}\mid w\in V_{H}\}\!\!\} implies that with every vVGv\in V_{G} we can associate a unique wVHw\in V_{H} such that χ{P1r,P2r},G,v(d)=χ{P1r,P2r},H,w(d)\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}=\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}. This in turn implies that χ{P1r,P2r},G,v(0)=χ{P1r,P2r},H,w(0)\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(0)}=\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(0)}, which implies that 𝗁𝗈𝗆(P1r,Gv)=𝗁𝗈𝗆(P1r,Hw)\mathsf{hom}(P_{1}^{r},G^{v})=\mathsf{hom}(P_{1}^{r},H^{w}) and 𝗁𝗈𝗆(P2r,Gv)=𝗁𝗈𝗆(P2r,Hw)\mathsf{hom}(P_{2}^{r},G^{v})=\mathsf{hom}(P_{2}^{r},H^{w}) and 𝗁𝗈𝗆(Qr,Gv)=𝗁𝗈𝗆(Qr,Hw)\mathsf{hom}(Q^{r},G^{v})=\mathsf{hom}(Q^{r},H^{w}) for every QrQ^{r}\in\mathcal{F}. As a consequence, from properties of the graph join operators, since Pr=P1rP2rP^{r}=P_{1}^{r}\star P_{2}^{r}:

𝗁𝗈𝗆(Pr,Gv)=𝗁𝗈𝗆(P1r,Gv)𝗁𝗈𝗆(P2r,Gv)=𝗁𝗈𝗆(P1r,Hw)𝗁𝗈𝗆(P2r,Hw)=𝗁𝗈𝗆(Pr,Hw),\mathsf{hom}(P^{r},G^{v})=\mathsf{hom}(P_{1}^{r},G^{v})\cdot\mathsf{hom}(P_{2}^{r},G^{v})=\mathsf{hom}(P_{1}^{r},H^{w})\cdot\mathsf{hom}(P_{2}^{r},H^{w})=\mathsf{hom}(P^{r},H^{w}),

and thus also χ{Pr},G,v(0)=χ{Pr},H,w(0)\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(0)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(0)}.

Inductive case. We assume that {{χ{P1r,P2r},G,v(d)vVG}}={{χ{P1r,P2r},H,w(d)wVH}}\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}\mid w\in V_{H}\}\!\!\} implies χ{Pr},G,v(e)=χ{Pr},H,w(e)\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(e)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(e)}, and want to show that it also implies χ{Pr},G,v(e+1)=χ{Pr},H,w(e+1)\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(e+1)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(e+1)}. We again use the fact that we can associate with every vVGv\in V_{G} a unique vertex wVHw\in V_{H} such that χ{P1r,P2r},G,v(d)=χ{P1r,P2r},H,w(d)\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}=\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}. In particular, this implies that χ{P1r,P2r},G,v(e)=χ{P1r,P2r},H,w(e)\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(e)}=\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(e)} and χ{P1r,P2r},G,v(e+1)=χ{P1r,P2r},H,w(e+1)\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(e+1)}=\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(e+1)}. From the definition of the -𝖶𝖫\text{-}\mathsf{WL}-test, it must also be the case that the multisets {χ{P1r,P2r},G,v(e)vNG(v)}\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v^{\prime}}^{(e)}\mid v^{\prime}\in N_{G}(v)\} and {χ{P1r,P2r},H,w(e)vNH(w)}\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w^{\prime}}^{(e)}\mid v^{\prime}\in N_{H}(w)\} must be equal as well, i.e., we can find a one-to-one corresponence between neighbors of vv in GG and neighbors of ww in HH that have the same label. From the induction hypothesis we then have that χ{Pr},G,v(e)=χ{Pr},H,w(e)\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(e)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(e)} and also that the multisets {χ{Pr},G,v(e)vNG(v)}\{\chi_{\mathcal{F}\cup\{P^{r}\},G,v^{\prime}}^{(e)}\mid v^{\prime}\in N_{G}(v)\} and {χ{Pr},H,w(e)vNH(w)}\{\chi_{\mathcal{F}\cup\{P^{r}\},H,w^{\prime}}^{(e)}\mid v^{\prime}\in N_{H}(w)\} are equal, which implies, by the definition of the 𝖶𝖫\mathsf{WL}-test, that χ{Pr},G,v(e+1)=χ{Pr},H,w(e+1)\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(e+1)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(e+1)}, as was to be shown.

C.3 Proof of Theorem 3

We show that {Qr}-𝖶𝖫\mathcal{F}\cup\{Q^{r}\}\text{-}\mathsf{WL}, where QrQ^{r} is pattern whose core has treewidth kk, is more expressive than -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} if every pattern PrP^{r}\in\mathcal{F} satisfies one of the following conditions: (i) PrP^{r} has treewidth <k<k; or (ii)  PrP^{r} does not map homomorphically to QrQ^{r}.

Let c(Q)rc(Q)^{r} to denote the (rooted) core of QQ, in which the root of c(Q)rc(Q)^{r} is any vertex which is the image of the root of QrQ^{r} in a homomorphism from QrQ^{r} to c(Q)rc(Q)^{r}. By assumption, c(Q)rc(Q)^{r} has treewidth kk.

Clearly, -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} is upper bounded by {Qr}-𝖶𝖫\mathcal{F}\cup\{Q^{r}\}\text{-}\mathsf{WL}. Thus, all we need for the proof is to find two graphs that are indistinguishable by -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL} but are in fact distinguished by {Qr}-𝖶𝖫\mathcal{F}\cup\{Q^{r}\}\text{-}\mathsf{WL}.

Those two graphs are, in fact, the graphs GG and HH constructed for c(Q)rc(Q)^{r} (of treewidth kk) in the proof of Theorem 2. From that proof, we know that:

  1. (a)

    𝗁𝗈𝗆(c(Q),G)=0\mathsf{hom}(c(Q),G)=0 and 𝗁𝗈𝗆(c(Q),H)0\mathsf{hom}(c(Q),H)\neq 0; and

  2. (b)

    G𝖢kHG\equiv_{\mathsf{C}^{k}}H.

We note that (a) immediately implies that GG and HH can be distinguished by {Qr}-𝖶𝖫\mathcal{F}\cup\{Q^{r}\}\text{-}\mathsf{WL}. In fact, they are distinguished in already by the initial labelling in round 0. We next show that GG and HH are indistinguishable by -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}.

Let us first present a small structural result that helps us deal with patterns in \mathcal{F} satisfying the second condition of the Theorem.

Lemma 4.

If a rooted pattern PrP^{r} does not map homomorphically to QrQ^{r}, then 𝗁𝗈𝗆(P,G)=𝗁𝗈𝗆(P,H)=0\mathsf{hom}(P,G)=\mathsf{hom}(P,H)=0

Proof.

We use the following property of graphs GG and HH, which can be directly observed from their construction (and was already noted in Atserias et al. (2007) and Bova & Chen (2019)). Define GrG^{r} and HrH^{r} by setting as their root any vertex (ar,f)(a_{r},f), for ara_{r} the root of c(Q)rc(Q)^{r}. Then there is a homomorphism from GrG^{r} to c(Q)rc(Q)^{r}, and there is a homomorphism from HrH^{r} to c(Q)rc(Q)^{r}.

Now, any homomorphism hh from PrP^{r} to GG can be extended to a homomorphism from PrP^{r} to QrQ^{r}: we compose hh with the homomorphism mentioned above from GG to c(Q)rc(Q)^{r}, which by definition again maps homomorphically to QrQ^{r}. Since by definition we have that PrP^{r} does not map to QrQ^{r}, hh cannot exist. The proof for HH is analogous. ∎

Now, let \mathcal{F}^{\prime} be the set of patterns obtained by removing from \mathcal{F} all patterns which do not map homomorphically to QrQ^{r}. By Lemma 4, we have that GG and HH are distinguished by the -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}-test if and only if they are distinguished by -𝖶𝖫\mathcal{F}^{\prime}\text{-}\mathsf{WL}.

But all patterns in \mathcal{F}^{\prime} must have treewidth less than kk, and by (b) GG and HH are indistinguishable by k-𝖶𝖫k\text{-}\mathsf{WL}. Proposition 2 then implies that GG and HH are indistinguishable by -𝖶𝖫\mathcal{F}\text{-}\mathsf{WL}, as desired.

Appendix D Connections to related work

We here provide more details of how \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} connect to 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} from the literature which also augment the initial labelling.

Vertex degrees.

We first consider so-called degree-aware 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (Geerts et al., 2020) in which the message functions of the 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} may depend on the vertex degrees. The Graph Convolution Networks (𝖦𝖢𝖭\mathsf{GCN}s) (Kipf & Welling, 2017) are an example of such 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. Degree-aware 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} are known to be equivalent, in terms of expressive power, to standard 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} in which the initial labelling is extended with vertex degrees (Geerts et al., 2020). Translated to our setting, we can simply let ={[Uncaptioned image]}\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{L1.pdf}}\} since 𝗁𝗈𝗆([Uncaptioned image],Gv)\mathsf{hom}(\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{L1.pdf}},G^{v}) is equal to the degree of vertex vv in GG. When considering graphs without an initial vertex labelling (or a uniform labelling which assigns every vertex the same label), our characterisation (Theorem 1) implies G[Uncaptioned image]-𝖶𝖫(d)HG\equiv_{\raisebox{0.0pt}{\includegraphics[height=6.02777pt]{L1.pdf}}\text{-}\mathsf{WL}}^{(d)}H if and only if 𝗁𝗈𝗆(T,G)=𝗁𝗈𝗆(T,H)\mathsf{hom}(T,G)=\mathsf{hom}(T,H) for every {[Uncaptioned image]}\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{L1.pdf}}\}-pattern tree of depth at most dd. This in turn is equivalent to 𝗁𝗈𝗆(T,G)=𝗁𝗈𝗆(T,H)\mathsf{hom}(T,G)=\mathsf{hom}(T,H) for every (standard) tree of depth at most d+1d+1. Indeed, {[Uncaptioned image]}\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{L1.pdf}}\}-pattern trees of depth at most dd are simply trees of depth d+1d+1. Combining this with the characterisation of 𝖶𝖫\mathsf{WL} by Dvorak (2010) and Dell et al. (2018), we thus have for unlabelled graphs that G[Uncaptioned image]-𝖶𝖫(d)HG\equiv_{\raisebox{0.0pt}{\includegraphics[height=6.02777pt]{L1.pdf}}\text{-}\mathsf{WL}}^{(d)}H if and only if G𝖶𝖫(d+1)HG\equiv_{\mathsf{WL}}^{(d+1)}H. So, by considering ={[Uncaptioned image]}\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{L1.pdf}}\}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} one gains one round of computation compared to considering standard 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. To lift this to labeled graphs, instead of ={[Uncaptioned image]}\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{L1.pdf}}\} one has to include labeled versions of the single edge pattern, in order to count the number of neighbours of a specific label for each vertex. This is done, e.g., by Ishiguro et al. (2020), who use the 𝖶𝖫\mathsf{WL} labelling obtained after the first round to augment the initial vertex labelling. This corresponds indeed by adding 𝗁𝗈𝗆(Tr,Gv)\mathsf{hom}(T^{r},G^{v}) as feature for every labeled tree of depth one. This results in that G[Uncaptioned image]-𝖶𝖫(d)HG\equiv_{\raisebox{0.0pt}{\includegraphics[height=6.02777pt]{L1.pdf}}\text{-}\mathsf{WL}}^{(d)}H if and only if G𝖶𝖫(d+1)HG\equiv_{\mathsf{WL}}^{(d+1)}H for labelled graphs.

Walk counts.

The Graph Feature Networks by Chen et al. (2019a) can be regarded as a generalisation of the previous approach. Instead of simply adding vertex degrees, the number of walks of certain lengths emanating from vertices are added. Translated to our setting, this corresponds to considering {L2,L3,,L}\{L_{2},L_{3},\ldots,L_{\ell}\}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, where LL_{\ell} denotes a rooted path of length \ell. For unlabelled graphs, our characterisation (Theorem 1) implies that GL1,,L-𝖶𝖫(d)HG\equiv_{L_{1},\ldots,L_{\ell}\text{-}\mathsf{WL}}^{(d)}H is upper bounded by G𝖶𝖫(d+)HG\equiv_{\mathsf{WL}}^{(d+\ell)}H, simply because every {L2,L3,,L}\{L_{2},L_{3},\ldots,L_{\ell}\}-pattern tree of depth dd is a standard tree of depth at most d+d+\ell.

Cycles.

Li et al. (2019) extend 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} by varying the notion of neighbourhood over which is aggregated. One particular instance corresponds to an aggregation of features, weighted by the number of cycles of a certain length in each vertex (see discussion at the end of Section 4 in Li et al. (2019)). Translated to our setting, this corresponds to considering {C}\{C_{\ell}\}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} where CC_{\ell} denotes the cycle of length \ell. As mentioned in the main body of the paper, these extend 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and result in architectures bounded by 2-𝖶𝖫2\text{-}\mathsf{WL} (Proposition 4). This is in line with Theorem 3 from Li et al. (2019) stating that their framework strictly extends 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and thus 1-𝖶𝖫1\text{-}\mathsf{WL}.

Isomorphism counts.

Another, albeit similar, approach to add structural information to the initial labelling is taken in the paper Graph Structure Networks by Bouritsas et al. (2020). The idea there is to extend the initial features with information about how often a vertex vv appears in a subgraph of GG which is isomorphic to PP. More precisely, Bouritsas et al. (2020) consider a connected unlabelled graph PP as pattern and partition its vertex set VPV_{P} orbit-wise. That is, VP=i=1oPVPiV_{P}=\biguplus_{i=1}^{o_{P}}V_{P}^{i} where oPo_{P} denotes the number of orbits of PP. Here, v,vVPiv,v^{\prime}\in V_{P}^{i} whenever there is an automorphism hh in 𝖠𝗎𝗍(P)\mathsf{Aut}(P) mapping vv to vv^{\prime}. Next, they consider all distinct subgraphs G1,,GkG_{1},\ldots,G_{k} in GG which are isomorphic to PP, denoted by PGjP\cong G_{j} for j[k]j\in[k]. We write PfGjP\cong_{f}G_{j} when PGjP\cong G_{j} using a specific isomorphism ff. Then for each orbit partition i[oP]i\in[o_{P}] and vertex vVv\in V, they define:

𝗂𝗌𝗈(P,G,v,i)=|{GjPvVGj,and there exists an f s.t. GjfP and f(v)VPi,j[k]}|.\mathsf{iso}(P,G,v,i)=|\{G_{j}\cong P\mid v\in V_{G_{j}},\text{and there exists an $f$ s.t. }G_{j}\cong_{f}P\text{ and }f(v)\in V_{P}^{i},j\in[k]\}|.

That is, the number of subgraphs GjG_{j} in GG that can be isomorphically mapped to PP are counted, provided that this can be done by an isomorphism which maps vertex vv in GjG_{j} (and thus GG) to one of the vertices in the iith orbit partition VPiV_{P}^{i} of the pattern. A similar notion is proposed for edges, which we will not consider here. Similar to our extended features, the initial features of each vertex vv is then augmented with (𝗂𝗌𝗈(P,Gv,i)P,i[oP])\big{(}\mathsf{iso}(P,G^{v},i)\mid P\in\mathcal{F},i\in[o_{P}]\big{)} for some set \mathcal{F} of patterns. Standard 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} are executed on these augmented initial features. We refer to Bouritsas et al. (2020) for more details.

We can view the above approach as an instance of our framework. Indeed, given a pattern PP in \mathcal{F}, for each orbit partition, we replace PP by a different rooted version PriP^{r_{i}}, where rir_{i} is a vertex in VPiV_{P}^{i}. Which vertex in the orbit under consideration is selected as root is not important (because they are equivalent by definition of orbit). We then see that the standard notion of subgraph isomorphism counting directly translates to the quantity used in Bouritsas et al. (2020):

𝗌𝗎𝖻(Pri,Gv):=number of subgraphs in G containing v, isomorphic to Pri=𝗂𝗌𝗈(P,G,v,i).\mathsf{sub}(P^{r_{i}},G^{v}):=\text{number of subgraphs in $G$ containing $v$, isomorphic to $P^{r_{i}}$}=\mathsf{iso}(P,G,v,i).

It thus remains to express 𝗌𝗎𝖻(Pri,Gv)\mathsf{sub}(P^{r_{i}},G^{v}) in terms of homomorphism counts. This, however, follows from Curticapean et al. (2017) in which it is shown that 𝗂𝗌𝗈(Pri,Gv)\mathsf{iso}(P^{r_{i}},G^{v}) can be computed by a linear combination of 𝗁𝗈𝗆(Qri,Gv)\mathsf{hom}(Q^{r_{i}},G^{v}) where QriQ^{r_{i}} ranges over all graphs on which PriP^{r_{i}} can be mapped by means of a surjective homomorphism. For a given PriP^{r_{i}}, the finite set of such patterns is called the spasm of PriP^{r_{i}} in Curticapean et al. (2017) and can be easily computed.

In summary, given the set \mathcal{F} of patterns in Bouritsas et al. (2020), we first replace every PP\in\mathcal{F} by its rooted versions PriP^{r_{i}}, for i[oP]i\in[o_{P}], and then expand the resulting set of rooted patterns, by the spasms of each of these patterns. Let us denote by \mathcal{F}^{*} the final set of rooted patterns. It now follows that 𝗁𝗈𝗆(Qr,Gv)\mathsf{hom}(Q^{r},G^{v}) for QrQ^{r}\in\mathcal{F}^{*} provides sufficient information to extract 𝗌𝗎𝖻(Pri,Gv)\mathsf{sub}(P^{r_{i}},G^{v}) and thus also 𝗂𝗌𝗈(P,G,v,i)\mathsf{iso}(P,G,v,i) for every PP\in\mathcal{F} and orbit part i[oP]i\in[o_{P}]. As a consequence, the 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} from Bouritsas et al. (2020) are bounded by \mathcal{F}^{*}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and thus -𝖶𝖫\mathcal{F}^{*}\text{-}\mathsf{WL}. Conversely, given an \mathcal{F}-𝖬𝖯𝖭𝖭\mathsf{MPNN} one can, again using results by Curticapean et al. (2017), define a set +\mathcal{F}^{+} of patterns, such that the subgraph isomorphism counts of patterns in +\mathcal{F}^{+} can be used to compute the homomorphism counts of patterns in \mathcal{F}. Hence, \mathcal{F}-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} are upper bounded by the 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} considered in Bouritsas et al. (2020) using patterns in +\mathcal{F}^{+}. This is in agreement with Curticapean et al. (2017) in which it is shown that homomorphism counts, subgraph isomorphism counts and other notions of pattern counts are all interchangeable. Nevertheless, by using homomorphism counts one can gracefully extend known results about 𝖶𝖫\mathsf{WL} and 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, as we have shown in the paper, and add little overhead.

Appendix E Additional experimental information

E.1 Experimental setup

One of the crucial questions when studying the effect of adding structural information to the initial vertex labels is whether these additional labels enhance the performance of graph neural networks. In order to reduce the effect of specific implementation details of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} and choice of hyper-parameters, we start from the 𝖦𝖭𝖭\mathsf{GNN} implementations and choices made in the benchmark by Dwivedi et al. (2020)777The original implementations can be found on https://github.com/graphdeeplearning/benchmarking-gnns. and only change the initial vertex labels, while leaving the 𝖦𝖭𝖭s\mathsf{GNN}\text{s} themselves unchanged. This ensures that we only measure the effect of augmenting initial features with homomorphism counts. We will use the 𝖦𝖭𝖭s\mathsf{GNN}\text{s} from the benchmark, without extended features, as our baselines. For the same reasons, we use datasets proposed in the benchmark for their ability to statistically separate the performance of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. All other parameters are taken as in Dwivedi et al. (2020) and we refer to that paper for more details.

Selected 𝗚𝗡𝗡s\mathsf{GNN}\text{s}

Dwivedi et al. (2020) divide the benchmarked 𝖦𝖭𝖭s\mathsf{GNN}\text{s} into two classes: the 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and the “theoretically designed” 𝖶𝖫\mathsf{WL}-𝖦𝖭𝖭s\mathsf{GNN}\text{s}. The first class is found to perform stronger and train faster. Hence, we chose to include the five following 𝖬𝖯𝖭𝖭\mathsf{MPNN} models from the benchmark:

  • Graph Attention Network (𝖦𝖠𝖳\mathsf{GAT}) as described in Velickovic et al. (2018)

  • Graph Convolutional Network (𝖦𝖢𝖭\mathsf{GCN}) as described in Kipf & Welling (2017)

  • 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} as described in Hamilton et al. (2017)

  • Mixed Model Convolutional Networks (𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet}) as described in Monti et al. (2017)

  • 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭\mathsf{GatedGCN} as described in Bresson & Laurent (2017).

For 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭\mathsf{GatedGCN} we used the version in which positional encoding (Belkin & Niyogi, 2003) is added to the vertex features, as it is empirically shown to be the strongest performing version of this model by for the selected datasets (Dwivedi et al., 2020). We denote this version by 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭E,PE\mathsf{GatedGCN}_{E,PE}, referring to the presence of edge features and this positional encoding. Details, background and a mathematical formalisation of the message passing layers of these models can be found in the supplementary material of Dwivedi et al. (2020).

As explained in the experimental section of the main paper, we enhance the vertex features with the log-normalised counts of the chosen patterns in every vertex of every graph of every dataset. The first layers of some models of (Dwivedi et al., 2020) are adapted to take in this variation in input size. All other layers where left identical to their original implementation as provided by Dwivedi et al. (2020).

Hardware, compute and resources

All models for ZINC, PATTERN and COLLAB were trained on a GeForce GTX 1080 337 Ti GPU, for CLUSTER a Tesla V100-SXM3-32GB GPU was used. Tables 7, 10, 13 and 16 report the training times for all combination of models and additional feature set. A rough estimate of the CO2 emissions based on the total computing times of reported experiments (2 0742\,074 hours GeForce GTX 1080, 372372 hours Tesla V100-SXM3-32GB), the computing times of not-included experiments (1 0371\,037 hours GeForce GTX 1080, 181181 hours Tesla V100-SXM3-32GB), the GPU types (GeForce GTX 1080, Tesla V100-SXM3-32GB) and the geographical location of our cluster results in a carbon emission of 135135 kg CO2 equivalent. This estimation was conducted using the MachineLearning Impact calculator presented in Lacoste et al. (2019).

E.2 Graph learning tasks

We here report the full results of our experimental evaluation for graph regression (Section E.2.1), link prediction (Section E.2.2) and vertex classification (Section E.2.3) as considered in Dwivedi et al. (2020). More precisely, a full listing of the patterns and combinations used and the obtained results for the test sets can be found in Tables 5, 8, 11 and 14. Average training time (in hours) and the number of epochs are reported in Tables 7, 10, 13 and 16. Finally, the total number of model parameters are reported in Tables 6, 9, 12 and 15. All averages and standard deviations are over 4 runs with different random seeds. The main take-aways from these results are included in the main paper.

E.2.1 Graph regression with the ZINC dataset

Just as in Dwivedi et al. (2020) we use a subset (1212K) of ZINC molecular graphs (250250K) dataset (Irwin et al., 2012b) to regress a molecular property known as the constrained solubility. For each molecular graph, the vertex features are the types of heavy atoms and the edge features are the types of bonds between them. The following are taken from Dwivedi et al. (2020):
Splitting. ZINC has 10 00010\,000 train, 1 0001\,000 validation and 1 0001\,000 test graphs.
Training.888Here and in the next tasks we are using the parameters used in the code accompanying Dwivedi et al. (2020). In the paper, slightly different parameters are used. For the learning rate strategy, an initial learning rate is set to 5×1055\times 10^{-5} , the reduce factor is 0.50.5, and the stopping learning rate is 1×1061\times 10^{-6}, the patience value is 25 and the maximal training time is set to 1212 hours.
Performance Measure The performance measure is the mean absolute error (MAE) between the predicted and the ground truth constrained solubility for each molecular graph.
Number of layers 16 MPNN layers are used for every model.

Table 5: Full results of the mean absolute error (predicted constrained solubility vs. the ground truth) for selected cycle combinations and 𝖦𝖭𝖭s\mathsf{GNN}\text{s} on the ZINC data set. In the last two rows we compare between homomorphism counts (hom) and subgraph isomorphism counts (iso).
Pattern set \mathcal{F} 𝖦𝖠𝖳\mathsf{GAT} 𝖦𝖢𝖭\mathsf{GCN} 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} 𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭E,PE\mathsf{GatedGCN}_{E,PE}
None 0,47±\pm0,02 0,35±\pm0,01 0,25±\pm0,01 0,44±\pm0,01 0,34±\pm0,05
{C3}\{C_{3}\} 0,45±\pm0,01 0,36±\pm0,01 0,25±\pm0,00 0,44±\pm0,00 0,30±\pm0,01
{C4}\{C_{4}\} 0,34±\pm0,02 0,29±\pm0,02 0,26±\pm0,01 0,30±\pm0,01 0,27±\pm0,06
{C5}\{C_{5}\} 0,44±\pm0,02 0,34±\pm0,02 0,23±\pm0,01 0,42±\pm0,01 0,27±\pm0,03
{C6}\{C_{6}\} 0,31±\pm0,00 0,27±\pm0,02 0,25±\pm0,01 0,30±\pm0,01 0,26±\pm0,09
{C3,C4}\{C_{3},C_{4}\} 0,33±\pm0,01 0,27±\pm0,01 0,24±\pm0,02 0,32±\pm0,01 0,23±\pm0,03
{C5,C6}\{C_{5},C_{6}\} 0,28±\pm0,01 0,26±\pm0,01 0,23±\pm0,01 0,28±\pm0,01 0,20±\pm0,03
{C4,C5,C6}\{C_{4},C_{5},C_{6}\} 0,24±\pm0,00 0,21±\pm0,00 0,20±\pm0,00 0,25±\pm0,01 0,16±\pm0,02
{C3,C4,C5,C6}\{C_{3},C_{4},C_{5},C_{6}\} 0,23±\pm0,00 0,21±\pm0,00 0,20±\pm0,01 0,26±\pm0,02 0,18±\pm0,02
{C3,,C10}\{C_{3},\ldots,C_{10}\} (hom) 0,22±\pm0,01 0,20±\pm0,00 0,19±\pm0,00 0,2376±\pm0,01 0,1352±\pm0,01
{C3,,C10}\{C_{3},\ldots,C_{10}\} (iso) 0,24±\pm0,01 0,22±\pm0,01 0,16±\pm0,01 0,2408±\pm0,01 0,1357 ±\pm 0,01
Table 6: Total model parameters for selected cycle combinations and 𝖦𝖭𝖭s\mathsf{GNN}\text{s} on the ZINC data set. In the last two rows we compare between homomorphism counts (hom) and subgraph isomorphism counts (iso).
Pattern set \mathcal{F} 𝖦𝖠𝖳\mathsf{GAT} 𝖦𝖢𝖭\mathsf{GCN} 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} 𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭E,PE\mathsf{GatedGCN}_{E,PE}
None 358 273 360 742 388 963 401 148 408 135
{C3}\{C_{3}\} 358 417 360 887 389 071 401 238 408 205
{C4}\{C_{4}\} 358 417 360 887 389 071 401 238 408 205
{C5}\{C_{5}\} 358 417 360 887 389 071 401 238 408 205
{C6}\{C_{6}\} 358 417 360 887 389 071 401 238 408 205
{C3,C4}\{C_{3},C_{4}\} 358 561 361 032 389 179 401 328 408 275
{C5,C6}\{C_{5},C_{6}\} 358 561 361 032 389 179 401 328 408 275
{C4,C5,C6}\{C_{4},C_{5},C_{6}\} 358 705 361 177 389 287 401 418 408 345
{C3,C4,C5,C6}\{C_{3},C_{4},C_{5},C_{6}\} 358 849 361 322 389 395 401 508 408 415
{C3,,C10}\{C_{3},\ldots,C_{10}\} (hom) 359 425 361 902 389 827 401 868 408 695
{C3,,C10}\{C_{3},\ldots,C_{10}\} (iso) 359 425 361 902 389 827 401 868 408 695
Table 7: Average training time in hours and number of epochs for selected cycle combinations and 𝖦𝖭𝖭s\mathsf{GNN}\text{s} on the ZINC data set. In the last two rows we compare between homomorphism counts (hom) and subgraph isomorphism counts (iso).

Model: 𝖦𝖠𝖳\mathsf{GAT} 𝖦𝖢𝖭\mathsf{GCN} 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} 𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭E,PE\mathsf{GatedGCN}_{E,PE} Pattern set \mathcal{F} Time Epochs Time Epochs Time Epochs Time Epochs Time Epochs None 2,40 377 10,99 463 2,46 420 1,53 345 12,08 136 {C3}\{C_{3}\} 2,88 444 12,03 363 2,03 500 0,91 298 12,07 148 {C4}\{C_{4}\} 2,30 351 11,36 324 2,31 396 1,70 382 12,06 139 {C5}\{C_{5}\} 2,42 375 12,03 333 1,70 444 1,06 370 12,06 202 {C6}\{C_{6}\} 2,40 369 9,98 421 2,58 446 1,25 288 12,08 136 {C3,C4}\{C_{3},C_{4}\} 2,98 461 12,03 332 2,56 458 1,41 321 12,09 132 {C5,C6}\{C_{5},C_{6}\} 2,76 422 12,04 319 2,67 464 1,53 356 12,06 137 {C4,C5,C6}\{C_{4},C_{5},C_{6}\} 2,45 381 10,13 419 1,67 463 1,04 382 12,04 229 {C3,C4,C5,C6}\{C_{3},C_{4},C_{5},C_{6}\} 2,65 408 10,38 420 2,09 503 1,26 364 12,08 135 {C3,,C10}\{C_{3},\ldots,C_{10}\} (hom) 2,65 428 12,03 350 2,76 478 1,48 363 12,06 175 {C3,,C10}\{C_{3},\ldots,C_{10}\} (iso) 2,78 497 11,72 419 2,63 547 1,58 440 11,62 148

E.2.2 Link Prediction with the Collab dataset

Another set used in Dwivedi et al. (2020) is COLLAB, a link prediction dataset proposed by the Open Graph Benchmark (OGB) (Hu et al., 2020) corresponding to a collaboration network between approximately 235235K scientists, indexed by Microsoft Academic Graph. Vertices represent scientists and edges denote collaborations between them. For vertex features, OGB provides 128128-dimensional vectors, obtained by averaging the word embeddings of a scientist’s papers. The year and number of co-authored papers in a given year are concatenated to form edge features. The graph can also be viewed as a dynamic multi-graph, since two vertices may have multiple temporal edges between if they collaborate over multiple years. The following are taken from Dwivedi et al. (2020):

Splitting. We use the real-life training, validation and test edge splits provided by OGB. Specifically, they use collaborations until 2017 as training edges, those in 2018 as validation edges, and those in 2019 as test edges.
Training. All GNNs use the same learning rate strategy: an initial learning rate is set to 1×1031\times 10^{-3}, the reduce factor is 0.50.5, the patience value is 10, and the stopping learning rate is 1×1051\times 10^{-5}.
Performance Measure. We use the evaluator provided by OGB (Hu et al., 2020), which aims to measure a model’s ability to predict future collaboration relationships given past collaborations. Specifically, they rank each true collaboration among a set of 100 000100\,000 randomly-sampled negative collaborations, and count the ratio of positive edges that are ranked at KK-place or above (Hits@K). The value K=50K=50 as this gives the best value for statistically separating the performance of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. Number of layers 3 MPNN layers are used for every model.

Table 8: Full Results (Hits @50) for all selected pattern combinations and 𝖦𝖭𝖭s\mathsf{GNN}\text{s} on the COLLAB data set.
Pattern set \mathcal{F} 𝖦𝖠𝖳\mathsf{GAT} 𝖦𝖢𝖭\mathsf{GCN} 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} 𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭E,PE\mathsf{GatedGCN}_{E,PE}
None 50,32±\pm0,55 51,36±\pm1,30 49,81±\pm1,56 50,33±\pm0,68 51,00±\pm2,54
{K3}\{K_{3}\} 52,87±\pm0,87 53,57±\pm0,89 50,18±\pm1,38 51,10±\pm0,38 51,57±\pm0,68
{K4}\{K_{4}\} 51,33±\pm1,42 52,84±\pm1,32 51,76±\pm1,38 51,13±\pm1,60 49,43±\pm1,85
{K5}\{K_{5}\} 52,41±\pm0,89 54,60±\pm1,01 50,94±\pm1,30 51,39±\pm1,23 50,31±\pm1,59
{K3,K4}\{K_{3},K_{4}\} 52,68±\pm1,82 53,49±\pm1,35 50,88±\pm1,73 50,97±\pm0,68 51,36±\pm0,92
{K3,K4,K5}\{K_{3},K_{4},K_{5}\} 51,81±\pm1,17 54,32±\pm1,02 49,94±\pm0,23 51,01±\pm1,00 51,11±\pm1,06
Table 9: Total number of model parameters for all selected pattern combinations and 𝖦𝖭𝖭s\mathsf{GNN}\text{s} on the COLLAB data set.
Pattern set \mathcal{F} 𝖦𝖠𝖳\mathsf{GAT} 𝖦𝖢𝖭\mathsf{GCN} 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} 𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭E,PE\mathsf{GatedGCN}_{E,PE}
None 25 992 40 479 39 751 26 487 27 440
{K3}\{K_{3}\} 26 049 40 553 39 804 26 525 27 475
{K4}\{K_{4}\} 26 049 40 553 39 804 26 525 27 475
{K5}\{K_{5}\} 26 049 40 553 39 804 26 525 27 475
{K3,K4}\{K_{3},K_{4}\} 26 106 40 627 39 857 26 563 27 510
{K3,K4,K5}\{K_{3},K_{4},K_{5}\} 26 163 40 701 39 910 26 601 27 545
Table 10: Average training times and number of epochs for all selected pattern combinations and 𝖦𝖭𝖭s\mathsf{GNN}\text{s} on the COLLAB data set.

Model: 𝖦𝖠𝖳\mathsf{GAT} 𝖦𝖢𝖭\mathsf{GCN} 𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭E,PE\mathsf{GatedGCN}_{E,PE} Pattern set \mathcal{F} Time #Epochs Time #Epochs Time #Epochs Time #Epochs Time #Epochs None 0,81 167 0,85 141 1,62 190 12,05 115,67 2,22 167 {K3}\{K_{3}\} 0,67 165 0,90 153 1,70 184 12,10 67,00 2,48 186 {K4}\{K_{4}\} 1,06 188 0,95 160 2,16 188 12,04 113,50 1,26 188 {K5}\{K_{5}\} 0,50 167 1,13 165 1,04 193 12,05 124,00 1,82 174 {K3,K4}\{K_{3},K_{4}\} 1,20 189 0,86 128 2,15 189 12,05 113,25 1,51 183 {K3,K4,K5}\{K_{3},K_{4},K_{5}\} 0,44 149 0,90 134 0,98 186 12,05 124,00 1,84 177

E.2.3 Vertex classification with PATTERN and CLUSTER

Finally, also used in Dwivedi et al. (2020) are the PATTERN and CLUSTER graph data sets, generated with the Stochastic Block Model (SBM) (Abbe, 2018), which is widely used to model communities in social networks by modulating the intra- and extra-communities connections, thereby controlling the difficulty of the task. A SBM is a random graph which assigns communities to each vertex as follows: any two vertices are connected with probability pp if they belong to the same community, or they are connected with probability qq if they belong to different communities (the value of qq acts as the noise level).

For the PATTERN dataset, the goal of the vertex classification problem is the detection of a certain pattern PP embedded in a larger graph GG. The graphs in GG consist of 55 communities with sizes randomly selected between [5,35][5,35]. The parameters of the SBM for each community is p=0.5p=0.5, q=0.35q=0.35, and the vertex features in GG are generated using a uniform random distribution with a vocabulary of size 33, i.e., {0,1,2}\{0,1,2\}. Randomly, 100100 patterns PP composed of 2020 vertices with intra-probability pP=0.5p_{P}=0.5 and extra-probability qP=0.5q_{P}=0.5 are generated (i.e., 50%50\% of vertices in PP are connected to GG). The vertex features for PP are also generated randomly using values in {0,1,2}\{0,1,2\}. The graphs consist of 4444-188188 vertices. The output vertex labels have value 11 if the vertex belongs to PP and value 0 belongs to GG.

For the CLUSTER dataset, the goal of the vertex classification is the detection of which cluster a vertex belongs. Here, six SBM clusters are generated with sizes randomly selected between [5,35][5,35] and probabilities p=0.55p=0.55 and q=0.25q=0.25. The graphs consist of 4040-190190 vertices. Each vertex can take an initial feature value in range {0,1,2,,6}\{0,1,2,\ldots,6\}. If the value is ii then the vertex belongs to class i1i-1. If the value is 0, then the class of the vertex is unknown and need to be inferred. There is only one labelled vertex that is randomly assigned to each community and most vertex features are set to 0. The output vertex labels are defined as the community/cluster class labels.

The following are taken from Dwivedi et al. (2020):

Splitting The PATTERN dataset has 10 00010\,000 train, 2 0002\,000 validation and 2 0002\,000 test graphs. The CLUSTER dataset has 10 00010\,000 train, 1 0001\,000 validation and 1 0001\,000 test graphs. We save the generated splits and use the same sets in all models for fair comparison.
Training For all 𝖦𝖭𝖭s\mathsf{GNN}\text{s}, an initial learning rate is set to 1×1031\times 10^{-3}, the reduce factor is 0.50.5, the patience value is 1010, and the stopping learning rate is 1×1051\times 10^{-5} .
Performance measure The performance measure is the average vertex-level accuracy weighted with respect to the class sizes. Number of layers 16 MPNN layers are used for every model.

Table 11: Full results of the weighted accuracy for selected pattern combinations and 𝖦𝖭𝖭s\mathsf{GNN}\text{s} on the CLUSTER data set.
Pattern set \mathcal{F} 𝖦𝖠𝖳\mathsf{GAT} 𝖦𝖢𝖭\mathsf{GCN} 𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭E,PE\mathsf{GatedGCN}_{E,PE}
None 70,86±\pm0,06 70,64±\pm0,39 71,15±\pm0,33 72,25±\pm0,52 74,28±\pm0,15
{K3}\{K_{3}\} 71,60±\pm0,15 64,88±\pm4,16 72,21±\pm0,19 72,97±\pm0,23 74,14±\pm0,12
{K4}\{K_{4}\} 71,40±\pm0,24 60,64±\pm2,93 72,14±\pm0,19 72,57±\pm0,19 74,16±\pm0,24
{K5}\{K_{5}\} 71,26±\pm0,39 66,60±\pm1,47 72,34±\pm0,09 72,60±\pm0,24 74,23±\pm0,07
{K3,K4}\{K_{3},K_{4}\} 71,80±\pm0,28 50,94±\pm22,98 72,32±\pm0,27 73,03±\pm0,25 74,17±\pm0,13
{K3,K4,K5}\{K_{3},K_{4},K_{5}\} 71,63±\pm0,26 63,03±\pm3,72 72,32±\pm0,36 72,65±\pm0,13 74,03±\pm0,19
Table 12: Total number of model parameters for all selected pattern combinations and 𝖦𝖭𝖭s\mathsf{GNN}\text{s} on the CLUSTER data set.
Pattern set \mathcal{F} 𝖦𝖠𝖳\mathsf{GAT} 𝖦𝖢𝖭\mathsf{GCN} 𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭E,PE\mathsf{GatedGCN}_{E,PE}
None 395 396 362 849 399 373 386 835 406 755
None 395 396 362 849 399 373 386 835 406 755
{K3}\{K_{3}\} 395 396 362 849 399 373 386 835 406 755
{K4}\{K_{4}\} 395 548 362 995 399 463 386 943 406 825
{K5}\{K_{5}\} 395 700 363 141 399 553 387 051 406 895
{K3,K4}\{K_{3},K_{4}\} 395 700 363 141 399 553 387 051 406 895
{K3,K4,K5}\{K_{3},K_{4},K_{5}\} 396 004 363 433 399 733 387 267 407 035
Table 13: Training times (in hours) and number of epochs for all selected pattern combinations and 𝖦𝖭𝖭s\mathsf{GNN}\text{s} on the CLUSTER data set.

Model: GAT 𝖦𝖢𝖭\mathsf{GCN} 𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭E,PE\mathsf{GatedGCN}_{E,PE} Pattern set \mathcal{F} Time #Epochs Time #Epochs Time #Epochs Time #Epochs Time #Epochs None 1,62 109 2,83 117 1,54 125 0,95 101 10,40 92 {K3}\{K_{3}\} 1,52 107 2,67 85 1,72 145 1,08 102 11,01 89 {K4}\{K_{4}\} 1,18 107 1,94 80 1,62 149 0,90 102 10,23 90 {K5}\{K_{5}\} 1,23 106 2,30 84 1,68 143 0,92 99 10,68 91 {K3,K4}\{K_{3},K_{4}\} 1,53 102 1,97 82 1,89 153 0,94 99 10,80 90 {K3,K4,K5}\{K_{3},K_{4},K_{5}\} 1,62 105 1,96 82 1,95 157 0,97 100 10,25 91

Table 14: Full results of the weighted accuracy for selected pattern combinations and 𝖦𝖭𝖭s\mathsf{GNN}\text{s} on the PATTERN data set.
Pattern set \mathcal{F} 𝖦𝖠𝖳\mathsf{GAT} 𝖦𝖢𝖭\mathsf{GCN} 𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭E,PE\mathsf{GatedGCN}_{E,PE}
None 78,83±\pm0,60 71,42±\pm1,38 85,90±\pm0,03 70,78±\pm0,19 86,15±\pm0,08
{K3}\{K_{3}\} 84,34±\pm0,09 61,54±\pm2,20 86,59±\pm0,02 84,75±\pm0,11 85,02±\pm0,20
{K4}\{K_{4}\} 84,43±\pm0,40 63,40±\pm1,55 86,60±\pm0,02 84,51±\pm0,06 85,40±\pm0,28
{K5}\{K_{5}\} 83,47±\pm0,11 64,18±\pm3,88 86,57±\pm0,02 83,73±\pm0,10 85,63±\pm0,22
{K3,K4}\{K_{3},K_{4}\} 85,44±\pm0,24 81,29±\pm2,82 86,58±\pm0,02 85,85±\pm0,13 85,80±\pm0,20
{K3,K4,K5}\{K_{3},K_{4},K_{5}\} 85,50±\pm0,23 82,49±\pm0,48 86,63±\pm0,03 85,88±\pm0,15 85,56±\pm0,33
Table 15: Total number of model parameters for selected pattern combinations and 𝖦𝖭𝖭s\mathsf{GNN}\text{s} on the PATTERN data set.
Pattern set \mathcal{F} 𝖦𝖠𝖳\mathsf{GAT} 𝖦𝖢𝖭\mathsf{GCN} 𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭E,PE\mathsf{GatedGCN}_{E,PE}
None 394 632 362 117 398 921 386 291 406 403
{K3}\{K_{3}\} 394 784 362 263 399 011 386 399 406 473
{K4}\{K_{4}\} 394 784 362 263 399 011 386 399 406 473
{K5}\{K_{5}\} 394 784 362 263 399 011 386 399 406 473
{K3,K4}\{K_{3},K_{4}\} 394 936 362 409 399 101 386 507 406 543
{K3,K4,K5}\{K_{3},K_{4},K_{5}\} 395 088 362 555 399 191 386 615 406 613
Table 16: Training times (in hours) and number of epochs for selected pattern combinations and 𝖦𝖭𝖭s\mathsf{GNN}\text{s} on the PATTERN data set.

Model: GAT 𝖦𝖢𝖭\mathsf{GCN} 𝖬𝗈𝖭𝖾𝗍\mathsf{MoNet} 𝖦𝗋𝖺𝗉𝗁𝖲𝖺𝗀𝖾\mathsf{GraphSage} 𝖦𝖺𝗍𝖾𝖽𝖦𝖢𝖭E,PE\mathsf{GatedGCN}_{E,PE} Pattern set \mathcal{F} Time Epochs Time Epochs Time Epochs Time Epochs Time Epochs None 1,96 87 3,41 102 1,68 116 0,77 103 10,32 101 {K3}\{K_{3}\} 0,97 97 2,58 80 1,42 107 0,69 105 9,12 95 {K4}\{K_{4}\} 0,90 90 2,68 80 1,46 106 0,67 95 9,47 94 {K5}\{K_{5}\} 0,89 95 2,36 80 1,26 100 0,58 98 9,14 99 {K3,K4}\{K_{3},K_{4}\} 2,11 91 3,62 98 1,68 108 0,86 97 9,50 87 {K3,K4,K5}\{K_{3},K_{4},K_{5}\} 1,02 91 3,26 94 1,48 109 0,76 102 8,84 88