Graph Neural Networks with Local Graph Parameters

Pablo Barceló PUC Chile and IMFD Chile Floris Geerts University of Antwerp Juan Reutter PUC Chile and IMFD Chile Maksimilian Ryschkov University of Antwerp

Abstract

Various recent proposals increase the distinguishing power of Graph Neural Networks $(\mathsf{GNN}\text{s})$ by propagating features between $k$ -tuples of vertices. The distinguishing power of these “higher-order” $\mathsf{GNN}\text{s}$ is known to be bounded by the $k$ -dimensional Weisfeiler-Leman ( $\mathsf{WL}$ ) test, yet their $\mathcal{O}(n^{k})$ memory requirements limit their applicability. Other proposals infuse $\mathsf{GNN}\text{s}$ with local higher-order graph structural information from the start, hereby inheriting the desirable $\mathcal{O}(n)$ memory requirement from $\mathsf{GNN}\text{s}$ at the cost of a one-time, possibly non-linear, preprocessing step. We propose local graph parameter enabled $\mathsf{GNN}\text{s}$ as a framework for studying the latter kind of approaches and precisely characterize their distinguishing power, in terms of a variant of the $\mathsf{WL}$ test, and in terms of the graph structural properties that they can take into account. Local graph parameters can be added to any $\mathsf{GNN}$ architecture, and are cheap to compute. In terms of expressive power, our proposal lies in the middle of $\mathsf{GNN}\text{s}$ and their higher-order counterparts. Further, we propose several techniques to aide in choosing the right local graph parameters. Our results connect $\mathsf{GNN}\text{s}$ with deep results in finite model theory and finite variable logics. Our experimental evaluation shows that adding local graph parameters often has a positive effect for a variety of $\mathsf{GNN}\text{s}$ , datasets and graph learning tasks.

1 Introduction

Context.

Graph neural networks ( $\mathsf{GNN}\text{s}$ ) (Merkwirth & Lengauer, 2005; Scarselli et al., 2009), and its important class of Message Passing Neural Networks ( $\mathsf{MPNN}\text{s}$ ) (Gilmer et al., 2017), are one of the most popular methods for graph learning tasks. Such $\mathsf{MPNN}\text{s}$ use an iterative message passing scheme, based on the adjacency structure of the underlying graph, to compute vertex (and graph) embeddings in some real Euclidean space. The expressive (or discriminative) power of $\mathsf{MPNN}\text{s}$ is, however, rather limited (Xu et al., 2019; Morris et al., 2019). Indeed, $\mathsf{MPNN}\text{s}$ will always identically embed two vertices (graphs) when these vertices (graphs) cannot be distinguished by the one-dimensional Weisfeiler-Leman ( $\mathsf{WL}$ ) algorithm. Two graphs $G_{1}$ and $H_{1}$ and vertices $v$ and $w$ that cannot be distinguished by $\mathsf{WL}$ (and thus any $\mathsf{MPNN}$ ) are shown in Fig. 1. The expressive power of $\mathsf{WL}$ is well-understood (Cai et al., 1992; Arvind et al., 2020; Dell et al., 2018) and basically can only use tree-based structural information in the graphs to distinguish vertices. As a consequence, no $\mathsf{MPNN}$ can detect that vertex $v$ in Fig. 1 is part of a $3$ -clique, whereas $w$ is not. Similarly, $\mathsf{MPNN}\text{s}$ cannot detect that $w$ is part of a $4$ -cycle, whereas $v$ is not. Further limitations of $\mathsf{WL}$ in terms of graph properties can be found, e.g., in Fürer (2017); Arvind et al. (2020); Chen et al. (2020); Tahmasebi & Jegelka (2020).

To remedy the weak expressive power of $\mathsf{MPNN}\text{s}$ , so-called higher-order $\mathsf{MPNN}\text{s}$ were proposed (Morris et al., 2019; Maron et al., 2019b; Morris et al., 2020), whose expressive power is measured in terms of the $k$ -dimensional $\mathsf{WL}$ procedures ( $k\text{-}\mathsf{WL}$ ) (Maron et al., 2019a; Chen et al., 2019a; Azizian & Lelarge, 2021; Geerts, 2020; Sato, 2020; Damke et al., 2020). In a nutshell, $k\text{-}\mathsf{WL}$ operates on $k$ -tuples of vertices and allows to distinguish vertices (graphs) based on structural information related to graphs of treewidth $k$ (Dvorak, 2010; Dell et al., 2018). By definition, $\mathsf{WL}=1\text{-}\mathsf{WL}$ . As an example, $2\text{-}\mathsf{WL}$ can detect that vertex $v$ in Fig. 1 belongs to a $3$ -clique or a $4$ -cycle since both have treewidth two. While more expressive than $\mathsf{WL}$ , the $\mathsf{GNN}\text{s}$ based on $k\text{-}\mathsf{WL}$ require $\mathcal{O}(n^{k})$ operations in each iteration, where $n$ is the number of vertices, hereby hampering their applicability.

Figure 1: Two graphs that are indistinguishable by the

\mathsf{WL}

-test. The numbers between round brackets indicate how many homomorphic images of the

3

-clique each vertex is involved in.

A more practical approach is to extend the expressive power of $\mathsf{MPNN}\text{s}$ whilst preserving their $\mathcal{O}(n)$ cost in each iteration. Various such extensions (Kipf & Welling, 2017; Chen et al., 2019a; Li et al., 2019; Ishiguro et al., 2020; Bouritsas et al., 2020; Geerts et al., 2020) achieve this by infusing $\mathsf{MPNN}\text{s}$ with local graph structural information from the start. That is, the iterative message passing scheme of $\mathsf{MPNN}\text{s}$ is run on vertex labels that contain quantitative information about local graph structures. It is easy to see that such architectures can go beyond the $\mathsf{WL}$ test: for example, adding triangle counts to $\mathsf{MPNN}\text{s}$ suffices to distinguish the vertices $v$ and $w$ and graphs $G_{1}$ and $H_{1}$ in Fig. 1. Moreover, the cost is a single preprocessing step to count local graph parameters, thus maintaining the $\mathcal{O}(n)$ cost in the iterations of the $\mathsf{MPNN}$ . While there are some partial results showing that local graph parameters increase expressive power (Bouritsas et al., 2020; Li et al., 2019), their precise expressive power and relationship to higher-order $\mathsf{MPNN}\text{s}$ was unknown, and there is little guidance in terms of which local parameters do help $\mathsf{MPNN}\text{s}$ and which ones do not. The main contribution of this paper is a precise characterization of the expressive power of $\mathsf{MPNN}\text{s}$ with local graph parameters and its relationship to the hierarchy of higher-order $\mathsf{MPNN}\text{s}$ .

Our contributions.

In order to nicely formalize local graph parameters, we propose to extend vertex labels with homomorphism counts of small graph patterns.¹¹1We recall that homomorphisms are edge-preserving mappings between the vertex sets. More precisely, given a graphs $P$ and $G$ , and vertices $r$ in $P$ and $v$ in $G$ , we add the number of homomorphisms from $P$ to $G$ that map $r$ to $v$ , denoted by $\mathsf{hom}(P^{r},G^{v})$ , to the initial features of $v$ . Such counts satisfy conditions (i) and (ii). Indeed, homomorphism counts are known to measure the similarity of vertices and graphs (Lovász, 1967; Grohe, 2020a), and serve as a basis for the efficient computation of a number of other important graph parameters, e.g., subgraph and induced subgraphs counts (Curticapean et al., 2017; Zhang et al., 2020). Furthermore, homomorphism counts underly characterisations of the expressive power of $\mathsf{MPNN}\text{s}$ and higher-order $\mathsf{MPNN}\text{s}$ . As an example, two vertices $v$ and $w$ in graphs $G$ and $H$ , respectively, are indistinguishable by $\mathsf{WL}$ , and hence by $\mathsf{MPNN}\text{s}$ , precisely when $\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w})$ for every rooted tree $T^{r}$ (Dvorak, 2010; Dell et al., 2018).

Concretely, we propose $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ where $\mathcal{F}=\{P_{1}^{r},\allowbreak\dots,\allowbreak P_{\ell}^{r}\}$ is a set of (graph) patterns, by (i) first allowing a pre-processing step that labels each vertex $v$ of a graph $G$ with the vector $\bigl{(}\mathsf{hom}(P^{r}_{1},G^{v}),\ldots,\mathsf{hom}(P^{r}_{\ell},G^{v})\bigr{)}$ , and (ii) then run an $\mathsf{MPNN}$ on this labelling. As such, we can turn any $\mathsf{MPNN}$ into an $\mathcal{F}$ - $\mathsf{MPNN}$ by simply augmenting the initial vertex embedding. Furthermore, several recently proposed extensions of $\mathsf{MPNN}\text{s}$ fit in this approach, including $\mathsf{MPNN}\text{s}$ extended with information about vertex degrees (Kipf & Welling, 2017), walk counts (Chen et al., 2019a), tree-based counts (Ishiguro et al., 2020) and subgraph counts (Bouritsas et al., 2020). Hence, $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ can also be regarded as a unifying theoretical formalism.

Our main results can be summarised, as follows:

1.

We precisely characterise the expressive power of $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ by means of an extension of $\mathsf{WL}$ , denoted by $\mathcal{F}$ - $\mathsf{WL}$ . For doing so, we use $\mathcal{F}$ -pattern trees, which are obtained from standard trees by joining an arbitrary number of copies of the patterns in $\mathcal{F}$ to each one of its vertices. Our result states that vertices $v$ and $w$ in graphs $G$ and $H$ , respectively, are indistinguishable by $\mathcal{F}$ - $\mathsf{WL}$ , and hence by $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ , precisely when $\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w})$ for every $\mathcal{F}$ -pattern tree $T^{r}$ . This characterisation gracefully extends the characterisation for standard $\mathsf{MPNN}\text{s}$ , mentioned earlier, by setting $\mathcal{F}=\emptyset$ . Furthermore, $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ provide insights in the expressive power of existing $\mathsf{MPNN}$ extensions, most notably the Graph Structure Networks of Bouritsas et al. (2020).
2.

We compare $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ to higher-order $\mathsf{MPNN}\text{s}$ , which are characterized in terms of the $k\text{-}\mathsf{WL}$ -test. On the one hand, while $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ strictly increase the expressive power of the $\mathsf{WL}$ -test, for any finite set $\mathcal{F}$ of patterns, $2\text{-}\mathsf{WL}$ can distinguish graphs which $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ cannot. On the other hand, for each $k\geq 1$ there are patterns $P$ such that $\{P\}$ - $\mathsf{MPNN}\text{s}$ can distinguish graphs which $k\text{-}\mathsf{WL}$ cannot.
3.

We deal with the technically challenging problem of pattern selection and comparing $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ based on the patterns included in $\mathcal{F}$ . We prove two partial results: one establishing when a pattern $P$ in $\mathcal{F}$ is redundant, based on whether or not $P$ is the join of other patterns in $\mathcal{F}$ , and another result indicating when $P$ does add expressive power, based on the treewidth of $P$ compared to the treewidth of other patterns in $\mathcal{F}$ .
4.

Our theoretical results are complemented by an experimental study in which we show that for various $\mathsf{GNN}$ architectures, datasets and graph learning tasks, all part of the recent benchmark by Dwivedi et al. (2020), the augmentation of initial features with homomorphism counts of graph patterns has often a positive effect, and the cost for computing these counts incurs little to no overhead.

As such, we believe that $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ not only provide an elegant theoretical framework for understanding local graph parameter enabled $\mathsf{MPNN}\text{s}$ , they are also a valuable alternative to higher-order $\mathsf{MPNN}\text{s}$ as way to increase the expressive power of $\mathsf{MPNN}\text{s}$ . In addition, and as will be explained in Section 2, $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ provide a unifying framework for understanding the expressive power of several other existing extensions of $\mathsf{MPNN}\text{s}$ . Proofs of our results and further details on the relationship to existing approaches and experiments can be found in the appendix.

Related Work.

Works related to the distinguishing power of the $\mathsf{WL}$ -test, $\mathsf{MPNN}\text{s}$ and their higher-order variants are cited throughout the paper. Beyond distinguishability, $\mathsf{GNN}\text{s}$ are analyzed in terms of universality and generalization properties (Maron et al., 2019c; Keriven & Peyré, 2019; Chen et al., 2019b; Garg et al., 2020), local distributed algorithms (Sato et al., 2019; Loukas, 2020), randomness in features (Sato et al., 2020; Abboud et al., 2020) and using local context matrix features (Vignac et al., 2020). Other extensions of $\mathsf{GNN}\text{s}$ are surveyed, e.g., in Wu et al. (2021); Zhou et al. (2018) and Chami et al. (2021). Related are the Graph Homomorphism Convolutions by NT & Maehara (2020) which apply $\mathsf{SVM}$ s directly on the representation of vertices by homomorphism counts. Finally, our approach is reminiscent of the graph representations by means of graphlet kernels (Shervashidze et al., 2009), but then on the level of vertices.

2 Local Graph Parameter Enabled $\mathsf{MPNN}\text{s}$

In this section we introduce $\mathsf{MPNN}\text{s}$ with local graph parameters. We start by introducing preliminary concepts.

Graphs.

We consider undirected vertex-labelled graphs $G=(V,E,\chi)$ , with $V$ the set of vertices, $E$ the set of edges and $\chi$ a mapping assigning a label to each vertex in $V$ . The set of neighbours of a vertex is denoted as $N_{G}(v)=\bigl{\{}u\in V\bigm{|}\{u,v\}\in E\bigr{\}}$ . A rooted graph is a graph in which one its vertices is declared as its root. We denote a rooted graph by $G^{v}$ , where $v\in V$ is the root and depict them as graphs in which the root is a blackened vertex. Given graphs $G=(V_{G},E_{G},\chi_{G})$ and $H=(H_{H},E_{H},\chi_{H})$ , an homomorphism $h$ is a mapping $h:V_{G}\to V_{H}$ such that (i) $\{h(u),h(v)\}\in E_{H}$ for every $\{u,v\}\in E_{G}$ , and (ii) $\chi_{G}(u)=\chi_{H}(h(u))$ for every $u\in V_{G}$ . For rooted graphs $G^{v}$ and $H^{w}$ , an homomorphism must additionally map $v$ to $w$ . We denote by $\mathsf{hom}(G,H)$ the number of homomorphisms from $G$ to $H$ ; similarly for rooted graphs. For simplicity of exposition we focus on vertex-labelled undirected graphs but all our results can be extended to edge-labelled directed graphs.

Message passing neural networks.

The basic architecture for $\mathsf{MPNN}\text{s}$ (Gilmer et al., 2017), and the one used in recent studies on GNN expressibility (Morris et al., 2019; Xu et al., 2019; Barceló et al., 2020), consists of a sequence of rounds that update the feature vector of every vertex in the graph by combining its current feature vector with the result of an aggregation over the feature vectors of its neighbours. Formally, for a graph $G=(V,E,\chi)$ , let $\mathbf{x}_{M,G,v}^{(d)}$ denote the feature vector computed for vertex $v\in V$ by an $\mathsf{MPNN}$ $M$ in round $d$ . The initial feature vector $\mathbf{x}_{M,G,v}^{(0)}$ is a one-hot encoding of its label $\chi(v)$ . This feature vector is iteratively updated in a number of rounds. In particular, in round $d$ ,

\mathbf{x}_{M,G,v}^{(d)}\,:=\,\textsc{Upd}^{(d)}\Big{(}\mathbf{x}_{M,G,v}^{(d-1)},\textsc{Comb}^{(d)}\big{(}\{\mskip-5.0mu\{\mathbf{x}_{M,G,u}^{(d-1)}\,\mid\,u\in N_{G}(v)\}\mskip-5.0mu\}\big{)}\Big{)},

where $\textsc{Comb}^{(d)}$ and $\textsc{Upd}^{(d)}$ are an aggregating and update function, respectively. Thus, the feature vectors $\mathbf{x}_{M,G,u}^{(d-1)}$ of all neighbours $u$ of $v$ are combined by the aggregating function $\textsc{Comb}^{(d)}$ into a single vector, and then this vector is used together with $\mathbf{x}_{M,G,v}^{(d-1)}$ in order to produce $\mathbf{x}_{M,G,v}^{(d)}$ by applying the update function $\textsc{Upd}^{(d)}$ .

$\mathsf{MPNN}\text{s}$ with local graph parameters.

The $\mathsf{GNN}\text{s}$ studied in this paper leverage the power of $\mathsf{MPNN}\text{s}$ by enhancing initial features of vertices with local graph parameters that are beyond their classification power. To illustrate the idea, consider the graphs in Fig. 1. As mentioned, these graphs cannot be distinguished by the $\mathsf{WL}$ -test, and therefore cannot be distinguished by the broad class of $\mathsf{MPNN}\text{s}$ (see e.g. (Xu et al., 2019; Morris et al., 2019)). If we allow a pre-processing stage, however, in which the initial labelling of every vertex $v$ is extended with the number of (homomorphic images of) $3$ -cliques in which $v$ participates (indicated by numbers between brackets in Fig. 1), then clearly vertices $v$ and $w$ (and the graphs $G_{1}$ and $H_{1}$ ) can be distinguished based on this extra structural information. In fact, the initial labelling already suffices for this purpose.

Let $\mathcal{F}=\{P_{1}^{r},\dots,P_{\ell}^{r}\}$ be a set of (rooted) graphs, which we refer to as patterns. Then, $\mathcal{F}$ -enabled $\mathsf{MPNN}\text{s}$ , or just $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ , are defined in the same way as $\mathsf{MPNN}\text{s}$ with the crucial difference that now the initial feature vector of a vertex $v$ is a one-hot encoding of the label $\chi_{G}(v)$ of the vertex, and all the homomorphism counts from patterns in $\mathcal{F}$ . Formally, in each round $d$ an $\mathcal{F}$ - $\mathsf{MPNN}$ $M$ labels each vertex $v$ in graph $G$ with a feature vector $\mathbf{x}_{M,\mathcal{F},G,v}^{(d)}$ which is inductively defined as follows:

	$\displaystyle\mathbf{x}_{M,\mathcal{F},G,v}^{(0)}$	$\displaystyle:=\big{(}\chi_{G}(v),\mathsf{hom}(P_{1}^{r},G^{v}),\dots,\mathsf{hom}(P_{\ell}^{r},G^{v})\big{)}$
	$\displaystyle\mathbf{x}_{M,\mathcal{F},G,v}^{(d)}$	$\displaystyle:=\,\textsc{Upd}^{(d)}\Big{(}\mathbf{x}_{M,\mathcal{F},G,v}^{(d-1)},\textsc{Comb}^{(d)}\big{(}\{\mskip-5.0mu\{\mathbf{x}_{M,\mathcal{F},G,v}^{(d-1)}\,\mid\,u\in N_{G}(v)\}\mskip-5.0mu\}\big{)}\Big{)}.$

We note that standard $\mathsf{MPNN}\text{s}$ are $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ with $\mathcal{F}=\emptyset$ . As for $\mathsf{MPNN}\text{s}$ , we can equip $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ with a Readout function that aggregates all final feature vectors into a single feature vector in order to classify or distinguish graphs.

We emphasise that any $\mathsf{MPNN}$ architecture can be turned in an $\mathcal{F}$ - $\mathsf{MPNN}$ by a simple homomorphism counting preprocessing step. As such, we propose a generic plug-in for a large class of $\mathsf{GNN}$ architectures. Better still, homomorphism counts of small graph patterns can be efficiently computed even on large datasets (Zhang et al., 2020) and they form the basis for counting (induced) subgraphs and other notions of subgraphs (Curticapean et al., 2017). Despite the simplicity of $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ , we will show that they can substantially increase the power of $\mathsf{MPNN}\text{s}$ by varying $\mathcal{F}$ , only paying the one-time cost for preprocessing.

$\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ as unifying framework.

An important aspect of $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ is that they allow a principled analysis of the power of existing extensions of $\mathsf{MPNN}\text{s}$ . For example, taking $\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=11.62494pt]{L1.pdf}}\}$ suffices to capture degree-aware $\mathsf{MPNN}\text{s}$ (Geerts et al., 2020), such as the Graph Convolution Networks ( $\mathsf{GCN}$ s) (Kipf & Welling, 2017), which use the degree of vertices; taking $\mathcal{F}=\{L_{1},L_{2},\ldots,L_{\ell}\}$ for rooted paths $L_{i}$ of length $i$ suffices to model the walk counts used in Chen et al. (2019a); and taking $\mathcal{F}$ as the set of labeled trees of depth one precisely corresponds to the use of the $\mathsf{WL}$ -labelling obtained after one round by Ishiguro et al. (2020). Furthermore, $\{C_{\ell}\}$ - $\mathsf{MPNN}\text{s}$ , where $C_{\ell}$ denotes the cycle of length $\ell$ , correspond to the extension proposed in Section 4 in Li et al. (2019).

In addition, $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ can also capture the Graph Structure Networks ( $\mathsf{GSN}\text{s}$ ) by Bouritsas et al. (2020), which use subgraph isomorphism counts of graph patterns. We recall that an isomorphism from $G$ to $H$ is a bijective homomorphism $h$ from $G$ to $H$ which additionally satisfies (i) $\{h^{-1}(u),h^{-1}(v)\}\in E_{G}$ for every $\{u,v\}\in E_{H}$ , and (ii) $\chi_{G}(h^{-1}(u))=\chi_{H}(u)$ for every $u\in V_{H}$ . When $G$ and $H$ are rooted graphs, isomorphisms should preserve the roots as well. Now, in a $\mathsf{GSN}$ , the feature vector of each vertex $v$ is augmented with the the counts of every isomorphism from a rooted pattern $P^{r}$ to $G^{v}$ , for rooted patterns $P$ in a set of patterns $\mathcal{P}$ ,²²2The use of orbits in Bouritsas et al. (2020) is here ignored, but explained in the appendix. and this is followed by the execution of an $\mathsf{MPNN}$ , just as for our $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ . It now remains to observe that subgraph isomorphism counts can be computed in terms of homomorphism counts, and vice versa (Curticapean et al., 2017). That is, $\mathsf{GSN}\text{s}$ can be viewed as $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ and thus our results for $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ carry over to $\mathsf{GSN}\text{s}$ . We adopt homomorphism counts instead of subgraph isomorphism counts because homomorphisms counts underly existing characterizations of the expressive power of $\mathsf{MPNN}\text{s}$ and homomorphism counts are in general more efficient to compute. Also, Curticapean et al. (2017) indicate that all common graph counts are interchangeable in terms of expressive power.

Figure 2: Two graphs and vertices that cannot be distinguished by the

\mathsf{WL}

-test, and therefore by standard

\mathsf{MPNN}\text{s}

. When considering

\mathcal{F}

\mathsf{MPNN}\text{s}

, with

\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{K3.pdf}}\}

, one can show that both graphs cannot be distinguished just by focusing on the initial labelling, but they can be distinguished by an

\mathcal{F}

\mathsf{MPNN}

with just one aggregation layer.

3 Expressive Power of $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$

Recall that the standard $\mathsf{WL}$ -test (Weisfeiler & Lehman, 1968; Grohe, 2017) iteratively constructs a labelling of the vertices in a graph $G=(V,E,\chi)$ as follows. In round $d$ , for each vertex $v$ the algorithm first collects the label of $v$ and all of its neighbours after round $d-1$ , and then it hashes this aggregated multiset of labels into a new label for $v$ . The initial label of $v$ is $\chi(v)$ . As shown in Xu et al. (2019) and Morris et al. (2019), the $\mathsf{WL}$ -test provides a bound on the classification power of $\mathsf{MPNN}\text{s}$ : if two vertices or two graphs are indistinguishable by the $\mathsf{WL}$ test, then they will not be distinguished by any $\mathsf{MPNN}$ .

In turn, the expressive power of the $\mathsf{WL}$ -test, and thus of $\mathsf{MPNN}\text{s}$ , can be characterised in terms of homomorphism counts of trees (Dvorak, 2010; Dell et al., 2018). This result can be seen as a characterisation of the expressiveness of the $\mathsf{WL}$ -test in terms of a particular infinite-dimensional graph kernel: the one defined by the number of homomorphisms from every tree $T$ into the underlying graph $G$ .

In this section we show that both characterisations extend in an elegant way to the setting of $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ , confirming that $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ are not just a useful, but also a well-behaved generalisation of standard $\mathsf{MPNN}\text{s}$ .

3.1 Characterisation in terms of $\mathcal{F}\text{-}\mathsf{WL}$

We bound the expressive power of $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ in terms of what we call the $\mathcal{F}\text{-}\mathsf{WL}$ -test. Formally, the $\mathcal{F}\text{-}\mathsf{WL}$ -test extends $\mathsf{WL}$ in the same way as $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ extend standard $\mathsf{MPNN}\text{s}$ : by including homomorphism counts of patterns in $\mathcal{F}$ in the initial labelling. That is, let $\mathcal{F}=\{P_{1}^{r},\dots,P_{\ell}^{r}\}$ . The $\mathcal{F}\text{-}\mathsf{WL}$ -test is a vertex labelling algorithm that iteratively computes a label $\chi_{\mathcal{F},G,v}^{(d)}$ for each vertex $v$ of a graph $G$ , defined as follows.

	$\displaystyle\chi_{\mathcal{F},G,v}^{(0)}\,$	$\displaystyle:=\,\bigl{(}\chi_{G}(v),\mathsf{hom}(P_{1}^{r},G^{v}),\dots,\mathsf{hom}(P_{\ell}^{r},G^{v})\bigr{)}$
	$\displaystyle\chi_{\mathcal{F},G,v}^{(d)}$	$\displaystyle:=\,\textsc{Hash}\bigl{(}\chi_{\mathcal{F},G,v}^{(d-1)},\{\!\!\{\chi_{\mathcal{F},G,u}^{(d-1)}\mid u\in N_{G}(v)\}\!\!\}\bigr{)}.$

The $\mathcal{F}\text{-}\mathsf{WL}$ -test stops in round $d$ when no new pair of vertices are identified by means of $\chi_{\mathcal{F},G,v}^{(d)}$ , that is, when for any two vertices $v_{1}$ and $v_{2}$ from $G$ , $\chi_{\mathcal{F},G,v_{1}}^{(d-1)}=\chi_{\mathcal{F},G,v_{2}}^{(d-1)}$ implies $\chi_{\mathcal{F},G,v_{1}}^{(d)}=\chi_{\mathcal{F},G,v_{2}}^{(d)}$ . Notice that $\mathsf{WL}=\emptyset\text{-}\mathsf{WL}$ .

We can use the $\mathcal{F}\text{-}\mathsf{WL}$ -test to compare vertices of the same graphs, or different graphs. We say that the $\mathcal{F}\text{-}\mathsf{WL}$ -test cannot distinguish vertices if their final labels are the same, and that the $\mathcal{F}\text{-}\mathsf{WL}$ -test cannot distinguish graphs $G$ and $H$ if the multiset containing each label computed for $G$ is the same as that of $H$ .

Similarly as for $\mathsf{MPNN}\text{s}$ and the $\mathsf{WL}$ -test (Xu et al., 2019; Morris et al., 2019), we obtain that the $\mathcal{F}\text{-}\mathsf{WL}$ -test provides an upper bound for the expressive power of $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ .

Proposition 1.

If two vertices of a graph cannot be distinguished by the $\mathcal{F}\text{-}\mathsf{WL}$ -test, then they cannot be distinguished by any $\mathcal{F}$ - $\mathsf{MPNN}$ either. Moreover, if two graphs cannot be distinguished by the $\mathcal{F}\text{-}\mathsf{WL}$ -test, then they cannot be distinguished by any $\mathcal{F}$ - $\mathsf{MPNN}$ either.

We can also construct $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ that mimic the $\mathcal{F}\text{-}\mathsf{WL}$ -test: Simply adding local parameters from a set $\mathcal{F}$ of patterns to the $\mathsf{GIN}$ architecture of Xu et al. (2019) results in an $\mathcal{F}$ - $\mathsf{MPNN}$ that classifies vertices and graphs as the $\mathcal{F}\text{-}\mathsf{WL}$ -test.

3.2 Characterisation in terms of $\mathcal{F}$ -pattern trees

At the core of several results about the $\mathsf{WL}$ -test lies a characterisation linking the test with homomorphism counts of (rooted) trees (Dvorak, 2010; Dell et al., 2018). In view of the connection to $\mathsf{MPNN}\text{s}$ , it tells that $\mathsf{MPNN}\text{s}$ only use quantitative tree-based structural information from the underlying graphs. We next extend this characterisation to $\mathcal{F}\text{-}\mathsf{WL}$ by using homomorphism counts of so-called $\mathcal{F}$ -pattern trees. In view of the connection with $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ (Proposition 1), this reveals that $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ can use quantitative information of richer graph structures than $\mathsf{MPNN}\text{s}$ .

To define $\mathcal{F}$ -pattern trees we need the graph join operator $\star$ . Given two rooted graphs $G^{v}$ and $H^{w}$ , the join graph $(G\star H)^{v}$ is obtained by taking the disjoint union of $G^{v}$ and $H^{w}$ , followed by identifying $w$ with $v$ . The root of the join graph is $v$ . For example, the join of and is . Further, if $G$ is a graph and $P^{r}$ is a rooted graph, then joining a vertex $v$ in $G$ with $P^{r}$ results in the disjoint union of $G$ and $P^{r}$ , where $r$ is identified with $v$ .

Let $\mathcal{F}=\{P_{1}^{r},\ldots,P_{\ell}^{r}\}$ . An $\mathcal{F}$ -pattern tree $T^{r}$ is obtained from a standard rooted tree $S^{r}=(V,E,\chi)$ , called the backbone of $T^{r}$ , followed by joining every vertex $s\in V$ with any number of copies of patterns from $\mathcal{F}$ .

We define the depth of an $\mathcal{F}$ -pattern tree as the depth of its backbone. Examples of $\mathcal{F}$ -pattern trees, for $\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{K3.pdf}}\}$ , are:

where grey vertices are part of the backbones of the $\mathcal{F}$ -pattern trees. Standard trees are also $\mathcal{F}$ -pattern trees.

We next use $\mathcal{F}$ -pattern trees to characterise the expressive power of $\mathcal{F}\text{-}\mathsf{WL}$ and thus, by Proposition 1, of $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ .

Theorem 1.

For any finite collection $\mathcal{F}$ of patterns, vertices $v$ and $w$ in a graph $G$ $\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},G^{w})$ for every rooted $\mathcal{F}$ -pattern tree $T^{r}$ . Similarly, $G$ and $H$ are undistinguishable by the $\mathcal{F}\text{-}\mathsf{WL}$ -test if and only if $\mathsf{hom}(T,G)=\mathsf{hom}(T,H)$ , for every (unrooted) $\mathcal{F}$ -pattern tree.

The proof of this theorem, which can be found in the appendix, requires extending techniques from Grohe (2020a, b) that were used to characterise the expressiveness of $\mathsf{WL}$ in terms of homomorphism counts of trees.

In fact, we can make the above theorem more precise. When $\mathcal{F}\text{-}\mathsf{WL}$ is run for $d$ rounds, then only $\mathcal{F}$ -patterns trees of depth $d$ are required. This tells that increasing the number of rounds of $\mathcal{F}\text{-}\mathsf{WL}$ results in that more complicated structural information is taken into account For example, consider the two graphs $G_{2}$ and $H_{2}$ and vertices $v\in V_{G_{2}}$ and $w\in V_{H_{2}}$ , shown in Fig. 2. Let $\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{K3.pdf}}\}$ . By definition, $\mathcal{F}\text{-}\mathsf{WL}$ cannot distinguish $v$ from $w$ based on the initial labelling. If run for one round, Theorem 1 implies that $\mathcal{F}\text{-}\mathsf{WL}$ cannot distinguish $v$ from $w$ if and only if $\mathsf{hom}(T^{r},G_{2}^{v})=\mathsf{hom}(T^{r},H_{2}^{w})$ for any $\mathcal{F}$ -pattern tree of depth at most $1$ . It is readily verified that

\mathsf{hom}(\raisebox{0.0pt}{\includegraphics[height=17.22217pt]{Ftree1g.pdf}},G_{2}^{v})=0\neq 4=\mathsf{hom}(\raisebox{0.0pt}{\includegraphics[height=17.22217pt]{Ftree1g.pdf}},H_{2}^{w}),

and thus $\mathcal{F}\text{-}\mathsf{WL}$ distinguishes $v$ from $w$ after one round. Similarly, $G_{2}$ and $H_{2}$ can be distinguished by $\mathcal{F}\text{-}\mathsf{WL}$ after one round. We observe that $G_{2}$ and $H_{2}$ are indistinguishable by $\mathsf{WL}$ . Hence, $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ are more expressive than $\mathsf{MPNN}\text{s}$ .

Importantly, Theorem 1 discloses the boundaries of $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ . To illustrate this for some specific instances of $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ mentioned earlier, the expressive power of degree-based $\mathsf{MPNN}\text{s}$ (Kipf & Welling, 2017; Geerts et al., 2020) is captured by $\{L_{1}\}$ -pattern trees, and walk counts- $\mathsf{MPNN}\text{s}$ (Chen et al., 2019a) are captured by $\{L_{1},\ldots,L_{\ell}\}$ -pattern trees. These pattern trees are just trees, since joining paths to trees only results in bigger trees. Thus, Theorem 1 tells that all these extensions are still bounded by $\mathsf{WL}$ (albeit needing less rounds). In contrast, beyond $\mathsf{WL}$ , $\{C_{\ell}\}$ -pattern trees capture cycle count $\mathsf{MPNN}\text{s}$ (Li et al., 2019), and for $\mathsf{GSN}\text{s}$ (Bouritsas et al., 2020) which use subgraph isomorphism counts of pattern $P\in\mathcal{P}$ , their expressive power is captured by $\mathsf{spasm}(\mathcal{P})$ -pattern trees, where $\mathsf{spasm}(\mathcal{P})$ consists of all surjective homomorphic images of patterns in $\mathcal{P}$ (Curticapean et al., 2017).

4 A Comparison with the $k\text{-}\mathsf{WL}$ -test

We propose $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ as an alternative and efficient way to extend the expressive power of $\mathsf{MPNN}\text{s}$ (and thus the $\mathsf{WL}$ -test) compared to the computationally intensive higher-order $\mathsf{MPNN}\text{s}$ based on the $k\text{-}\mathsf{WL}$ -test (Morris et al., 2019, 2020; Maron et al., 2019a). In this section we situate $\mathcal{F}\text{-}\mathsf{WL}$ in the $k\text{-}\mathsf{WL}$ hierarchy. The definition of $k\text{-}\mathsf{WL}$ is deferred to the appendix.

We have seen that $\mathcal{F}\text{-}\mathsf{WL}$ can distinguish graphs that $\mathsf{WL}$ cannot: it suffices to consider $\{K_{3}\}\text{-}\mathsf{WL}$ for the $3$ -clique $K_{3}$ . In order to generalise this observation we need some notation. Let $\mathcal{F}$ and $\mathcal{G}$ be two sets of patterns and consider an $\mathcal{F}$ - $\mathsf{MPNN}$ $M$ and a $\mathcal{G}$ - $\mathsf{MPNN}$ $N$ . We say that $M$ is upper bounded in expressive power by $N$ if for any graph $G$ , if $N$ cannot distinguish vertices $v$ and $w$ ,³³3Just as for the $\mathcal{F}\text{-}\mathsf{WL}$ -test, an $\mathcal{F}$ - $\mathsf{MPNN}$ cannot distinguish two vertices if the label computed for both of them is the same then neither can $M$ . A similar notion is in place for pairs of graphs: if $N$ cannot distinguish graphs $G$ and $H$ , then neither can $M$ .

More generally, let $\mathcal{M}$ be a class of $\mathcal{F}$ - $\mathsf{MPNN}$ and $\mathcal{N}$ be a class of $\mathcal{G}$ - $\mathsf{MPNN}$ . We say that the class $\mathcal{M}$ is upper bounded in expressive power by $\mathcal{N}$ if every $M\in\mathcal{M}$ is upper bounded in expressive power by an $N\in\mathcal{N}$ (which may depend on $M$ ). When $\mathcal{M}$ is upper bounded by $\mathcal{N}$ and vice versa, then $\mathcal{M}$ and $\mathcal{N}$ are said to have the same expressive power. A class $\mathcal{N}$ is more expressive than a class $\mathcal{M}$ when $\mathcal{M}$ is upper bounded in expressive power by $\mathcal{N}$ , but there exist graphs that can be distinguished by $\mathsf{MPNN}\text{s}$ in $\mathcal{N}$ but not by any $\mathsf{MPNN}$ in $\mathcal{M}$ .

Finally, we use the notion of treewidth of a graph, which measures the tree-likeness of a graph. For example, trees have treewidth one, cycles have treewidth two, and the $k$ -clique $K_{k}$ has treewidth $k-1$ (for $k>1$ ). We define this standard notion in the appendix and only note that we define the treewidth of a pattern $P^{r}$ as the treewidth of its unrooted version $P$ .

Our first result is a consequence of the characterisation of $k\text{-}\mathsf{WL}$ in terms of homomorphism counts of graphs of treewidth $k$ (Dvorak, 2010; Dell et al., 2018).

Proposition 2.

For each finite set $\mathcal{F}$ of patterns, the expressive power of $\mathcal{F}\text{-}\mathsf{WL}$ is bounded by $k\text{-}\mathsf{WL}$ , where $k$ is the largest treewidth of a pattern in $\mathcal{F}$ .

For example, since the treewidth of $K_{3}$ is $2$ , we have that $\{K_{3}\}\text{-}\mathsf{WL}$ is bounded by $2\text{-}\mathsf{WL}$ . Similarly, $\{K_{k+1}\}\text{-}\mathsf{WL}$ is bounded in expressive power by $k\text{-}\mathsf{WL}$ .

Our second result tells how to increase the expressive power of $\mathcal{F}\text{-}\mathsf{WL}$ beyond $k\text{-}\mathsf{WL}$ . A pattern $P^{r}$ is a core if any homomorphism from $P$ to itself is injective. For example, any clique $K_{k}$ and cycle of odd length is a core.

Theorem 2.

Let $\mathcal{F}$ be a finite set of patterns. If $\mathcal{F}$ contains a pattern $P^{r}$ which is a core and has treewidth $k$ , then there exist graphs that can be distinguished by $\mathcal{F}\text{-}\mathsf{WL}$ but not by $(k-1)\text{-}\mathsf{WL}$ .

In other words, for such $\mathcal{F}$ , $\mathcal{F}\text{-}\mathsf{WL}$ is not bounded by $(k-1)\text{-}\mathsf{WL}$ . For example, since $K_{3}$ is a core, $\{K_{3}\}\text{-}\mathsf{WL}$ is not bounded in expressive power by $\mathsf{WL}=1\text{-}\mathsf{WL}$ . More generally, $\{K_{k}\}\text{-}\mathsf{WL}$ is not bounded by $(k-1)\text{-}\mathsf{WL}$ . The proof of Theorem 2 is based on extending deep techniques developed in finite model theory, and that have been used to understand the expressive power of finite variable logics (Atserias et al., 2007; Bova & Chen, 2019). This result is stronger than the one underlying the strictness of the $k\text{-}\mathsf{WL}$ hierarchy (Otto, 2017), which states that $k\text{-}\mathsf{WL}$ is strictly more expressive than $(k-1)\text{-}\mathsf{WL}$ . Indeed, Otto (2017) only shows the existence of a pattern $P^{r}$ of treewidth $k$ such that $(k-1)\text{-}\mathsf{WL}$ is not bounded by $\{P^{r}\}\text{-}\mathsf{WL}$ . In Theorem 2 we provide an explicit recipe for finding such a pattern $P^{r}$ , that is, $P^{r}$ can be taken a core of treewidth $k$ .

In summary, we have shown that there is a set $\mathcal{F}$ of patterns such that (i) $\mathcal{F}\text{-}\mathsf{WL}$ can distinguish graphs which cannot be distinguished by $(k-1)\text{-}\mathsf{WL}$ , yet (ii) $\mathcal{F}\text{-}\mathsf{WL}$ cannot distinguish more graphs than $k\text{-}\mathsf{WL}$ . This begs the question whether there is a finite set $\mathcal{F}$ such that $\mathcal{F}\text{-}\mathsf{WL}$ is equivalent in expressive power to $k\text{-}\mathsf{WL}$ . We answer this negatively.

Proposition 3.

For any $k>1$ , there does not exist a finite set $\mathcal{F}$ of patterns such that $\mathcal{F}\text{-}\mathsf{WL}$ is equivalent in expressive power to $k\text{-}\mathsf{WL}$ .

In view of the connection between $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ and $\mathsf{GSN}\text{s}$ mentioned earlier, we thus show that no $\mathsf{GSN}$ can match the power of $k\text{-}\mathsf{WL}$ , which was a question left open in Bouritsas et al. (2020). We remark that if we allow $\mathcal{F}$ to consist of all (infinitely many) patterns of treewidth $k$ , then $\mathcal{F}\text{-}\mathsf{WL}$ is equivalent in expressive power to $k\text{-}\mathsf{WL}$ (Dvorak, 2010; Dell et al., 2018).

5 When Do Patterns Extend Expressiveness?

Patterns are not learned, but must be passed as an input to $\mathsf{MPNN}\text{s}$ together with the graph structure. Thus, knowing which patterns work well, and which do not, is of key importance for the power of the resulting $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ . This is a difficult question to answer since determining which patterns work well is clearly application-dependent. From a theoretical point of view, however, we can still look into interesting questions related to the problem of which patterns to choose. One such a question, and the one studied in this section, is when a pattern adds expressive power over the ones that we have already selected. More formally, we study the following problem: Given a finite set $\mathcal{F}$ of patterns, when does adding a new pattern $P^{r}$ to $\mathcal{F}$ extends the expressive power of the $\mathcal{F}\text{-}\mathsf{WL}$ -test?

To answer this question in the positive, we need to find two graphs $G$ and $H$ , show that they are indistinguishable by the $\mathcal{F}\text{-}\mathsf{WL}$ -test, but show that they can be distinguished by the $\mathcal{F}\cup\{P^{r}\}\text{-}\mathsf{WL}$ -test. As an example of this technique we show that longer cycles always add expressive power. We use $C_{k}$ to represent the cycle of length $k$ .

Proposition 4.

For any $k>3$ , $\{C^{r}_{3},\dots,C^{r}_{k}\}\text{-}\mathsf{WL}$ is more expressive than $\{C^{r}_{3},\dots,C^{r}_{k-1}\}\text{-}\mathsf{WL}$ .

We also observe that, by Proposition 2, $\{C^{r}_{3},\dots,C^{r}_{k}\}\text{-}\mathsf{WL}$ is bounded by $2\text{-}\mathsf{WL}$ for any $k\geq 3$ because cycles have treewidth two. It is often the case, however, that finding such graphs and proving that they are indistinguishable, can be rather challenging. Instead, in this section we provide two techniques that can be used to partially answer the question posed above by only looking at properties of the sets of patterns. Our first result is for establishing when a pattern does not add expressive power to a given set $\mathcal{F}$ of patterns, and the second one when it does.

5.1 Detecting when patterns are superfluous

Our first result is a simple recipe for choosing local features: instead of choosing complex patterns that are the joins of smaller patterns, one should opt for the smaller patterns.

Proposition 5.

Let $P^{r}=P_{1}^{r}\star P_{2}^{r}$ be a pattern that is the join of two smaller patterns. Then for any set $\mathcal{F}$ of patterns, we have that $\mathcal{F}\cup\{P^{r}\}$ is upper bounded by $\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\}$ .

Stated differently, this means that adding to $\mathcal{F}$ any pattern which is the join of two patterns already in $\mathcal{F}$ does not add expressive power.

Thus, instead of using, for example, the pattern , one should prefer to use instead the triangle . This result is in line with other advantages of smaller patterns: their homomorphism counts are easier to compute, and, since they are less specific, they should tend to produce less over-fitting.

5.2 Detecting when patterns add expressiveness

Joining patterns into new patterns does not give extra expressive power, but what about patterns which are not joins? We provide next a useful recipe for detecting when a pattern does add expressive power. We recall that the core of a graph $P$ is its unique (up to isomorphism) induced subgraph which is a core.

Theorem 3.

Let $\mathcal{F}$ be a finite set of patterns and let $Q^{r}$ be a pattern whose core has treewidth $k$ . Then, $\mathcal{F}\cup\{Q^{r}\}\text{-}\mathsf{WL}$ is more expressive than $\mathcal{F}\text{-}\mathsf{WL}$ if every pattern $P^{r}\in\mathcal{F}$ satisfies one of the following conditions: (i) $P^{r}$ has treewidth $<k$ ; or (ii) $P^{r}$ does not map homomorphically to $Q^{r}$ .

As an example, $\{K_{3},\ldots,K_{k}\}\text{-}\mathsf{WL}$ is more expressive than $\{K_{3},\ldots,K_{k-1}\}\text{-}\mathsf{WL}$ for any $k>3$ because of the first condition. Similarly, $\{K_{3},\ldots,K_{k},C_{\ell}\}\text{-}\mathsf{WL}$ is more expressive than $\{K_{3},\ldots,K_{k}\}\text{-}\mathsf{WL}$ for odd cycles $C_{\ell}$ . Indeed, such cycles are cores and no clique $K_{k}$ with $k>2$ maps homomorphically to $C_{\ell}$ .

6 Experiments

We next showcase that $\mathsf{GNN}$ architectures benefit when homomorphism counts of patterns are added as additional vertex features. For patterns where homomorphism and subgraph isomorphism counts differ (e.g., cycles) we compare with $\mathsf{GSN}\text{s}$ (Bouritsas et al., 2020). We use the benchmark for $\mathsf{GNN}\text{s}$ by Dwivedi et al. (2020), as it offers a broad choice of models, datasets and graph classification tasks.

Selected $\mathsf{GNN}\text{s}$ .

We select the best architectures from Dwivedi et al. (2020): Graph Attention Networks ( $\mathsf{GAT}$ ) (Velickovic et al., 2018), Graph Convolutional Networks ( $\mathsf{GCN}$ ) (Kipf & Welling, 2017), $\mathsf{GraphSage}$ (Hamilton et al., 2017), Gaussian Mixture Models ( $\mathsf{MoNet}$ ) (Monti et al., 2017) and $\mathsf{GatedGCN}$ (Bresson & Laurent, 2017). We leave out various linear architectures such as $\mathsf{GIN}$ (Xu et al., 2019) as they were shown to perform poorly on the benchmark.

Learning tasks and datasets.

As in Dwivedi et al. (2020) we consider (i) graph regression and the ZINC dataset (Irwin et al., 2012a; Dwivedi et al., 2020); (ii) vertex classification and the PATTERN and CLUSTER datasets (Dwivedi et al., 2020); and (iii) link prediction and the COLLAB dataset (Hu et al., 2020). We omit graph classification: for this task, the graph datasets from Dwivedi et al. (2020) originate from image data and hence vertex neighborhoods carry little information.

Patterns.

We extend the initial features of vertices with homomorphism counts of cycles $C_{\ell}$ of length $\ell\leq 10$ , when molecular data (ZINC) is concerned, and with homomorphism counts of $k$ -cliques $K_{k}$ for $k\leq 5$ , when social or collaboration data (PATTERN, CLUSTER, COLLAB) is concerned. We use the $z$ -score of the logarithms of homomorphism counts to make them standard-normally distributed and comparable to other features. Section 5 tells us that all these patterns will increase expressive power (Theorem 3 and Proposition 4) and are “minimal” in the sense that they are not the join of smaller patterns (Proposition 5). Similar pattern choices were used in Bouritsas et al. (2020). We use DISC (Zhang et al., 2020)⁴⁴4We thank the authors for providing us with an executable., a tool specifically built to get homomorphism counts for large datasets. Each model is trained and tested independently using combinations of patterns.

Higher-order $\mathsf{GNN}\text{s}$ .

We do not compare to higher-order $\mathsf{GNN}\text{s}$ since this was already done by Dwivedi et al. (2020). They included ring- $\mathsf{GNN}\text{s}$ (which outperform $\mathsf{2WL}$ - $\mathsf{GNNs}$ ) and $\mathsf{3WL}$ - $\mathsf{GNNs}$ in their experiments, and these were outperformed by our selected “linear” architectures. Although the increased expressive power of higher-order $\mathsf{GNN}\text{s}$ may be beneficial for learning, scalability and learning issues (e.g., loss divergence) hamper their applicability (Dwivedi et al., 2020). Our approach thus certainly outperforms higher-order $\mathsf{GNN}\text{s}$ with respect to the benchmark.

Methodology.

Graphs were divided between training/test as instructed by Dwivedi et al. (2020), and all numbers reported correspond to the test set. The reported performance is the average over four runs with different random seeds for the respective combinations of patterns in $\mathcal{F}$ , model and dataset. Training times were comparable to the baseline of training models without any augmented features.⁵⁵5Code to reproduce our experiments is available at https://github.com/LGP-GNN-2021/LGP-GNN All models for ZINC, PATTERN and COLLAB were trained on a GeForce GTX 1080 Ti GPU, for CLUSTER a Tesla V100-SXM3-32GB GPU was used.

Next we summarize our results for each learning task separately.

Graph regression.

The first task of the benchmark is the prediction of the solubility of molecules in the ZINC dataset (Irwin et al., 2012a; Dwivedi et al., 2020), a dataset of about $12\,000$ graphs of small size, each of them consisting of one particular molecule. The results in Table 1 show that each of our models indeed improves by adding homomorphism counts of cycles and the best result is obtained by considering all cycles. $\mathsf{GSN}\text{s}$ were applied to the ZINC dataset as well (Bouritsas et al., 2020). In Table 1 we also report results by using subgraph isomorphism counts (as in $\mathsf{GSN}\text{s}$ ): homomorphism counts generally provide better results than subgraph isomorphisms counts. Our best result (framed in Table 1) is competitive to the value of $0.139$ reported in Bouritsas et al. (2020). By looking at the full results, we see that some cycles are much more important than others. Table 2 shows which cycles have greatest impact for the worst-performing baseline, $\mathsf{GAT}$ . Remarkably, adding homomorphism counts makes the $\mathsf{GAT}$ model competitive with the best performers of the benchmark.

Table 1: Results for the ZINC dataset show that homomorphism (hom) counts of cycles improve every model. We compare the mean absolute error (MAE) of each model without any homomorphism count (baseline), against the model augmented with the hom count, and with subgraph isomorphism (iso) counts of

C_{3}

–

C_{10}

Model	MAE (base)	MAE (hom)	MAE (iso)
$\mathsf{GAT}$	0.47 $\pm$ 0.02	0.22 $\pm$ 0.01	0.24 $\pm$ 0.01
$\mathsf{GCN}$	0.35 $\pm$ 0.01	0.20 $\pm$ 0.01	0.22 $\pm$ 0.01
$\mathsf{GraphSage}$	0.44 $\pm$ 0.01	0.24 $\pm$ 0.01	0.24 $\pm$ 0.01
$\mathsf{MoNet}$	0.25 $\pm$ 0.01	0.19 $\pm$ 0.01	0.16 $\pm$ 0.01
$\mathsf{GatedGCN}$	0.34 $\pm$ 0.05	0.1353 $\pm$ 0.01	0.1357 $\pm$ 0.01

Table 2: The effect of different cycles for the

\mathsf{GAT}

model over the ZINC dataset, using mean absolute error.

Set $(\mathcal{F})$	MAE
None	0.47 $\pm$ 0.02
$\{C_{3}\}$	0.45 $\pm$ 0.01
$\{C_{4}\}$	0.34 $\pm$ 0.02
$\{C_{6}\}$	0.31 $\pm$ 0.01
$\{C_{5},C_{6}\}$	0.28 $\pm$ 0.01
$\{C_{3},\ldots,C_{6}\}$	0.23 $\pm$ 0.01
$\{C_{3},\ldots,C_{10}\}$	0.22 $\pm$ 0.01

Vertex classification.

The next task in the benchmark corresponds to vertex classification. Here we analyze two datasets, PATTERN and CLUSTER (Dwivedi et al., 2020), both containing over $12\,000$ artificially generated graphs resembling social networks or communities. The task is to predict whether a vertex belongs to a particular cluster or pattern, and all results are measured using the accuracy of the classifier. Also here, our results show that homomorphism counts, this times of cliques, tend to improve the accuracy of our models. Indeed, for the PATTERN dataset we see an improvement in all models but $\mathsf{GatedGCN}$ (Table 3), and three models are improved in the CLUSTER dataset (reported in the appendix). Once again, the best performer in this task is a model that uses homomorphism counts. We remark that for cliques, homomorphism counts coincide with subgraph isomorphism counts (up to a constant factor) so our extensions behave like $\mathsf{GSN}\text{s}$ .

Table 3: Results for the PATTERN dataset show that homomorphism counts improve all models except

\mathsf{GatedGCN}

. We compare weighted accuracy of each model without any homomorphism count (baseline) against the model augmented with the counts of the set

\mathcal{F}

that showed best performance (best

\mathcal{F}

Model + best $\mathcal{F}$	Accuracy baseline	Accuracy best
$\mathsf{GAT}$ $\{K_{3},K_{4},K_{5}\}$	78.83 $\pm$ 0.60	85.50 $\pm$ 0.23
$\mathsf{GCN}$ $\{K_{3},K_{4},K_{5}\}$	71.42 $\pm$ 1,38	82.49 $\pm$ 0.48
$\mathsf{GraphSage}$ $\{K_{3},K_{4},K_{5}\}$	70.78 $\pm$ 0,19	85,85 $\pm$ 0.15
$\mathsf{MoNet}$ $\{K_{3},K_{4},K_{5}\}$	85.90 $\pm$ 0,03	86.63 $\pm$ 0.03
$\mathsf{GatedGCN}$ $\{\emptyset\}$	86.15 $\pm$ 0.08	86.15 $\pm$ 0.08

Link prediction

In our final task we consider a single graph, COLLAB (Hu et al., 2020), with over $235\,000$ vertices, containing information about the collaborators in an academic network, and the task at hand is to predict future collaboration. The metric used in the benchmark is the Hits@50 evaluator (Hu et al., 2020). Here, positive collaborations are ranked among randomly sampled negative collaborations, and the metric is the ratio of positive edges that are ranked at place 50 or above. Once again, homomorphism counts of cliques improve the performance of all models, see Table 4. An interesting observation is that this time the best set of features (cliques) does depend on the model, although the best model uses all cliques again.

Table 4: All models improve the Hits@50 metric over the COLLAB dataset. We compare each model without any homomorphism count (baseline) against the model augmented with the counts of the set of patterns that showed best performance (best

\mathcal{F}

Model + best $\mathcal{F}$	Hits@50 baseline	Hits@50 best
$\mathsf{GAT}$ $\{K_{3}\}$	50.32 $\pm$ 0.55	52.87 $\pm$ 0.87
$\mathsf{GCN}$ $\{K_{3},K_{4},K_{5}\}$	51.35 $\pm$ 1.30	54.60 $\pm$ 1.01
$\mathsf{GraphSage}$ $\{K_{5}\}$	50.33 $\pm$ 0.68	51.39 $\pm$ 1.23
$\mathsf{MoNet}$ $\{K_{4}\}$	49.81 $\pm$ 1.56	51.76 $\pm$ 1.38
$\mathsf{GatedGCN}$ $\{K_{3}\}$	51.00 $\pm$ 2.54	51.57 $\pm$ 0.68

Remarks.

The best performers in each task use homomorphism counts, in accordance with our theoretical results, showing that such counts do extend the power of $\mathsf{MPNN}\text{s}$ . Homomorphism counts are also cheap to compute. For COLLAB, the largest graph in our experiments, the homomorphism counts of all patterns we used, for all vertices, could be computed by DISC (Zhang et al., 2020) in less than 3 minutes. One important remark is that selecting the best set of features is still a challenging endeavor. Our theoretical results help us streamline this search, but for now it is still an exploratory task. In our experiments we first looked at adding each pattern individually, and then tried with combinations of those that showed the best improvements. This feature selection strategy incurs considerable cost, both computational and environmental, and needs further investigation.

7 Conclusion

We propose local graph parameter enabled $\mathsf{MPNN}\text{s}$ as an efficient way to increase the expressive power of $\mathsf{MPNN}\text{s}$ . The take-away message is that enriching features with homomorphism counts of small patterns is a promising add-on to any $\mathsf{GNN}$ architecture, with little to no overhead. Regarding future work, the problem of which parameters to choose deserves further study. In particular, we plan to provide a complete characterisation of when adding a new pattern to $\mathcal{F}$ adds expressive power to the $\mathcal{F}\text{-}\mathsf{WL}$ -test.

References

Abbe (2018) Abbe, E. Community detection and stochastic block models: Recent developments. Journal of Machine Learning Research, 18(177):1–86, 2018. URL http://jmlr.org/papers/v18/16-480.html.
Abboud et al. (2020) Abboud, R., Ceylan, İ. İ., Grohe, M., and Lukasiewicz, T. The surprising power of graph neural networks with random node initialization. arXiv, 2020. URL https://arxiv.org/abs/2010.01179.
Arvind et al. (2020) Arvind, V., Fuhlbrück, F., Köbler, J., and Verbitsky, O. On Weisfeiler-Leman invariance: Subgraph counts and related graph properties. Journal of Computer and System Sciences, 113:42 – 59, 2020. URL https://doi.org/10.1016/j.jcss.2020.04.003.
Atserias et al. (2007) Atserias, A., Bulatov, A. A., and Dalmau, V. On the power of $k$ -consistency. In Proceedings of the 34th International Colloquium on Automata, Languages and Programming, volume 4596 of Lecture Notes in Computer Science, pp. 279–290, 2007. URL https://doi.org/10.1007/978-3-540-73420-8_26.
Azizian & Lelarge (2021) Azizian, W. and Lelarge, M. Expressive power of invariant and equivariant graph neural networks. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=lxHgXYN4bwl.
Barceló et al. (2020) Barceló, P., Kostylev, E. V., Monet, M., Pérez, J., Reutter, J., and Silva, J. P. The logical expressiveness of graph neural networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=r1lZ7AEKvB.
Belkin & Niyogi (2003) Belkin, M. and Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003. URL https://ieeexplore.ieee.org/document/6789755.
Bouritsas et al. (2020) Bouritsas, G., Frasca, F., Zafeiriou, S., and Bronstein, M. M. Improving graph neural network expressivity via subgraph isomorphism counting. arXiv, 2020. URL https://arxiv.org/abs/2006.09252.
Bova & Chen (2019) Bova, S. and Chen, H. How many variables are needed to express an existential positive query? Theory Comput. Syst., 63(7):1573–1594, 2019. URL https://doi.org/10.1007/s00224-018-9884-z.
Bresson & Laurent (2017) Bresson, X. and Laurent, T. Residual gated graph convnets. arXiv, 2017. URL https://arxiv.org/abs/1711.07553.
Cai et al. (1992) Cai, J., Fürer, M., and Immerman, N. An optimal lower bound on the number of variables for graph identifications. Combinatorica, 12(4):389–410, 1992. URL https://doi.org/10.1007/BF01305232.
Chami et al. (2021) Chami, I., Abu-El-Haija, S., Perozzi, B., Ré, C., and Murphy, K. Machine learning on graphs: A model and comprehensive taxonomy. arXiv, 2021. URL https://arxiv.org/abs/2005.03675.
Chen et al. (2019a) Chen, T., Bian, S., and Sun, Y. Are powerful graph neural nets necessary? A dissection on graph classification. arXiv, 2019a. URL https://arxiv.org/abs/1905.04579.
Chen et al. (2019b) Chen, Z., Villar, S., Chen, L., and Bruna, J. On the equivalence between graph isomorphism testing and function approximation with GNNs. In Advances in Neural Information Processing Systems, volume 32, pp. 15894–15902, 2019b. URL https://proceedings.neurips.cc/paper/2019/file/71ee911dd06428a96c143a0b135041a4-Paper.pdf.
Chen et al. (2020) Chen, Z., Chen, L., Villar, S., and Bruna, J. Can graph neural networks count substructures? In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://arxiv.org/abs/2002.04025.
Curticapean et al. (2017) Curticapean, R., Dell, H., and Marx, D. Homomorphisms are a good basis for counting small subgraphs. In Proceedings of the 49th Symposium on Theory of Computing, pp. 210––223, 2017. URL http://dx.doi.org/10.1145/3055399.3055502.
Damke et al. (2020) Damke, C., Melnikov, V., and Hüllermeier, E. A novel higher-order Weisfeiler-Lehman graph convolution. arXiv, 2020. URL https://arxiv.org/abs/2007.00346.
Dell et al. (2018) Dell, H., Grohe, M., and Rattan, G. Lovász meets Weisfeiler and Leman. In Proceedings of the 45th International Colloquium on Automata, Languages, and Programming, volume 107, pp. 40:1–40:14, 2018. URL https://doi.org/10.4230/LIPIcs.ICALP.2018.40.
Dvorak (2010) Dvorak, Z. On recognizing graphs by numbers of homomorphisms. Journal of Graph Theory, 64(4):330–342, 2010. URL https://doi.org/10.1002/jgt.20461.
Dwivedi et al. (2020) Dwivedi, V. P., Joshi, C. K., Laurent, T., Bengio, Y., and Bresson, X. Benchmarking graph neural networks. arXiv, 2020. URL https://arxiv.org/abs/2003.00982.
Fürer (2017) Fürer, M. On the combinatorial power of the Weisfeiler-Lehman algorithm. In Proceedings of the 10th International Conference on Algorithms and Complexity, volume 10236 of Lecture Notes in Computer Science, pp. 260–271, 2017. URL https://doi.org/10.1007/978-3-319-57586-5_22.
Garg et al. (2020) Garg, V. K., Jegelka, S., and Jaakkola, T. S. Generalization and representational limits of graph neural networks. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pp. 3419–3430, 2020. URL http://proceedings.mlr.press/v119/garg20c.html.
Geerts (2020) Geerts, F. The expressive power of $k$ th-order invariant graph networks. arXiv, 2020. URL https://arxiv.org/abs/2007.12035.
Geerts et al. (2020) Geerts, F., Mazowiecki, F., and Pérez, G. A. Let’s agree to degree: Comparing graph convolutional networks in the message-passing framework. arXiv, 2020. URL https://arxiv.org/abs/2004.02593.
Gilmer et al. (2017) Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pp. 1263–1272, 2017. URL {http://proceedings.mlr.press/v70/gilmer17a/gilmer17a.pdf}.
Grohe (2017) Grohe, M. Descriptive Complexity, Canonisation, and Definable Graph Structure Theory. Lecture Notes in Logic. Cambridge University Press, 2017. URL https://doi.org/10.1017/9781139028868.
Grohe (2020a) Grohe, M. word2vec, node2vec, graph2vec, x2vec: Towards a theory of vector embeddings of structured data. In Proceedings of the 39th Symposium on Principles of Database Systems, pp. 1–16, 2020a. URL https://doi.org/10.1145/3375395.3387641.
Grohe (2020b) Grohe, M. Counting bounded tree depth homomorphisms. In Proceedings of the 35th Symposium on Logic in Computer Science, pp. 507–520, 2020b. URL https://doi.org/10.1145/3373718.3394739.
Hamilton et al. (2017) Hamilton, W. L., Ying, Z., and Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, volume 30, pp. 1024–1034, 2017.
Hella (1996) Hella, L. Logical hierarchies in PTIME. Inf. Comput., 129(1):1–19, 1996. URL https://doi.org/10.1006/inco.1996.0070.
Hu et al. (2020) Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., and Leskovec, J. Open graph benchmark: Datasets for machine learning on graphs. In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://arxiv.org/abs/2005.00687.
Irwin et al. (2012a) Irwin, J. J., Sterling, T., Mysinger, M. M., Bolstad, E. S., and Coleman, R. G. Zinc: A free tool to discover chemistry for biology. Journal of Chemical Information and Modeling, 52(7):1757–1768, 2012a. URL https://doi.org/10.1021/ci3001277.
Irwin et al. (2012b) Irwin, J. J., Sterling, T., Mysinger, M. M., Bolstad, E. S., and Coleman, R. G. Zinc: A free tool to discover chemistry for biology. Journal of Chemical Information and Modeling, 52(7):1757–1768, 2012b. URL https://doi.org/10.1021/ci3001277.
Ishiguro et al. (2020) Ishiguro, K., Oono, K., and Hayashi, K. Weisfeiler-Lehman embedding for molecular graph neural networks. arXiv, 2020. URL https://arxiv.org/abs/2006.06909.
Keriven & Peyré (2019) Keriven, N. and Peyré, G. Universal invariant and equivariant graph neural networks. In Advances in Neural Information Processing Systems, volume 32, pp. 7092–7101, 2019. URL https://proceedings.neurips.cc/paper/2019/file/ea9268cb43f55d1d12380fb6ea5bf572-Paper.pdf.
Kipf & Welling (2017) Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=SJU4ayYgl.
Lacoste et al. (2019) Lacoste, A., Luccioni, A., Schmidt, V., and Dandres, T. Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700, 2019.
Li et al. (2019) Li, M. L., Dong, M., Zhou, J., and Rush, A. M. A hierarchy of graph neural networks based on learnable local features. arXiv, 2019. URL https://arxiv.org/abs/1911.05256.
Loukas (2020) Loukas, A. What graph neural networks cannot learn: depth vs width. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=B1l2bp4YwS.
Lovász (1967) Lovász, L. Operations with structures. Acta Mathematica, 18:321–328, 1967. URL https://doi.org/10.1007/BF02280291.
Maron et al. (2019a) Maron, H., Ben-Hamu, H., Serviansky, H., and Lipman, Y. Provably powerful graph networks. In Advances in Neural Information Processing Systems, volume 32, pp. 2153–2164, 2019a. URL http://papers.nips.cc/paper/8488-provably-powerful-graph-networks.
Maron et al. (2019b) Maron, H., Ben-Hamu, H., Shamir, N., and Lipman, Y. Invariant and equivariant graph networks. In International Conference on Learning Representations, 2019b. URL https://openreview.net/forum?id=Syx72jC9tm.
Maron et al. (2019c) Maron, H., Fetaya, E., Segol, N., and Lipman, Y. On the universality of invariant networks. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pp. 4363–4371, 2019c. URL http://proceedings.mlr.press/v97/maron19a.html.
Merkwirth & Lengauer (2005) Merkwirth, C. and Lengauer, T. Automatic generation of complementary descriptors with molecular graph networks. J. Chem. Inf. Model., 45(5):1159–1168, 2005. URL https://pubs.acs.org/doi/pdf/10.1021/ci049613b.
Monti et al. (2017) Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., and Bronstein, M. M. Geometric deep learning on graphs and manifolds using mixture model CNNs. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5115–5124, 2017. URL https://openaccess.thecvf.com/content_cvpr_2017/papers/Monti_Geometric_Deep_Learning_CVPR_2017_paper.pdf.
Morris et al. (2019) Morris, C., Ritzert, M., Fey, M., Hamilton, W. L., Lenssen, J. E., Rattan, G., and Grohe, M. Weisfeiler and Leman go neural: Higher-order graph neural networks. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp. 4602–4609, 2019. URL https://doi.org/10.1609/aaai.v33i01.33014602.
Morris et al. (2020) Morris, C., Rattan, G., and Mutzel, P. Weisfeiler and leman go sparse: Towards scalable higher-order graph embeddings. In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://arxiv.org/abs/1904.01543.
NT & Maehara (2020) NT, H. and Maehara, T. Graph homomorphism convolution. arXiv, 2020. URL https://arxiv.org/abs/2005.01214.
Otto (2017) Otto, M. Bounded Variable Logics and Counting: A Study in Finite Models, volume 9 of Lecture Notes in Logic. Cambridge University Press, 2017. URL https://doi.org/10.1017/9781316716878.
Sato (2020) Sato, R. A survey on the expressive power of graph neural networks. arXiv, 2020. URL https://arxiv.org/abs/2003.04078.
Sato et al. (2019) Sato, R., Yamada, M., and Kashima, H. Approximation ratios of graph neural networks for combinatorial problems. In Advances in Neural Information Processing Systems, volume 32, pp. 4083–4092, 2019. URL https://papers.nips.cc/paper/2019/file/635440afdfc39fe37995fed127d7df4f-Paper.pdf.
Sato et al. (2020) Sato, R., Yamada, M., and Kashima, H. Random features strengthen graph neural networks. arXiv, 2020. URL https://arxiv.org/abs/2002.03155.
Scarselli et al. (2009) Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. The graph neural network model. IEEE Trans. Neural Networks, 20(1):61–80, 2009. URL https://doi.org/10.1109/TNN.2008.2005605.
Shervashidze et al. (2009) Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., and Borgwardt, K. Efficient graphlet kernels for large graph comparison. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, volume 5, pp. 488–495, 2009. URL http://proceedings.mlr.press/v5/shervashidze09a.html.
Tahmasebi & Jegelka (2020) Tahmasebi, B. and Jegelka, S. Counting substructures with higher-order graph neural networks: Possibility and impossibility results. arXiv, 2020. URL https://arxiv.org/abs/2012.03174.
Velickovic et al. (2018) Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. Graph attention networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJXMpikCZ.
Vignac et al. (2020) Vignac, C., Loukas, A., and Frossard, P. Building powerful and equivariant graph neural networks with structural message-passing. In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/a32d7eeaae19821fd9ce317f3ce952a7-Abstract.html.
Weisfeiler & Lehman (1968) Weisfeiler, B. J. and Lehman, A. A. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsiya, 2(9):12–16, 1968. https://www.iti.zcu.cz/wl2018/pdf/wl_paper_translation.pdf.
Wu et al. (2021) Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. S. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32(1):4–24, 2021. URL https://doi.org/10.1109/TNNLS.2020.2978386.
Xu et al. (2019) Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerful are graph neural networks? In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
Zhang et al. (2020) Zhang, H., Yu, J. X., Zhang, Y., Zhao, K., and Cheng, H. Distributed subgraph counting: A general approach. Proc. VLDB Endow., 13(11):2493–2507, 2020. URL http://www.vldb.org/pvldb/vol13/p2493-zhang.pdf.
Zhou et al. (2018) Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., and Sun, M. Graph neural networks: A review of methods and applications. ArXiv, 2018. URL http://arxiv.org/abs/1812.08434.

Appendix

Appendix A Proofs of Section 3

We use the following notions. Let $G$ and $H$ be graphs, $v\in V_{G}$ , $w\in V_{H}$ , and $d\geq 0$ . The vertices $v$ and $w$ are said to be indistinguishably by $\mathcal{F}\text{-}\mathsf{WL}$ in round $d$ , denoted by $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)$ , iff $\chi_{\mathcal{F},G,v}^{(d)}=\chi_{\mathcal{F},H,w}^{(d)}$ . Similarly, $G$ and $H$ are said to be indistinguishable by $\mathcal{F}\text{-}\mathsf{WL}$ in round $d$ , denoted by $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H$ , iff $\{\!\!\{\chi_{\mathcal{F},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F},H,w}^{(d)}\mid w\in V_{H}\}\!\!\}$ . Along the same lines, $v$ and $w$ are indistinguishable by an $\mathcal{F}$ - $\mathsf{MPNN}$ $M$ , denoted by $(G,v)\equiv_{M,\mathcal{F}}^{(d)}(H,w)$ , iff $\mathbf{x}_{M,\mathcal{F},G,v}^{(d)}=\mathbf{x}_{M,\mathcal{F},H,w}^{(d)}$ . Similarly, $G$ and $H$ are said to be indistinguishable by $M$ in round $d$ , denoted by $G\equiv_{M,\mathcal{F}}^{(d)}H$ , iff $\{\!\!\{\mathbf{x}_{M,\mathcal{F},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\mathbf{x}_{M,\mathcal{F},H,w}^{(d)}\mid w\in V_{H}\}\!\!\}$ .

A.1 Proof of Proposition 1

We show that the class of $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ is upper bounded in expressive power by $\mathcal{F}\text{-}\mathsf{WL}$ . The proof is analogous to the proof in Morris et al. (2019) showing that $\mathsf{MPNN}\text{s}$ are bounded by $\mathsf{WL}$ .

We show a stronger result by upper bounding $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ by $\mathcal{F}\text{-}\mathsf{WL}$ -test, layer by layer. More precisely, we show that for every $\mathcal{F}$ - $\mathsf{MPNN}$ $M$ , graphs $G$ and $H$ , vertices $v\in V_{G}$ , $w\in V_{H}$ , and $d\geq 0$ ,

(1)

$(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)\ \ \Longrightarrow\ \ (G,v)\equiv_{M,\mathcal{F}}^{(d)}(H,w)$ ; and
(2)

$G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H\ \ \Longrightarrow\ \ G\equiv_{M,\mathcal{F}}^{(d)}H$ .

Clearly, these imply that $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ are bounded in expressive power by $\mathcal{F}\text{-}\mathsf{WL}$ , both when vertex and graph distinguishability are concerned.

Proof of implication (1). We show this implication by induction on the number of rounds.

Base case. We first assume $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(0)}(H,w)$ . In other words, $\chi_{\mathcal{F},G,v}^{(0)}=\chi_{\mathcal{F},H,w}^{(0)}$ and thus, $\chi_{G}(v)=\chi_{H}(w)$ and for every $P^{r}\in\mathcal{F}$ we have $\mathsf{hom}(P^{r},G^{v})=\mathsf{hom}(P^{r},H^{w})$ . By definition, $\mathbf{x}^{(0)}_{M,\mathcal{F},G,v}$ is a hot-one encoding of $\chi_{G}(v)$ combined with $\mathsf{hom}(P^{r},G^{v})$ for $P^{r}\in\mathcal{F}$ , for every $\mathsf{MPNN}$ $M$ , graph $G$ and vertex $v\in V_{G}$ . Since these agree with the labelling and homomorphism counts for vertex $w\in V_{H}$ in graph $H$ , we also have that $\mathbf{x}^{(0)}_{M,\mathcal{F},G,v}=\mathbf{x}^{(0)}_{M,\mathcal{F},H,w}$ , as desired.

Inductive step. We next assume $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)$ . By the definition of $\mathcal{F}\text{-}\mathsf{WL}$ this is equivalent to $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,w)$ and $\{\!\!\{\chi_{\mathcal{F},G,v^{\prime}}^{(d-1)}\mid v^{\prime}\in N_{G}(v)\}\!\!\}=\{\!\!\{\chi_{\mathcal{F},H,w^{\prime}}^{(d-1)}\mid w^{\prime}\in N_{H}(w)\}\!\!\}$ . By the induction hypothesis, this implies $(G,v)\equiv_{M,\mathcal{F}}^{(d-1)}(H,w)$ and there exists a bijection $\beta:N_{G}(v)\to N_{H}(w)$ such that $(G,v^{\prime})\equiv_{M,\mathcal{F}}^{(d-1)}(H,\beta(v^{\prime}))$ for every $v^{\prime}\in N_{G}(v)$ , and every $\mathcal{F}$ - $\mathsf{MPNN}$ $M$ . In other words, $\mathbf{x}^{(d-1)}_{M,\mathcal{F},G,v}=\mathbf{x}^{(d-1)}_{M,\mathcal{F},H,w}$ and $\mathbf{x}^{(d-1)}_{M,\mathcal{F},G,v^{\prime}}=\mathbf{x}^{(d-1)}_{M,\mathcal{F},H,\beta(v^{\prime})}$ for every $v^{\prime}\in N_{G}(v)$ . By the definition of $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ this implies that $\textsc{Comb}^{(d)}\bigl{(}\{\!\!\{\mathbf{x}^{(d-1)}_{M,\mathcal{F},G,v^{\prime}}\mid v^{\prime}\in N_{G}(v)\}\!\!\}\bigr{)}$ is equal to $\textsc{Comb}^{(d)}\bigl{(}\{\!\!\{\mathbf{x}^{(d-1)}_{M,\mathcal{F},H,w^{\prime}}\mid w^{\prime}\in N_{G}(w)\}\!\!\}\bigr{)}$ and hence also, after applying $\textsc{Upd}^{(d)}$ , $\mathbf{x}^{(d)}_{M,\mathcal{F},G,v}=\mathbf{x}^{(d)}_{M,\mathcal{F},H,w}$ . That is, $(G,u)\equiv_{M,\mathcal{F}}^{(d)}(H,w)$ , as desired.

Proof of implication (2). The implication $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H\ \ \Longrightarrow\ \ G\equiv_{M,\mathcal{F}}^{(d)}H$ now easily follows. Indeed, $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H$ is equivalent to $\{\!\!\{\chi_{\mathcal{F},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F},H,w}^{(d)}\mid w\in V_{H}\}\!\!\}$ . In other words, there exists a bijection $\alpha:V_{G}\to V_{H}$ such that $\chi_{\mathcal{F},G,v}^{(d)}=\chi_{\mathcal{F},H,\alpha(v)}^{(d)}$ for every $v\in V_{G}$ . We have just shown that this implies $\mathbf{x}^{(d)}_{M,\mathcal{F},G,v}=\mathbf{x}^{(d)}_{M,\mathcal{F},H,\alpha(v)}$ for every $v\in V_{G}$ and for every $\mathcal{F}$ - $\mathsf{MPNN}$ $M$ . Hence, $\{\!\!\{\mathbf{x}^{(d)}_{M,\mathcal{F},G,v}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\mathbf{x}^{(d)}_{M,\mathcal{F},H,w}\mid w\in V_{H}\}\!\!\}$ , or $G\equiv_{M,\mathcal{F}}^{(d)}H$ , as desired. ∎

A.2 Proof of Theorem 1

We show that for any finite collection $\mathcal{F}$ of patterns, graphs $G$ and $H$ , vertices $v\in V_{G}$ and $w\in V_{H}$ , and $d\geq 0$ :

(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)\ \Longleftrightarrow\ \mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w}),

(1)

for every $\mathcal{F}$ -pattern tree $T^{r}$ of depth at most $d$ . Similarly,

G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H\ \Longleftrightarrow\ \mathsf{hom}(T,G)=\mathsf{hom}(T,H),

(2)

for every (unrooted) $\mathcal{F}$ -pattern tree of depth at most $d$ .

For a given set $\mathcal{F}=\{P_{1}^{r},\ldots,P_{\ell}^{r}\}$ of patterns and $\mathbf{s}=(s_{1},\ldots,s_{\ell})\in\mathbb{N}^{\ell}$ , we denote by $\mathcal{F}^{\mathbf{s}}$ the graph pattern of the form $(P_{1}^{s_{1}}\star\cdots\star P_{\ell}^{s_{\ell}})^{r}$ , that is, we join $s_{1}$ copies of $P_{1}$ , $s_{2}$ copies of $P_{2}$ and so on.

Proof of equivalence (1). The proof is by induction on the number of rounds $d$ .

$\Longrightarrow$ We first consider the implication $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)\Longrightarrow\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w})$ for every $\mathcal{F}$ -pattern tree $T^{r}$ of depth at most $d$ .

Base case. Let us first consider the base case, that is, $d=0$ . In other words, we consider $\mathcal{F}$ -pattern trees $T^{r}$ consisting of a single root $r$ adorned with a pattern $\mathcal{F}^{\mathsf{s}}$ for some $\mathbf{s}=(s_{1},\ldots,s_{\ell})\in\mathbb{N}^{\ell}$ . We note that due to the properties of the graph join operator:

\mathsf{hom}(T^{r},G^{v})=\prod_{i=1}^{\ell}\bigl{(}\mathsf{hom}(P_{i}^{r},G^{v})\bigr{)}^{s_{i}}.

(3)

Since, $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(0)}(H,w)$ , we know that $\chi_{G}(v)=\chi_{H}(w)=a$ for some $a\in\Sigma$ and $\mathsf{hom}(P_{i}^{r},G^{v})=\mathsf{hom}(P_{i}^{r},H^{w})$ for all $P_{i}^{r}\in\mathcal{F}$ . This implies that the product in (3) is equal to

\prod_{i=1}^{\ell}\bigl{(}\mathsf{hom}(P_{i}^{r},H^{w})\bigr{)}^{s_{i}}=\mathsf{hom}(T^{r},H^{w}),

as desired.

Inductive step. Suppose next that we know that the implication holds for $d-1$ . We assume now $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)$ and consider an $\mathcal{F}$ -pattern tree $T^{r}$ of depth at most $d$ . Assume that in the backbone of $T^{r}$ , the root $r$ has $m$ children $c_{1},\ldots,c_{m}$ , and denote by $T_{1}^{c_{1}},\ldots,T_{m}^{c_{\ell}}$ the $\mathcal{F}$ -pattern trees in $T^{r}$ rooted at $c_{i}$ . Furthermore, we denote by $T_{i}^{(r,c_{i})}$ the $\mathcal{F}$ -pattern tree obtained from $T_{i}^{c_{i}}$ by attaching $r$ to $c_{i}$ ; $T_{i}^{(r,c_{i})}$ has root $r$ . Let $\mathcal{F}^{\mathbf{s}}$ be the pattern in $T^{r}$ associated with $r$ . The following equalities are readily verified:

\displaystyle\mathsf{hom}(T^{r},G^{v})

\displaystyle=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},G^{v})\prod_{i=1}^{m}\mathsf{hom}(T_{i}^{(r,c_{i})},G^{v})=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},G^{v})\prod_{i=1}^{m}\Bigl{(}\sum_{v^{\prime}\in N_{G}(v)}\mathsf{hom}(T_{i}^{c_{i}},G^{v^{\prime}})\Bigr{)}.

(4)

Recall now that we assume $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)$ and thus, in particular, $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(0)}(H,w)$ . Hence, by induction, $\mathsf{hom}(S^{r},G^{v})=\mathsf{hom}(S^{r},H^{w})$ for every $\mathcal{F}$ -pattern tree $S^{r}$ of depth $0$ . In particular, this holds for $S^{r}=\mathcal{F}^{\mathbf{s}}$ and hence

\mathsf{hom}(\mathcal{F}^{\mathbf{s}},G^{v})=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},H^{w}).

Furthermore, $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)$ implies that there exists a bijection $\beta:N_{G}(v)\to N_{H}(w)$ such that $(G,v^{\prime})\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,\beta(v^{\prime}))$ for every $v^{\prime}\in N_{G}(v)$ . By induction, for every $v^{\prime}\in N_{G}(v)$ there thus exists a unique $w^{\prime}\in N_{H}(w)$ such that $\mathsf{hom}(S^{r},G^{v^{\prime}})=\mathsf{hom}(S^{r},H^{w^{\prime}})$ for every $\mathcal{F}$ -pattern tree $S^{r}$ of depth at most $d-1$ . In particular, for every $v^{\prime}\in N_{G}(v)$ there exists a $w^{\prime}\in N_{H}(w)$ such that

\mathsf{hom}(T_{i}^{c_{i}},G^{v^{\prime}})=\mathsf{hom}(T_{i}^{c_{i}},H^{w^{\prime}}),

for each of the sub-trees $T_{i}^{c_{i}}$ in $T^{r}$ . Hence, (4) is equal to

\mathsf{hom}(\mathcal{F}^{\mathbf{s}},H^{w})\prod_{i=1}^{m}\Bigl{(}\sum_{w^{\prime}\in N_{H}(w)}\mathsf{hom}(T_{i}^{c_{i}},H^{w^{\prime}})\Bigr{)},

which in turn is equal to $\mathsf{hom}(T^{r},H^{w})$ , as desired.

$\Longleftarrow$ We next consider the other direction, that is, we show that when $\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(S^{r},H^{w})$ holds for every $\mathcal{F}$ -pattern tree $T^{r}$ of depth at most $d$ , then $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)$ holds. This is again verified by induction on $d$ . This direction is more complicated and is similar to techniques used in Grohe (2020b). In our induction hypothesis we further include that a finite number of $\mathcal{F}$ -pattern trees suffices to infer $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)$ for graphs $G$ and $H$ and vertices $v\in V_{G}$ and $w\in V_{H}$ .

Base case. Let us consider the base case $d=0$ first. We need to show that $\chi_{G}(v)=\chi_{H}(w)$ and $\mathsf{hom}(P_{i}^{r},G^{v})=\mathsf{hom}(P_{i}^{r},H^{w})$ for every $P_{i}^{r}\in\mathcal{F}$ , since this implies $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(0)}(H,w)$ .

We first observe that $\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w})$ for every $\mathcal{F}$ -pattern tree $T^{r}$ of depth $0$ , implies that $v$ and $w$ must be assigned the same label, say $a$ , by $\chi_{G}$ and $\chi_{H}$ , respectively.

Indeed, if we take $T^{r}$ to consist of a single root $r$ labeled with $a$ (and thus $r$ is associated with the pattern $\mathcal{F}^{\mathbf{0}}$ ), then $\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w})$ will be one if $\chi_{G}(v)=\chi_{H}(w)=a$ and zero otherwise. This implies that $\chi_{G}(v)=\chi_{H}(w)=a$ .

Next, we show that $\mathsf{hom}(P_{i}^{r},G^{v})=\mathsf{hom}(P_{i}^{r},H^{w})$ for every $P_{i}^{r}\in\mathcal{F}$ . It suffices to consider the $\mathcal{F}$ -pattern tree $T^{r}_{i}$ consisting of a root $r$ joined with a single copy of $P_{i}^{r}$ .

We observe that we only need a finite number of $\mathcal{F}$ -pattern trees to infer $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(0)}(H,w)$ . Indeed, suppose that $\chi_{G}$ and $\chi_{H}$ assign labels $a_{1},\ldots,a_{L}$ , then we need $L$ single vertex trees with no patterns attached and root labeled with one of these labels. In addition, we need one $\mathcal{F}$ -pattern tree for each pattern $P_{i}^{r}\in\mathcal{F}$ and each label $a_{1},\ldots,a_{L}$ . That is, we need $L(\ell+1)$ $\mathcal{F}$ -pattern trees of depth $0$ .

Inductive step. We now assume that the implication holds for $d-1$ and consider $d$ . That is, we assume that if $\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w})$ holds for every $\mathcal{F}$ -pattern tree $T^{r}$ of depth at most $d-1$ , then $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,w)$ holds. Furthermore, we assume that only a finite number $K$ of $\mathcal{F}$ -pattern trees $S_{1}^{r},\ldots,S_{K}^{r}$ of depth at most $d-1$ suffice to infer $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,w)$ .

So, for $d$ , let us assume that $\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w})$ holds for every $\mathcal{F}$ -pattern tree of depth at most $d$ . We need to show $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)$ and that we can again assume that a finite number of $\mathcal{F}$ -pattern trees of depth at most $d$ suffice to infer $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)$ .

By definition of $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}(H,w)$ , we can, equivalently, show that $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,w)$ and that there exists a bijection $\beta:N_{G}(v)\to N_{H}(w)$ such that $(G,v^{\prime})\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,\beta(v^{\prime}))$ for every $v^{\prime}\in N_{G}(v)$ . That $(G,v)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(H,w)$ holds, is by induction, since $\mathsf{hom}(T^{r},G^{v})=\mathsf{hom}(T^{r},H^{w})$ for every $\mathcal{F}$ -pattern tree of depth at most $d$ and thus also for every $\mathcal{F}$ -pattern tree of depth at most $d-1$ . We may thus focus on showing the existence of the bijection $\beta$ .

Let $X,Y\in\{G,H\}$ , $x\in V_{X}$ and $y\in V_{Y}$ . We know, by induction and the proof of the previous implication, that $(X,x)\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}(Y,y)$ if and only if $\mathsf{hom}(S_{i}^{r},X^{x})=\mathsf{hom}(S_{i}^{r},Y^{y})$ for each $i\in K$ . Denote by $R_{1},\ldots,R_{e}$ the equivalence class on $V_{X}\cup V_{Y}$ induced by $\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d-1)}$ . Furthermore, define $N_{j,X}(x):=N_{X}(x)\cap R_{j}$ and let $n_{j}=|N_{j,G}(v)|$ and $m_{j}=|N_{j,H}(w)|$ for $v\in V_{G}$ and $w\in V_{H}$ , for each $j\in[e]$ . If we can show that $n_{j}=m_{j}$ for each $j\in[e]$ , then this implies the existence of the desired bijection.

Let $T_{i}^{r=a}$ be the $\mathcal{F}$ -pattern tree of depth at most $d$ obtained by attaching $S_{i}^{r}$ to a new root vertex $r$ labeled with $a$ . We may assume that $v$ and $w$ both have label $a$ , since their homomorphism counts for the single root trees with labels from $\Sigma$ . The root vertex $r$ is not joined with any $\mathcal{F}^{\mathbf{s}}$ (or alternatively it is joined with $\mathcal{F}^{\mathbf{0}}$ ). It will be convenient to denote the root of $S_{i}^{r}$ by $r_{i}$ instead of $r$ . Then for each $i\in[K]$ :

	$\displaystyle\mathsf{hom}(T_{i}^{r=a},G^{v})$	$\displaystyle=\sum_{v^{\prime}\in N_{G}(v)}\mathsf{hom}(S_{i}^{r_{i}},G^{v^{\prime}})=\sum_{j\in[e]}n_{j}\mathsf{hom}(S_{i}^{r_{i}},G^{v^{\prime}_{j}})$
		$\displaystyle=\sum_{j\in[e]}m_{j}\mathsf{hom}(S_{i}^{r_{i}},H^{w_{j}^{\prime}})=\sum_{w^{\prime}\in N_{H}(w)}\mathsf{hom}(S_{i}^{r_{i}},H^{w^{\prime}})=\mathsf{hom}(T_{i}^{r=a},H^{w}),$

where $v_{j}^{\prime}$ and $w_{j}^{\prime}$ denote arbitrary vertices in $N_{j,G}(v)$ and $N_{j,H}(w)$ , respectively. Let us denote $\mathsf{hom}(S_{i}^{r_{i}},G^{v_{j}^{\prime}})$ by $a_{ij}$ and observe that this is equal to $\mathsf{hom}(S_{i}^{r_{i}},H^{w_{j}^{\prime}})$ . Hence, we know that for each $i\in[K]$ :

\sum_{j\in[e]}a_{ij}n_{j}=\sum_{j\in[e]}a_{ij}m_{j}.

Let us call a set $I\subseteq[K]$ compatible if all roots in $S_{i}^{r_{i}}$ , for $i\in I$ , have the same label. Consider a vector $\mathbf{s}=(s_{1},\ldots,s_{K})\in\mathbb{N}^{K}$ and define its support as $\mathsf{supp}(\mathbf{s}):=\{i\in[K]\mid s_{i}\neq 0\}$ . We say that $\mathbf{s}$ is compatible if its support is. For such a compatible $\mathbf{s}$ we now define $T^{r=a,\mathbf{s}}$ to be the $\mathcal{F}$ -pattern tree with root $r$ labeled with $a$ , with one child $c$ which is joined with (and inheriting the label from) the following $\mathcal{F}$ -pattern tree of depth $d-1$ :

\bigstar_{i\in\mathsf{supp}(\mathbf{s})}S_{i}^{s_{i}}.

In other words, we simply join together powers of the $S_{i}^{r_{i}}$ ’s that have roots with the same label. Then for every compatible $\mathbf{s}\in\mathbb{N}^{K}$ :

	$\displaystyle\mathsf{hom}(T^{r=a,\mathbf{s}},G^{v})$	$\displaystyle=\sum_{v^{\prime}\in N_{G}(v)}\prod_{i\in[K]}\bigl{(}\mathsf{hom}(S_{i}^{r_{i}},G^{v^{\prime}})\bigr{)}^{s_{i}}=\sum_{j\in[e]}n_{j}\prod_{i\in[K]}\bigl{(}\mathsf{hom}(S_{i}^{r_{i}},G^{v_{j}^{\prime}})\bigr{)}^{s_{i}}$
		$\displaystyle=\sum_{j\in[e]}m_{j}\prod_{i\in[K]}\bigl{(}\mathsf{hom}(S_{i}^{r_{i}},H^{w_{j}^{\prime}})\bigr{)}^{s_{i}}=\sum_{w^{\prime}\in N_{H}(w)}\prod_{i\in[K]}\bigl{(}\mathsf{hom}(S_{i}^{r_{i}},H^{w^{\prime}})\bigr{)}^{s_{i}}$
		$\displaystyle=\mathsf{hom}(T_{i}^{r=a,\mathbf{s}},H^{w}),$

where, as before, $v_{j}^{\prime}$ and $w_{j}^{\prime}$ denote arbitrary vertices in $N_{j,G}(v)$ and $N_{j,H}(w)$ , respectively. Hence, for any compatible $\mathbf{s}\in\mathbb{N}^{K}$ :

\sum_{j\in[e]}n_{j}\prod_{i\in[K]}a_{ij}^{s_{i}}=\sum_{j\in[e]}m_{j}\prod_{i\in[K]}a_{ij}^{s_{i}}.

We now continue in the same way as in the proof of Lemma 4.2 in Grohe (2020b). We repeat the argument here for completeness. Define $\mathbf{a}_{j}^{\mathbf{s}}:=\prod_{i=1}^{K}a_{ij}^{s_{i}}$ for each $j\in[e]$ . We assume, for the sake of contradiction, that there exists a $j\in[e]$ such that $n_{j}\neq m_{j}$ . We choose such a $j_{0}\in[e]$ for which $S=\mathsf{supp}(\mathbf{a}_{j_{0}})$ is inclusion-wise maximal.

We first rule out that $S=\emptyset$ . Indeed, suppose that $S=\emptyset$ . This implies that $\mathbf{a}_{j_{0}}=\mathbf{0}$ . Now observe that $\mathbf{a}_{j}$ and $\mathbf{a}_{j^{\prime}}$ are mutually distinct for all $j,j^{\prime}\in[e]$ , $j\neq j^{\prime}$ . Indeed, if they were equal then this would imply that $R_{j}=R_{j^{\prime}}$ . Hence, $\mathsf{supp}(\mathbf{a}_{j})\neq\emptyset$ for any $j\neq j_{0}$ . We note that $n_{j}=m_{j}$ for all $j\neq j_{0}$ by the maximality of $S$ . Hence, $n_{j_{0}}=n-\sum_{j\neq j_{0}}n_{j}=n-\sum_{j\neq{j_{0}}}m_{j}=m_{j_{0}}$ , contradicting our assumption. Hence, $S\neq\emptyset$ .

Consider $J:=\{j\in[e]\mid\mathsf{supp}(\mathbf{a}_{j})=S\}$ . For each $j\in J$ , consider the truncated vector $\hat{\mathbf{a}}_{j}:=(a_{ij}\mid i\in S)$ . We note that $\hat{\mathbf{a}}_{j}$ , for $j\in J$ , all have positive entries and are mutually distinct. Lemma 4.1 in Grohe (2020b) implies that we can find a vector (with non-zero entries) $\hat{\mathbf{s}}=(\hat{s}_{i}\mid i\in S)$ such that the numbers $\hat{\mathbf{a}}_{j}^{\hat{\mathbf{s}}}$ for $j\in J$ are mutually distinct as well. We next consider $\mathbf{s}=(s_{1},\ldots,s_{K})$ with $s_{i}=\hat{s}_{i}$ if $i\in S$ and $s_{i}=0$ otherwise. Then by definition of $\hat{\mathbf{s}}$ , also $\mathbf{a}_{j}^{\mathbf{s}}$ for $j\in J$ are mutually distinct.

We next note that for every $p\in\mathbb{N}$ , $\mathbf{a}_{j}^{p\mathbf{s}}=(\mathbf{a}_{j}^{\mathbf{s}})^{p}$ and if we define $\mathbf{A}$ to be the $|J|\times|J|$ -matrix such that $A_{jj^{\prime}}:=\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}$ then this will be an invertible matrix (Vandermonde). We use this invertibility to show that $n_{j_{0}}=m_{j_{0}}$ .

Let $\mathbf{n}_{J}:=(n_{j}\mid j\in J)$ and $\mathbf{m}_{J}=(m_{j}\mid j\in J)$ . If we inspect the $j^{\prime}$ th entry of $\mathbf{n}_{J}\cdot\mathbf{A}$ , then this is equal to

\displaystyle\sum_{j\in J}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=\sum_{j\in[e]}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}-\sum_{\begin{subarray}{c}j\in[e]\\ S\not\subseteq\mathsf{supp}(\mathbf{a}_{j})\end{subarray}}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}-\sum_{\begin{subarray}{c}j\in[e]\\ S\subset\mathsf{supp}(\mathbf{a}_{j})\end{subarray}}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}.

We want to reduce the above expression to

\sum_{j\in J}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=\sum_{j\in[e]}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}-\sum_{\begin{subarray}{c}j\in[e]\\ S\subset\mathsf{supp}(\mathbf{a}_{j})\end{subarray}}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}.

To see that this holds, we verify that when $S\not\subseteq\mathsf{supp}(\mathbf{a}_{j})$ then $\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=0$ . Indeed, take an $\ell\in S$ such that $\ell\not\in\mathsf{supp}(\mathbf{a}_{j})$ . Then, $\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}$ contains the factor $a_{\ell j}^{j^{\prime}s_{\ell}}=0^{s_{\ell}}$ with $s_{\ell}=\hat{s}_{\ell}\neq 0$ . Hence, $\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=0$ .

Now, by the maximality of $S$ , for all $j$ with $S\subset\mathsf{supp}(\mathbf{a}_{j})$ we have $n_{j}=m_{j}$ and thus

\sum_{\begin{subarray}{c}j\in[e]\\ S\subset\mathsf{supp}(\mathbf{a}_{j})\end{subarray}}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=\sum_{\begin{subarray}{c}j\in[e]\\ S\subset\mathsf{supp}(\mathbf{a}_{j})\end{subarray}}m_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}.

Since $\sum_{j\in[e]}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=\sum_{j\in[e]}m_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}$ , we thus also have that

\sum_{j\in J}n_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}=\sum_{j\in J}m_{j}\mathbf{a}_{j}^{j^{\prime}\mathbf{s}}.

Since this holds for all $j^{\prime}\in J$ , we have $\mathbf{n}_{J}\cdot\mathbf{A}=\mathbf{m}_{J}\cdot\mathbf{A}$ and by the invertibility of $\mathbf{A}$ , $\mathbf{n}_{J}=\mathbf{m}_{J}$ . In particular, since $j_{0}\in J$ , $n_{j_{0}}=m_{j_{0}}$ contradicting our assumption.

As a consequence, $n_{j}=m_{j}$ for all $j\in[e]$ and thus we have our desired bijection.

It remains to verify that we only need a finite number of $\mathcal{F}$ -pattern trees to conclude that $n_{j}=m_{j}$ for all $j\in[e]$ . In fact, the above proof indicates that we just need to check test for each root label $a$ , we need to check identities for the finite number of pattern trees used to define the matrix $\mathbf{A}$ .

Proof of equivalence 2 This equivalence is shown just like proof of Theorem 4.4. in Grohe (2020a) with the additional techniques from Lemma 4.2 in Grohe (2020b).

$\Longrightarrow$ We first show that $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H$ implies $\mathsf{hom}(T,G)=\mathsf{hom}(T,H)$ for unrooted $\mathcal{F}$ -pattern trees $T$ of depth at most $d$ .

Assume that $V_{X}\cap V_{Y}=\emptyset$ for $X,Y\in\{G,H\}$ . For $x\in V_{X}$ and $y\in V_{Y}$ , define $x\sim_{d}y$ if and only if $\mathsf{hom}(T^{r},X^{x})=\mathsf{hom}(T^{r},Y^{y})$ for all $\mathcal{F}$ -pattern trees $T^{r}$ of depth at most $d$ . Let $R_{1},\ldots,R_{e}$ be the $\sim_{d}$ -equivalence classes and for each $j\in[e]$ , let $p_{j}:=|R_{j}\cap V_{G}|$ and $q_{j}:=|R_{j}\cap V_{H}|$ . Suppose that $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H$ . This implies that $p_{j}=q_{j}$ for every $j\in[e]$ .

Let $T$ be an unrooted $\mathcal{F}$ -pattern tree of depth at most $d$ , let $r$ be any vertex on the backbone of $T$ , and let $T^{r}$ be the rooted $\mathcal{F}$ -pattern tree obtained from $T$ by declaring $r$ as its root. By definition, for $X\in\{G,H\}$ , any $x\in V_{X}\cap R_{j}$ , $\mathsf{hom}(T^{r},X^{x})$ are all the same number, only dependent on $j\in[e]$ . Hence,

	$\displaystyle\mathsf{hom}(T,G)$	$\displaystyle=\sum_{v\in V(G)}\mathsf{hom}(T^{r},G^{v})=\sum_{j\in[e]}p_{j}\mathsf{hom}(T^{r},G^{v_{j}})$
		$\displaystyle=\sum_{j\in[e]}q_{j}\mathsf{hom}(T^{r},H^{w_{j}})=\sum_{w\in V(H)}\mathsf{hom}(T^{r},H^{w})=\mathsf{hom}(T,H),$

where $v_{j}$ and $w_{j}$ are arbitrary vertices in $R_{j}\cap V_{G}$ and $R_{j}\cap V_{H}$ , respectively, and where we used that $\mathsf{hom}(T^{r},G^{v_{j}})=\mathsf{hom}(T^{r},H^{w_{j}})$ and $p_{j}=q_{j}$ . Since this holds for any unrooted $\mathcal{F}$ -pattern tree $T$ of depth at most $d$ , we have show the desired implication.

$\Longleftarrow$ We next check the other direction. That is, we assume that $\mathsf{hom}(T,G)=\mathsf{hom}(T,H)$ holds for any unrooted $\mathcal{F}$ -pattern tree $T$ of depth at most $d$ and verify that $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}^{(d)}H$ .

For $x\sim_{d}y$ to hold for $x\in V_{X}$ , $y\in V_{Y}$ and $X,Y\in\{G,H\}$ , we earlier showed that this corresponds to checking whether $\mathsf{hom}(T_{i}^{r_{i}},X^{x})=\mathsf{hom}(T_{i}^{r_{i}},Y^{y})$ for a finite number $K$ rooted $\mathcal{F}$ -pattern trees $T_{i}^{r_{i}}$ . By definition of the $R_{j}$ ’s, $a_{ij}:=\mathsf{hom}(T_{i}^{r_{i}},X^{x})$ for $x\in R_{j}$ is well-defined (independent of the choice of $X\in\{G,H\}$ $x\in V_{X}$ ). For the rooted $T_{i}^{r_{i}}$ ’s we denote by $T_{i}$ its unrooted version. Similarly as before,

\mathsf{hom}(T_{i},G)=\sum_{j\in[e]}a_{ij}p_{j}=\sum_{j\in[e]}a_{ij}q_{j}=\mathsf{hom}(T_{i},H).

We next show that $p_{j}=q_{j}$ for $j\in[e]$ . In fact, this is shown in precisely the same way as in our previous characterisation and based on Lemma 4.2 in Grohe (2020b). That is, we again consider trees obtained by joining copies of the $T_{i}$ ’s, to obtain, for compatible $\mathbf{s}\in\mathbb{N}^{K}$ ,

\sum_{j\in[e]}a_{ij}^{s_{i}}p_{j}=\sum_{j\in[e]}a_{ij}^{s_{i}}q_{j}.

It now suffices to repeat the same argument as before (details omitted). ∎

Appendix B Proofs of Section 4

B.1 Additional details of standard concepts

Core and treewidth.

A graph $G$ is a core if all homomorphisms from $G$ to itself are injective. The treewidth of a graph $G=(V,E,\chi)$ is a measure of how much $G$ resembles a tree. This is defined in terms of the tree decompositions of $G$ , which are pairs $(T,\lambda)$ , for a tree $T=(V_{T},E_{T})$ and $\lambda$ a mapping that associates each vertex $t$ of $V_{T}$ with a set $\lambda(t)\subseteq V$ , satisfying the following:

•

The union of $\lambda(t)$ , for $t\in V_{T}$ , is equal to $V$ ;
•

The set $\{t\in V_{T}\mid v\in\lambda(t)\}$ is connected, for all $v\in V$ ; and
•

For each $\{u,v\}\in E$ there is $t\in V_{T}$ with $\{u,v\}\in\lambda(t)$ .

The width of $(T,\lambda)$ is $\min_{t\in T}(|\lambda(t)|)-1$ . The treewidth of $G$ is the minimum width of its tree decompositions. For instance, trees have treewidth one, cycles have clique two, and the $k$ -clique $K_{k}$ has treewidth $k-1$ (for $k>1$ ).

If $P^{r}$ is a pattern, then its treewidth is defined as the treewidth of the graph $P$ . Similarly, $P^{r}$ is a core if $P$ is.

$k\text{-}\mathsf{WL}$ .

A partial isomorphism from a graph $G$ to a graph $H$ is a set $\pi\subseteq V_{G}\times V_{H}$ such that all $(v,w),(v^{\prime},w^{\prime})\in\pi$ satisfy the equivalences $v=v^{\prime}\Leftrightarrow w=w^{\prime}$ , $\{v,v^{\prime}\}\in E_{G}\Leftrightarrow\{w,w^{\prime}\}\in E_{H}$ , $\chi_{G}(v)=\chi_{H}(w)$ and $\chi_{G}(v^{\prime})=\chi_{H}(w^{\prime})$ . We may view $\pi$ as a bijective mapping from a subset $X\subseteq V_{G}$ to a subset of $Y\subseteq V_{H}$ that is an isomorphism from the induced subgraph $G[X]$ to the induced subgraph $H[Y]$ . The isomorphism type $\mathsf{isotp}(G,\bar{v})$ of a $k$ -tuple $\bar{v}=(v_{1},\ldots,v_{k})$ is a label in some alphabet $\Sigma$ such that $\mathsf{isotp}(G,\bar{v})=\mathsf{isotp}(H,\bar{w})$ if and only if $\pi=\{(v_{1},w_{1}),\ldots,(v_{k},w_{k})\}$ is a partial isomorphism from $G$ to $H$ .

Let $k\geq 1$ and $G=(V,E,\chi)$ . The $k$ -dimensional Weisfeiler-Leman algorithm ( $k\text{-}\mathsf{WL}$ ) computes a sequence of labellings $\chi_{k,G}^{(d)}$ from $V^{k}\to\Sigma$ . We denote by $\chi_{k,G,\bar{v}}^{(d)}$ the label assigned to the $k$ -tuple $\bar{v}\in V^{k}$ in round $d$ . The initial labelling $\chi_{k,G}^{(0)}$ assigns to each $k$ -tuple $\bar{v}$ is isomorphism type $\mathsf{isotp}(G,\bar{v})$ . Then, for round $d$ ,

\chi_{k,G,\bar{v}}^{(d)}:=\big{(}\chi_{k,G,\bar{v}}^{(d-1)},M^{(d-1)}_{\bar{v}}\big{)},

where $M^{(d-1)}_{\bar{v}}$ is the multiset

\Big{\{}\!\!\!\Big{\{}\big{(}\mathsf{isotp}(v_{1},\ldots,v_{k},w),\chi_{k,G,(v_{1},\ldots,v_{k-1},w)}^{(d-1)},\chi_{k,G,(v_{1},\ldots,v_{k-2},w,v_{k})}^{(d-1)},\ldots,\chi_{k,G,(w,v_{2},\ldots,v_{k})}^{(d-1)}\big{)}\>\Big{|}\;w\in V\Big{\}}\!\!\!\Big{\}}.

As observed in Dell et al. (2018), if $k\geq 2$ holds, then we can omit the entry $\mathsf{isotp}(v_{1},\ldots,v_{k},w)$ from the tuples in $M_{\bar{v}}$ , because all the information it contains is also contained in the entries $\chi_{k,G,\ldots}^{(d-1)}$ of these tuples. Also, $\mathsf{WL}=1\text{-}\mathsf{WL}$ in the sense that $\chi_{G,v}^{(d)}=\chi_{G,v^{\prime}}^{(d)}$ if and only if $\chi_{1,G,v}^{(d)}=\chi_{1,G,v^{\prime}}^{(d)}$ for all $v,v^{\prime}\in V$ . The $k\text{-}\mathsf{WL}$ algorithm is run until the labelings stabilises, i.e., if for all $\bar{v},\bar{w}\in V^{k}$ , $\chi_{k,G,\bar{v}}^{(d)}=\chi_{k,G,\bar{w}}^{(d)}$ if and only if $\chi_{k,G,\bar{v}}^{(d+1)}=\chi_{k,G,\bar{w}}^{(d+1)}$ . We say that $k\text{-}\mathsf{WL}$ distinguishes two graphs $G$ and $H$ if the multisets of labels for all $k$ -tuples of vertices in $G$ and $H$ , respectively, coincides. Similar notions as are place for distinguishing $k$ -tuples, and for distinguishing graphs (or vertices) based on labels computed by a given number of rounds.

We remark that $k\text{-}\mathsf{WL}$ algorithm given here is sometimes referred to as the “folklore” version of the $k$ -dimensional Weisfeiler-Leman algorithm. It is known that indistinguishability of graphs by $k\text{-}\mathsf{WL}$ is equivalent to indistinguishability by sentences in the the $k+1$ -variable fragment of first order logic with counting (Cai et al., 1992), and to $\mathsf{hom}(P,G)=\mathsf{hom}(P,H)$ for every graph of treewidth $k$ (Dvorak, 2010; Dell et al., 2018).

B.2 Proof of Proposition 2

We show that for each finite set $\mathcal{F}$ of patterns, the expressive power of $\mathcal{F}\text{-}\mathsf{WL}$ is bounded by $k\text{-}\mathsf{WL}$ , where $k$ is the largest treewidth of a pattern in $\mathcal{F}$ .

We first recall the following characterisation of $k\text{-}\mathsf{WL}$ -equivalence (Dvorak, 2010; Dell et al., 2018). For any two graphs $G$ and $H$ ,

G\equiv_{k\text{-}\mathsf{WL}}H\Longleftrightarrow\mathsf{hom}(P,G)=\mathsf{hom}(P,H)

for every graph $P$ of treewidth at most $k$ . On the other hand, we know from Theorem 1 that

G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H\Longleftrightarrow\mathsf{hom}(T,G)=\mathsf{hom}(T,H)

for every $\mathcal{F}$ -pattern tree $T$ . Hence, we may conclude that

G\equiv_{k\text{-}\mathsf{WL}}H\Longrightarrow G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H

if we can show that any $\mathcal{F}$ -pattern tree has treewidth at most $k$ .

Suppose that $k$ is the maximal treewidth of a pattern in $\mathcal{F}$ . To conclude the proof, we verify that the treewidth of any $\mathcal{F}$ -pattern tree is bounded by $k$ .

Lemma 1.

If $k$ is the maximal treewidth of a pattern in $\mathcal{F}$ , then the treewidth of any $\mathcal{F}$ -pattern tree $T$ is bounded by $k$ .

Proof.

The proof is by induction on the number of patterns joined at any leaf of $T$ . Clearly, if no patterns are joined, then $T$ is simply a tree and its treewidth is $1$ . Otherwise, consider a $\mathcal{F}$ -pattern tree $T=(V,E,\chi)$ whose treewidth is at most $k$ , and a pattern $P^{r}$ of treewidth $k$ that is to be joined at vertex $t$ of $T$ . By the induction hypothesis, there is a decomposition $(H,\lambda)$ for $T$ witnessing its bounded treewidth, that is,

1.

The union of all $\lambda(h)$ , for $h\in V_{H}$ , is equal to $V$ ;
2.

The set $\{h\in V_{H}\mid t\in\lambda(h)\}$ is connected, for all $t\in V$ ;
3.

For each $\{u,v\}\in E$ there is $h\in V_{H}$ with $\{u,v\}\in\lambda(h)$ ; and
4.

The size of each set $\lambda(h)$ is at most $k+1$ .

Likewise, by assumption, for pattern $P^{r}$ we have such a tree decomposition, say $(H^{P},\lambda^{P})$ .

Now consider any vertex $h$ of the decomposition of $T$ such that $\lambda(h)$ contains vertex $t$ in $T$ to which $P^{r}$ is to be joined at its root. We can create a joint tree decomposition for the join of $P^{r}$ and $T$ (at node $t$ ) by merging $H$ and $H^{P}$ with an edge from vertex $h$ in $H$ to the root of $H^{P}$ (recall $H^{P}$ is a tree by definition). It is readily verified that this decomposition maintains all necessary properties. Indeed, condition 1 is clearly satisfied since $\lambda$ and $\lambda^{p}$ combined cover all vertices of the join of $T$ with $P^{r}$ . Furthermore, since the only node shared by $T$ and $P^{r}$ is the join node, and we merge $H$ and $H^{P}$ by putting an edge from node $h$ in $H$ to the root of $H^{P}$ , connectivity of is guaranteed and condition 2 is satisfied. Moreover, since the operation of joining $T$ and $P^{r}$ does not create any extra edges, condition 2 is immediately verified, and so is 3, because we do not create any new vertices, neither in $H$ nor in $H^{P}$ , and we already know that $\lambda$ and $\lambda^{P}$ are bounded by $k+1$ . ∎

B.3 Proof of Theorem 2

We show that if $\mathcal{F}$ contains a pattern $P^{r}$ which is a core and has treewidth $k$ , then $\mathcal{F}\text{-}\mathsf{WL}$ is not bounded by $(k-1)\text{-}\mathsf{WL}$ . In other words, we construct two graphs $G$ and $H$ that can be distinguished by $\mathcal{F}\text{-}\mathsf{WL}$ but not by $(k-1)\text{-}\mathsf{WL}$ . It suffices to find such graphs that can be distinguished by $\{P^{r}\}\text{-}\mathsf{WL}$ but not by $(k-1)\text{-}\mathsf{WL}$ . The proof relies on the characterisation of $(k-1)\text{-}\mathsf{WL}$ indistinguishability in terms of the $k$ -variable fragment $\mathsf{C}^{k}$ of first logic with counting and of $k$ -pebble bijective games in particular (Cai et al., 1992; Hella, 1996). More precisely, $G\equiv_{(k-1)\text{-}\mathsf{WL}}H$ if and only if no sentence in $\mathsf{C}^{k}$ can distinguish $G$ from $H$ . In other words, for any sentence $\varphi$ in $\mathsf{C}^{k}$ , $G\models\varphi$ if and only if $H\models\varphi$ . We denote indistinguishability by $\mathsf{C}^{k}$ by $G\equiv_{\mathsf{C}^{k}}H$ . We heavily rely on the constructions used in Atserias et al. (2007) and Bova & Chen (2019). In fact, we show that the graphs $G$ and $H$ constructed in those works, suffice for our purpose, by extending their strategy for the $k$ -pebble game to $k$ -pebble bijective games.

Construction of the graphs $G$ and $H$ .

Let $P^{r}$ be a pattern in $\mathcal{F}$ which is a core and has treewidth $k$ . For a vertex $v\in V_{P}$ , we gather all its edges in $E_{v}:=\bigl{\{}\{v,v^{\prime}\}\mid\{v,v^{\prime}\}\in E_{P}\bigr{\}}$ . Let $v_{1}$ be one of the vertices in $V_{P}$ .

For $G$ , as vertex set $V_{G}$ we take vertices of the form $(v,f)$ with $v\in V_{P}$ and $f:E_{v}\to\{0,1\}$ . We require that

\sum_{e\in E_{v_{1}}}f(e)\!\!\mod 2=1\text{ and }\sum_{e\in E_{v}}f(e)\!\!\mod 2=0\text{ for $v\neq v_{1}$, $v\in V_{P}$}.

For $H$ , as vertex set $V_{H}$ we take vertices of the form $(v,f)$ with $v\in V_{P}$ and $f:E_{v}\to\{0,1\}$ . We require that $\sum_{e\in E_{v}}f(e)\mod 2=0$ , for all $v\in V_{P}$ . We observe that $G$ and $H$ have the same number of vertices.

The edge sets $E_{G}$ and $E_{H}$ of $G$ and $H$ , respectively, are defined as follows: $(v,f)$ and $(v^{\prime},f^{\prime})$ are adjacent if and only if $v\neq v^{\prime}$ and furthermore,

f(\{v,v^{\prime}\}=f^{\prime}(\{v,v^{\prime}\}).

It is known that $\mathsf{hom}(P,G)=0$ (here it is used that $P$ is a core), $\mathsf{hom}(P,H)\neq 0$ and $G$ and $H$ are indistinguishably by means of sentences in the $k$ -variable fragment $\mathsf{FO}^{k}$ of first order logic (Atserias et al., 2007; Bova & Chen, 2019). To show our theorem, we thus need to verify that $G\equiv_{\mathsf{C}^{k}}H$ as well. Indeed, for if this holds, then $G\equiv_{(k-1)\text{-}\mathsf{WL}}H$ yet $G\not\equiv_{\{P\}\text{-}\mathsf{WL}}^{(0)}H$ . Indeed, Theorem 1 implies that for $G\not\equiv_{\{P\}\text{-}\mathsf{WL}}^{(0)}H$ to hold, $\mathsf{hom}(P,G)=\mathsf{hom}(P,H)$ , which we know not to be true. Hence, $G\not\equiv_{\{P\}\text{-}\mathsf{WL}}H$ , as desired.

Showing $\mathsf{C}^{k}$ -indistinguishability of $G$ and $H$ .

We next show that the graphs $G$ and $H$ are indistinguishable by sentences in $\mathsf{C}^{k}$ . This will be shown by verifying that the Duplicator has a winning strategy for the $k$ -pebble bijective game on $G$ and $H$ (Hella, 1996).

The $k$ -pebble bijective game. We recall that the $k$ -pebble bijective game is played between two players, the Spoiler and the Duplicator, each placing at most $k$ pebbles on the vertices of $G$ and $H$ , respectively. The game is played in a number of rounds. The pebbles placed after round $r$ are typically represented by a partial function $p^{(r)}:\{1,\ldots,k\}\to V_{G}\times V_{H}$ . When $p^{(r)}(i)$ is defined, say, $p^{(r)}(i)=(v,w)$ , this means that the Spoiler places the $i$ th pebble on vertex $v$ and the Duplicator places the $i$ th pebble on $w$ . Initially, no pebbles are placed on $G$ and $H$ and hence $p^{(0)}$ is undefined everywhere.

Then in round $r>0$ , the game proceeds as follows:

1.

The Spoiler selects a pebble $i$ in $[k]$ . All other already placed pebbles are kept on the same vertices. We define $p^{(r)}(j)=p^{(r-1)}(j)$ for all $j\in[k]$ , $j\neq i$ .
2.

The Duplicator responds by choosing a bijection $h:V_{G}\to V_{H}$ . This bijection should be consistent with the pebbles in the restriction of $p^{(r-1)}$ to $[k]\setminus\{i\}$ . That is, for every $j\in[k]$ , $j\neq i$ , if $p^{(r-1)}(j)=(v,w)$ then $w=h(v)$ .
3.

Next, the Spoiler selects an element $v\in V_{G}$ .
4.

The Duplicator defines $p^{(r)}(i)=(v,h(v))$ . Hence, after this round, the $i$ th pebble is placed on $v$ by the Spoiler and on $h(v)$ by the Duplicator.

Let $\mathsf{dom}(p^{(r)})$ be the elements in $[k]$ for which $p^{(r)}$ is defined. For $i\in\mathsf{dom}(p^{(r)})$ denote by $(v_{i},w_{i})\in V_{G}\times V_{H}$ the pair of vertices on which the $i$ th pebble is placed. The Duplicator wins round $r$ if the mapping $v_{i}\mapsto w_{i}$ is partial isomorphism between $G$ and $H$ . More precisely, it should hold that for all edges $\{v_{i},v_{j}\}\in E_{G}$ if and only if $(w_{i},w_{j})\in E_{H}$ . In this case, the game continues to the next round. Infinite games are won by the Duplicator. A winning strategy consists of defining a bijection in step 2 in each round, allowing the game to continue, irregardless of which vertex $v$ the Spoiler places a pebble in Step 3.

Winning strategy. We will now provide a winning strategy for the $k$ -bijective game on our constructed graphs $G$ and $H$ . We recall that $V_{G}$ and $V_{H}$ have the same number of vertices, so a bijection between $V_{G}$ and $V_{H}$ exists. We show how the Duplicator can select a “good” bijection in Step 2 of the game, by induction on the number of rounds.

To state our induction hypothesis, we first recall some notions and properties from Atserias et al. (2007) and Bova & Chen (2019).

Let $W$ be a walk in $P$ and let $e$ be an edge in $E_{P}$ . Then, $\mathsf{occ}_{W}(e)$ denotes the number of occurrences of the edge $e$ in the walk. More precisely, if $W=(a_{1},\ldots,a_{\ell})$ is a walk in $P$ of length $\ell$ , then

\mathsf{occ}_{W}(e):=|\{i\in[\ell-1]\mid e=\{a_{i},a_{i+1}\}\}|.

Furthermore, for a subset $S\subseteq V_{P}$ , we define

\mathsf{avoid}(S):=\bigcup_{\{M\in\mathcal{M},M\cap S=\emptyset\}}M,

where $\mathcal{M}$ is an arbitrary bramble of $P$ of order $>k$ . A bramble $\mathcal{M}$ is a set of connected subsets of $V_{P}$ such that for any two elements $M_{1}$ and $M_{2}$ in $\mathcal{M}$ , either $M_{1}\cap M_{2}\neq\emptyset$ , or there exists a vertex $a\in M_{1}$ and $b\in M_{2}$ such that $\{a,b\}\in E_{P}$ . The order of a bramble is the minimum size of a hitting set for $\mathcal{M}$ . It is known that $P$ has treewidth $\geq k$ if and only if it has a bramble of order $>k$ . In what follows, we let $\mathcal{M}$ be any such bramble.

Lemma 2 (Lemma 14 in Bova & Chen (2019)).

For any $1\leq\ell\leq k$ , let $(a_{1},f_{1}),\ldots,(a_{\ell},f_{\ell})$ be vertices in $V_{G}$ . Let $W$ be a walk in $P$ from $v_{1}$ to $\mathsf{avoid}(\{a_{1},\ldots,a_{\ell}\})$ . For all $i\in[\ell]$ , let $f_{i}^{\prime}:E_{a_{i}}\to\{0,1\}$ be defined by

f^{\prime}_{i}(e)=f_{i}(e)+\mathsf{occ}_{W}(e)\mod 2

for all $e\in E_{a_{i}}$ . Then, the mapping $(a_{i},f_{i})\mapsto(a_{i},f_{i}^{\prime})$ , for all $i\in[\ell]$ , is a partial isomorphism from $G$ to $H$ .∎

We use this lemma to show that the bijection (to be defined shortly) selected by the Duplicator induces a partial isomorphism between $G$ and $H$ on the pebbled vertices.

We can now state our induction hypothesis: In each round $r$ , there exists a bijection $h:V_{G}\to V_{H}$ which is

(a)

consistent with the pebbles in the restriction of $p^{(r-1)}$ to $[k]\setminus\{i\}$ (Recall, Pebble $i$ is selected by the Spoiler in Step 1.)
(b)

If $p^{(r)}(j)=(a_{j},f_{j},h(a_{j},f_{j}))$ for $j\in\mathsf{dom}(p^{(r)})$ , then there exists a walk $W^{(r)}$ in $P$ , from $v$ to $\mathsf{avoid}(\{a_{j}\mid j\in\mathsf{dom}(p^{(r)})\}$ , such that

$h(a_{j},f_{j})=(a_{j},f_{j}^{\prime}),$

where $f_{j}^{\prime}(e)=f_{j}(e)+\mathsf{occ}_{W^{(r)}}(e)\mod 2$ for every $e\in E_{a_{j}}$ . In other words, on the vertices in $V_{G}$ pebbled by $p^{(r)}$ , the bijection $h$ is, by the previous Lemma, a partial isomorphism from $G$ to $H$ .

If this holds, then the strategy for the Duplicator is selecting that bijection $h$ in each round.

Verification of the induction hypothesis. We assume that the special vertex $v_{1}$ in $P$ has at least two neighbours. Such a vertex exists since otherwise $P$ consists of a single edge while we assume $P$ to be of treewidth at least two.

Base case. For the base case ( $r=0)$ we define two walks: $W_{1}=v_{1},v_{2}$ and $W_{2}=v_{1},t$ with $v_{2}\neq t$ and $v_{2},t$ are neighhbours of $v_{1}$ . We define $h(a_{i},f)=(a_{i},f^{\prime})$ with $f^{\prime}(e)=f(e)+\mathsf{occ}_{W_{1}}(e)\mod 2$ if $a_{i}\neq t$ , and $h(t,f)=(t,f^{\prime})$ with $f^{\prime}(e)=f(e)+\mathsf{occ}_{W_{2}}(e)\mod 2$ .

The mapping $h$ is a bijection from $V_{G}$ to $V_{H}$ . We note that it suffices to show that $h$ is injective since $V_{G}$ and $V_{H}$ contain the same number of vertices. Since $h(a_{i},f_{i})\neq h(a_{j},f_{j})$ whenever $a_{i}\neq a_{j}$ , we can focus on comparing $h(a_{i},f)$ and $h(a_{i},g)$ with $f\neq g$ . This implies that $f(e)\neq g(e)$ for at least one edge $e\in N_{a_{i}}$ . Clearly, this implies that $f^{\prime}(e)=f(e)+\mathsf{occ}_{W}(e)\mod 2\neq g^{\prime}(e)=g(e)+\mathsf{occ}_{W}(e)\mod 2$ . In fact this, holds for any walk $W$ and thus in particular for $W_{1}$ and $W_{2}$ . We further observe that $h$ is consistent simply because no pebbles have been placed yet. For the same reason we can take the walk $W^{(0)}$ to be either $W_{1}$ or $W_{2}$ .

Inductive case. Assume that the induction hypothesis holds for round $r$ and consider round $r+1$ . Let $p^{(r)}=(a_{j},f_{j},a_{j},f_{j}^{\prime})$ for $j\in\mathsf{dom}(p^{(r)})$ . By induction, there exists a bijection $h^{\prime}:V_{G}\to V_{H}$ such that $h(a_{j},f_{j})=(a_{j},f_{j}^{\prime})$ and furthermore, $f_{j}^{\prime}(e)=f_{j}(e)+\mathsf{occ}_{W^{(r)}}(e)\mod 2$ for every $e\in N_{a_{j}}$ , for some walk $W^{(r)}$ from $v_{1}$ to $t\in\mathsf{avoid}(\{a_{j}\mid j\in\mathsf{dom}(p^{(r)})\})$ .

Assume that the Spoiler selects $i\in[k]$ in Step 1 in round $r+1$ . We define the Duplicator’s bijection $h:V_{G}\to V_{H}$ for round $r+1$ , as follows. Recall that $t\in V_{P}$ is the vertex in which the walk $W^{(r)}$ ends.

•

For all $(a,f)\in V_{G}$ such that $a\neq t$ , we define $h(a,f)=(a,f^{\prime})$ where for each $e\in E_{a}$ :

$f^{\prime}(e)=f(e)+\mathsf{occ}_{W^{(r)}}(e)\mod 2.$
•

For all $(t,f)\in V_{G}$ , we will extend $W^{(r)}$ with a walk $W^{\prime}$ so that it ends in a vertex $t^{\prime}$ different from $t$ . Suppose that $M\in\mathcal{M}$ such that $t\in M$ . We want to find an $M^{\prime}\in\mathcal{M}$ such that $M^{\prime}\cap(\{a_{j}\mid j\in\mathsf{dom}(p^{(r)}),j\neq i\}\cup\{t\})=\emptyset$ . We can then take $t^{\prime}$ to be a vertex in $M^{\prime}$ and since $M$ and $M^{\prime}$ are both connected, and either have a vertex in common or an edge between them, we can let $W^{\prime}$ be a walk from $t$ to $t^{\prime}$ entirely in $M$ and $M^{\prime}$ . Now, such an $M^{\prime}$ exists since otherwise $\{a_{j}\mid j\in\mathsf{dom}(p^{(r)}),j\neq i\}\cup\{t\}$ would be a hitting set for $\mathcal{M}$ of size at most $k$ . We know, however, that any hitting set $\mathcal{M}$ must be of size $k+1$ because of the treewidth $k$ assumption for $P$ . We now define the bijection as $h(t,f)=(t,f^{\prime})$ where for each $e\in E_{t}$ :

$f^{\prime}(e)=f(e)+\mathsf{occ}_{W^{(r)},W^{\prime}}(e)\mod 2.$

This concludes the definition of $h:V_{G}\to V_{H}$ . We need to verify a couple of things: (i) $h$ is bijection; (ii) $h$ is consistent with all pebbles in $p^{(r)}$ except for the “unpebbled” one $p^{(r)}(i)$ ; and (iii) it induces a partial isomorphism on pebbled vertices.

(i)

$h$ is a bijection. Since $V_{G}$ and $V_{H}$ are of the same size, it suffices to show that $h$ is an injection. Clearly, $h(a_{1},f_{1})\neq h(a_{2},f_{2})$ whenever $a_{1}\neq a_{2}$ . We can thus focus on $h(a,f_{1})$ and $h(a,f_{2})$ with $f_{1}\neq f_{2}$ . Then, $f_{1}$ and $f_{2}$ differ in at least one edge $e\in E_{a}$ and for this edge:

f_{1}^{\prime}(e)=f_{1}(e)+\mathsf{occ}_{W}(e)\mod 2\neq f_{2}(e)+\mathsf{occ}_{W}(e)\mod 2=f_{2}^{\prime}(e).

for any walk $W$ . In particular, this holds for both walks used in the definition of $h$ : $W^{(r)}$ , used when $a\neq t$ , and $W^{(r)},W^{\prime}$ used when $a=t$ . Hence, $h$ is indeed a bijection.

(ii)

$h$ is consistent. For each $j\in\mathsf{dom}(p^{(r+1)})$ with $j\neq i$ , let $p^{(r+1)}=(a_{j},f_{j},a_{j},f_{j}^{\prime})$ . Now, by induction, $W^{(r)}$ ended in a vertex $t$ distinct from any of these $a_{j}$ ’s and thus none of these $a_{j}$ ’s are equal to $t$ . This implies that $h(a_{j},f_{j})=(a_{j},f_{j}^{\prime\prime})$ with $f_{j}^{\prime\prime}(e)=f_{j}(e)+\mathsf{occ}_{W^{(r)}}(e)\mod 2$ . But this is precisely how $p^{(r)}$ placed its pebbles, by induction. Hence, $f_{j}^{\prime\prime}(e)=f_{j}^{\prime}(e)$ and thus $h$ is consistent.

(iii)

$p^{(r+1)}$ induces a partial isomorphism. After the Spoiler picked an element $(a_{i},f_{i})\in V_{G}$ , we now know that $p^{(r+1)}(j)=(a_{j},f_{j},h(a_{j},f_{j}))$ for all $j\in\mathsf{dom}(p^{(r+1)})$ . We recall that $h$ is defined in two possible ways, using two distinct walks: $W^{(r)}$ , for vertices in $V_{G}$ not involving $t$ , or, otherwise using the walk $W^{(r)},W^{\prime}$ , for vertices in $V_{G}$ involving $t$ .

Hence, when all $a_{j}$ ’s for $p^{(r+1)}$ are distinct from $t$ , then $h(a_{j},f_{j})=(a_{j},f_{j}^{\prime})$ with $f_{j}^{\prime}(e)=f_{j}(e)+\mathsf{occ}_{W^{(r)}}(e)\mod 2$ and we can simply take the new walk $W^{(r+1)}$ to be $W^{(r)}$ . Then, Lemma 2 implies that the mapping $(a_{j},f_{j})\to h(a_{j},f_{j})$ , for $j\in\mathsf{dom}(p^{(r+1)})$ is a partial isomorphism from $G$ to $H$ , as desired.

Otherwise, we know that $a_{j}\neq t$ for $j\neq i$ but $a_{i}=t$ . That is, the Spoiler places the $i$ th pebble on a vertex of the form $(t,f)$ in $V_{G}$ . We now have that $h$ is defined in two ways for the pebbled elements using the two distinct walks. We next show that $W^{(r)},W^{\prime}$ can be used for both types of pebbled elements in $p^{(r+1)}$ , those of the form $(a_{j},f)$ with $a_{j}\neq t$ and $(t,f)$ . For the last type this is obvious since we defined $h(t,f)$ in terms of $W^{(r)},W^{\prime}$ . For the former type, we note that $a_{j}\not\in M$ and $a_{j}\not\in M^{\prime}$ for $j\neq i$ . If we take an edge $e\in N_{a_{j}}$ , then $\mathsf{occ}_{W^{r},W^{\prime}}(e)=\mathsf{occ}_{W^{(r)}}(e)$ because $W^{\prime}$ lies entirely in $M$ and $M^{\prime}$ . As a consequence, for $(a_{j},f_{j})$ with $j\neq i$ , for all $e\in N_{j}$ :

	$\displaystyle f^{\prime}_{j}(e)$	$\displaystyle=f_{j}(e)+\mathsf{occ}_{W^{(r)}}(e)\mod 2$
		$\displaystyle=f_{j}(e)+\mathsf{occ}_{W^{(r)},W^{\prime}}(e)\mod 2.$

Then, Lemma 2 implies that the mapping $(a_{j},f_{j})\to h(a_{j},f_{j})$ , for $j\in\mathsf{dom}(p^{(r+1)})$ is a partial isomorphism from $G$ to $H$ , because we can use the same walk $W^{(r),W^{\prime}}$ for all pebbled vertices.

B.4 Proof of Proposition 3

We show that no finite set $\mathcal{F}$ of patterns suffices for $\mathcal{F}\text{-}\mathsf{WL}$ to be equivalent to $k\text{-}\mathsf{WL}$ , for $k>1$ , in terms of expressive power. The proof is by contradiction. That is, suppose that there exists a set $\mathcal{F}$ such that $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H\Leftrightarrow G\equiv_{k\text{-}\mathsf{WL}}H$ for any two graphs $G$ and $H$ . In particular, $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H\Rightarrow G\equiv_{k\text{-}\mathsf{WL}}H$ and thus also $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H\Rightarrow G\equiv_{2\text{-}\mathsf{WL}}H$ , since the $2\text{-}\mathsf{WL}$ -test is upper bounded by any $k\text{-}\mathsf{WL}$ -test for $k>2$ . We argue that no finite set $\mathcal{F}$ exists satisfying $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H\Rightarrow G\equiv_{2\text{-}\mathsf{WL}}H$ .

Let $m$ denote the maximum number of vertices of any pattern in $\mathcal{F}$ .⁶⁶6Strictly speaking, we can use the diameter of any pattern in $\mathcal{F}$ instead, but it is easier to convey the proof simply by taking number of vertices. Furthermore, consider graphs $G$ and $H$ , where $G$ is the disjoint union of $m+2$ copies of the cycle $C_{m+1}$ , and $H$ is the union of $m+1$ copies of the cycle $C_{m+2}$ . Note that $G$ and $H$ have the same number of vertices.

We observe that any homomorphism from a pattern $P^{r}$ in $\mathcal{F}$ to $G^{v}$ or $H^{w}$ , for vertices $v\in V_{G}$ and $w\in V_{H}$ , maps $P^{r}$ to either a copy of $C_{m+1}$ (for $G$ ) or a copy of $C_{m+2}$ (for $H$ ). Furthermore, any such homomorphism maps $P^{r}$ in a subgraph of $C_{m+1}$ or $C_{m+2}$ , consisting of at most $m$ vertices. There is, however, a unique (up to isomorphism) subgraph of $m$ vertices in $C_{m+1}$ and $C_{m+2}$ . Indeed, such subgraphs will be a path of length $m$ . This implies that $\mathsf{hom}(P^{r},G^{v})=\mathsf{hom}(P^{r},H^{w})$ for any $v\in V_{G}$ and $w\in V_{H}$ . Since the argument holds for any pattern $P^{r}$ in $\mathcal{F}$ , all vertices in $G$ and $H$ will have the same homomorphism count for patterns in $\mathcal{F}$ . Furthermore, since both $G$ and $H$ are regular graphs (each vertex has degree two), this implies that $\mathcal{F}\text{-}\mathsf{WL}$ cannot distinguish between $G$ and $H$ . This is formalised in the following lemma. We recall that a $t$ -regular graph is a graph in which every vertex has degree $t$ .

Lemma 3.

For any set $\mathcal{F}$ of patterns and any two $t$ -regular (unlabelled) graphs $G$ and $H$ such that $\mathsf{hom}(P^{r},X^{x})=\mathsf{hom}(P^{r},Y^{y})$ for $P^{r}\in\mathcal{F}$ , $X,Y\in\{G,H\}$ , $x\in V_{X}$ and $y\in V_{Y}$ holds, $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H$ .

Proof.

The lemma is readily verified by induction on the number $d$ of rounds of $\mathcal{F}\text{-}\mathsf{WL}$ . We show a stronger result in that $\chi_{\mathcal{F},X,x}^{(d)}=\chi_{\mathcal{F},Y,y}^{(d)}$ for any $d$ , $X,Y\in\{G,H\}$ , $x\in V_{X}$ and $y\in V_{Y}$ , from which $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H$ follows. By our Theorem 1, it suffices to show that $\mathsf{hom}(T^{r},X^{x})=\mathsf{hom}(T^{r},Y^{y})$ for $\mathcal{F}$ -pattern trees of depth at most $d$ . Let $\mathcal{F}=\{P_{1}^{r},\ldots,P_{\ell}^{r}\}$ . For the base case, let $T^{r}$ be a join pattern $\mathcal{F}^{\mathbf{s}}$ for some $\mathbf{s}=(s_{1},\ldots,s_{\ell})\in\mathbb{N}^{\ell}$ . Then,

\displaystyle\mathsf{hom}(T^{r},X^{x})=\prod_{i=1}^{\ell}\left(\mathsf{hom}(P_{i}^{r},X^{x})\right)^{s_{i}}=\prod_{i=1}^{\ell}\left(\mathsf{hom}(P_{i}^{r},Y^{y})\right)^{s_{i}}=\mathsf{hom}(T^{r},Y^{y}),

since $\mathsf{hom}(P_{i}^{r},X^{x})=\mathsf{hom}(P_{i}^{r},Y^{y})$ for any $P_{i}^{r}\in\mathcal{F}$ . Then, for the inductive case, assume that $\mathsf{hom}(S^{r},X^{x})=\mathsf{hom}(S^{r},Y^{y})$ for any $\mathcal{F}$ -pattern tree $S^{r}$ of depth at most $d-1$ , $X,Y\in\{G,H\}$ , $x\in V_{X}$ and $y\in V_{Y}$ , and consider an $\mathcal{F}$ -pattern $T^{r}$ of depth $d$ . Let $S_{1}^{c_{1}},\ldots,S_{p}^{c_{p}}$ be the $\mathcal{F}$ -pattern trees of depth at most $d-1$ rooted at the children $c_{1},\ldots,c_{p}$ of $r$ in the backbone of $T^{r}$ . As before, let $\mathcal{F}^{\mathbf{s}}$ the pattern joined at $r$ in $T^{r}$ . Then,

	$\displaystyle\mathsf{hom}(T^{r},X^{x})$	$\displaystyle=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},X^{x})\prod_{i=1}^{p}\sum_{x^{\prime}\in N_{X}(x)}\mathsf{hom}(S_{i}^{c_{i}},X^{x^{\prime}})=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},X^{x})\prod_{i=1}^{p}t\cdot\mathsf{hom}(S_{i}^{c_{i}},X^{\tilde{x}})$
		$\displaystyle=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},Y^{y})\prod_{i=1}^{p}t\cdot\mathsf{hom}(S_{i}^{c_{i}},Y^{\tilde{y}})=\mathsf{hom}(\mathcal{F}^{\mathbf{s}},Y^{y})\prod_{i=1}^{p}\sum_{y^{\prime}\in N_{Y}(y)}\mathsf{hom}(S_{i}^{c_{i}},Y^{y^{\prime}})$
		$\displaystyle=\mathsf{hom}(T^{r},Y^{y}),$

where we used that $N_{X}(x)$ and $N_{Y}(y)$ both consists of $t$ vertices (regularity), by the induction hypothesis all vertex have the same homomorphism counts for $\mathcal{F}$ -patterns trees of depth at most $d-1$ , and where $\tilde{x}$ and $\tilde{y}$ are taken to be arbitrary vertices in $N_{X}(x)$ and $N_{Y}(y)$ , respectively. ∎

Hence, since $G$ and $H$ are $2$ -regular and satisfy the conditions of the lemma, we may indeed infer that $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H$ . We note, however, that $G\not\equiv_{2\text{-}\mathsf{WL}}H$ . Indeed, from Dvorak (2010) and Dell et al. (2018) we know that $G\equiv_{2\text{-}\mathsf{WL}}H$ implies that $\mathsf{hom}(P,G)=\mathsf{hom}(P,H)$ for any graph $P$ of treewidth at most two. In particular, $G\equiv_{2\text{-}\mathsf{WL}}H$ implies that $\mathsf{hom}(C_{\ell},G)=\mathsf{hom}(C_{\ell},H)$ for all cycles $C_{\ell}$ . We now conclude by observing that $\mathsf{hom}(C_{m+1},G)\neq\mathsf{hom}(C_{m+1},H)$ by construction. We have thus found two graphs with cannot be distinguished by $\mathcal{F}\text{-}\mathsf{WL}$ , but that can be distinguished by $2\text{-}\mathsf{WL}$ , contradicting our assumption that $G\equiv_{\mathcal{F}\text{-}\mathsf{WL}}H\Rightarrow G\equiv_{2\text{-}\mathsf{WL}}H$ .

Appendix C Proofs of Section 5

C.1 Proof of Proposition 4

We show that for any $k>3$ , $\{C^{r}_{3},\dots,C^{r}_{k}\}\text{-}\mathsf{WL}$ is more expressive than $\{C^{r}_{3},\dots,C^{r}_{k-1}\}\text{-}\mathsf{WL}$ . More precisely, we construct two graphs $G$ and $H$ such that $G$ and $H$ cannot be distinguished by $\{C^{r}_{3},\dots,C^{r}_{k-1}\}\text{-}\mathsf{WL}$ , but they can be distinguished by $\{C^{r}_{3},\dots,C^{r}_{k}\}\text{-}\mathsf{WL}$ .

The proof is analogous to the proof of Proposition 3. Indeed, it suffices to let $G$ consist of $k$ disjoint copies of $C_{k+1}$ and $H$ to consist of $k+1$ disjoint copies of $C_{k}$ . Then, as observed in the proof of Proposition 3, $G$ and $H$ will be indistinguishable by $\{C^{r}_{3},\dots,C^{r}_{k-1}\}\text{-}\mathsf{WL}$ simply because each pattern has at most $k-1$ vertices. Yet, by construction, $\mathsf{hom}(C_{k},G)\neq\mathsf{hom}(C_{k},H)$ and thus $G$ and $H$ are distinguishable (already by the initial labelling) by $\{C^{r}_{3},\dots,C^{r}_{k}\}\text{-}\mathsf{WL}$ .

C.2 Proof of Proposition 5

Let $P^{r}=P_{1}^{r}\star P_{2}^{r}$ be a pattern that is the join of two smaller patterns. We show that for any any set $\mathcal{F}$ of patterns, we have that $\mathcal{F}\cup\{P^{r}\}\text{-}\mathsf{WL}$ is upper bounded by $\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\}\text{-}\mathsf{WL}$ . That is, for every two graphs $G$ and $H$ , $G\equiv_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\}\text{-}\mathsf{WL}}H$ implies $G\equiv_{\mathcal{F}\cup\{P^{r}\}\text{-}\mathsf{WL}}H$ . By definition, $G\equiv_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\}\text{-}\mathsf{WL}}H$ is equivalent to $\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}\mid w\in V_{H}\}\!\!\}$ . In other words, with every $v\in V_{G}$ we can associate a unique $w\in V_{H}$ such that $\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}=\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}$ . We show, by induction on $d$ , that this implies that $\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(d)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(d)}$ . This suffices to conclude that $\{\!\!\{\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(d)}\mid w\in V_{H}\}\!\!\}$ and thus $G\equiv_{\mathcal{F}\cup\{P^{t}\}\text{-}\mathsf{WL}}H$ .

Base case. We show that $\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}\mid w\in V_{H}\}\!\!\}$ implies that with every $v\in V_{G}$ we can associate a unique $w\in V_{H}$ satisfying $\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(0)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(0)}$ . Indeed, as already observed, $\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}\mid w\in V_{H}\}\!\!\}$ implies that with every $v\in V_{G}$ we can associate a unique $w\in V_{H}$ such that $\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}=\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}$ . This in turn implies that $\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(0)}=\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(0)}$ , which implies that $\mathsf{hom}(P_{1}^{r},G^{v})=\mathsf{hom}(P_{1}^{r},H^{w})$ and $\mathsf{hom}(P_{2}^{r},G^{v})=\mathsf{hom}(P_{2}^{r},H^{w})$ and $\mathsf{hom}(Q^{r},G^{v})=\mathsf{hom}(Q^{r},H^{w})$ for every $Q^{r}\in\mathcal{F}$ . As a consequence, from properties of the graph join operators, since $P^{r}=P_{1}^{r}\star P_{2}^{r}$ :

\mathsf{hom}(P^{r},G^{v})=\mathsf{hom}(P_{1}^{r},G^{v})\cdot\mathsf{hom}(P_{2}^{r},G^{v})=\mathsf{hom}(P_{1}^{r},H^{w})\cdot\mathsf{hom}(P_{2}^{r},H^{w})=\mathsf{hom}(P^{r},H^{w}),

and thus also $\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(0)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(0)}$ .

Inductive case. We assume that $\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}\mid v\in V_{G}\}\!\!\}=\{\!\!\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}\mid w\in V_{H}\}\!\!\}$ implies $\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(e)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(e)}$ , and want to show that it also implies $\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(e+1)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(e+1)}$ . We again use the fact that we can associate with every $v\in V_{G}$ a unique vertex $w\in V_{H}$ such that $\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(d)}=\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(d)}$ . In particular, this implies that $\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(e)}=\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(e)}$ and $\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v}^{(e+1)}=\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w}^{(e+1)}$ . From the definition of the $\text{-}\mathsf{WL}$ -test, it must also be the case that the multisets $\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},G,v^{\prime}}^{(e)}\mid v^{\prime}\in N_{G}(v)\}$ and $\{\chi_{\mathcal{F}\cup\{P_{1}^{r},P_{2}^{r}\},H,w^{\prime}}^{(e)}\mid v^{\prime}\in N_{H}(w)\}$ must be equal as well, i.e., we can find a one-to-one corresponence between neighbors of $v$ in $G$ and neighbors of $w$ in $H$ that have the same label. From the induction hypothesis we then have that $\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(e)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(e)}$ and also that the multisets $\{\chi_{\mathcal{F}\cup\{P^{r}\},G,v^{\prime}}^{(e)}\mid v^{\prime}\in N_{G}(v)\}$ and $\{\chi_{\mathcal{F}\cup\{P^{r}\},H,w^{\prime}}^{(e)}\mid v^{\prime}\in N_{H}(w)\}$ are equal, which implies, by the definition of the $\mathsf{WL}$ -test, that $\chi_{\mathcal{F}\cup\{P^{r}\},G,v}^{(e+1)}=\chi_{\mathcal{F}\cup\{P^{r}\},H,w}^{(e+1)}$ , as was to be shown.

C.3 Proof of Theorem 3

We show that $\mathcal{F}\cup\{Q^{r}\}\text{-}\mathsf{WL}$ , where $Q^{r}$ is pattern whose core has treewidth $k$ , is more expressive than $\mathcal{F}\text{-}\mathsf{WL}$ if every pattern $P^{r}\in\mathcal{F}$ satisfies one of the following conditions: (i) $P^{r}$ has treewidth $<k$ ; or (ii) $P^{r}$ does not map homomorphically to $Q^{r}$ .

Let $c(Q)^{r}$ to denote the (rooted) core of $Q$ , in which the root of $c(Q)^{r}$ is any vertex which is the image of the root of $Q^{r}$ in a homomorphism from $Q^{r}$ to $c(Q)^{r}$ . By assumption, $c(Q)^{r}$ has treewidth $k$ .

Clearly, $\mathcal{F}\text{-}\mathsf{WL}$ is upper bounded by $\mathcal{F}\cup\{Q^{r}\}\text{-}\mathsf{WL}$ . Thus, all we need for the proof is to find two graphs that are indistinguishable by $\mathcal{F}\text{-}\mathsf{WL}$ but are in fact distinguished by $\mathcal{F}\cup\{Q^{r}\}\text{-}\mathsf{WL}$ .

Those two graphs are, in fact, the graphs $G$ and $H$ constructed for $c(Q)^{r}$ (of treewidth $k$ ) in the proof of Theorem 2. From that proof, we know that:

(a)

$\mathsf{hom}(c(Q),G)=0$ and $\mathsf{hom}(c(Q),H)\neq 0$ ; and
(b)

$G\equiv_{\mathsf{C}^{k}}H$ .

We note that (a) immediately implies that $G$ and $H$ can be distinguished by $\mathcal{F}\cup\{Q^{r}\}\text{-}\mathsf{WL}$ . In fact, they are distinguished in already by the initial labelling in round $0$ . We next show that $G$ and $H$ are indistinguishable by $\mathcal{F}\text{-}\mathsf{WL}$ .

Let us first present a small structural result that helps us deal with patterns in $\mathcal{F}$ satisfying the second condition of the Theorem.

Lemma 4.

If a rooted pattern $P^{r}$ does not map homomorphically to $Q^{r}$ , then $\mathsf{hom}(P,G)=\mathsf{hom}(P,H)=0$

Proof.

We use the following property of graphs $G$ and $H$ , which can be directly observed from their construction (and was already noted in Atserias et al. (2007) and Bova & Chen (2019)). Define $G^{r}$ and $H^{r}$ by setting as their root any vertex $(a_{r},f)$ , for $a_{r}$ the root of $c(Q)^{r}$ . Then there is a homomorphism from $G^{r}$ to $c(Q)^{r}$ , and there is a homomorphism from $H^{r}$ to $c(Q)^{r}$ .

Now, any homomorphism $h$ from $P^{r}$ to $G$ can be extended to a homomorphism from $P^{r}$ to $Q^{r}$ : we compose $h$ with the homomorphism mentioned above from $G$ to $c(Q)^{r}$ , which by definition again maps homomorphically to $Q^{r}$ . Since by definition we have that $P^{r}$ does not map to $Q^{r}$ , $h$ cannot exist. The proof for $H$ is analogous. ∎

Now, let $\mathcal{F}^{\prime}$ be the set of patterns obtained by removing from $\mathcal{F}$ all patterns which do not map homomorphically to $Q^{r}$ . By Lemma 4, we have that $G$ and $H$ are distinguished by the $\mathcal{F}\text{-}\mathsf{WL}$ -test if and only if they are distinguished by $\mathcal{F}^{\prime}\text{-}\mathsf{WL}$ .

But all patterns in $\mathcal{F}^{\prime}$ must have treewidth less than $k$ , and by (b) $G$ and $H$ are indistinguishable by $k\text{-}\mathsf{WL}$ . Proposition 2 then implies that $G$ and $H$ are indistinguishable by $\mathcal{F}\text{-}\mathsf{WL}$ , as desired.

Appendix D Connections to related work

We here provide more details of how $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ connect to $\mathsf{MPNN}\text{s}$ from the literature which also augment the initial labelling.

Vertex degrees.

We first consider so-called degree-aware $\mathsf{MPNN}\text{s}$ (Geerts et al., 2020) in which the message functions of the $\mathsf{MPNN}\text{s}$ may depend on the vertex degrees. The Graph Convolution Networks ( $\mathsf{GCN}$ s) (Kipf & Welling, 2017) are an example of such $\mathsf{MPNN}\text{s}$ . Degree-aware $\mathsf{MPNN}\text{s}$ are known to be equivalent, in terms of expressive power, to standard $\mathsf{MPNN}\text{s}$ in which the initial labelling is extended with vertex degrees (Geerts et al., 2020). Translated to our setting, we can simply let $\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{L1.pdf}}\}$ since $\mathsf{hom}(\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{L1.pdf}},G^{v})$ is equal to the degree of vertex $v$ in $G$ . When considering graphs without an initial vertex labelling (or a uniform labelling which assigns every vertex the same label), our characterisation (Theorem 1) implies $G\equiv_{\raisebox{0.0pt}{\includegraphics[height=6.02777pt]{L1.pdf}}\text{-}\mathsf{WL}}^{(d)}H$ if and only if $\mathsf{hom}(T,G)=\mathsf{hom}(T,H)$ for every $\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{L1.pdf}}\}$ -pattern tree of depth at most $d$ . This in turn is equivalent to $\mathsf{hom}(T,G)=\mathsf{hom}(T,H)$ for every (standard) tree of depth at most $d+1$ . Indeed, $\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{L1.pdf}}\}$ -pattern trees of depth at most $d$ are simply trees of depth $d+1$ . Combining this with the characterisation of $\mathsf{WL}$ by Dvorak (2010) and Dell et al. (2018), we thus have for unlabelled graphs that $G\equiv_{\raisebox{0.0pt}{\includegraphics[height=6.02777pt]{L1.pdf}}\text{-}\mathsf{WL}}^{(d)}H$ if and only if $G\equiv_{\mathsf{WL}}^{(d+1)}H$ . So, by considering $\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{L1.pdf}}\}$ - $\mathsf{MPNN}\text{s}$ one gains one round of computation compared to considering standard $\mathsf{MPNN}\text{s}$ . To lift this to labeled graphs, instead of $\mathcal{F}=\{\raisebox{0.0pt}{\includegraphics[height=12.91663pt]{L1.pdf}}\}$ one has to include labeled versions of the single edge pattern, in order to count the number of neighbours of a specific label for each vertex. This is done, e.g., by Ishiguro et al. (2020), who use the $\mathsf{WL}$ labelling obtained after the first round to augment the initial vertex labelling. This corresponds indeed by adding $\mathsf{hom}(T^{r},G^{v})$ as feature for every labeled tree of depth one. This results in that $G\equiv_{\raisebox{0.0pt}{\includegraphics[height=6.02777pt]{L1.pdf}}\text{-}\mathsf{WL}}^{(d)}H$ if and only if $G\equiv_{\mathsf{WL}}^{(d+1)}H$ for labelled graphs.

Walk counts.

The Graph Feature Networks by Chen et al. (2019a) can be regarded as a generalisation of the previous approach. Instead of simply adding vertex degrees, the number of walks of certain lengths emanating from vertices are added. Translated to our setting, this corresponds to considering $\{L_{2},L_{3},\ldots,L_{\ell}\}$ - $\mathsf{MPNN}\text{s}$ , where $L_{\ell}$ denotes a rooted path of length $\ell$ . For unlabelled graphs, our characterisation (Theorem 1) implies that $G\equiv_{L_{1},\ldots,L_{\ell}\text{-}\mathsf{WL}}^{(d)}H$ is upper bounded by $G\equiv_{\mathsf{WL}}^{(d+\ell)}H$ , simply because every $\{L_{2},L_{3},\ldots,L_{\ell}\}$ -pattern tree of depth $d$ is a standard tree of depth at most $d+\ell$ .

Cycles.

Li et al. (2019) extend $\mathsf{MPNN}\text{s}$ by varying the notion of neighbourhood over which is aggregated. One particular instance corresponds to an aggregation of features, weighted by the number of cycles of a certain length in each vertex (see discussion at the end of Section 4 in Li et al. (2019)). Translated to our setting, this corresponds to considering $\{C_{\ell}\}$ - $\mathsf{MPNN}\text{s}$ where $C_{\ell}$ denotes the cycle of length $\ell$ . As mentioned in the main body of the paper, these extend $\mathsf{MPNN}\text{s}$ and result in architectures bounded by $2\text{-}\mathsf{WL}$ (Proposition 4). This is in line with Theorem 3 from Li et al. (2019) stating that their framework strictly extends $\mathsf{MPNN}\text{s}$ and thus $1\text{-}\mathsf{WL}$ .

Isomorphism counts.

Another, albeit similar, approach to add structural information to the initial labelling is taken in the paper Graph Structure Networks by Bouritsas et al. (2020). The idea there is to extend the initial features with information about how often a vertex $v$ appears in a subgraph of $G$ which is isomorphic to $P$ . More precisely, Bouritsas et al. (2020) consider a connected unlabelled graph $P$ as pattern and partition its vertex set $V_{P}$ orbit-wise. That is, $V_{P}=\biguplus_{i=1}^{o_{P}}V_{P}^{i}$ where $o_{P}$ denotes the number of orbits of $P$ . Here, $v,v^{\prime}\in V_{P}^{i}$ whenever there is an automorphism $h$ in $\mathsf{Aut}(P)$ mapping $v$ to $v^{\prime}$ . Next, they consider all distinct subgraphs $G_{1},\ldots,G_{k}$ in $G$ which are isomorphic to $P$ , denoted by $P\cong G_{j}$ for $j\in[k]$ . We write $P\cong_{f}G_{j}$ when $P\cong G_{j}$ using a specific isomorphism $f$ . Then for each orbit partition $i\in[o_{P}]$ and vertex $v\in V$ , they define:

\mathsf{iso}(P,G,v,i)=|\{G_{j}\cong P\mid v\in V_{G_{j}},\text{and there exists an $f$ s.t. }G_{j}\cong_{f}P\text{ and }f(v)\in V_{P}^{i},j\in[k]\}|.

That is, the number of subgraphs $G_{j}$ in $G$ that can be isomorphically mapped to $P$ are counted, provided that this can be done by an isomorphism which maps vertex $v$ in $G_{j}$ (and thus $G$ ) to one of the vertices in the $i$ th orbit partition $V_{P}^{i}$ of the pattern. A similar notion is proposed for edges, which we will not consider here. Similar to our extended features, the initial features of each vertex $v$ is then augmented with $\big{(}\mathsf{iso}(P,G^{v},i)\mid P\in\mathcal{F},i\in[o_{P}]\big{)}$ for some set $\mathcal{F}$ of patterns. Standard $\mathsf{MPNN}\text{s}$ are executed on these augmented initial features. We refer to Bouritsas et al. (2020) for more details.

We can view the above approach as an instance of our framework. Indeed, given a pattern $P$ in $\mathcal{F}$ , for each orbit partition, we replace $P$ by a different rooted version $P^{r_{i}}$ , where $r_{i}$ is a vertex in $V_{P}^{i}$ . Which vertex in the orbit under consideration is selected as root is not important (because they are equivalent by definition of orbit). We then see that the standard notion of subgraph isomorphism counting directly translates to the quantity used in Bouritsas et al. (2020):

\mathsf{sub}(P^{r_{i}},G^{v}):=\text{number of subgraphs in $G$ containing $v$, isomorphic to $P^{r_{i}}$}=\mathsf{iso}(P,G,v,i).

It thus remains to express $\mathsf{sub}(P^{r_{i}},G^{v})$ in terms of homomorphism counts. This, however, follows from Curticapean et al. (2017) in which it is shown that $\mathsf{iso}(P^{r_{i}},G^{v})$ can be computed by a linear combination of $\mathsf{hom}(Q^{r_{i}},G^{v})$ where $Q^{r_{i}}$ ranges over all graphs on which $P^{r_{i}}$ can be mapped by means of a surjective homomorphism. For a given $P^{r_{i}}$ , the finite set of such patterns is called the spasm of $P^{r_{i}}$ in Curticapean et al. (2017) and can be easily computed.

In summary, given the set $\mathcal{F}$ of patterns in Bouritsas et al. (2020), we first replace every $P\in\mathcal{F}$ by its rooted versions $P^{r_{i}}$ , for $i\in[o_{P}]$ , and then expand the resulting set of rooted patterns, by the spasms of each of these patterns. Let us denote by $\mathcal{F}^{*}$ the final set of rooted patterns. It now follows that $\mathsf{hom}(Q^{r},G^{v})$ for $Q^{r}\in\mathcal{F}^{*}$ provides sufficient information to extract $\mathsf{sub}(P^{r_{i}},G^{v})$ and thus also $\mathsf{iso}(P,G,v,i)$ for every $P\in\mathcal{F}$ and orbit part $i\in[o_{P}]$ . As a consequence, the $\mathsf{MPNN}\text{s}$ from Bouritsas et al. (2020) are bounded by $\mathcal{F}^{*}$ - $\mathsf{MPNN}\text{s}$ and thus $\mathcal{F}^{*}\text{-}\mathsf{WL}$ . Conversely, given an $\mathcal{F}$ - $\mathsf{MPNN}$ one can, again using results by Curticapean et al. (2017), define a set $\mathcal{F}^{+}$ of patterns, such that the subgraph isomorphism counts of patterns in $\mathcal{F}^{+}$ can be used to compute the homomorphism counts of patterns in $\mathcal{F}$ . Hence, $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ are upper bounded by the $\mathsf{MPNN}\text{s}$ considered in Bouritsas et al. (2020) using patterns in $\mathcal{F}^{+}$ . This is in agreement with Curticapean et al. (2017) in which it is shown that homomorphism counts, subgraph isomorphism counts and other notions of pattern counts are all interchangeable. Nevertheless, by using homomorphism counts one can gracefully extend known results about $\mathsf{WL}$ and $\mathsf{MPNN}\text{s}$ , as we have shown in the paper, and add little overhead.

Appendix E Additional experimental information

E.1 Experimental setup

One of the crucial questions when studying the effect of adding structural information to the initial vertex labels is whether these additional labels enhance the performance of graph neural networks. In order to reduce the effect of specific implementation details of $\mathsf{GNN}\text{s}$ and choice of hyper-parameters, we start from the $\mathsf{GNN}$ implementations and choices made in the benchmark by Dwivedi et al. (2020)⁷⁷7The original implementations can be found on https://github.com/graphdeeplearning/benchmarking-gnns. and only change the initial vertex labels, while leaving the $\mathsf{GNN}\text{s}$ themselves unchanged. This ensures that we only measure the effect of augmenting initial features with homomorphism counts. We will use the $\mathsf{GNN}\text{s}$ from the benchmark, without extended features, as our baselines. For the same reasons, we use datasets proposed in the benchmark for their ability to statistically separate the performance of $\mathsf{GNN}\text{s}$ . All other parameters are taken as in Dwivedi et al. (2020) and we refer to that paper for more details.

Selected $\mathsf{GNN}\text{s}$

Dwivedi et al. (2020) divide the benchmarked $\mathsf{GNN}\text{s}$ into two classes: the $\mathsf{MPNN}\text{s}$ and the “theoretically designed” $\mathsf{WL}$ - $\mathsf{GNN}\text{s}$ . The first class is found to perform stronger and train faster. Hence, we chose to include the five following $\mathsf{MPNN}$ models from the benchmark:

•

Graph Attention Network ( $\mathsf{GAT}$ ) as described in Velickovic et al. (2018)
•

Graph Convolutional Network ( $\mathsf{GCN}$ ) as described in Kipf & Welling (2017)
•

$\mathsf{GraphSage}$ as described in Hamilton et al. (2017)
•

Mixed Model Convolutional Networks ( $\mathsf{MoNet}$ ) as described in Monti et al. (2017)
•

$\mathsf{GatedGCN}$ as described in Bresson & Laurent (2017).

For $\mathsf{GatedGCN}$ we used the version in which positional encoding (Belkin & Niyogi, 2003) is added to the vertex features, as it is empirically shown to be the strongest performing version of this model by for the selected datasets (Dwivedi et al., 2020). We denote this version by $\mathsf{GatedGCN}_{E,PE}$ , referring to the presence of edge features and this positional encoding. Details, background and a mathematical formalisation of the message passing layers of these models can be found in the supplementary material of Dwivedi et al. (2020).

As explained in the experimental section of the main paper, we enhance the vertex features with the log-normalised counts of the chosen patterns in every vertex of every graph of every dataset. The first layers of some models of (Dwivedi et al., 2020) are adapted to take in this variation in input size. All other layers where left identical to their original implementation as provided by Dwivedi et al. (2020).

Hardware, compute and resources

All models for ZINC, PATTERN and COLLAB were trained on a GeForce GTX 1080 337 Ti GPU, for CLUSTER a Tesla V100-SXM3-32GB GPU was used. Tables 7, 10, 13 and 16 report the training times for all combination of models and additional feature set. A rough estimate of the CO₂ emissions based on the total computing times of reported experiments ( $2\,074$ hours GeForce GTX 1080, $372$ hours Tesla V100-SXM3-32GB), the computing times of not-included experiments ( $1\,037$ hours GeForce GTX 1080, $181$ hours Tesla V100-SXM3-32GB), the GPU types (GeForce GTX 1080, Tesla V100-SXM3-32GB) and the geographical location of our cluster results in a carbon emission of $135$ kg CO₂ equivalent. This estimation was conducted using the MachineLearning Impact calculator presented in Lacoste et al. (2019).

E.2 Graph learning tasks

We here report the full results of our experimental evaluation for graph regression (Section E.2.1), link prediction (Section E.2.2) and vertex classification (Section E.2.3) as considered in Dwivedi et al. (2020). More precisely, a full listing of the patterns and combinations used and the obtained results for the test sets can be found in Tables 5, 8, 11 and 14. Average training time (in hours) and the number of epochs are reported in Tables 7, 10, 13 and 16. Finally, the total number of model parameters are reported in Tables 6, 9, 12 and 15. All averages and standard deviations are over 4 runs with different random seeds. The main take-aways from these results are included in the main paper.

E.2.1 Graph regression with the ZINC dataset

Just as in Dwivedi et al. (2020) we use a subset ( $12$ K) of ZINC molecular graphs ( $250$ K) dataset (Irwin et al., 2012b) to regress a molecular property known as the constrained solubility. For each molecular graph, the vertex features are the types of heavy atoms and the edge features are the types of bonds between them. The following are taken from Dwivedi et al. (2020):
Splitting. ZINC has $10\,000$ train, $1\,000$ validation and $1\,000$ test graphs.
Training.⁸⁸8Here and in the next tasks we are using the parameters used in the code accompanying Dwivedi et al. (2020). In the paper, slightly different parameters are used. For the learning rate strategy, an initial learning rate is set to $5\times 10^{-5}$ , the reduce factor is $0.5$ , and the stopping learning rate is $1\times 10^{-6}$ , the patience value is 25 and the maximal training time is set to $12$ hours.
Performance Measure The performance measure is the mean absolute error (MAE) between the predicted and the ground truth constrained solubility for each molecular graph.
Number of layers 16 MPNN layers are used for every model.

Table 5: Full results of the mean absolute error (predicted constrained solubility vs. the ground truth) for selected cycle combinations and

\mathsf{GNN}\text{s}

on the ZINC data set. In the last two rows we compare between homomorphism counts (hom) and subgraph isomorphism counts (iso).

Pattern set $\mathcal{F}$	$\mathsf{GAT}$	$\mathsf{GCN}$	$\mathsf{GraphSage}$	$\mathsf{MoNet}$	$\mathsf{GatedGCN}_{E,PE}$
None	0,47 $\pm$ 0,02	0,35 $\pm$ 0,01	0,25 $\pm$ 0,01	0,44 $\pm$ 0,01	0,34 $\pm$ 0,05
$\{C_{3}\}$	0,45 $\pm$ 0,01	0,36 $\pm$ 0,01	0,25 $\pm$ 0,00	0,44 $\pm$ 0,00	0,30 $\pm$ 0,01
$\{C_{4}\}$	0,34 $\pm$ 0,02	0,29 $\pm$ 0,02	0,26 $\pm$ 0,01	0,30 $\pm$ 0,01	0,27 $\pm$ 0,06
$\{C_{5}\}$	0,44 $\pm$ 0,02	0,34 $\pm$ 0,02	0,23 $\pm$ 0,01	0,42 $\pm$ 0,01	0,27 $\pm$ 0,03
$\{C_{6}\}$	0,31 $\pm$ 0,00	0,27 $\pm$ 0,02	0,25 $\pm$ 0,01	0,30 $\pm$ 0,01	0,26 $\pm$ 0,09
$\{C_{3},C_{4}\}$	0,33 $\pm$ 0,01	0,27 $\pm$ 0,01	0,24 $\pm$ 0,02	0,32 $\pm$ 0,01	0,23 $\pm$ 0,03
$\{C_{5},C_{6}\}$	0,28 $\pm$ 0,01	0,26 $\pm$ 0,01	0,23 $\pm$ 0,01	0,28 $\pm$ 0,01	0,20 $\pm$ 0,03
$\{C_{4},C_{5},C_{6}\}$	0,24 $\pm$ 0,00	0,21 $\pm$ 0,00	0,20 $\pm$ 0,00	0,25 $\pm$ 0,01	0,16 $\pm$ 0,02
$\{C_{3},C_{4},C_{5},C_{6}\}$	0,23 $\pm$ 0,00	0,21 $\pm$ 0,00	0,20 $\pm$ 0,01	0,26 $\pm$ 0,02	0,18 $\pm$ 0,02
$\{C_{3},\ldots,C_{10}\}$ (hom)	0,22 $\pm$ 0,01	0,20 $\pm$ 0,00	0,19 $\pm$ 0,00	0,2376 $\pm$ 0,01	0,1352 $\pm$ 0,01
$\{C_{3},\ldots,C_{10}\}$ (iso)	0,24 $\pm$ 0,01	0,22 $\pm$ 0,01	0,16 $\pm$ 0,01	0,2408 $\pm$ 0,01	0,1357 $\pm$ 0,01

Table 6: Total model parameters for selected cycle combinations and

\mathsf{GNN}\text{s}

on the ZINC data set. In the last two rows we compare between homomorphism counts (hom) and subgraph isomorphism counts (iso).

Pattern set $\mathcal{F}$	$\mathsf{GAT}$	$\mathsf{GCN}$	$\mathsf{GraphSage}$	$\mathsf{MoNet}$	$\mathsf{GatedGCN}_{E,PE}$
None	358 273	360 742	388 963	401 148	408 135
$\{C_{3}\}$	358 417	360 887	389 071	401 238	408 205
$\{C_{4}\}$	358 417	360 887	389 071	401 238	408 205
$\{C_{5}\}$	358 417	360 887	389 071	401 238	408 205
$\{C_{6}\}$	358 417	360 887	389 071	401 238	408 205
$\{C_{3},C_{4}\}$	358 561	361 032	389 179	401 328	408 275
$\{C_{5},C_{6}\}$	358 561	361 032	389 179	401 328	408 275
$\{C_{4},C_{5},C_{6}\}$	358 705	361 177	389 287	401 418	408 345
$\{C_{3},C_{4},C_{5},C_{6}\}$	358 849	361 322	389 395	401 508	408 415
$\{C_{3},\ldots,C_{10}\}$ (hom)	359 425	361 902	389 827	401 868	408 695
$\{C_{3},\ldots,C_{10}\}$ (iso)	359 425	361 902	389 827	401 868	408 695

Table 7: Average training time in hours and number of epochs for selected cycle combinations and

\mathsf{GNN}\text{s}

on the ZINC data set. In the last two rows we compare between homomorphism counts (hom) and subgraph isomorphism counts (iso).

Model: $\mathsf{GAT}$ $\mathsf{GCN}$ $\mathsf{GraphSage}$ $\mathsf{MoNet}$ $\mathsf{GatedGCN}_{E,PE}$ Pattern set $\mathcal{F}$ Time Epochs Time Epochs Time Epochs Time Epochs Time Epochs None 2,40 377 10,99 463 2,46 420 1,53 345 12,08 136 $\{C_{3}\}$ 2,88 444 12,03 363 2,03 500 0,91 298 12,07 148 $\{C_{4}\}$ 2,30 351 11,36 324 2,31 396 1,70 382 12,06 139 $\{C_{5}\}$ 2,42 375 12,03 333 1,70 444 1,06 370 12,06 202 $\{C_{6}\}$ 2,40 369 9,98 421 2,58 446 1,25 288 12,08 136 $\{C_{3},C_{4}\}$ 2,98 461 12,03 332 2,56 458 1,41 321 12,09 132 $\{C_{5},C_{6}\}$ 2,76 422 12,04 319 2,67 464 1,53 356 12,06 137 $\{C_{4},C_{5},C_{6}\}$ 2,45 381 10,13 419 1,67 463 1,04 382 12,04 229 $\{C_{3},C_{4},C_{5},C_{6}\}$ 2,65 408 10,38 420 2,09 503 1,26 364 12,08 135 $\{C_{3},\ldots,C_{10}\}$ (hom) 2,65 428 12,03 350 2,76 478 1,48 363 12,06 175 $\{C_{3},\ldots,C_{10}\}$ (iso) 2,78 497 11,72 419 2,63 547 1,58 440 11,62 148

E.2.2 Link Prediction with the Collab dataset

Another set used in Dwivedi et al. (2020) is COLLAB, a link prediction dataset proposed by the Open Graph Benchmark (OGB) (Hu et al., 2020) corresponding to a collaboration network between approximately $235$ K scientists, indexed by Microsoft Academic Graph. Vertices represent scientists and edges denote collaborations between them. For vertex features, OGB provides $128$ -dimensional vectors, obtained by averaging the word embeddings of a scientist’s papers. The year and number of co-authored papers in a given year are concatenated to form edge features. The graph can also be viewed as a dynamic multi-graph, since two vertices may have multiple temporal edges between if they collaborate over multiple years. The following are taken from Dwivedi et al. (2020):

Splitting. We use the real-life training, validation and test edge splits provided by OGB. Specifically, they use collaborations until 2017 as training edges, those in 2018 as validation edges, and those in 2019 as test edges.
Training. All GNNs use the same learning rate strategy: an initial learning rate is set to $1\times 10^{-3}$ , the reduce factor is $0.5$ , the patience value is 10, and the stopping learning rate is $1\times 10^{-5}$ .
Performance Measure. We use the evaluator provided by OGB (Hu et al., 2020), which aims to measure a model’s ability to predict future collaboration relationships given past collaborations. Specifically, they rank each true collaboration among a set of $100\,000$ randomly-sampled negative collaborations, and count the ratio of positive edges that are ranked at $K$ -place or above (Hits@K). The value $K=50$ as this gives the best value for statistically separating the performance of $\mathsf{GNN}\text{s}$ . Number of layers 3 MPNN layers are used for every model.

Table 8: Full Results (Hits @50) for all selected pattern combinations and

\mathsf{GNN}\text{s}

on the COLLAB data set.

Pattern set $\mathcal{F}$	$\mathsf{GAT}$	$\mathsf{GCN}$	$\mathsf{GraphSage}$	$\mathsf{MoNet}$	$\mathsf{GatedGCN}_{E,PE}$
None	50,32 $\pm$ 0,55	51,36 $\pm$ 1,30	49,81 $\pm$ 1,56	50,33 $\pm$ 0,68	51,00 $\pm$ 2,54
$\{K_{3}\}$	52,87 $\pm$ 0,87	53,57 $\pm$ 0,89	50,18 $\pm$ 1,38	51,10 $\pm$ 0,38	51,57 $\pm$ 0,68
$\{K_{4}\}$	51,33 $\pm$ 1,42	52,84 $\pm$ 1,32	51,76 $\pm$ 1,38	51,13 $\pm$ 1,60	49,43 $\pm$ 1,85
$\{K_{5}\}$	52,41 $\pm$ 0,89	54,60 $\pm$ 1,01	50,94 $\pm$ 1,30	51,39 $\pm$ 1,23	50,31 $\pm$ 1,59
$\{K_{3},K_{4}\}$	52,68 $\pm$ 1,82	53,49 $\pm$ 1,35	50,88 $\pm$ 1,73	50,97 $\pm$ 0,68	51,36 $\pm$ 0,92
$\{K_{3},K_{4},K_{5}\}$	51,81 $\pm$ 1,17	54,32 $\pm$ 1,02	49,94 $\pm$ 0,23	51,01 $\pm$ 1,00	51,11 $\pm$ 1,06

Table 9: Total number of model parameters for all selected pattern combinations and

\mathsf{GNN}\text{s}

on the COLLAB data set.

Pattern set $\mathcal{F}$	$\mathsf{GAT}$	$\mathsf{GCN}$	$\mathsf{GraphSage}$	$\mathsf{MoNet}$	$\mathsf{GatedGCN}_{E,PE}$
None	25 992	40 479	39 751	26 487	27 440
$\{K_{3}\}$	26 049	40 553	39 804	26 525	27 475
$\{K_{4}\}$	26 049	40 553	39 804	26 525	27 475
$\{K_{5}\}$	26 049	40 553	39 804	26 525	27 475
$\{K_{3},K_{4}\}$	26 106	40 627	39 857	26 563	27 510
$\{K_{3},K_{4},K_{5}\}$	26 163	40 701	39 910	26 601	27 545

Table 10: Average training times and number of epochs for all selected pattern combinations and

\mathsf{GNN}\text{s}

on the COLLAB data set.

Model: $\mathsf{GAT}$ $\mathsf{GCN}$ $\mathsf{MoNet}$ $\mathsf{GraphSage}$ $\mathsf{GatedGCN}_{E,PE}$ Pattern set $\mathcal{F}$ Time #Epochs Time #Epochs Time #Epochs Time #Epochs Time #Epochs None 0,81 167 0,85 141 1,62 190 12,05 115,67 2,22 167 $\{K_{3}\}$ 0,67 165 0,90 153 1,70 184 12,10 67,00 2,48 186 $\{K_{4}\}$ 1,06 188 0,95 160 2,16 188 12,04 113,50 1,26 188 $\{K_{5}\}$ 0,50 167 1,13 165 1,04 193 12,05 124,00 1,82 174 $\{K_{3},K_{4}\}$ 1,20 189 0,86 128 2,15 189 12,05 113,25 1,51 183 $\{K_{3},K_{4},K_{5}\}$ 0,44 149 0,90 134 0,98 186 12,05 124,00 1,84 177

E.2.3 Vertex classification with PATTERN and CLUSTER

Finally, also used in Dwivedi et al. (2020) are the PATTERN and CLUSTER graph data sets, generated with the Stochastic Block Model (SBM) (Abbe, 2018), which is widely used to model communities in social networks by modulating the intra- and extra-communities connections, thereby controlling the difficulty of the task. A SBM is a random graph which assigns communities to each vertex as follows: any two vertices are connected with probability $p$ if they belong to the same community, or they are connected with probability $q$ if they belong to different communities (the value of $q$ acts as the noise level).

For the PATTERN dataset, the goal of the vertex classification problem is the detection of a certain pattern $P$ embedded in a larger graph $G$ . The graphs in $G$ consist of $5$ communities with sizes randomly selected between $[5,35]$ . The parameters of the SBM for each community is $p=0.5$ , $q=0.35$ , and the vertex features in $G$ are generated using a uniform random distribution with a vocabulary of size $3$ , i.e., $\{0,1,2\}$ . Randomly, $100$ patterns $P$ composed of $20$ vertices with intra-probability $p_{P}=0.5$ and extra-probability $q_{P}=0.5$ are generated (i.e., $50\%$ of vertices in $P$ are connected to $G$ ). The vertex features for $P$ are also generated randomly using values in $\{0,1,2\}$ . The graphs consist of $44$ - $188$ vertices. The output vertex labels have value $1$ if the vertex belongs to $P$ and value $0$ belongs to $G$ .

For the CLUSTER dataset, the goal of the vertex classification is the detection of which cluster a vertex belongs. Here, six SBM clusters are generated with sizes randomly selected between $[5,35]$ and probabilities $p=0.55$ and $q=0.25$ . The graphs consist of $40$ - $190$ vertices. Each vertex can take an initial feature value in range $\{0,1,2,\ldots,6\}$ . If the value is $i$ then the vertex belongs to class $i-1$ . If the value is $0$ , then the class of the vertex is unknown and need to be inferred. There is only one labelled vertex that is randomly assigned to each community and most vertex features are set to $0$ . The output vertex labels are defined as the community/cluster class labels.

The following are taken from Dwivedi et al. (2020):

Splitting The PATTERN dataset has $10\,000$ train, $2\,000$ validation and $2\,000$ test graphs. The CLUSTER dataset has $10\,000$ train, $1\,000$ validation and $1\,000$ test graphs. We save the generated splits and use the same sets in all models for fair comparison.
Training For all $\mathsf{GNN}\text{s}$ , an initial learning rate is set to $1\times 10^{-3}$ , the reduce factor is $0.5$ , the patience value is $10$ , and the stopping learning rate is $1\times 10^{-5}$ .
Performance measure The performance measure is the average vertex-level accuracy weighted with respect to the class sizes. Number of layers 16 MPNN layers are used for every model.

Table 11: Full results of the weighted accuracy for selected pattern combinations and

\mathsf{GNN}\text{s}

on the CLUSTER data set.

Pattern set $\mathcal{F}$	$\mathsf{GAT}$	$\mathsf{GCN}$	$\mathsf{MoNet}$	$\mathsf{GraphSage}$	$\mathsf{GatedGCN}_{E,PE}$
None	70,86 $\pm$ 0,06	70,64 $\pm$ 0,39	71,15 $\pm$ 0,33	72,25 $\pm$ 0,52	74,28 $\pm$ 0,15
$\{K_{3}\}$	71,60 $\pm$ 0,15	64,88 $\pm$ 4,16	72,21 $\pm$ 0,19	72,97 $\pm$ 0,23	74,14 $\pm$ 0,12
$\{K_{4}\}$	71,40 $\pm$ 0,24	60,64 $\pm$ 2,93	72,14 $\pm$ 0,19	72,57 $\pm$ 0,19	74,16 $\pm$ 0,24
$\{K_{5}\}$	71,26 $\pm$ 0,39	66,60 $\pm$ 1,47	72,34 $\pm$ 0,09	72,60 $\pm$ 0,24	74,23 $\pm$ 0,07
$\{K_{3},K_{4}\}$	71,80 $\pm$ 0,28	50,94 $\pm$ 22,98	72,32 $\pm$ 0,27	73,03 $\pm$ 0,25	74,17 $\pm$ 0,13
$\{K_{3},K_{4},K_{5}\}$	71,63 $\pm$ 0,26	63,03 $\pm$ 3,72	72,32 $\pm$ 0,36	72,65 $\pm$ 0,13	74,03 $\pm$ 0,19

Table 12: Total number of model parameters for all selected pattern combinations and

\mathsf{GNN}\text{s}

on the CLUSTER data set.

Pattern set $\mathcal{F}$	$\mathsf{GAT}$	$\mathsf{GCN}$	$\mathsf{MoNet}$	$\mathsf{GraphSage}$	$\mathsf{GatedGCN}_{E,PE}$
None	395 396	362 849	399 373	386 835	406 755
None	395 396	362 849	399 373	386 835	406 755
$\{K_{3}\}$	395 396	362 849	399 373	386 835	406 755
$\{K_{4}\}$	395 548	362 995	399 463	386 943	406 825
$\{K_{5}\}$	395 700	363 141	399 553	387 051	406 895
$\{K_{3},K_{4}\}$	395 700	363 141	399 553	387 051	406 895
$\{K_{3},K_{4},K_{5}\}$	396 004	363 433	399 733	387 267	407 035

Table 13: Training times (in hours) and number of epochs for all selected pattern combinations and

\mathsf{GNN}\text{s}

on the CLUSTER data set.

Model: GAT $\mathsf{GCN}$ $\mathsf{MoNet}$ $\mathsf{GraphSage}$ $\mathsf{GatedGCN}_{E,PE}$ Pattern set $\mathcal{F}$ Time #Epochs Time #Epochs Time #Epochs Time #Epochs Time #Epochs None 1,62 109 2,83 117 1,54 125 0,95 101 10,40 92 $\{K_{3}\}$ 1,52 107 2,67 85 1,72 145 1,08 102 11,01 89 $\{K_{4}\}$ 1,18 107 1,94 80 1,62 149 0,90 102 10,23 90 $\{K_{5}\}$ 1,23 106 2,30 84 1,68 143 0,92 99 10,68 91 $\{K_{3},K_{4}\}$ 1,53 102 1,97 82 1,89 153 0,94 99 10,80 90 $\{K_{3},K_{4},K_{5}\}$ 1,62 105 1,96 82 1,95 157 0,97 100 10,25 91

Table 14: Full results of the weighted accuracy for selected pattern combinations and

\mathsf{GNN}\text{s}

on the PATTERN data set.

Pattern set $\mathcal{F}$	$\mathsf{GAT}$	$\mathsf{GCN}$	$\mathsf{MoNet}$	$\mathsf{GraphSage}$	$\mathsf{GatedGCN}_{E,PE}$
None	78,83 $\pm$ 0,60	71,42 $\pm$ 1,38	85,90 $\pm$ 0,03	70,78 $\pm$ 0,19	86,15 $\pm$ 0,08
$\{K_{3}\}$	84,34 $\pm$ 0,09	61,54 $\pm$ 2,20	86,59 $\pm$ 0,02	84,75 $\pm$ 0,11	85,02 $\pm$ 0,20
$\{K_{4}\}$	84,43 $\pm$ 0,40	63,40 $\pm$ 1,55	86,60 $\pm$ 0,02	84,51 $\pm$ 0,06	85,40 $\pm$ 0,28
$\{K_{5}\}$	83,47 $\pm$ 0,11	64,18 $\pm$ 3,88	86,57 $\pm$ 0,02	83,73 $\pm$ 0,10	85,63 $\pm$ 0,22
$\{K_{3},K_{4}\}$	85,44 $\pm$ 0,24	81,29 $\pm$ 2,82	86,58 $\pm$ 0,02	85,85 $\pm$ 0,13	85,80 $\pm$ 0,20
$\{K_{3},K_{4},K_{5}\}$	85,50 $\pm$ 0,23	82,49 $\pm$ 0,48	86,63 $\pm$ 0,03	85,88 $\pm$ 0,15	85,56 $\pm$ 0,33

Table 15: Total number of model parameters for selected pattern combinations and

\mathsf{GNN}\text{s}

on the PATTERN data set.

Pattern set $\mathcal{F}$	$\mathsf{GAT}$	$\mathsf{GCN}$	$\mathsf{MoNet}$	$\mathsf{GraphSage}$	$\mathsf{GatedGCN}_{E,PE}$
None	394 632	362 117	398 921	386 291	406 403
$\{K_{3}\}$	394 784	362 263	399 011	386 399	406 473
$\{K_{4}\}$	394 784	362 263	399 011	386 399	406 473
$\{K_{5}\}$	394 784	362 263	399 011	386 399	406 473
$\{K_{3},K_{4}\}$	394 936	362 409	399 101	386 507	406 543
$\{K_{3},K_{4},K_{5}\}$	395 088	362 555	399 191	386 615	406 613

Table 16: Training times (in hours) and number of epochs for selected pattern combinations and

\mathsf{GNN}\text{s}

on the PATTERN data set.

Model: GAT $\mathsf{GCN}$ $\mathsf{MoNet}$ $\mathsf{GraphSage}$ $\mathsf{GatedGCN}_{E,PE}$ Pattern set $\mathcal{F}$ Time Epochs Time Epochs Time Epochs Time Epochs Time Epochs None 1,96 87 3,41 102 1,68 116 0,77 103 10,32 101 $\{K_{3}\}$ 0,97 97 2,58 80 1,42 107 0,69 105 9,12 95 $\{K_{4}\}$ 0,90 90 2,68 80 1,46 106 0,67 95 9,47 94 $\{K_{5}\}$ 0,89 95 2,36 80 1,26 100 0,58 98 9,14 99 $\{K_{3},K_{4}\}$ 2,11 91 3,62 98 1,68 108 0,86 97 9,50 87 $\{K_{3},K_{4},K_{5}\}$ 1,02 91 3,26 94 1,48 109 0,76 102 8,84 88

Graph Neural Networks with Local Graph Parameters

Abstract

1 Introduction

Context.

Our contributions.

Related Work.

2 Local Graph Parameter Enabled 𝗠𝗣𝗡𝗡​s\mathsf{MPNN}\text{s}

Graphs.

Message passing neural networks.

𝗠𝗣𝗡𝗡​s\mathsf{MPNN}\text{s} with local graph parameters.

𝓕\mathcal{F}-𝗠𝗣𝗡𝗡​s\mathsf{MPNN}\text{s} as unifying framework.

3 Expressive Power of 𝓕\mathcal{F}-𝗠𝗣𝗡𝗡​s\mathsf{MPNN}\text{s}

3.1 Characterisation in terms of 𝓕​-​𝗪𝗟\mathcal{F}\text{-}\mathsf{WL}

Proposition 1.

3.2 Characterisation in terms of 𝓕\mathcal{F}-pattern trees

Theorem 1.

4 A Comparison with the 𝒌​-​𝗪𝗟k\text{-}\mathsf{WL}-test

Proposition 2.

Theorem 2.

Proposition 3.

5 When Do Patterns Extend Expressiveness?

Proposition 4.

5.1 Detecting when patterns are superfluous

Proposition 5.

5.2 Detecting when patterns add expressiveness

Theorem 3.

6 Experiments

Selected 𝗚𝗡𝗡​s\mathsf{GNN}\text{s}.

Learning tasks and datasets.

Patterns.

Higher-order 𝗚𝗡𝗡​s\mathsf{GNN}\text{s}.

Methodology.

Graph regression.

Vertex classification.

Link prediction

Remarks.

7 Conclusion

References

Appendix

Appendix A Proofs of Section 3

A.1 Proof of Proposition 1

A.2 Proof of Theorem 1

Appendix B Proofs of Section 4

B.1 Additional details of standard concepts

Core and treewidth.

𝒌​-​𝗪𝗟k\text{-}\mathsf{WL}.

B.2 Proof of Proposition 2

Lemma 1.

Proof.

B.3 Proof of Theorem 2

Construction of the graphs 𝑮G and 𝑯H.

Showing 𝗖𝒌\mathsf{C}^{k}-indistinguishability of 𝑮G and 𝑯H.

Lemma 2 (Lemma 14 in Bova & Chen (2019)).

B.4 Proof of Proposition 3

Lemma 3.

Proof.

Appendix C Proofs of Section 5

C.1 Proof of Proposition 4

C.2 Proof of Proposition 5

C.3 Proof of Theorem 3

Lemma 4.

Proof.

Appendix D Connections to related work

Vertex degrees.

Walk counts.

Cycles.

Isomorphism counts.

Appendix E Additional experimental information

E.1 Experimental setup

Selected 𝗚𝗡𝗡​s\mathsf{GNN}\text{s}

Hardware, compute and resources

E.2 Graph learning tasks

E.2.1 Graph regression with the ZINC dataset

E.2.2 Link Prediction with the Collab dataset

E.2.3 Vertex classification with PATTERN and CLUSTER

2 Local Graph Parameter Enabled $\mathsf{MPNN}\text{s}$

$\mathsf{MPNN}\text{s}$ with local graph parameters.

$\mathcal{F}$ - $\mathsf{MPNN}\text{s}$ as unifying framework.

3 Expressive Power of $\mathcal{F}$ - $\mathsf{MPNN}\text{s}$

3.1 Characterisation in terms of $\mathcal{F}\text{-}\mathsf{WL}$

3.2 Characterisation in terms of $\mathcal{F}$ -pattern trees

4 A Comparison with the $k\text{-}\mathsf{WL}$ -test

Selected $\mathsf{GNN}\text{s}$ .

Higher-order $\mathsf{GNN}\text{s}$ .

$k\text{-}\mathsf{WL}$ .

Construction of the graphs $G$ and $H$ .

Showing $\mathsf{C}^{k}$ -indistinguishability of $G$ and $H$ .

Selected $\mathsf{GNN}\text{s}$