This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Practical, Progressively-Expressive GNN

Lingxiao Zhao
Carnegie Mellon University
[email protected]
&Louis Härtel
RWTH Aachen University
[email protected]
&Neil Shah
Snap Inc.
[email protected]
&Leman Akoglu
Carnegie Mellon University
[email protected]
Abstract

Message passing neural networks (MPNNs) have become a dominant flavor of graph neural networks (GNNs) in recent years. Yet, MPNNs come with notable limitations; namely, they are at most as powerful as the 1-dimensional Weisfeiler-Leman (1-WL) test in distinguishing graphs in a graph isomorphism testing framework. To this end, researchers have drawn inspiration from the kk-WL hierarchy to develop more expressive GNNs. However, current kk-WL-equivalent GNNs are not practical for even small values of kk, as kk-WL becomes combinatorially more complex as kk grows. At the same time, several works have found great empirical success in graph learning tasks without highly expressive models, implying that chasing expressiveness with a “coarse-grained ruler” of expressivity like kk-WL is often unneeded in practical tasks. To truly understand the expressiveness-complexity tradeoff, one desires a more “fine-grained ruler,” which can more gradually increase expressiveness. Our work puts forth such a proposal: Namely, we first propose the (k,c)()(k,c)(\leq)-SetWL hierarchy with greatly reduced complexity from kk-WL, achieved by moving from kk-tuples of nodes to sets with k{\leq}k nodes defined over c{\leq}c connected components in the induced original graph. We show favorable theoretical results for this model in relation to kk-WL, and concretize it via (k,c)()(k,c)(\leq)-SetGNN, which is as expressive as (k,c)()(k,c)(\leq)-SetWL. Our model is practical and progressively-expressive, increasing in power with kk and cc. We demonstrate effectiveness on several benchmark datasets, achieving several state-of-the-art results with runtime and memory usage applicable to practical graphs. We open source our implementation at https://github.com/LingxiaoShawn/KCSetGNN.

1 Introduction

In recent years, graph neural networks (GNNs) have gained considerable attention [58, 60] for their ability to tackle various node-level [35, 56], link-level [66, 51] and graph-level [9, 40] learning tasks, given their ability to learn rich representations for complex graph-structured data. The common template for designing GNNs follows the message passing paradigm; these so-called message-passing neural networks (MPNNs) are built by stacking layers which encompass feature transformation and aggregation operations over the input graph [39, 25]. Despite their advantages, MPNNs have several limitations including oversmoothing [67, 12, 47], oversquashing [2, 55], inability to distinguish node identities [63] and positions [62], and expressive power [61].

Since Xu et al.’s [61] seminal work showed that MPNNs are at most as powerful as the first-order Weisfeiler-Leman (1-WL) test in the graph isomorphism (GI) testing framework, there have been several follow-up works on improving the understanding of GNN expressiveness [3, 15]. In response, the community proposed many GNN models to overcome such limitations [52, 4, 68]. Several of these aim to design powerful higher-order GNNs which are increasingly expressive [43, 41, 33, 9] by inspiration from the kk-WL hierarchy [54].

A key limitation of such higher-order models that reach beyond 1-WL is their poor scalability; in fact, these models can only be applied to small graphs with small kk in practice [40, 41, 43] due to combinatorial growth in complexity. On the other hand lies an open question on whether practical graph learning tasks indeed need such complex, and extremely expressive GNN models. Historically, Babai et al. [5] showed that almost all graphs on nn nodes can be distinguished by 1-WL. In other contexts like node classification, researchers have encountered superior generalization performance with graph-augmented multi-layer perceptron (GA-MLP) models [50, 37] compared to MPNNs, despite the former being strictly less expressive than the latter [13]. Considering that increased model complexity has negative implications in terms of overfitting and generalization [29], it is worth re-evaluating continued efforts to pursue maximally expressive GNNs in practice. Ideally, one could study these models’ generalization performance on various real-world tasks by increasing kk (expressiveness) in the kk-WL hierarchy. Yet, given the impracticality of this too-coarse “ruler” of expressiveness, which becomes infeasible beyond kk>>33, one desires a more fine-grained “ruler”, that is, a new hierarchy whose expressiveness grows more gradually. Such a hierarchy could enable us to gradually build more expressive models which admit improved scaling, and avoid unnecessary leaps in model complexity for tasks which do not require them, guarding against overfitting.

Refer to caption
Figure 1: Main steps of (k,c)()(k,c)(\leq)-SetGNN. Given a graph and (k,ck,c), we build the (k,c)(k,c)-bipartite super-graph (in middle) containing sets with up to kk nodes and cc connected components in the induced subgraph, on which a base GNN assigns initial “colors”. Bidirectional bipartite GNN layers with frw.-bckw. message passing learn set embeddings, pooled into graph embedding. The size of super-graph, and accordingly its expressiveness, grows progressively with increasing kk and cc. The 22-bipartite message passing generalizes normal GNN, edge GNN, and line graph (see Appendix.A.12).

Present Work. In this paper, we propose such a hierarchy, and an associated practical progressively-expressive GNN model, called (k,c)()(k,c)(\leq)-SetGNN, whose expressiveness can be modulated through kk and cc. In a nutshell, we take inspiration from kk-WL, yet achieve practicality through three design ideas which simplify key bottlenecks of scaling kk-WL: First, we move away from kk-tuples of the original kk-WL to kk-multisets (unordered), and then to k()k(\leq)-sets. We demonstrate that these steps drastically reduce the number nodes in the kk-WL graph while retaining significant expressiveness. (Sec.s 3.2-3.3) Second, by considering the underlying sparsity of an input graph, we reduce scope to k,c()k,c(\leq)-sets that consist of k{\leq}k nodes whose induced subgraph is comprised of ck{\leq}{c}{\leq}{k} connected components. This also yields massive reduction in the number of nodes while improving practicality; i.e. small values of cc on sparse graphs can allow one to increase kk well beyond 33 in practice. (Sec. 3.4) Third, we design a super-graph architecture that consists of a sequence of kk-11 bipartite graphs over which we learn embeddings for our k,c()k,c(\leq)-sets, using bidirectional message passing. These embeddings can be pooled to yield a final graph embedding.. (Sec.s 4.1-4.2) We also speed up initializing “colors” for the k,c()k,c(\leq)-sets, for c>1c>1, substantially reducing computation and memory. (Sec. 4.3) Fig. 1 overviews of our proposed framework. Experimentally, our (k,c)()(k,c)(\leq)-SetGNN outperforms existing state-of-the-art GNNs on simulated expressiveness as well as real-world graph-level tasks, achieving new bests on substructure counting and ZINC-12K, respectively. We show that generalization performance reflects increasing expressiveness by kk and cc. Our proposed scalable designs allow us to train models with e.g. kk==1010 or cc==33 with practical running time and memory requirements.

2 Related Work

MPNN limitations. Given the understanding that MPNNs have expressiveness upper-bounded by 1-WL [61], several researchers investigated what else MPNNs cannot learn. To this end, Chen et al. [15] showed that MPNNs cannot count induced connected substructures of 3 or more nodes, while along with [15], Arvind et al. [3] showed that MPNNs can only count star-shaped patterns. Loukas [38] further proved several results regarding decision problems on graphs (e.g. subgraph verification, cycle detection), finding that MPNNs cannot solve these problems unless strict conditions of depth and width are met. Moreover, Garg et al. [22] showed that many standard MPNNs cannot compute properties such as longest or shortest cycles, diameters or existence of cliques.

Improving expressivity. Several works aim to improve expressiveness limitations of MPNNs. One approach is to inject features into the MPNN aggregation, motivated by Loukas [38] who showed that MPNNs can be universal approximators when nodes are sufficiently distinguishable. Sato et al. [53] show that injecting random features can better enable MPNNs to solve problems like minimum dominating set and maximum matching. You et al. [63] inject cycle counts as node features, motivated by the limitation of MPNNs not being able to count cycles. Others proposed utilizing subgraph isomorphism counting to empower MPNNs with substructure information they cannot learn [10, 9]. Earlier work by You et al. [62] adds positional features to distinguish node which naïve MPNN embeddings would not. Recently, several works also propose utilizing subgraph aggregation; Zhang et al. [65], Zhao et al. [68] and Bevilacqua et al. [8] propose subgraph variants of the WL test which are no less powerful than 3-WL. Recently a following up work [21] shows that rooted subgraph based extension with 1-WL as kernel is bounded by 3-WL. The community is yet exploring several distinct avenues in overcoming limitations: Murphy et al. [46] propose a relational pooling mechanism which sums over all permutations of a permutation-sensitive function to achieve above-1-WL expressiveness. Balcilar et al. [6] generalizes spatial and spectral MPNNs and shows that instances of spectral MPNNs are more powerful than 1-WL. Azizian et al. [4] and Geerts et al. [24] unify expressivity and approximation ability results for existing GNNs.

kk-WL-inspired GNNs. The kk-WL test captures higher-order interactions in graph data by considering all kk-tuples, i.e., size kk ordered multisets defined over the set of nodes. While highly expressive, it does not scale to practical graphs beyond a very small kk (k=3k=3 pushes the practical limit even for small graphs). However, several works propose designs which can achieve kk-WL in theory and strive to make them practical. Maron et al. [41] proposes a general class of invariant graph networks kk-IGNs having exact kk-WL expressivity [23], while being not scalable. Morris et al. have several work on kk-WL and its variants like kk-LWL [42], kk-GNN [43], and δ\delta-kk-WL-GNN [44]. Both kk-LWL and kk-GNN use a variant of kk-WL that only considers kk-sets and are strictly weaker than kk-WL but much more practical. In our paper we claim that k()k(\leq)-sets should be used with additional designs to keep the best expressivity while remaining practical. The δ\delta-kk-WL-GNN works with kk-tuples but sparsifies the connections among tuples by considering locality. Qian et al. [48] extends the k-tuple to subgraph network but has same scalability bottleneck as k-WL. A recent concurrent work SpeqNet [45] proposes to reduce number of tuples by restricting the number of connected components of tuples’ induced subgraphs. The idea is independently explored in our paper in Sec.3.4. All four variants proposed by Morris et al. do not have realizations beyond k>3k>3 in their experiments while we manage to reach k=10k=10. Interestingly, a recent work [34] links graph transformer with kk-IGN that having better (or equal) expressivity, however is still not scalable.

3 A practical progressively-expressive isomorphism test: (k,c)()(k,c)(\leq)-SetWL

We first motivate our method (k,c)()(k,c)(\leq)-SetGNN from the GI testing perspective by introducing (k,c)()(k,c)(\leq)-SetWL. We first introduce notation and background of kk-WL (Sec. 3.1). Next, we show how to increase the practicality of kk-WL without reducing much of its expressivity, by removing node ordering (Sec. 3.2), node repetitions (Sec. 3.3), and leveraging graph sparsity (Sec. 3.4). We then give the complexity analysis (Sec. 3.5). We close the section with extending the idea to kk-FWL which is as expressive as (k(k+1)1)-WL (Sec. 3.6). All proofs are included in Appendix A.4.

Notation: Let G=(V(G),E(G),lG)G=(V(G),E(G),l_{G}) be an undirected, colored graph with nodes V(G)V(G), edges E(G)E(G), and a color-labeling function lG:V(G)Cl_{G}:V(G)\rightarrow C where C={c1,,cd}C=\{c_{1},...,c_{d}\} denotes a set of dd distinct colors. Let [n]={1,2,3,,n}[n]=\{1,2,3,...,n\}. Let ()(\cdot) denote a tuple (ordered multiset), {{}}\{\mskip-5.0mu\{\cdot\}\mskip-5.0mu\} denote a multiset (set which allows repetition), and {}\{\cdot\} denote a set. We define 𝒗=(v1,,vk)\overrightarrow{\boldsymbol{v}}=(v_{1},...,v_{k}) as a kk-tuple, 𝒗~={{v1,,vk}}\tilde{\boldsymbol{v}}=\{\mskip-5.0mu\{v_{1},...,v_{k}\}\mskip-5.0mu\} as a kk-multiset, and 𝒗^={v1,,vk}\hat{\boldsymbol{v}}=\{v_{1},...,v_{k}\} as a kk-set. Let 𝒗[x/i]=(v1,,vi1,x,vi+1,,vk)\overrightarrow{\boldsymbol{v}}[x/i]=(v_{1},...,v_{i-1},x,v_{i+1},...,v_{k}) denote replacing the ii-th element in 𝒗\overrightarrow{\boldsymbol{v}} with xx, and analogously for 𝒗~[x/i]\tilde{\boldsymbol{v}}[x/i] and 𝒗^[x/i]\hat{\boldsymbol{v}}[x/i] (assume mutlisets and sets are represented with the canonical ordering). When viV(G)v_{i}\in V(G), let G[𝒗]G[\overrightarrow{\boldsymbol{v}}], G[𝒗~]G[\tilde{\boldsymbol{v}}], G[𝒗^]G[\hat{\boldsymbol{v}}] denote the induced subgraph on GG with nodes inside 𝒗,𝒗~,𝒗^\overrightarrow{\boldsymbol{v}},\tilde{\boldsymbol{v}},\hat{\boldsymbol{v}} respectively (keeping repetitions). An isomorphism between two graphs GG and GG^{\prime} is a bijective mapping p:V(G)V(G)p:V(G)\rightarrow V(G^{\prime}) which satisfies u,vV(G),(u,v)E(G)(p(u),p(v))E(G)\forall u,v\in V(G),(u,v)\in E(G)\Longleftrightarrow(p(u),p(v))\in E(G^{\prime}) and uV(G),lG(u)=lG(p(u))\forall u\in V(G),l_{G}(u)=l_{G^{\prime}}(p(u)). Two graphs G,GG,G^{\prime} are isomorphic if there exists an isomorphism between them, which we denote as GGG\cong G^{\prime}.

3.1 Preliminaries: the kk-Weisfeiler-Leman (kk-WL) Graph Isomorphism Test

The 1-dimensional Weisfeiler-Leman (11-WL) test, also known as color refinement [49] algorithm, is a widely celebrated approach to (approximately) test GI. Although extremely fast and effective for most graphs (11-WL can provide canonical forms for all but n1/7n^{-1/7} fraction of nn-vertex graphs [5]), it fails to distinguish members of many important graph families, such as regular graphs. The more powerful kk-WL algorithm first appeared in [57], and extends coloring of vertices (or 1-tuples) to that of kk-tuples. kk-WL is progressively expressive with increasing kk, and can distinguish any finite set of graphs given a sufficiently large kk. kk-WL has many interesting connections to logic, games, and linear equations [11, 28]. Another variant of kk-WL is called kk-FWL (Folklore WL), such that (k+1)(k+1)-WL is equally expressive as kk-FWL [11, 26]. Our work focuses on kk-WL.

kk-WL iteratively recolors all nkn^{k} kk-tuples defined on a graph GG with nn nodes. At iteration 0, each kk-tuple 𝒗=(v1,,vk)V(G)k\overrightarrow{\boldsymbol{v}}=(v_{1},...,v_{k})\in V(G)^{k} is initialized with a color as its atomic type 𝒂𝒕𝒌(G,𝒗)\bm{at_{k}}(G,\overrightarrow{\boldsymbol{v}}). Assume GG has dd colors, then 𝒂𝒕𝒌(G,𝒗){0,1}2(k2)+kd\bm{at_{k}}(G,\overrightarrow{\boldsymbol{v}})\in\{0,1\}^{2\binom{k}{2}+kd} is an ordered vector encoding. The first (k2)\binom{k}{2} entries indicate whether vi=vjv_{i}=v_{j}, i,j,1i<jk\forall i,j,1{\leq}i{<}j{\leq}k, which is the node repetition information. The second (k2)\binom{k}{2} entries indicate whether (vi,vj)E(G)(v_{i},v_{j})\in E(G). The last kdkd entries one-hot encode the initial color of viv_{i}, i,1ik\forall i,1\leq i\leq k. Importantly, 𝒂𝒕𝒌(G,𝒗)=𝒂𝒕𝒌(G,𝒗)\bm{at_{k}}(G,\overrightarrow{\boldsymbol{v}})=\bm{at_{k}}(G^{\prime},\overrightarrow{\boldsymbol{v^{\prime}}}) if and only if viviv_{i}\mapsto v^{\prime}_{i} is an isomorphism from G[𝒗]G[\overrightarrow{\boldsymbol{v}}] to G[𝒗]G^{\prime}[\overrightarrow{\boldsymbol{v^{\prime}}}]. Let 𝒘𝒍k(t)(G,𝒗)\bm{wl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}}) denote the color of kk-tuple 𝒗\overrightarrow{\boldsymbol{v}} on graph GG at tt-th iteration of kk-WL, where colors are initialized with 𝒘𝒍k(0)(G,𝒗)=𝒂𝒕𝒌(G,𝒗)\bm{wl}_{k}^{(0)}(G,\overrightarrow{\boldsymbol{v}})=\bm{at_{k}}(G,\overrightarrow{\boldsymbol{v}}).

At the tt-th iteration, kk-WL updates the color of each 𝒗V(G)k\overrightarrow{\boldsymbol{v}}\in V(G)^{k} according to

𝒘𝒍k(t+1)(G,𝒗)=HASH(𝒘𝒍k(t)(G,𝒗),{{𝒘𝒍k(t)(G,𝒗[x/1])|xV(G)}},,{{𝒘𝒍k(t)(G,𝒗[x/k])|xV(G)}})\displaystyle\scriptstyle\bm{wl}_{k}^{(t+1)}(G,\overrightarrow{\boldsymbol{v}})=\text{HASH}\Big{(}\bm{wl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}}),\Big{\{}\mskip-10.0mu\Big{\{}\bm{wl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}}[x/1])\Big{|}x\in V(G)\Big{\}}\mskip-10.0mu\Big{\}},\ldots,\Big{\{}\mskip-10.0mu\Big{\{}\bm{wl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}}[x/k])\Big{|}x\in V(G)\Big{\}}\mskip-10.0mu\Big{\}}\Big{)} (1)

Let 𝒈𝒘𝒍k(t)(G)\bm{gwl}_{k}^{(t)}(G) denote the encoding of GG at tt-th iteration of kk-WL. Then,

𝒈𝒘𝒍k(t)(G)=HASH({{𝒘𝒍k(t)(G,𝒗)|𝒗V(G)k}})\displaystyle\bm{gwl}_{k}^{(t)}(G)=\text{HASH}\bigg{(}\bigg{\{}\mskip-10.0mu\bigg{\{}\bm{wl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}})\Big{|}\overrightarrow{\boldsymbol{v}}\in V(G)^{k}\bigg{\}}\mskip-10.0mu\bigg{\}}\bigg{)} (2)

Two kk-tuples 𝒗\overrightarrow{\boldsymbol{v}} and 𝒖\overrightarrow{\boldsymbol{u}} are connected by an edge if |{i[k]|vi=ui}|=k1|\{i\in[k]|v_{i}=u_{i}\}|={k-1}, or informally if they share (k(k-1)1) entries. Then, kk-WL defines a super-graph Sk-wl(G)S_{k\text{-wl}}(G) with its nodes being all kk-tuples in GG, and edges defined as above. Eq. (1) defines the rule of color refinement on Sk-wl(G)S_{k\text{-wl}}(G). Intuitively, kk-WL is akin to (but more powerful as it orders subgroups of neighbors) running 1-WL algorithm on the supergraph Sk-wl(G)S_{k\text{-wl}}(G). As tt\rightarrow\infty, the color 𝒘𝒍k(t+1)(G,𝒗)\bm{wl}_{k}^{(t+1)}(G,\overrightarrow{\boldsymbol{v}}) converges to a stable value, denoted as 𝒘𝒍k()(G,𝒗)\bm{wl}_{k}^{(\infty)}(G,\overrightarrow{\boldsymbol{v}}) and the corresponding stable graph color denoted as 𝒈𝒘𝒍k()(G)\bm{gwl}_{k}^{(\infty)}(G). For two non-isomorphic graphs G,GG,G^{\prime}, kk-WL can successfully distinguish them if 𝒈𝒘𝒍k()(G)\bm{gwl}_{k}^{(\infty)}(G)\not= 𝒈𝒘𝒍k()(G)\bm{gwl}_{k}^{(\infty)}(G^{\prime}). The expressivity of 𝒘𝒍k(t)\bm{wl}_{k}^{(t)} can be exactly characterized by first-order logic with counting quantifiers. Let 𝐂kt\mathbf{C}_{k}^{t} denote all first-order formulas with at most kk variables and tt-depth counting quantifiers, then 𝒘𝒍k(t)(G,𝒗)=𝒘𝒍k(t)(G,𝒗)\bm{wl}^{(t)}_{k}(G,\overrightarrow{\boldsymbol{v}})=\bm{wl}^{(t)}_{k}(G^{\prime},\overrightarrow{\boldsymbol{v^{\prime}}}) \Longleftrightarrow ϕ𝐂kt,ϕ(G,𝒗)=ϕ(G,𝒗)\forall\phi\in\mathbf{C}^{t}_{k},\phi(G,\overrightarrow{\boldsymbol{v}})=\phi(G^{\prime},\overrightarrow{\boldsymbol{v^{\prime}}}). Additionally, there is a tt-step bijective pebble game that are equivalent to tt-iteration kk-WL in expressivity. See Appendix.A.2 for the pebble game characterization of kk-WL.

Despite its power, kk-WL uses all nkn^{k} tuples and has O(knk)O(kn^{k}) complexity at each iteration.

3.2 From kk-WL to kk-MultisetWL: Removing Ordering

Our first proposed adaptation to kk-WL is to remove ordering in each kk-tuple, i.e. changing kk-tuples to kk-multisets. This greatly reduces the number of supernodes to consider by O(k!)O(k!) times.

Let 𝒗~={{v1,,vk}}\tilde{\boldsymbol{v}}=\{\mskip-5.0mu\{v_{1},...,v_{k}\}\mskip-5.0mu\} be the corresponding multiset of tuple 𝒗=(v1,,vk)\overrightarrow{\boldsymbol{v}}=(v_{1},...,v_{k}). We introduce a canonical order function on GG, oG:V(G)[n]o_{G}:V(G)\rightarrow[n], and a corresponding indexing function oG1(𝒗~,i)o^{-1}_{G}(\tilde{\boldsymbol{v}},i), which returns the ii-th element of v1,,vkv_{1},...,v_{k} sorted according to oGo_{G}. Let 𝒎𝒘𝒍k(t)(G,𝒗~)\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}}) denote the color of the kk-multiset 𝒗~\tilde{\boldsymbol{v}} at tt-th iteration of kk-MultisetWL, formally defined next.

At t=0t=0, we initialize the color of 𝒗~\tilde{\boldsymbol{v}} as 𝒎𝒘𝒍k(0)(G,𝒗~)=HASH({{𝒂𝒕k(G,p(𝒗~))|pperm[k]}})\bm{mwl}_{k}^{(0)}(G,\tilde{\boldsymbol{v}})=\text{HASH}\big{(}\{\mskip-5.0mu\{\bm{at}_{k}(G,p(\tilde{\boldsymbol{v}}))|p\in\text{perm[k]}\}\mskip-5.0mu\}\big{)} where perm[k]111This function should also consider repeated elements; we omit this for clarity of presentation. denotes the set of all permutation mappings of kk elements. It can be shown that 𝒎𝒘𝒍k(0)(G,𝒗~)=𝒎𝒘𝒍k(0)(G,𝒗~)\bm{mwl}_{k}^{(0)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(0)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}) if and only if G[𝒗~]G[\tilde{\boldsymbol{v}}] and G[𝒗~]G^{\prime}[\tilde{\boldsymbol{v}}^{\prime}] are isomorphic.

At tt-th iteration, kk-MultisetWL updates the color of every kk-multiset by

𝒎𝒘𝒍k(t+1)(G,𝒗~)=HASH(𝒎𝒘𝒍k(t)(G,𝒗~),{{\displaystyle\bm{mwl}_{k}^{(t+1)}(G,\tilde{\boldsymbol{v}})=\text{HASH}\bigg{(}\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}}),\bigg{\{}\mskip-10.0mu\bigg{\{} {{𝒎𝒘𝒍k(t)(G,𝒗~[x/oG1(𝒗~,1)])|xV(G)}},\displaystyle\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}}[x/o^{-1}_{G}(\tilde{\boldsymbol{v}},1)])\big{|}x\in V(G)\}\mskip-5.0mu\}, (3)
,\displaystyle..., {{𝒎𝒘𝒍k(t)(G,𝒗~[x/oG1(𝒗~,k)])|xV(G)}}}})\displaystyle\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}}[x/o^{-1}_{G}(\tilde{\boldsymbol{v}},k)])\big{|}x\in V(G)\}\mskip-5.0mu\}\bigg{\}}\mskip-10.0mu\bigg{\}}\bigg{)}

Where 𝒗~[x/oG1(𝒗~,i)])\tilde{\boldsymbol{v}}[x/o^{-1}_{G}(\tilde{\boldsymbol{v}},i)]) denotes replacing the ii-th (ordered by oGo_{G}) element of the multiset with xV(G)x\in V(G). Let Sk-mwl(G)S_{k\text{-mwl}}(G) denote the super-graph defined by kk-MultisetWL. Similar to Eq. (2), the graph level encoding is 𝒈𝒎𝒘𝒍k(t)(G)=HASH({{𝒎𝒘𝒍k(t)(G,𝒗~)|𝒗~V(Sk-mwl(G))}})\bm{gmwl}_{k}^{(t)}(G)=\text{HASH}(\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}\big{(}G,\tilde{\boldsymbol{v}}\big{)}\big{|}\forall\tilde{\boldsymbol{v}}\in V(S_{k\text{-mwl}}(G))\}\mskip-5.0mu\}\big{)}.

Interestingly, although kk-MultisetWL has significantly fewer number of node groups than kk-WL, we show it is no less powerful than kk-11-WL in terms of distinguishing graphs, while being upper bounded by kk-WL in distingushing both node groups and graphs.

Theorem 1.

Let k1k\geq 1 and 𝐰𝐥k(t)(G,𝐯~):={{𝐰𝐥k(t)(G,p(𝐯~))|pperm[k]}}\bm{wl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}}):=\{\mskip-5.0mu\{\bm{wl}_{k}^{(t)}(G,p(\tilde{\boldsymbol{v}}))|p\in\text{perm[k]}\}\mskip-5.0mu\}. For all tt\in\mathbb{N} and all graphs G,GG,G^{\prime}: kk-MultisetWL is upper bounded by kk-WL in distinguishing multisets G,𝐯~G,\tilde{\boldsymbol{v}} and G,𝐯~G^{\prime},\tilde{\boldsymbol{v}}^{\prime} at tt-th iteration, i.e. 𝐰𝐥k(t)(G,𝐯~)=𝐰𝐥k(t)(G,𝐯~)\bm{wl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{wl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}) \Longrightarrow 𝐦𝐰𝐥k(t)(G,𝐯~)=𝐦𝐰𝐥k(t)(G,𝐯~)\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}).

Theorem 2.

kk-MultisetWL is no less powerful than (kk-11)-WL in distinguishing graphs: for any k3k\geq 3 there exists graphs that can be distinguished by kk-MultisetWL but not by (kk-11)-WL.

Theorem 2 is proved by using a variant of a series of CFI [11] graphs which cannot be distinguished by kk-WL. This theorem shows that kk-MultisetWL is indeed very powerful and finding counter examples of kk-WL distinguishable graphs that cannot be distinguished by kk-MultisetWL is very hard. Hence we conjecture that kk-MultisetWL may have the same expressivity as kk-WL in distinguishing undirected graphs, with an attempt of proving it in Appendix.A.4.10. Additionally, the theorem also implies that kk-MultisetWL is strictly more powerful than (k1k-1)-MultisetWL.

We next give a pebble game characterization of kk-MultisetWL, which is named doubly bijective kk-pebble game presented in Appendix.A.3. The game is used in the proof of Theorem 2.

Theorem 3.

kk-MultisetWL has the same expressivity as the doubly bijective kk-pebble game.

3.3 From kk-MultisetWL to k()k(\leq)-SetWL: Removing Repetition

Next, we propose further removing repetition inside any kk-multiset, i.e. transforming kk-multisets to k()k(\leq)-sets. We assume elements of kk-multiset 𝒗~\tilde{\boldsymbol{v}} and k()k(\leq)-set 𝒗^\hat{\boldsymbol{v}} in GG are sorted based on the ordering function oGo_{G}, and omit oGo_{G} for clarity. Let s()s(\cdot) transform a multiset to set by removing repeats, and let r()r(\cdot) return a tuple with the number of repeats for each distinct element in a multiset. Specifically, let 𝒗^=s(𝒗~)\hat{\boldsymbol{v}}=s(\tilde{\boldsymbol{v}}) and 𝒏^=r(𝒗~)\hat{\boldsymbol{n}}=r(\tilde{\boldsymbol{v}}), then m:=|𝒗^|=|𝒏^|[k]m:=|\hat{\boldsymbol{v}}|=|\hat{\boldsymbol{n}}|\in[k] denotes the number of distinct elements in kk-multiset 𝒗~\tilde{\boldsymbol{v}}, and i[m]\forall i\in[m], 𝒗^i\hat{\boldsymbol{v}}_{i} is the ii-th distinct element with 𝒏^i\hat{\boldsymbol{n}}_{i} repetitions. Clearly there is an injective mapping between 𝒗~\tilde{\boldsymbol{v}} and (𝒗^,𝒏^)(\hat{\boldsymbol{v}},\hat{\boldsymbol{n}}); let ff be the inverse mapping such that 𝒗~=f(s(𝒗~),r(𝒗~))\tilde{\boldsymbol{v}}=f(s(\tilde{\boldsymbol{v}}),r(\tilde{\boldsymbol{v}})). Equivalently, each mm-set 𝒗^\hat{\boldsymbol{v}} can be mapped with a multiset of kk-multisets: 𝒗^{{𝒗~=f(𝒗^,𝒏^)|i=1m𝒏^i=k,i𝒏^i1}}\hat{\boldsymbol{v}}\leftrightarrow\{\mskip-5.0mu\{\tilde{\boldsymbol{v}}=f(\hat{\boldsymbol{v}},\hat{\boldsymbol{n}})\ |\ \sum_{i=1}^{m}\hat{\boldsymbol{n}}_{i}=k,\forall i\ \hat{\boldsymbol{n}}_{i}\geq 1\}\mskip-5.0mu\}. Based on this relationship, we extend the connection among kk-multisets to k()k(\leq)-sets: given m1,m2[k]m_{1},m_{2}\in[k], a m1m_{1}-set 𝒗^\hat{\boldsymbol{v}} is connected to a m2m_{2}-set 𝒖^\hat{\boldsymbol{u}} if and only if 𝒏^v,𝒏^u\exists\hat{\boldsymbol{n}}_{v},\hat{\boldsymbol{n}}_{u}, f(𝒗^,𝒏^v)f(\hat{\boldsymbol{v}},\hat{\boldsymbol{n}}_{v}) is connected with f(𝒖^,𝒏^u)f(\hat{\boldsymbol{u}},\hat{\boldsymbol{n}}_{u}) in kk-MultisetWL. Let Sk-swl(G)S_{k\text{-swl}}(G) denote the defined super-graph on GG by k()k(\leq)-SetWL. It can be shown that this is equivalent to either (1) (|m1m2|=1)(|𝒗^𝒖^|=min(m1,m2))(|m_{1}-m_{2}|=1)\land(|\hat{\boldsymbol{v}}\cap\hat{\boldsymbol{u}}|=\min(m_{1},m_{2})) or (2) (m1=m2)(|𝒗^𝒖^|=m11)(m_{1}=m_{2})\land(|\hat{\boldsymbol{v}}\cap\hat{\boldsymbol{u}}|=m_{1}-1) is true. Notice that Sk-swl(G)S_{k\text{-swl}}(G) contains a sequence of kk-11 bipartite graphs with each reflecting the connections among the (mm-11)-sets and the mm-sets. It also contains kk-11 subgraphs, i.e. the connections among the mm-sets for m=2,,km=2,...,k. Later on we will show that these kk-11 subgraphs can be ignored without affecting k()k(\leq)-SetWL.

Let 𝒔𝒘𝒍k(t)(G,𝒗^)\bm{swl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}) denote the color of mm-set 𝒗^\hat{\boldsymbol{v}} at tt-th iteration of k()k(\leq)-SetWL. Now we formally define k()k(\leq)-SetWL. At t=0t=0, we initialize the color of a mm-set 𝒗^\hat{\boldsymbol{v}} (m[k]m\in[k]) as:

𝒔𝒘𝒍k(0)(G,𝒗^)=HASH({{𝒎𝒘𝒍k(0)(G,f(𝒗^,𝒏^))|𝒏^1++𝒏^m=k,i𝒏^i1}})\displaystyle\bm{swl}_{k}^{(0)}(G,\hat{\boldsymbol{v}})=\text{HASH}\big{(}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(0)}(G,f(\hat{\boldsymbol{v}},\hat{\boldsymbol{n}}))\ \big{|}\ \hat{\boldsymbol{n}}_{1}+...+\hat{\boldsymbol{n}}_{m}=k,\forall i\ \hat{\boldsymbol{n}}_{i}\geq 1\}\mskip-5.0mu\}\big{)} (4)

Clearly 𝒔𝒘𝒍k(0)(G,𝒗^)=𝒔𝒘𝒍k(0)(G,𝒗^)\bm{swl}_{k}^{(0)}(G,\hat{\boldsymbol{v}})=\bm{swl}_{k}^{(0)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}) if and only if G[𝒗^]G[\hat{\boldsymbol{v}}] and G[𝒗^]G^{\prime}[\hat{\boldsymbol{v}}^{\prime}] are isomorphic. At tt-th iteration, k()k(\leq)-SetWL updates the color of every mm-set 𝒗^\hat{\boldsymbol{v}} by

𝒔𝒘𝒍k(t+1)(G,𝒗^)=HASH(𝒔𝒘𝒍k(t)(G,𝒗^),{{𝒔𝒘𝒍k(t)(G,𝒗^{x})|xV(G)𝒗^}},{{𝒔𝒘𝒍k(t)(G,𝒗^x)|x𝒗^}},\displaystyle\bm{swl}_{k}^{(t+1)}(G,\hat{\boldsymbol{v}})=\text{HASH}\bigg{(}\bm{swl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}),\{\mskip-5.0mu\{\bm{swl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}\cup\{x\})\ |\ x\in V(G)\setminus\hat{\boldsymbol{v}}\}\mskip-5.0mu\},\{\mskip-5.0mu\{\bm{swl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}\setminus{x})\ |\ x\in\hat{\boldsymbol{v}}\}\mskip-5.0mu\},
{{{{𝒔𝒘𝒍k(t)(G,𝒗^[x/oG1(𝒗^,1)])|xV(G)𝒗^}},,{{𝒔𝒘𝒍k(t)(G,𝒗^[x/oG1(𝒗^,m)])|xV(G)𝒗^}}}})\displaystyle\bigg{\{}\mskip-10.0mu\bigg{\{}\{\mskip-5.0mu\{\bm{swl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}[x/o^{-1}_{G}(\hat{\boldsymbol{v}},1)])\ |\ x\in V(G)\setminus\hat{\boldsymbol{v}}\}\mskip-5.0mu\},...,\{\mskip-5.0mu\{\bm{swl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}[x/o^{-1}_{G}(\hat{\boldsymbol{v}},m)])\ |\ x\in V(G)\setminus\hat{\boldsymbol{v}}\}\mskip-5.0mu\}\bigg{\}}\mskip-10.0mu\bigg{\}}\bigg{)} (5)

Notice that when m=1m=1 and m=km=k, the third and second part of the hashing input is an empty multiset, respectively. Similar to Eq. (2), we formulate the graph level encoding as 𝒈𝒔𝒘𝒍k(t)(G)=HASH({{𝒔𝒘𝒍k(t)(G,𝒗^)|𝒗^V(Sk-swl(G))}})\bm{gswl}_{k}^{(t)}(G)=\text{HASH}(\{\mskip-5.0mu\{\bm{swl}_{k}^{(t)}\big{(}G,\hat{\boldsymbol{v}}\big{)}\ \big{|}\ \forall\hat{\boldsymbol{v}}\in V(S_{k\text{-swl}}(G))\}\mskip-5.0mu\}\big{)}.

To characterize the relationship of their expressivity, we first extend kk-MultisetWL on sets by defining the color of a mm-set 𝒗^\hat{\boldsymbol{v}} on kk-MultisetWL as 𝒎𝒘𝒍k(t)(G,𝒗^):={{𝒎𝒘𝒍k(t)(G,f(𝒗^,𝒏^))|i=1m𝒗^i=k,i𝒏^i1}}\bm{mwl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}):=\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G,f(\hat{\boldsymbol{v}},\hat{\boldsymbol{n}}))\ \big{|}\ \sum_{i=1}^{m}\hat{\boldsymbol{v}}_{i}=k,\forall i\ \hat{\boldsymbol{n}}_{i}\geq 1\}\mskip-5.0mu\}. We prove that kk-MultisetWL is at least as expressive as k()k(\leq)-SetWL in terms of separating node sets and graphs.

Theorem 4.

Let k1k\geq 1, then t\forall t\in\mathbb{N} and all graphs G,GG,G^{\prime}: 𝐦𝐰𝐥k(t)(G,𝐯^)=𝐦𝐰𝐥k(t)(G,𝐯^)𝐬𝐰𝐥k(t)(G,𝐯^)=𝐬𝐰𝐥k(t)(G,𝐯^)\bm{mwl}_{k}^{(t)}(G,\hat{\boldsymbol{v}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime})\Longrightarrow\bm{swl}_{k}^{(t)}(G,\hat{\boldsymbol{v}})=\bm{swl}_{k}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}).

We also conjecture that kk-MultisetWL and k()k(\leq)-SetWL could be equally expressive, and leave it to future work. As a single mm-set corresponds to (k1m1)\binom{k-1}{m-1} kk-multisets, moving from kk-MultisetWL to k()k(\leq)-SetWL further reduces the computational cost greatly.

3.4 From k()k(\leq)-SetWL to (k,c)()(k,c)(\leq)-SetWL: Accounting for Sparsity

Notice that for two arbitrary graphs GG and GG^{\prime} with equal number of nodes, the number of k()k(\leq)-sets and the connections among all k()k(\leq)-sets in k()k(\leq)-SetWL are exactly the same, regardless of whether they are dense or sparse. We next propose to account for the sparsity of a graph GG to further reduce the complexity of k()k(\leq)-SetWL. As the graph structure is encoded inside every mm-set, when the graph becomes sparser, there would be more sparse mm-sets with a potentially large number of disconnected components. Based on the hypothesis that the induced subgraph over a set (of nodes) with fewer disconnected components naturally contains more structural information, we propose to restrict the k()k(\leq)-sets to be (k,c)()(k,c)(\leq)-sets: all sets with at most kk nodes and at most cc connected components in its induced subgraph. Let Sk,c-swl(G)S_{k,c\text{-swl}}(G) denote the super-graph defined by (k,c)()(k,c)(\leq)-SetWL, then Sk,c-swl(G)=Sk-swl(G)[{𝒗^|#components(G[𝒗^])c}]S_{k,c\text{-swl}}(G)=S_{k\text{-swl}}(G)[\{\hat{\boldsymbol{v}}|\text{\#components}(G[\hat{\boldsymbol{v}}])\leq c\}], which is the induced subgraph on the super-graph defined by k()k(\leq)-SetWL. Fortunately, Sk,c-swl(G)S_{k,c\text{-swl}}(G) can be efficiently and recursively constructed based on S(k1,c)-swl(G)S_{(k-1,c)\text{-swl}}(G), and we include the construction algorithm in the Appendix A.5. (k,c)()(k,c)(\leq)-SetWL can be defined similarly to k()k(\leq)-SetWL (Eq. (4) and Eq. (5)), however while removing all colors of sets that do not exist on Sk,c-swl(G)S_{k,c\text{-swl}}(G).

(k,c)()(k,c)(\leq)-SetWL is progressively expressive with increasing kk and cc, and when c=kc=k, (k,c)()(k,c)(\leq)-SetWL becomes the same as k()k(\leq)-SetWL, as all k()k(\leq)-sets are then considered. Let 𝒔𝒘𝒍k,c(t)(G,𝒗^)\bm{swl}_{k,c}^{(t)}(G,\hat{\boldsymbol{v}}) denote the color of a (k,c)()(k,c)(\leq)-set 𝒗^\hat{\boldsymbol{v}} on tt-th iteration of (k,c)()(k,c)(\leq)-SetWL, then 𝒈𝒔𝒘𝒍k,c(t)(G)=HASH({{𝒔𝒘𝒍k,c(t)(G,𝒗^)|𝒗^Sk,c-swl(G))}}\bm{gswl}_{k,c}^{(t)}(G)=\text{HASH}(\{\mskip-5.0mu\{\bm{swl}_{k,c}^{(t)}\big{(}G,\hat{\boldsymbol{v}}\big{)}\big{|}\forall\hat{\boldsymbol{v}}\in S_{k,c\text{-swl}}(G)\big{)}\}\mskip-5.0mu\}.

Theorem 5.

Let k1k\geq 1, then t\forall t\in\mathbb{N} and all graphs G,GG,G^{\prime}:

  • (1)

    when 1c1<c2k1\leq c_{1}<c_{2}\leq k, if G,GG,G^{\prime} cannot be distinguished by (k,c2)()(k,c_{2})(\leq)-SetWL, they cannot be distinguished by (k,c1)()(k,c_{1})(\leq)-SetWL

  • (2)

    when k1<k2k_{1}<k_{2}, ck1\forall c\leq k_{1}, if G,GG,G^{\prime} cannot be distinguished by (k2,c)()(k_{2},c)(\leq)-SetWL, they cannot be distinguished by (k1,c)()(k_{1},c)(\leq)-SetWL

3.5 Complexity Analysis

All color refinement algorithms described above run on a super-graph; thus, their complexity at each iteration is linear to the number of supernodes and number of edges of the super-graph. Instead of using big𝒪\mathcal{O} notation that ignores constant factors, we compare the exact number of supernodes and edges. Let GG be the input graph with nn nodes and average degree dd. For kk-WL, there are nkn^{k} supernodes and each has nkn*k number of neighbors, hence Sk-wl(G)S_{k\text{-wl}}(G) has nkn^{k} supernodes and knk+1/2kn^{k+1}/2 edges. For m[k]m\in[k], there are (nm)\binom{n}{m} mm-sets and each connects to mm number of (m1)(m-1)-sets. So Sk-swlS_{k\text{-swl}} has i=1k(ni)(nk)nk+1n2k+1\sum_{i=1}^{k}\binom{n}{i}\leq\binom{n}{k}\frac{n-k+1}{n-2k+1} supernodes and i=2ki(ni)=ni=1k1(n1i)n(n1k1)nk+1n2k+2\sum_{i=2}^{k}i\binom{n}{i}=n\sum_{i=1}^{k-1}\binom{n-1}{i}\leq n\binom{n-1}{k-1}\frac{n-k+1}{n-2k+2} edges (derivation in Appendix). Here we ignore edges within mm-sets for any m[k]m\in[k] as they can be reconstructed from the bipartite connections among (m1)(m-1)-sets and mm-sets, described in detail in Sec. 4.1. Consider e.g., n=30,k=5n=30,k=5; we get |V(Sk-wl(G))||V(Sk-swl(G))|=139\frac{|V(S_{k\text{-wl}}(G))|}{|V(S_{k\text{-swl}}(G))|}=139, |E(Sk-wl(G))||E(Sk-swl(G))|=2182\frac{|E(S_{k\text{-wl}}(G))|}{|E(S_{k\text{-swl}}(G))|}=2182. Directly analyzing the savings by restricting number of components is not possible without assuming a graph family. In Sec. 5.3 we measured the scalability for (k,c)()(k,c)(\leq)-SetGNN with different number of components directly on sparse graphs, where (k,c)()(k,c)(\leq)-SetWL shares similar scalability.

3.6 Set version of kk-FWL

kk-FWL is a stronger GI algorithm and it has the same expressivity as kk+11-WL [26], in this section we also demonstrate how to extend the set to kk-FWL to get k()k(\leq)-SetFWL. Let 𝒇𝒘𝒍k(t)(G,𝒗)\bm{fwl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}}) denote the color of kk-tuple 𝒗\overrightarrow{\boldsymbol{v}} at tt-th iteration of kk-FWL. Then kk-FWL is initialized the same as the kk-WL, i.e. 𝒇𝒘𝒍k(0)(G,𝒗)\bm{fwl}_{k}^{(0)}(G,\overrightarrow{\boldsymbol{v}}) = 𝒘𝒍k(0)(G,𝒗)\bm{wl}_{k}^{(0)}(G,\overrightarrow{\boldsymbol{v}}). At tt-th iteration, kk-FWL updates colors with

𝒇𝒘𝒍k(t+1)(G,𝒗)=HASH(𝒇𝒘𝒍k(t)(G,𝒗),{{(𝒇𝒘𝒍k(t)(G,𝒗[x/1]),,𝒇𝒘𝒍k(t)(G,𝒗[x/k]))|xV(G)}})\displaystyle\scriptstyle\bm{fwl}_{k}^{(t+1)}(G,\overrightarrow{\boldsymbol{v}})=\text{HASH}\Big{(}\bm{fwl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}}),\Big{\{}\mskip-10.0mu\Big{\{}\big{(}\bm{fwl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}}[x/1]),...,\bm{fwl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}}[x/k])\big{)}\Big{|}x\in V(G)\Big{\}}\mskip-10.0mu\Big{\}}\Big{)} (6)

Let 𝒔𝒇𝒘𝒍k(t)(G,𝒗^)\bm{sfwl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}) denote the color of mm-set 𝒗^\hat{\boldsymbol{v}} at tt-th iteration of k()k(\leq)-SetFWL. Then at tt-th iteration it updates with

𝒔𝒇𝒘𝒍k(t+1)(G,𝒗^)=HASH(𝒔𝒇𝒘𝒍k(t)(G,𝒗^),{{{{𝒔𝒇𝒘𝒍k(t)(G,𝒗^[x/1]),,𝒔𝒇𝒘𝒍k(t)(G,𝒗^[x/m])}}|xV(G)}})\displaystyle\scriptstyle\bm{sfwl}_{k}^{(t+1)}(G,\hat{\boldsymbol{v}})=\text{HASH}\Big{(}\bm{sfwl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}),\Big{\{}\mskip-10.0mu\Big{\{}\big{\{}\mskip-10.0mu\big{\{}\bm{sfwl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}[x/1]),...,\bm{sfwl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}[x/m])\big{\}}\mskip-10.0mu\big{\}}\Big{|}x\in V(G)\Big{\}}\mskip-10.0mu\Big{\}}\Big{)} (7)

The k()k(\leq)-SetFWL should have better expressivity than k()k(\leq)-SetWL. We show in the next section that k()k(\leq)-SetWL can be further improved with less computation through an intermediate step while this is nontrivial for k()k(\leq)-SetFWL. We leave it to future work of studying k()k(\leq)-SetFWL.

4 A practical progressively-expressive GNN: (k,c)()(k,c)(\leq)-SetGNN

In this section we transform (k,c)()(k,c)(\leq)-SetWL to a GNN model by replacing the HASH function in Eq. (5) with a combination of MLP and DeepSet [64], given they are universal function approximators for vectors and sets, respectively [31, 64]. After the transformation, we propose two additional improvements (Sec. 4.2 and Sec. 4.3) to further improve scalability. We work on vector-attributed graphs. Let G=(V(G),E(G),X)G=(V(G),E(G),X) be an undirected graph with node features 𝐱id,iV(G){\mathbf{x}}_{i}\in\mathbb{R}^{d},\forall i\in V(G).

4.1 From (k,c)()(k,c)(\leq)-SetWL to (k,c)()(k,c)(\leq)-SetGNN

(k,c)()(k,c)(\leq)-SetWL defines a super-graph Sk,c-swlS_{k,c\text{-swl}}, which aligns with Eq. (5). We first rewrite Eq. (5) to reflect its connection to Sk,c-swlS_{k,c\text{-swl}}. For a supernode 𝒗^\hat{\boldsymbol{v}} in Sk,c-swlS_{k,c\text{-swl}}, let 𝒩leftG(𝒗^)={𝒖^|𝒖^Sk,c-swl,𝒖^𝒗^ and |𝒖^|=|𝒗^|1}\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})=\{\hat{\boldsymbol{u}}\ |\ \hat{\boldsymbol{u}}\in S_{k,c\text{-swl}},\hat{\boldsymbol{u}}\leftrightarrow\hat{\boldsymbol{v}}\text{ and }|\hat{\boldsymbol{u}}|=|\hat{\boldsymbol{v}}|-1\}, and 𝒩rightG(𝒗^)={𝒖^|𝒖^Sk,c-swl,𝒖^𝒗^ and |𝒖^|=|𝒗^|+1}\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})=\{\hat{\boldsymbol{u}}\ |\ \hat{\boldsymbol{u}}\in S_{k,c\text{-swl}},\hat{\boldsymbol{u}}\leftrightarrow\hat{\boldsymbol{v}}\text{ and }|\hat{\boldsymbol{u}}|=|\hat{\boldsymbol{v}}|+1\}. Then we can rewrite Eq. (5) for (k,c)()(k,c)(\leq)-SetWL as

𝒔𝒘𝒍k,c(t+1)(G,𝒗^)=(𝒔𝒘𝒍k,c(t)(G,𝒗^),𝒔𝒘𝒍k,c(t+12)(G,𝒗^),{{𝒔𝒘𝒍k,c(t)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}},{{𝒔𝒘𝒍k,c(t+12)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}})\displaystyle\bm{swl}_{k,c}^{(t+1)}(G,\hat{\boldsymbol{v}})=\bigg{(}\bm{swl}_{k,c}^{(t)}(G,\hat{\boldsymbol{v}}),\bm{swl}_{k,c}^{(t+\frac{1}{2})}(G,\hat{\boldsymbol{v}}),\{\mskip-5.0mu\{\bm{swl}_{k,c}^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\},\{\mskip-5.0mu\{\bm{swl}_{k,c}^{(t+\frac{1}{2})}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\}\bigg{)} (8)

where 𝒔𝒘𝒍k,c(t+12)(G,𝒗^):={{𝒔𝒘𝒍k,c(t)(G,𝒖^)|𝒖^𝒩rightG(𝒗^)}}\bm{swl}_{k,c}^{(t+\frac{1}{2})}(G,\hat{\boldsymbol{v}}):=\{\mskip-5.0mu\{\bm{swl}_{k,c}^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\}. Notice that we omit HASH and apply it implicitly. Eq. (8) essentially splits the computation of Eq. (5) into two steps and avoids repeated computation via caching the explicit tt++12\frac{1}{2} step. It also implies that the connection among mm-sets for any m[k]m\in[k] can be reconstructed from the bipartite graph among mm-sets and (m1)(m-1)-sets.

Next we formulate (k,c)()(k,c)(\leq)-SetGNN formally. Let h(t)(𝒗^)dth^{(t)}(\hat{\boldsymbol{v}})\in\mathbb{R}^{d_{t}} denote the vector representation of supernode 𝒗^\hat{\boldsymbol{v}} on the tt-th iteration of (k,c)()(k,c)(\leq)-SetGNN. For any input graph GG, it initializes representations of supernodes by

h(0)(𝒗^)=BaseGNN(G[𝒗^])\displaystyle h^{(0)}(\hat{\boldsymbol{v}})=\text{BaseGNN}(G[\hat{\boldsymbol{v}}]) (9)

where the BaseGNN can be any GNN model. Theoretically the BaseGNN should be chosen to encode non-isomorphic induced subgraphs distinctly, mimicking HASH. Based on empirical tests of several GNNs on all (11,117) possible 88-node non-isomorphic graphs [6], and given GIN [61] is simple and fast with nearly 100%100\% separation rate, we use GIN as our BaseGNN in experiments.

(k,c)()(k,c)(\leq)-SetGNN iteratively updates representations of all supernodes by

h(t+12)(𝒗^)\displaystyle\scriptstyle h^{(t+\frac{1}{2})}(\hat{\boldsymbol{v}}) =𝒖^𝒩rightG(𝒗^)MLP(t+12)(h(t)(𝒖^))\displaystyle=\sum_{\hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})}\text{MLP}^{(t+\frac{1}{2})}(h^{(t)}(\hat{\boldsymbol{u}})) (10)
h(t+1)(𝒗^)\displaystyle h^{(t+1)}(\hat{\boldsymbol{v}}) =MLP(t)(h(t)(𝒗^),h(t+12)(𝒗^),𝒖^𝒩leftG(𝒗^)MLPA(t)(h(t)(𝒖^)),𝒖^𝒩leftG(𝒗^)MLPB(t)(h(t+12)(𝒖^)))\displaystyle=\text{MLP}^{(t)}\Big{(}h^{(t)}(\hat{\boldsymbol{v}}),h^{(t+\frac{1}{2})}(\hat{\boldsymbol{v}}),\sum_{\hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})}\text{MLP}^{(t)}_{A}(h^{(t)}(\hat{\boldsymbol{u}})),\sum_{\hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})}\text{MLP}^{(t)}_{B}(h^{(t+\frac{1}{2})}(\hat{\boldsymbol{u}}))\Big{)} (11)

Then after TT iterations, we compute the graph level encoding as

h(T)(G)=POOL({{h(T)(𝒗^)|𝒗^V(Sk,c-swl)}})\displaystyle h^{(T)}(G)=\text{POOL}(\{\mskip-5.0mu\{h^{(T)}(\hat{\boldsymbol{v}})\ |\ \hat{\boldsymbol{v}}\in V(S_{k,c\text{-swl}})\}\mskip-5.0mu\}) (12)

where POOL can be chosen as summation. We visualize the steps in Figure 1. Under mild conditions, (k,c)()(k,c)(\leq)-SetGNN has the same expressivity as (k,c)()(k,c)(\leq)-SetWL.

Theorem 6.

When (ii) BaseGNN can distinguish any non-isomorhpic graphs with at most kk nodes, (iiii) all MLPs have sufficient depth and width, and (iiiiii) POOL is an injective function, then for any tt\in\mathbb{N}, tt-layer (k,c)()(k,c)(\leq)-SetGNN is as expressive as (k,c)()(k,c)(\leq)-SetWL at the tt-th iteration.

The following facts can be derived easily from Theorem 5 and Theorem 6.

Corollary 6.1.

(k,c)()(k,c)(\leq)-SetGNN is progressively-expressive with increasing kk and cc, that is,

  • (1)

    when c1>c2c_{1}>c_{2}, (k,c1)()(k,c_{1})(\leq)-SetGNN is more expressive than (k,c2)()(k,c_{2})(\leq)-SetGNN, and

  • (2)

    when k1>k2k_{1}>k_{2}, (k1,c)()(k_{1},c)(\leq)-SetGNN is more expressive than (k2,c)()(k_{2},c)(\leq)-SetGNN.

4.2 Bidirectional Sequential Message Passing

The tt-th layer of (k,c)()(k,c)(\leq)-SetGNN (Eq. (10) and Eq. (11)) are essentially propagating information back and forth on the super-graph Sk,c-swl(G)S_{k,c\text{-swl}}(G), which is a sequence of kk-11 bipartite graphs (see the middle of Figure 1), in parallel for all supernodes. We propose to change it to bidirectional blockwise sequential message passing, which we call (k,c)()(k,c)(\leq)-SetGNN, defined as follows.

m=k1 to 1,m-set 𝒗^,h(t+12)(𝒗^)=MLPm,1(t)(h(t)(𝒗^),𝒖^𝒩rightG(𝒗^)MLPm,2(t)(h(t+12)(𝒖^)))\displaystyle m=k-1\text{ to }1,\forall m\text{-set }\hat{\boldsymbol{v}},h^{(t+\frac{1}{2})}(\hat{\boldsymbol{v}})=\text{MLP}_{m,1}^{(t)}\Big{(}h^{(t)}(\hat{\boldsymbol{v}}),\sum_{\hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})}\text{MLP}_{m,2}^{(t)}(h^{(t+\frac{1}{2})}(\hat{\boldsymbol{u}}))\Big{)} (13)
m=2 to k,m-set 𝒗^,h(t+1)(𝒗^)=MLPm,1(t+12)(h(t+12)(𝒗^),𝒖^𝒩leftG(𝒗^)MLPm,2(t+12)(h(t+1)(𝒖^)))\displaystyle m=2\text{ to }k,\forall m\text{-set }\hat{\boldsymbol{v}},h^{(t+1)}(\hat{\boldsymbol{v}})=\text{MLP}_{m,1}^{(t+\frac{1}{2})}\Big{(}h^{(t+\frac{1}{2})}(\hat{\boldsymbol{v}}),\sum_{\hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})}\text{MLP}_{m,2}^{(t+\frac{1}{2})}(h^{(t+1)}(\hat{\boldsymbol{u}}))\Big{)} (14)

Notice that (k,c)()(k,c)(\leq)-SetGNN has lower memory usage, as (k,c)()(k,c)(\leq)-SetGNN load the complete supergraph (kk-11 bipartites) while (k,c)()(k,c)(\leq)-SetGNN loads 1 out of kk-11 bipartites at a time, which is beneficial for limited-size GPUs. What is more, for a small, finite tt it is even more expressive than (k,c)()(k,c)(\leq)-SetGNN. We provide both implementation of parallel and sequential message passing in the official github repository, while only report the performance of (k,c)()(k,c)(\leq)-SetGNN given its efficiency and better expressivity.

Theorem 7.

For any tt\in\mathbb{N}, the tt-layer (k,c)()(k,c)(\leq)-SetGNN is more expressive than the tt-layer (k,c)()(k,c)(\leq)-SetGNN. As limt\lim_{t\to\infty}, (k,c)()(k,c)(\leq)-SetGNN is as expressive as (k,c)()(k,c)(\leq)-SetGNN.

4.3 Improving Supernode Initialization

Next we describe how to improve the supernode initialization (Eq. (9)) and extensively reduce computational and memory overhead for c>1c>1 without losing any expressivity. We achieve this by the fact that a graph with cc components can be viewed as a set of cc connected components. Formally,

Theorem 8.

Let GG be a graph with cc connected components C1,,CcC_{1},...,C_{c}, and GG^{\prime} be a graph also with cc connected components C1,,CcC^{\prime}_{1},...,C^{\prime}_{c}, then GG and GG^{\prime} are isomorphic if and only if p:[c][c]\exists p:[c]\rightarrow[c], s.t. i[c]\forall i\in[c], CiC_{i} and Cp(i)C^{\prime}_{p(i)} are isomorphic.

The theorem implies that we only need to apply Eq. (9) to all supernodes with a single component, and for any cc-components 𝒗^\hat{\boldsymbol{v}} with components 𝒖^1,,𝒖^c\hat{\boldsymbol{u}}_{1},...,\hat{\boldsymbol{u}}_{c}, we can get the encoding by h(0)(𝒗^)=DeepSet({h(0)(𝒖^1),,h(0)(𝒖^c)})h^{(0)}(\hat{\boldsymbol{v}})=\text{DeepSet}(\{h^{(0)}(\hat{\boldsymbol{u}}_{1}),...,h^{(0)}(\hat{\boldsymbol{u}}_{c})\}) without passing its induced subgraph to BaseGNN. This eliminates the heavy computation of passing a large number of induced subgraphs. We give the algorithm of building connection between 𝒗^\hat{\boldsymbol{v}} to its single components in Appendix A.6.

5 Experiments

We design experiments to answer the following questions. Q1. Performance: How does (k,c)()(k,c)(\leq)-SetGNN compare to SOTA expressive GNNs? Q2. Varying kk and cc: How does the progressively increasing expressiveness reflect on generalization performance? Q3. Computational requirements: Is (k,c)()(k,c)(\leq)-SetGNN feasible on practical graphs w.r.t. running time and memory usage?

5.1 Setup

Datasets.  To inspect the expressive power, we use four different types of simulation datasets: 1) EXP [1] contains 600 pairs of 1&21\&2-WL failed graphs which we split into two where graphs in each pair is assigned to two different classes; 2) SR25 [6] has 15 strongly regular graphs (33-WL failed) with 25 nodes each, which we transform to a 15-way classification task; 3) Substructure counting (i.e. triangle, tailed triangle, star and 4-cycle) tasks on random graph dataset [15]; 4) Graph property regression (i.e. connectedness, diameter, radius) tasks on random graph dataset [17]. We also evaluate performance on two real world graph learning tasks: 5) ZINC-12K [19], and 6) QM9 [59] for molecular property regression. See Table 6 in Appendix A.7 for detailed dataset statistics.

Baselines.  We use GCN [36], GIN [61], PNA [17], PPGN [40], PF-GNN [18], and GNN-AK [68] as baselines on the simulation datasets. On ZINC-12K we also reference CIN [9] directly from literature. Most baselines results are taken from [68]. Finally, we compare to GINE [32] on QM9. Hyperparameter and model configurations are described in Appendix A.8.

5.2 Results

Theorem 2 shows that kk-MultisetWL is able to distinguish CFI(k) graphs. It also holds for k()k(\leq)-SetWL as the proof doesn’t use repetitions. We implemented the construction of CFI(k) graphs for any k. Empirically we found that (k,c)()(k,c)(\leq)-SetGNN is able to distinguish CFI(k) for k = 3, 4, 5 (k>6 out of memory). This empirically verifies its theoretical expressivity.

Table 1 shows the performance results on all the simulation datasets. The low expressive models such as GCN, GIN and PNA underperform on EXP and SR25, while 3-WL equivalent PPGN excels only on EXP. Notably, (k,c)()(k,c)(\leq)-SetGNN achieves 100% discriminating power with any kk larger than 2 and 3, and any cc larger than 1 and 0, resp. for EXP and SR25. (k,c)()(k,c)(\leq)-SetGNN also outperforms the baselines on all substructure counting tasks, as well as on two out of three graph property prediction tasks (except Radius) with significant gap, for relatively small values of kk and cc.

Table 1: Simulation data performances. For (k,c)()(k,c)(\leq)-SetGNN, (k,c)(k,c) values that achieve reported performance in parenthesis. (ACC: accuracy, MA[S]E: mean abs.[sq.] error)
Method EXP (ACC) SR25 (ACC) Counting Substructures (MAE) Graph Properties (log10\log_{10}(MSE))
Triangle Tailed Tri. Star 4-Cycle IsConnected Diameter Radius
GCN 50% 6.67% 0.4186 0.3248 0.1798 0.2822 -1.7057 -2.4705 -3.9316
GIN 50% 6.67% 0.3569 0.2373 0.0224 0.2185 -1.9239 -3.3079 -4.7584
PNA 50% 6.67% 0.3532 0.2648 0.1278 0.2430 -1.9395 -3.4382 -4.9470
PPGN 100% 6.67% 0.0089 0.0096 0.0148 0.0090 -1.9804 -3.6147 -5.0878
GIN-AK+ 100% 6.67% 0.0123 0.0112 0.0150 0.0126 -2.7513 -3.9687 -5.1846
PNA-AK+ 100% 6.67% 0.0118 0.0138 0.0166 0.0132 -2.6189 -3.9011 -5.2026
(k,c)(k,c)()(\leq) 100% 100% 0.0073 0.0075 0.0134 0.0075 -5.4667 -4.0800 -5.1603
(\geq33, \geq22) (\geq44, \geq11) (33, 22) (44, 11) (33, 22) (44, 11) (44, 22) (44, 11) (22, 22)
Table 2: Train and Test performances on substructure counting tasks by varying kk and cc. Notice the orders of magnitude drop in Test MAE between bolded entries per task.
Counting Substructures (MAE)
Triangle Tailed Tri. Star 4-Cycle
k c Train Test Train Test Train Test Train Test
2 1 0.9941 ± 0.2623 1.1409 ± 0.1224 1.1506 ± 0.2542 0.8695 ± 0.0781 1.5348 ± 2.0697 2.3454 ± 0.8198 1.2159 ± 0.0292 0.8361 ± 0.1171
3 1 0.0311 ± 0.0025 0.0088 ± 0.0001 0.0303 ± 0.0108 0.0085 ± 0.0018 0.0559 ± 0.0019 0.0151 ± 0.0006 0.1351 ± 0.0058 0.1893 ± 0.0030
4 1 0.0321 ± 0.0008 0.0151 ± 0.0074 0.0307 ± 0.0085 0.0075 ± 0.0012 0.0687 ± 0.0104 0.0339 ± 0.0009 0.0349 ± 0.0007 0.0075 ± 0.0002
5 1 0.0302 ± 0.0070 0.0208 ± 0.0042 0.0553 ± 0.0009 0.0189 ± 0.0024 0.0565 ± 0.0078 0.0263 ± 0.0023 0.0377 ± 0.0057 0.0175 ± 0.0036
6 1 0.0344 ± 0.0024 0.0247 ± 0.0085 0.0357 ± 0.0017 0.0171 ± 0.0000 0.0560 ± 0.0000 0.0168 ± 0.0022 0.0356 ± 0.0014 0.0163 ± 0.0064
2 2 0.3452 ± 0.0329 0.4029 ± 0.0053 0.2723 ± 0.0157 0.2898 ± 0.0055 0.0466 ± 0.0025 0.0242 ± 0.0006 0.2369 ± 0.0123 0.2512 ± 0.0029
3 2 0.0234 ± 0.0030 0.0073 ± 0.0009 0.0296 ± 0.0074 0.0100 ± 0.0009 0.0640 ± 0.0003 0.0134 ± 0.0006 0.0484 ± 0.0135 0.0194 ± 0.0065
4 2 0.0587 ± 0.0356 0.0131 ± 0.0010 0.0438 ± 0.0140 0.0094 ± 0.0002 0.0488 ± 0.0008 0.0209 ± 0.0063 0.0464 ± 0.0037 0.0110 ± 0.0020

In Table 2, we show the train and test MAEs on substructure counting tasks for individual values of kk and cc. As expected, performances improve for increasing kk when cc is fixed, and vice versa. It is notable that orders of magnitude improvements on test error occur moving from kk==22 to 33 for the triangle tasks as well as the star task, while a similarly large magnitude drop is obtained at kk==44 for the 4-cycle task, which is expected as triangle and 4-cycle have 3 and 4 nodes respectively. Similar observations hold for graph property tasks as well. (See Appendix A.9)

Table 3: SetGNN achieves SOTA on ZINC-12K.
Method MAE
GatedGCN 0.363 ±\pm 0.009
GCN 0.321 ±\pm 0.009
PNA 0.188 ±\pm 0.004
DGN 0.168 ±\pm 0.003
GIN 0.163 ±\pm 0.004
GINE 0.157 ±\pm 0.004
HIMP 0.151 ±\pm 0.006
PNA 0.140 ±\pm 0.006
GSN 0.115 ±\pm 0.012
PF-GNN 0.122 ±\pm 0.010
GIN-AK+ 0.080 ±\pm 0.001
CIN 0.079 ±\pm 0.006
(k,c)(k,c)()(\leq) 0.0750 ±\pm 0.0027

In addition to simulation datasets, we evaluate our (k,c)()(k,c)(\leq)-SetGNN on real-world data; Table 3 shows our performance on ZINC-12K. Our method achieves a new state-of-the-art performance, with a mean absolute error (MAE) of 0.0750, using kk==55 and cc==22.

In Table 5 we show the test and validation MAE along with training loss for varying kk and cc for ZINC-12K. For a fixed cc, validation MAE and training loss both follow a first decaying and later increasing trend with increasing kk, potentially owing to the difficulty in fitting with too many sets (i.e. supernodes) and edges in the super-graph.

Similar results are shown for QM9 in Table 5. For comparison we also show GINE performances using both 4 and 6 layers, both of which are significantly lower than (k,c)()(k,c)(\leq)-SetGNN.

Table 4: (k,c)()(k,c)(\leq)-SetGNN performances on ZINC-12K by varying (kk,cc). Test MAE at lowest Val. MAE, and lowest Test MAE.
kk cc Train loss Val. MAE Test MAE
2 1 0.1381 ± 0.0240 0.2429 ± 0.0071 0.2345 ± 0.0131
3 1 0.1172 ± 0.0063 0.2298 ± 0.0060 0.2252 ± 0.0030
4 1 0.0693 ± 0.0111 0.1645 ± 0.0052 0.1636 ± 0.0052
5 1 0.0643 ± 0.0019 0.1593 ± 0.0051 0.1447 ± 0.0013
6 1 0.0519 ± 0.0064 0.0994 ± 0.0093 0.0843 ± 0.0048
7 1 0.0543 ± 0.0048 0.0965 ± 0.0061 0.0747 ± 0.0022
8 1 0.0564 ± 0.0152 0.0961 ± 0.0043 0.0732 ± 0.0037
9 1 0.0817 ± 0.0274 0.0909 ± 0.0094 0.0824 ± 0.0056
10 1 0.0894 ± 0.0266 0.1060 ± 0.0157 0.0950 ± 0.0102
2 2 0.1783 ± 0.0602 0.2913 ± 0.0102 0.2948 ± 0.0210
3 2 0.0640 ± 0.0072 0.1668 ± 0.0078 0.1391 ± 0.0102
4 2 0.0499 ± 0.0043 0.1029 ± 0.0033 0.0836 ± 0.0010
5 2 0.0483 ± 0.0017 0.0899 ± 0.0056 0.0750 ± 0.0027
6 2 0.0530 ± 0.0064 0.0927 ± 0.0050 0.0737 ± 0.0006
7 2 0.0547 ± 0.0036 0.0984 ± 0.0047 0.0784 ± 0.0043
3 3 0.0798 ± 0.0062 0.1881 ± 0.0076 0.1722 ± 0.0086
4 3 0.0565 ± 0.0059 0.1121 ± 0.0066 0.0869 ± 0.0026
5 3 0.0671 ± 0.0156 0.1091 ± 0.0097 0.0920 ± 0.0054
Table 5: (k,c)()(k,c)(\leq)-SetGNN performances on QM9 by varying (kk,cc). Test MAE at lowest Val. MAE, and lowest Test MAE. All variances are 0.002{\leq}0.002 and thus omitted.
kk cc Train loss Val. MAE Test MAE
2 1 0.0376 ± 0.0005 0.0387 ± 0.0007 0.0389 ± 0.0008
3 1 0.0308 ± 0.0010 0.0386 ± 0.0017 0.0379 ± 0.0010
4 1 0.0338 ± 0.0003 0.0371 ± 0.0005 0.0370 ± 0.0006
5 1 0.0299 ± 0.0017 0.0343 ± 0.0008 0.0341 ± 0.0009
6 1 0.0226 ± 0.0004 0.0296 ± 0.0007 0.0293 ± 0.0007
7 1 0.0208 ± 0.0005 0.0289 ± 0.0007 0.0269 ± 0.0003
2 2 0.0367 ± 0.0007 0.0398 ± 0.0004 0.0398 ± 0.0004
3 2 0.0282 ± 0.0013 0.0358 ± 0.0009 0.0356 ± 0.0007
4 2 0.0219 ± 0.0004 0.0280 ± 0.0008 0.0278 ± 0.0008
5 2 0.0175 ± 0.0003 0.0267± 0.0005 0.0251± 0.0006
3 3 0.0391 ± 0.0107 0.0428 ± 0.0057 0.0425 ± 0.0052
4 3 0.0219 ± 0.0011 0.0301 ± 0.0010 0.0286 ± 0.0004
GINE   (LL==44) 0.0507 ± 0.0014 0.0478 ± 0.0003 0.0479 ± 0.0004
GINE   (LL==66) 0.0440 ± 0.0009 0.0440 ± 0.0009 0.0451 ± 0.0009

5.3 Computational requirements

Refer to caption
(a) Memory usage
Refer to caption
(b) Training time
Figure 2: (k,c)()(k,c)(\leq)-SetGNN’s footprint scales practically with both kk and cc in memory (a) and running time (b) – results on ZINC-12K.

We next investigate how increasing kk and cc change the computational footprint of (k,c)()(k,c)(\leq)-SetGNN in practice. Fig. 2 shows that increasing these parameters expectedly increases both memory consumption (in MB in (a)) as well as runtime (in seconds per epoch in (b)). Notably, since larger kk and cc increases our model’s expressivity, we observe that suitable choices for kk and cc allow us to practically realize these increasingly expressive models on commodity hardware. With conservative values of cc (e.g. cc==11), we are able to consider passing messages between sets of kk (e.g. 10) nodes far larger than kk-WL-style higher order models can achieve (\leq3).

6 Conclusion

Our work is motivated by the impracticality of higher-order GNN models based on the kk-WL hierarchy, which make it challenging to study how much expressiveness real-world tasks truly necessitate. To this end, we proposed (k,c)()(k,c)(\leq)-SetWL, a more practical and progressively-expressive hierarchy with theoretical connections to kk-WL and drastically lowered complexity. We also designed and implemented a practical model (k,c)()(k,c)(\leq)-SetGNN()(^{*}), expressiveness of which is gradually increased by larger kk and cc. Our model achieves strong performance, including several new best results on graph-level tasks like ZINC-12K and expressiveness-relevant tasks like substructure counting, while being practically trainable on commodity hardware.

References

  • Abboud et al. [2021] R. Abboud, İ. İ. Ceylan, M. Grohe, and T. Lukasiewicz. The surprising power of graph neural networks with random node initialization. In Proceedings of the Thirtieth International Joint Conference on Artifical Intelligence (IJCAI), 2021.
  • Alon and Yahav [2021] U. Alon and E. Yahav. On the bottleneck of graph neural networks and its practical implications. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=i80OPhOCVH2.
  • Arvind et al. [2020] V. Arvind, F. Fuhlbrück, J. Köbler, and O. Verbitsky. On weisfeiler-leman invariance: Subgraph counts and related graph properties. Journal of Computer and System Sciences, 113:42–59, 2020.
  • Azizian and Lelarge [2020] W. Azizian and M. Lelarge. Expressive power of invariant and equivariant graph neural networks. arXiv preprint arXiv:2006.15646, 2020.
  • Babai et al. [1980] L. Babai, P. Erdos, and S. M. Selkow. Random graph isomorphism. SIAM Journal on Computing, 9(3):628–635, 1980.
  • Balcilar et al. [2021] M. Balcilar, P. Héroux, B. Gauzere, P. Vasseur, S. Adam, and P. Honeine. Breaking the limits of message passing graph neural networks. In International Conference on Machine Learning, pages 599–608. PMLR, 2021.
  • Battaglia et al. [2018] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
  • Bevilacqua et al. [2021] B. Bevilacqua, F. Frasca, D. Lim, B. Srinivasan, C. Cai, G. Balamurugan, M. M. Bronstein, and H. Maron. Equivariant subgraph aggregation networks. arXiv preprint arXiv:2110.02910, 2021.
  • Bodnar et al. [2021] C. Bodnar, F. Frasca, N. Otter, Y. Wang, P. Lio, G. F. Montufar, and M. Bronstein. Weisfeiler and lehman go cellular: Cw networks. Advances in Neural Information Processing Systems, 34:2625–2640, 2021.
  • Bouritsas et al. [2022] G. Bouritsas, F. Frasca, S. P. Zafeiriou, and M. Bronstein. Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  • Cai et al. [1992] J.-Y. Cai, M. Fürer, and N. Immerman. An optimal lower bound on the number of variables for graph identification. Combinatorica, 12(4):389–410, 1992.
  • Chen et al. [2020a] D. Chen, Y. Lin, W. Li, P. Li, J. Zhou, and X. Sun. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3438–3445, 2020a.
  • Chen et al. [2020b] L. Chen, Z. Chen, and J. Bruna. On graph neural networks versus graph-augmented mlps. arXiv preprint arXiv:2010.15116, 2020b.
  • Chen et al. [2019] Z. Chen, L. Li, and J. Bruna. Supervised community detection with line graph neural networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=H1g0Z3A9Fm.
  • Chen et al. [2020c] Z. Chen, L. Chen, S. Villar, and J. Bruna. Can graph neural networks count substructures? Advances in neural information processing systems, 33:10383–10395, 2020c.
  • Choudhary and DeCost [2021] K. Choudhary and B. DeCost. Atomistic line graph neural network for improved materials property predictions. npj Computational Materials, 7(1):1–8, 2021.
  • Corso et al. [2020] G. Corso, L. Cavalleri, D. Beaini, P. Liò, and P. Veličković. Principal neighbourhood aggregation for graph nets. In Advances in Neural Information Processing Systems, volume 33, pages 13260–13271, 2020.
  • Dupty et al. [2022] M. H. Dupty, Y. Dong, and W. S. Lee. PF-GNN: Differentiable particle filtering based approximation of universal graph representations. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=oh4TirnfSem.
  • Dwivedi et al. [2020] V. P. Dwivedi, C. K. Joshi, T. Laurent, Y. Bengio, and X. Bresson. Benchmarking graph neural networks. arXiv preprint arXiv:2003.00982, 2020.
  • Fey and Lenssen [2019] M. Fey and J. E. Lenssen. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
  • Frasca et al. [2022] F. Frasca, B. Bevilacqua, M. M. Bronstein, and H. Maron. Understanding and extending subgraph gnns by rethinking their symmetries. arXiv preprint arXiv:2206.11140, 2022.
  • Garg et al. [2020] V. Garg, S. Jegelka, and T. Jaakkola. Generalization and representational limits of graph neural networks. In International Conference on Machine Learning, pages 3419–3430. PMLR, 2020.
  • Geerts [2020] F. Geerts. The expressive power of kth-order invariant graph networks. arXiv preprint arXiv:2007.12035, 2020.
  • Geerts and Reutter [2022] F. Geerts and J. L. Reutter. Expressiveness and approximation properties of graph neural networks. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=wIzUeM3TAU.
  • Gilmer et al. [2017] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
  • Grohe [2021] M. Grohe. The logic of graph neural networks. In 2021 36th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–17. IEEE, 2021.
  • Grohe [2022] M. Grohe. Multiset wl. Personal communication, 2022.
  • Grohe and Otto [2015] M. Grohe and M. Otto. Pebble games and linear equations. The Journal of Symbolic Logic, 80(3):797–844, 2015.
  • Hawkins [2004] D. M. Hawkins. The problem of overfitting. Journal of chemical information and computer sciences, 44(1):1–12, 2004.
  • Hella [1996] L. Hella. Logical hierarchies in ptime. Information and Computation, 129(1):1–19, 1996.
  • Hornik et al. [1989] K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.
  • Hu et al. [2020] W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec. Strategies for pre-training graph neural networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=HJlWWJSFDH.
  • Keriven and Peyré [2019] N. Keriven and G. Peyré. Universal invariant and equivariant graph neural networks. Advances in Neural Information Processing Systems, 32, 2019.
  • Kim et al. [2022] J. Kim, T. D. Nguyen, S. Min, S. Cho, M. Lee, H. Lee, and S. Hong. Pure transformers are powerful graph learners. arXiv, abs/2207.02505, 2022. URL https://arxiv.org/abs/2207.02505.
  • Kipf and Welling [2016] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  • Kipf and Welling [2017] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
  • Klicpera et al. [2018] J. Klicpera, A. Bojchevski, and S. Günnemann. Predict then propagate: Graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997, 2018.
  • Loukas [2020] A. Loukas. What graph neural networks cannot learn: depth vs width. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=B1l2bp4YwS.
  • Ma et al. [2021] Y. Ma, X. Liu, T. Zhao, Y. Liu, J. Tang, and N. Shah. A unified view on graph neural networks as graph signal denoising. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 1202–1211, 2021.
  • Maron et al. [2019a] H. Maron, H. Ben-Hamu, H. Serviansky, and Y. Lipman. Provably powerful graph networks. Advances in neural information processing systems, 32, 2019a.
  • Maron et al. [2019b] H. Maron, E. Fetaya, N. Segol, and Y. Lipman. On the universality of invariant networks. In International conference on machine learning, pages 4363–4371. PMLR, 2019b.
  • Morris et al. [2017] C. Morris, K. Kersting, and P. Mutzel. Glocalized weisfeiler-lehman graph kernels: Global-local feature maps of graphs. In 2017 IEEE International Conference on Data Mining (ICDM), pages 327–336. IEEE, 2017.
  • Morris et al. [2019] C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 4602–4609, 2019.
  • Morris et al. [2020] C. Morris, G. Rattan, and P. Mutzel. Weisfeiler and leman go sparse: Towards scalable higher-order graph embeddings. Advances in Neural Information Processing Systems, 33:21824–21840, 2020.
  • Morris et al. [2022] C. Morris, G. Rattan, S. Kiefer, and S. Ravanbakhsh. Speqnets: Sparsity-aware permutation-equivariant graph networks. arXiv preprint arXiv:2203.13913, 2022.
  • Murphy et al. [2019] R. Murphy, B. Srinivasan, V. Rao, and B. Ribeiro. Relational pooling for graph representations. In International Conference on Machine Learning, pages 4663–4673. PMLR, 2019.
  • Oono and Suzuki [2019] K. Oono and T. Suzuki. Graph neural networks exponentially lose expressive power for node classification. arXiv preprint arXiv:1905.10947, 2019.
  • Qian et al. [2022] C. Qian, G. Rattan, F. Geerts, C. Morris, and M. Niepert. Ordered subgraph aggregation networks. arXiv preprint arXiv:2206.11168, 2022.
  • Read and Corneil [1977] R. C. Read and D. G. Corneil. The graph isomorphism disease. Journal of graph theory, 1(4):339–363, 1977.
  • Rossi et al. [2020] E. Rossi, F. Frasca, B. Chamberlain, D. Eynard, M. Bronstein, and F. Monti. Sign: Scalable inception graph neural networks. arXiv preprint arXiv:2004.11198, 2020.
  • Sankar et al. [2021] A. Sankar, Y. Liu, J. Yu, and N. Shah. Graph neural networks for friend ranking in large-scale social platforms. In Proceedings of the Web Conference 2021, pages 2535–2546, 2021.
  • Sato [2020] R. Sato. A survey on the expressive power of graph neural networks. CoRR, abs/2003.04078, 2020. URL http://dblp.uni-trier.de/db/journals/corr/corr2003.html#abs-2003-04078.
  • Sato et al. [2021] R. Sato, M. Yamada, and H. Kashima. Random features strengthen graph neural networks. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), pages 333–341. SIAM, 2021.
  • Shervashidze et al. [2011] N. Shervashidze, P. Schweitzer, E. J. Van Leeuwen, K. Mehlhorn, and K. M. Borgwardt. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(9), 2011.
  • Topping et al. [2021] J. Topping, F. Di Giovanni, B. P. Chamberlain, X. Dong, and M. M. Bronstein. Understanding over-squashing and bottlenecks on graphs via curvature. arXiv preprint arXiv:2111.14522, 2021.
  • Veličković et al. [2017] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
  • Weisfeiler [1976] B. Weisfeiler. On construction and identification of graphs. In LECTURE NOTES IN MATHEMATICS. Citeseer, 1976.
  • Wu et al. [2020a] S. Wu, F. Sun, W. Zhang, X. Xie, and B. Cui. Graph neural networks in recommender systems: a survey. ACM Computing Surveys (CSUR), 2020a.
  • Wu et al. [2018] Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V. Pande. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
  • Wu et al. [2020b] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24, 2020b.
  • Xu et al. [2019] K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How Powerful are Graph Neural Networks? In Proceedings of the 7th International Conference on Learning Representations, ICLR ’19, pages 1–17, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
  • You et al. [2019] J. You, R. Ying, and J. Leskovec. Position-aware graph neural networks. In International Conference on Machine Learning, pages 7134–7143. PMLR, 2019.
  • You et al. [2021] J. You, J. Gomes-Selman, R. Ying, and J. Leskovec. Identity-aware graph neural networks. arXiv preprint arXiv:2101.10320, 2021.
  • Zaheer et al. [2017] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola. Deep sets. Advances in neural information processing systems, 30, 2017.
  • Zhang and Li [2021] M. Zhang and P. Li. Nested graph neural networks. Advances in Neural Information Processing Systems, 34:15734–15747, 2021.
  • Zhang et al. [2020] M. Zhang, P. Li, Y. Xia, K. Wang, and L. Jin. Revisiting graph neural networks for link prediction. 2020.
  • Zhao and Akoglu [2020] L. Zhao and L. Akoglu. Pairnorm: Tackling oversmoothing in {gnn}s. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rkecl1rtwB.
  • Zhao et al. [2022] L. Zhao, W. Jin, L. Akoglu, and N. Shah. From stars to subgraphs: Uplifting any gnn with local structure awareness. ICLR, 2022.

Appendix A Appendix

A.1 Visualization Supergraph Connection of 2-SetWL

Refer to caption
Figure 3: Comparison between 2-SetWL and 2-WL in their supergraph connection

A.2 Bijective kk-Pebble Game for kk-WL

The pebble game characterization of k-FWL appeared in [11]. We use the pebble game defined in [28] for k-WL. Let G,GG,G^{\prime} be graphs and 𝒗0,𝒗0\overrightarrow{\boldsymbol{v}}_{0},\overrightarrow{\boldsymbol{v}}_{0}^{\prime} be kk-tuple. Then the bijective kk-pebble game BPk\text{BP}_{k}(G,𝒗0,G,𝒗0G,\overrightarrow{\boldsymbol{v}}_{0},G^{\prime},\overrightarrow{\boldsymbol{v}}^{\prime}_{0}) is defined as the follows. The game is played by two players Player I and Player II. The game proceeds in rounds starting from initial position (𝒗0,𝒗0)(\overrightarrow{\boldsymbol{v}}_{0},\overrightarrow{\boldsymbol{v}}_{0}^{\prime}) and continues to new position (𝒗t,𝒗t)(\overrightarrow{\boldsymbol{v}}_{t},\overrightarrow{\boldsymbol{v}}_{t}^{\prime}) as long as after tt rounds 𝒗t[i]𝒗t[i]\overrightarrow{\boldsymbol{v}}_{t}[i]\mapsto\overrightarrow{\boldsymbol{v}}^{\prime}_{t}[i] defines an isomorphism between G[𝒗t]G[\overrightarrow{\boldsymbol{v}}_{t}] and G[𝒗t]G^{\prime}[\overrightarrow{\boldsymbol{v}}^{\prime}_{t}]. If the game has not ended after tt rounds, Player II wins the tt-step BPk\text{BP}_{k}(G,𝒗0,G,𝒗0G,\overrightarrow{\boldsymbol{v}}_{0},G^{\prime},\overrightarrow{\boldsymbol{v}}^{\prime}_{0}), otherwise Player I wins.

The tt-th round is played as follows.

  1. 1.

    Player I picks up the ii-th pair of pebbles with i[k]i\in[k].

  2. 2.

    Player II chooses bijection f:V(G)V(G)f:V(G)\to V(G^{\prime}).

  3. 3.

    Player I chooses xV(G)x\in V(G).

  4. 4.

    The new position is (𝒗t[x/i]\overrightarrow{\boldsymbol{v}}_{t}[x/i], 𝒗t[f(x)/i]\overrightarrow{\boldsymbol{v}}^{\prime}_{t}[f(x)/i]). If G[𝒗t[x/i]]G[\overrightarrow{\boldsymbol{v}}_{t}[x/i]] and G[𝒗t[f(x)/i]]G[\overrightarrow{\boldsymbol{v}}^{\prime}_{t}[f(x)/i]] are still isomorphic, the game continues. Otherwise, the game ends and Player II loses.

The game BPk\text{BP}_{k}(G,𝒗0,G,𝒗0G,\overrightarrow{\boldsymbol{v}}_{0},G^{\prime},\overrightarrow{\boldsymbol{v}}^{\prime}_{0}) has the same expressivity as kk-WL in distinguishing (G,𝒗0G,\overrightarrow{\boldsymbol{v}}_{0}) and (G,𝒗0G^{\prime},\overrightarrow{\boldsymbol{v}}^{\prime}_{0}).

Theorem 9.

kk-WL cannot distinguish (G,𝐯0G,\overrightarrow{\boldsymbol{v}}_{0}) and (G,𝐯0G^{\prime},\overrightarrow{\boldsymbol{v}}^{\prime}_{0}) at step tt, i.e. 𝐰𝐥k(t)(G,𝐯0)=𝐰𝐥k(t)(G,𝐯0)\bm{wl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}}_{0})=\bm{wl}_{k}^{(t)}(G^{\prime},\overrightarrow{\boldsymbol{v}}^{\prime}_{0}) , if and only if Player II has a winning strategy for tt-step BPk\text{BP}_{k}(G,𝐯0,G,𝐯0G,\overrightarrow{\boldsymbol{v}}_{0},G^{\prime},\overrightarrow{\boldsymbol{v}}^{\prime}_{0}).

There is an extension of the pebble game, BPk\text{BP}_{k}(G,GG,G^{\prime}), without specifying the starting position. Specifically, at the beginning of the game, Player II is first asked to provide a bijection g:V(G)kV(G)k.g:V(G)^{k}\to V(G^{\prime})^{k}. Then Player I chooses 𝒗0V(G)k\overrightarrow{\boldsymbol{v}}_{0}\in V(G)^{k}. Then Player I and Player II start to play BPk\text{BP}_{k}(G,𝒗0,G,g(𝒗0)G,\overrightarrow{\boldsymbol{v}}_{0},G^{\prime},g(\overrightarrow{\boldsymbol{v}}_{0})).

Theorem 10.

kk-WL cannot distinguish G and GG^{\prime} at step tt, i.e. 𝐠𝐰𝐥k(t)(G)=𝐠𝐰𝐥k(t)(G)\bm{gwl}_{k}^{(t)}(G)=\bm{gwl}_{k}^{(t)}(G^{\prime}) , if and only if Player II has a winning strategy for tt-step BPk\text{BP}_{k}(G,G)G,G^{\prime}).

The proof of Theorem 9 and Theorem 10 can be found in Hella’s work [30].

A.3 Doubly Bijective kk-Pebble Game for kk-MultisetWL

Let G,GG,G^{\prime} be graphs and 𝒗~0,𝒗~0\tilde{\boldsymbol{v}}_{0},\tilde{\boldsymbol{v}}^{\prime}_{0} be kk-multisets. We adapt the BPk\text{BP}_{k} game for kk-MultisetWL and call it doubly bijective kk-pebble game, i.e. DBPk(G,𝒗~0,G,𝒗~0)\text{DBP}_{k}(G,\tilde{\boldsymbol{v}}_{0},G^{\prime},\tilde{\boldsymbol{v}}^{\prime}_{0}). A similar version of the pebble game for kk-MultisetWL was suggested by Grohe in [27].

The DBPk(G,𝒗~0,G,𝒗~0)\text{DBP}_{k}(G,\tilde{\boldsymbol{v}}_{0},G^{\prime},\tilde{\boldsymbol{v}}^{\prime}_{0}) is defined as the follows. The game starts at the position (𝒗~0,𝒗~0\tilde{\boldsymbol{v}}_{0},\tilde{\boldsymbol{v}}^{\prime}_{0}).

Let the current position be (𝒗~t,𝒗~t)(\tilde{\boldsymbol{v}}_{t},\tilde{\boldsymbol{v}}^{\prime}_{t}). The tt-th round is played as follows.

  1. 1.

    Player II chooses a bijection h:𝒗~t𝒗~th:\tilde{\boldsymbol{v}}_{t}\to\tilde{\boldsymbol{v}}^{\prime}_{t}

  2. 2.

    Player I chooses the y𝒗~ty\in\tilde{\boldsymbol{v}}_{t}

  3. 3.

    Player II chooses bijection f:V(G)V(G)f:V(G)\to V(G^{\prime})

  4. 4.

    Player I chooses xV(G)x\in V(G)

  5. 5.

    The new position is (𝒗~t[x/idx𝒗~t(y)],𝒗~t[f(x)/idx𝒗~t(h(y))])(\tilde{\boldsymbol{v}}_{t}[x/\text{idx}_{\tilde{\boldsymbol{v}}_{t}}(y)],\tilde{\boldsymbol{v}}^{\prime}_{t}[f(x)/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}_{t}}(h(y))]) and the game continues, if G[𝒗~t[x/idx𝒗~t(y)]]G[\tilde{\boldsymbol{v}}_{t}[x/\text{idx}_{\tilde{\boldsymbol{v}}_{t}}(y)]] and G[𝒗~t[f(x)/idx𝒗~t(h(y))]]G^{\prime}[\tilde{\boldsymbol{v}}^{\prime}_{t}[f(x)/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}_{t}}(h(y))]] are isomorphic. Otherwise the game ends and Player II loses.

To extend the game to the setting of no starting position, we define the corresponding game DBPk(G,G)\text{DBP}_{k}(G,G^{\prime}). Simliar to BPk(G,G)\text{BP}_{k}(G,G^{\prime}), at the beginning Player II is asked to pick up a bijection g:Multiset(V(G)k)Multiset(V(G)k)g:\text{Multiset}(V(G)^{k})\to\text{Multiset}(V(G^{\prime})^{k}). Then Player I picks up a 𝒗~0\tilde{\boldsymbol{v}}_{0} and the game DBPk(G,𝒗~0,G,g(𝒗~0))\text{DBP}_{k}(G,\tilde{\boldsymbol{v}}_{0},G^{\prime},g(\tilde{\boldsymbol{v}}_{0})) starts.

Theorem 3.

kk-MultisetWL has the same expressivity as the doubly bijective kk-pebble game.

Proof.

Specifically, given graph GG with kk-multiset 𝒗~\tilde{\boldsymbol{v}} and graph GG^{\prime} with kk-multiset 𝒗~\tilde{\boldsymbol{v}}^{\prime}, we are going to prove that 𝒎𝒘𝒍k(t)(G,𝒗~)=𝒎𝒘𝒍k(t)(G,𝒗~)\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime})\Longleftrightarrow Player II has a winning strategy for tt-step game DBPk(G,𝒗~,G,𝒗~)\text{DBP}_{k}(G,\tilde{\boldsymbol{v}},G^{\prime},\tilde{\boldsymbol{v^{\prime}}}).

We now prove it by induction on tt. When t=0t=0, it’s obvious that the statement is correct, as 𝒎𝒘𝒍k(0)(G,𝒗~)=𝒎𝒘𝒍k(0)(G,𝒗~)\bm{mwl}_{k}^{(0)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(0)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}) is equivalent to G[𝒗~]G[\tilde{\boldsymbol{v}}] and G[𝒗~]G^{\prime}[\tilde{\boldsymbol{v}}^{\prime}] being isomorphic, which implies that Player II can start the game without losing. Now assume that for tlt\leq l the statement is correct. For step t=l+1t=l+1, let’s first prove left \Longrightarrow right.

𝒎𝒘𝒍k(l+1)(G,𝒗~)=𝒎𝒘𝒍k(l+1)(G,𝒗~)\bm{mwl}_{k}^{(l+1)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(l+1)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}). By Eq.(3) we know this is equivalent to
(1) 𝒎𝒘𝒍k(l)(G,𝒗~)=𝒎𝒘𝒍k(l)(G,𝒗~)\bm{mwl}_{k}^{(l)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(l)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime})
(2)  bijective h:𝒗~𝒗~\exists\text{ bijective }h:\tilde{\boldsymbol{v}}\to\tilde{\boldsymbol{v}}^{\prime}, y𝒗~\forall y\in\tilde{\boldsymbol{v}}, {{𝒎𝒘𝒍k(l)(G,𝒗~[x/idx𝒗~(y)])|xV(G)}}={{𝒎𝒘𝒍k(l)(G,𝒗~[x/idx𝒗~(h(y))])|xV(G)}}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(l)}(G,\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])\ \big{|}\ x\in V(G)\}\mskip-5.0mu\}=\{\mskip-5.0mu\{\bm{mwl}_{k}^{(l)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}[x/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))])\ \big{|}\ x\in V(G^{\prime})\}\mskip-5.0mu\}.

Now let’s start the DBPk\text{DBP}_{k} game at position (𝒗~,𝒗~\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}). By (2) we know that there exist a hh satisfying (2). Then at the first round, we as Player II pick the hh as the bijection. Next Player I will choose a y𝒗~y\in\tilde{\boldsymbol{v}}. According to (2), for the hh and yy we have {{𝒎𝒘𝒍k(l)(G,𝒗~[x/idx𝒗~(y)])|xV(G)}}={{𝒎𝒘𝒍k(l)(G,𝒗~[x/idx𝒗~(h(y))])|xV(G)}}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(l)}(G,\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])\ \big{|}\ x\in V(G)\}\mskip-5.0mu\}=\{\mskip-5.0mu\{\bm{mwl}_{k}^{(l)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}[x/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))])\ \big{|}\ x\in V(G^{\prime})\}\mskip-5.0mu\}. This implies that there exists a bijection f:V(G)V(G)f:V(G)\to V(G^{\prime}) such that xV(G),𝒎𝒘𝒍k(l)(G,𝒗~[x/idx𝒗~(y)])=𝒎𝒘𝒍k(l)(G,𝒗~[f(x)/idx𝒗~(h(y))])\forall x\in V(G),\bm{mwl}_{k}^{(l)}(G,\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])=\bm{mwl}_{k}^{(l)}(G,\tilde{\boldsymbol{v}}^{\prime}[f(x)/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))]). Hence let Player II pick the ff. Now player I will choose a xV(G)x\in V(G). Then 𝒎𝒘𝒍k(l)(G,𝒗~[x/idx𝒗~(y)])=𝒎𝒘𝒍k(l)(G,𝒗~[f(x)/idx𝒗~(h(y))])\bm{mwl}_{k}^{(l)}(G,\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])=\bm{mwl}_{k}^{(l)}(G,\tilde{\boldsymbol{v}}^{\prime}[f(x)/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))]) implies 𝒎𝒘𝒍k(0)(G,𝒗~[x/idx𝒗~(y)])=𝒎𝒘𝒍k(0)(G,𝒗~[f(x)/idx𝒗~(h(y))])\bm{mwl}_{k}^{(0)}(G,\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])=\bm{mwl}_{k}^{(0)}(G,\tilde{\boldsymbol{v}}^{\prime}[f(x)/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))]), hence G[𝒗~t[x/idx𝒗~t(y)]]G[\tilde{\boldsymbol{v}}_{t}[x/\text{idx}_{\tilde{\boldsymbol{v}}_{t}}(y)]] and G[𝒗~t[f(x)/idx𝒗~t(h(y))]]G^{\prime}[\tilde{\boldsymbol{v}}^{\prime}_{t}[f(x)/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}_{t}}(h(y))]] are isomorphic. So the game doesn’t end. At the next round, the game starts at the position (𝒗~[x/idx𝒗~(y)],𝒗~[f(x)/idx𝒗~(h(y))])(\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)],\tilde{\boldsymbol{v}}^{\prime}[f(x)/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))]), and we know 𝒎𝒘𝒍k(l)(G,𝒗~[x/idx𝒗~(y)])=𝒎𝒘𝒍k(l)(G,𝒗~[f(x)/idx𝒗~(h(y))])\bm{mwl}_{k}^{(l)}(G,\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])=\bm{mwl}_{k}^{(l)}(G,\tilde{\boldsymbol{v}}^{\prime}[f(x)/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))]). By the inductive hypothesis, Player II has a strategy to play DBPk at the new position for ll rounds. Hence Player II has a winning strategy to play DBPk at original position (𝒗~,𝒗~\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}) for l+1l+1 rounds.

Next we prove right \Longrightarrow left by showing that 𝒎𝒘𝒍k(l+1)(G,𝒗~)𝒎𝒘𝒍k(l+1)(G,𝒗~)\bm{mwl}_{k}^{(l+1)}(G,\tilde{\boldsymbol{v}})\neq\bm{mwl}_{k}^{(l+1)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime})\Longrightarrow Player I has a winning strategy for l+1l+1-step game DBPk(G,𝒗~,G,𝒗~)\text{DBP}_{k}(G,\tilde{\boldsymbol{v}},G^{\prime},\tilde{\boldsymbol{v^{\prime}}}).

𝒎𝒘𝒍k(l+1)(G,𝒗~)𝒎𝒘𝒍k(l+1)(G,𝒗~)\bm{mwl}_{k}^{(l+1)}(G,\tilde{\boldsymbol{v}})\neq\bm{mwl}_{k}^{(l+1)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}) implies:
(1) 𝒎𝒘𝒍k(l)(G,𝒗~)𝒎𝒘𝒍k(l)(G,𝒗~)\bm{mwl}_{k}^{(l)}(G,\tilde{\boldsymbol{v}})\neq\bm{mwl}_{k}^{(l)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime})
or (2) for any bijection h:𝒗~𝒗~h:\tilde{\boldsymbol{v}}\to\tilde{\boldsymbol{v}}^{\prime}, y𝒗~\exists y\in\tilde{\boldsymbol{v}}, such that {{𝒎𝒘𝒍k(l)(G,𝒗~[x/idx𝒗~(y)])|xV(G)}}{{𝒎𝒘𝒍k(l)(G,𝒗~[x/idx𝒗~(h(y))])|xV(G)}}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(l)}(G,\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])\ \big{|}\ x\in V(G)\}\mskip-5.0mu\}\neq\{\mskip-5.0mu\{\bm{mwl}_{k}^{(l)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}[x/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))])\ \big{|}\ x\in V(G^{\prime})\}\mskip-5.0mu\}

If (1) holds, then by induction we know that Player I has a winning strategy within ll-steps hence Player I also has a winning strategy within l+1l+1 steps. If (2) holds, then at the first round after Player II picks up a bijection hh, Player I can choose the specific y𝒗~y\in\tilde{\boldsymbol{v}} with {{𝒎𝒘𝒍k(l)(G,𝒗~[x/idx𝒗~(y)])|xV(G)}}{{𝒎𝒘𝒍k(l)(G,𝒗~[x/idx𝒗~(h(y))])|xV(G)}}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(l)}(G,\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])\ \big{|}\ x\in V(G)\}\mskip-5.0mu\}\neq\{\mskip-5.0mu\{\bm{mwl}_{k}^{(l)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}[x/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))])\ \big{|}\ x\in V(G^{\prime})\}\mskip-5.0mu\}. Then no matter which bijection f:V(G)V(G)f:V(G)\to V(G^{\prime}) Player II chooses, Player I can always choose a xV(G)x\in V(G) such that 𝒎𝒘𝒍k(l)(G,𝒗~[x/idx𝒗~(y)])𝒎𝒘𝒍k(l)(G,𝒗~[x/idx𝒗~(h(y))])\bm{mwl}_{k}^{(l)}(G,\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])\neq\bm{mwl}_{k}^{(l)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}[x/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))]). Then by induction, Player I has a winning strategy within ll-steps for the DBPk starts at position (𝒗~[x/idx𝒗~(y)],𝒗~[x/idx𝒗~(h(y))])(\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)],\tilde{\boldsymbol{v}}^{\prime}[x/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))]). Hence even if Player I doesn’t win in the first round, he/she can still win in l+1l+1 rounds.

Combining both sides we know that the equivalence between tt-step DBPk and tt-step kk-MultisetWL holds for any tt and any kk-multisets.

Theorem 11.

kk-MultisetWL cannot distinguish G and GG^{\prime} at step tt, i.e. 𝐠𝐦𝐰𝐥k(t)(G)=𝐠𝐦𝐰𝐥k(t)(G)\bm{gmwl}_{k}^{(t)}(G)=\bm{gmwl}_{k}^{(t)}(G^{\prime}) , if and only if Player II has a winning strategy for tt-step DBPk\text{DBP}_{k}(G,G)G,G^{\prime}).

Proof.

The proof is strict forward with using the proof inside Theorem 3. We just need to show that the pooling of all kk-multisets representations is equivalent to Player II finding a bijection g:Multiset(V(G)k)Multiset(V(G)k)g:\text{Multiset}(V(G)^{k})\to\text{Multiset}(V(G^{\prime})^{k}) at the first step of the game. We omit that given its simplicity. ∎

A.4 Proofs of Theorems

A.4.1 Bound of Summation of Binomial Coefficients

Derivation of:

i=1k(ni)(nk)nk+1n2k+1\displaystyle\sum_{i=1}^{k}\binom{n}{i}\leq\binom{n}{k}\frac{n-k+1}{n-2k+1} (15)
Proof.
i=1k(ni)(nk)\displaystyle\frac{\sum_{i=1}^{k}\binom{n}{i}}{\binom{n}{k}} =(nk)+(nk1)+(nk2)+(nk)\displaystyle=\frac{\binom{n}{k}+\binom{n}{k-1}+\binom{n}{k-2}+...}{\binom{n}{k}} (16)
=1+knk+1+k(k1)(nk+1)(nk+2)+\displaystyle=1+\frac{k}{n-k+1}+\frac{k(k-1)}{(n-k+1)(n-k+2)}+... (17)
1+knk+1+(knk+1)2+\displaystyle\leq 1+\frac{k}{n-k+1}+(\frac{k}{n-k+1})^{2}+... (18)
nk+1n2k+1\displaystyle\leq\frac{n-k+1}{n-2k+1} (19)

A.4.2 Proof of Theorem 1

Theorem 1.

Let k1k\geq 1 and 𝐰𝐥k(t)(G,𝐯~):={{𝐰𝐥k(t)(G,p(𝐯~))|pperm[k]}}\bm{wl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}}):=\{\mskip-5.0mu\{\bm{wl}_{k}^{(t)}(G,p(\tilde{\boldsymbol{v}}))|p\in\text{perm[k]}\}\mskip-5.0mu\}. For all tt\in\mathbb{N} and all graphs G,GG,G^{\prime}: kk-MultisetWL is upper bounded by kk-WL in distinguishing multisets G,𝐯~G,\tilde{\boldsymbol{v}} and G,𝐯~G^{\prime},\tilde{\boldsymbol{v}}^{\prime} at tt-th iteration, i.e. 𝐰𝐥k(t)(G,𝐯~)=𝐰𝐥k(t)(G,𝐯~)\bm{wl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{wl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}) \Longrightarrow 𝐦𝐰𝐥k(t)(G,𝐯~)=𝐦𝐰𝐥k(t)(G,𝐯~)\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}).

Proof.

By induction on tt. It’s obvious that when t=0t=0 the above statement hold, as both side are equivalent to G[𝒗~]G[\tilde{\boldsymbol{v}}] and G[𝒗~]G^{\prime}[\tilde{\boldsymbol{v}}^{\prime}] being isomorphic to each other. Assume t\leq t the above statement is true. For t+1t+1 case, by definition the left side is equivalent to {{𝒘𝒍k(t+1)(G,p(𝒗~))|pperm[k]}}={{𝒘𝒍k(t+1)(G,p(𝒗~))|pperm[k]}}\{\mskip-5.0mu\{\bm{wl}_{k}^{(t+1)}(G,p(\tilde{\boldsymbol{v}}))|p\in\text{perm[k]}\}\mskip-5.0mu\}=\{\mskip-5.0mu\{\bm{wl}_{k}^{(t+1)}(G^{\prime},p(\tilde{\boldsymbol{v}}^{\prime}))|p\in\text{perm[k]}\}\mskip-5.0mu\}. Let 𝒗\overrightarrow{\boldsymbol{v}} be the ordered version of 𝒗~\tilde{\boldsymbol{v}} following canonical ordering over GG, then there exists a bijective mapping ff between 𝒗~\tilde{\boldsymbol{v}} and 𝒗~\tilde{\boldsymbol{v}}^{\prime}, such that 𝒘𝒍k(t+1)(G,𝒗)=𝒘𝒍k(t+1)(G,f(𝒗))\bm{wl}_{k}^{(t+1)}(G,\overrightarrow{\boldsymbol{v}})=\bm{wl}_{k}^{(t+1)}(G^{\prime},f(\overrightarrow{\boldsymbol{v}})), where f(𝒗):=(f(𝒗1),,f(𝒗k))f(\overrightarrow{\boldsymbol{v}}):=(f(\overrightarrow{\boldsymbol{v}}_{1}),...,f(\overrightarrow{\boldsymbol{v}}_{k})). By [11]’s Theorem 5.2, for any tt, 𝒘𝒍k(t)(G,𝒗)=𝒘𝒍k(t)(G,f(𝒗))\bm{wl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}})=\bm{wl}_{k}^{(t)}(G^{\prime},f(\overrightarrow{\boldsymbol{v}})) is equivalent to that player II has a winner strategy for tt-step pebble game with initial configuration G,𝒗G,\overrightarrow{\boldsymbol{v}} and G,f(𝒗)G^{\prime},f(\overrightarrow{\boldsymbol{v}}) (please refer to [11] for the description of pebble game). Notice that applying any permutation to the pebble game’s initial configuration won’t change having winner strategy for player II, hence we know that pperm[k]\forall p\in\text{perm[k]}, 𝒘𝒍k(t+1)(G,p(𝒗))=𝒘𝒍k(t+1)(G,p(f(𝒗)))\bm{wl}_{k}^{(t+1)}(G,p(\overrightarrow{\boldsymbol{v}}))=\bm{wl}_{k}^{(t+1)}(G^{\prime},p(f(\overrightarrow{\boldsymbol{v}}))). Now applying Eq.(1), we know that (1) 𝒘𝒍k(t)(G,p(𝒗))=𝒘𝒍k(t)(G,p(f(𝒗)))\bm{wl}_{k}^{(t)}(G,p(\overrightarrow{\boldsymbol{v}}))=\bm{wl}_{k}^{(t)}(G^{\prime},p(f(\overrightarrow{\boldsymbol{v}}))), and (2)i[k]\forall i\in[k], {{𝒘𝒍k(t)(G,p(𝒗[x/i]))|xV(G)}}\{\mskip-5.0mu\{\bm{wl}_{k}^{(t)}(G,p(\overrightarrow{\boldsymbol{v}}[x/i]))\ |\ x\in V(G)\}\mskip-5.0mu\} = {{𝒘𝒍k(t)(G,p(f(𝒗)[x/i]))|xV(G)}}\{\mskip-5.0mu\{\bm{wl}_{k}^{(t)}(G^{\prime},p(f(\overrightarrow{\boldsymbol{v}})[x/i]))\ |\ x\in V(G^{\prime})\}\mskip-5.0mu\}. We rewrite (2) as, i[k]\forall i\in[k],  bijective hi:V(G)V(G)\exists\text{ bijective }h_{i}:V(G)\rightarrow V(G^{\prime}), xV(G)\forall x\in V(G), 𝒘𝒍k(t)(G,p(𝒗[x/i]))=𝒘𝒍k(t)(G,p(f(𝒗)[hi(x)/i]))\bm{wl}_{k}^{(t)}(G,p(\overrightarrow{\boldsymbol{v}}[x/i]))=\bm{wl}_{k}^{(t)}(G^{\prime},p(f(\overrightarrow{\boldsymbol{v}})[h_{i}(x)/i])). As (1) and (2) hold for any pperm[k]p\in\text{perm[k]} , now applying induction hypothesis to both (1) and (2), we can get (a) 𝒎𝒘𝒍k(t)(G,𝒗~)=𝒎𝒘𝒍k(t)(G,𝒗~)\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}), and (b) i[k]\forall i\in[k],  bijective hi:V(G)V(G)\exists\text{ bijective }h_{i}:V(G)\rightarrow V(G^{\prime}), xV(G)\forall x\in V(G), 𝒎𝒘𝒍k(t)(G,𝒗~[x/i])=𝒎𝒘𝒍k(t)(G,𝒗~[hi(x)/g(i)])\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}}[x/i])=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}[h_{i}(x)/g(i)]), where g:[k][k]g:[k]\rightarrow[k] is the index mapping function corresponding to ff. Now we rewrite (b) as g:[k][k]\exists g:[k]\rightarrow[k], i[k]\forall i\in[k], {{𝒎𝒘𝒍k(t)(G,𝒗~[x/i])|xV(G)}}={{𝒎𝒘𝒍k(t)(G,𝒗~[x/g(i)])|xV(G)}}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}}[x/i])\ |\ x\in V(G)\}\mskip-5.0mu\}=\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}[x/g(i)])\ |\ x\in V(G^{\prime})\}\mskip-5.0mu\}. Combining (a) and (b), using Eq.(3) we can get 𝒎𝒘𝒍k(t+1)(G,𝒗~)=𝒎𝒘𝒍k(t+1)(G,𝒗~)\bm{mwl}_{k}^{(t+1)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t+1)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}). Thus for any tt the above statement is correct. ∎

A.4.3 Proof of Theorem 2

CFI Graphs and Their Properties

Cai, Furer and Immerman [11] designed a construction of a series of pairs of graphs CFI(k) such that for any kk, kk-WL cannot distinguish the pair of graphs CFI(k)(k). We will use the CFI graph construction to prove the theorem. Here we use a variant of CFI construction that is proposed in Grohe et al.’s work [28]. Let 𝒦\mathcal{K} he a complete graph with kk nodes. We first construct an enlarged graph 𝒳(𝒦)\mathcal{X}(\mathcal{K}) from the base graph 𝒦\mathcal{K} by mapping each node and edge to a group of vertices and connecting all (|V(𝒦)|+|E(𝒦)|)(|V(\mathcal{K})|+|E(\mathcal{K})|) groups of vertices following certain rules. Notice that we use node for base graph and use vertex for the enlarged graph for a distinction. Let vwvw denotes the edge connecting node vv and node ww. For a node vV(𝒦)v\in V(\mathcal{K}), let E(v):={vw|vwE(𝒦)}E(v):=\big{\{}vw|vw\in E(\mathcal{K})\big{\}} denotes the set of all adjacent edges of vv in the graph 𝒦\mathcal{K}.

For every node vV(𝒦)v\in V(\mathcal{K}), we map node vv to a group of vertices Sv:={vX|XE(v)}S_{v}:=\{v^{X}|X\subseteq E(v)\} with size |Sv|=2deg(v)=2k1|S_{v}|=2^{deg(v)}=2^{k-1}. For every edge eE(𝒦)e\in E(\mathcal{K}), the construction maps ee to two vertices Se:={e0,e1}S_{e}:=\{e^{0},e^{1}\}. Hence there are 2|E(𝒦)|+|V(𝒦)|2k1=k(k1)+k(2k1)2*|E(\mathcal{K})|+|V(\mathcal{K})|*2^{k-1}=k(k-1)+k(2^{k-1}) number of vertices in the enlarged graph 𝒳(𝒦)\mathcal{X}(\mathcal{K}) with V(𝒳(𝒦))=(vV(𝒦)Sv)(eE(𝒦)Se)V(\mathcal{X}(\mathcal{K}))=(\cup_{v\in{V(\mathcal{K})}}S_{v})\cup(\cup_{e\in{E(\mathcal{K})}}S_{e}). Let V:=vV(𝒦)SvV^{*}:=\cup_{v\in{V(\mathcal{K})}}S_{v} and E:=eE(𝒦)SeE^{*}:=\cup_{e\in{E(\mathcal{K})}}S_{e}.

Now edges inside 𝒳(𝒦)\mathcal{X}(\mathcal{K}) are defined as follows

E(𝒳(𝒦)):=\displaystyle E(\mathcal{X}(\mathcal{K})):= {vXe1|vV(𝒦),XE(v),and eX}\displaystyle\{v^{X}e^{1}\ |\ v\in V(\mathcal{K}),X\subseteq E(v),\text{and }e\in X\}\ \cup (20)
{vXe0|vV(𝒦),XE(v),and eX}{e0e1|eE(𝒦)}\displaystyle\{v^{X}e^{0}\ |\ v\in V(\mathcal{K}),X\subseteq E(v),\text{and }e\not\in X\}\cup\{e^{0}e^{1}\ |\ e\in E(\mathcal{K})\} (21)

What’s more, we also color the vertices such that all vertices inside SvS_{v} have color vv for every vV(𝒦)v\in V(\mathcal{K}) and all vertices inside SeS_{e} have color ee for every eE(𝒦)e\in E(\mathcal{K}). See Figure.4 top right for the visualization of transforming from base graph 𝒦\mathcal{K} to 𝒳(𝒦)\mathcal{X}(\mathcal{K}) with k=3k=3.

Refer to caption
Figure 4: CFI(k)(k) construction visualization for k=3k=3

There are several important properties about the automorphisms of 𝒳(𝒦)\mathcal{X}(\mathcal{K}). Let hAut(𝒳(𝒦))h\in\text{Aut}(\mathcal{X}(\mathcal{K})) be an automorphism, then

  1. 1.

    h(Sv)=Svh(S_{v})=S_{v} and h(Se)=Seh(S_{e})=S_{e} for all vV(𝒦)v\in V(\mathcal{K}) and eE(𝒦)e\in E(\mathcal{K}).

  2. 2.

    For every subset FE(𝒦)F\subseteq E(\mathcal{K}), there is exactly one automorphism hFh_{F} that flips precisely all edges in F, i.e. hF(e0)=e1h_{F}(e^{0})=e^{1} and hF(e1)=e0h_{F}(e^{1})=e^{0} if and only if eFe\in F. More specifically,

    • hF(ei)=e1i,eFh_{F}(e^{i})=e^{1-i},\ \forall e\in F

    • hF(ei)=ei,eFh_{F}(e^{i})=e^{i},\ \forall e\not\in F

    • hF(vX)=vYh_{F}(v^{X})=v^{Y}, vE(𝒦),XE(v)\forall v\in{E(\mathcal{K})},X\subseteq E(v)
      where Y:=X(FE(v))=(X(FE(v)))((FE(v))X)Y:=X\triangle\big{(}F\cap E(v)\big{)}=\Big{(}X\setminus\big{(}F\cap E(v)\big{)}\Big{)}\cup\Big{(}\big{(}F\cap E(v)\big{)}\setminus X\Big{)}

Proof.

These properties are not hard to prove. First for property 1, it is true based on the coloring rules of vertices in 𝒳(𝒦)\mathcal{X}(\mathcal{K}). Now for the second property. As hFh_{F} flips precisely all edges in FF, we have hF(ei)=e1i,eFh_{F}(e^{i})=e^{1-i},\ \forall e\in F and hF(ei)=ei,eFh_{F}(e^{i})=e^{i},\ \forall e\not\in F. Now let hF(vX)=vYh_{F}(v^{X})=v^{Y} vE(𝒦),XE(v)\forall v\in{E(\mathcal{K})},X\subseteq E(v), we need to figure out YY’s formulation. Let’s focus on node vv without losing generality, and we can partition the set E(v)E(v) to two parts: E(v)FE(v)\cap F and E(v)FE(v)\setminus F. Then for any eE(v)Fe\in E(v)\cap F, we know that hF(ei)=e1ih_{F}(e^{i})=e^{1-i}. Let vwv\leftrightarrow w denote that vv and ww are connected in 𝒳(𝒦)\mathcal{X}(\mathcal{K}). Then XE(v)\forall X\subseteq E(v), eXe0vXhF(e0)hF(vX)e1vYeYe\in X\Longleftrightarrow e^{0}\leftrightarrow v^{X}\Longleftrightarrow h_{F}(e^{0})\leftrightarrow h_{F}(v^{X})\Longleftrightarrow e^{1}\leftrightarrow v^{Y}\Longleftrightarrow e\not\in Y. And similarly we have XE(v)\forall X\subseteq E(v), eXeYe\not\in X\Longleftrightarrow e\in Y. Hence eE(v)F,XE(v)\forall e\in E(v)\cap F,\forall X\subseteq E(v), eXYe\in X\triangle Y. This implies that E(v)FXYE(v)\cap F\subseteq X\triangle Y. Following the same logic we can also get eE(v)F,XE(v)\forall e\in E(v)\setminus F,\forall X\subseteq E(v), eE(v)(XY)e\in E(v)\setminus(X\triangle Y), which is equivalent to E(v)FE(v)(XY)E(v)\setminus F\subseteq E(v)\setminus(X\triangle Y), which further implies that E(v)FXYE(v)\cap F\supseteq X\triangle Y as XYE(v)X\triangle Y\subseteq E(v). Then combining both side we know that vE(𝒦),XE(v),E(v)F=XY\forall v\in E(\mathcal{K}),\forall X\subseteq E(v),E(v)\cap F=X\triangle Y. Hence we get Y=X(XY)=X(FE(v))Y=X\triangle(X\triangle Y)=X\triangle\big{(}F\cap E(v)\big{)}. ∎

In the proof we can also know another important property. That is v,XE(v),XhF(X)=E(v)F\forall v,X\subseteq E(v),X\triangle h_{F}(X)=E(v)\cap F is constant for any input XE(v)X\in E(v).

Now we are ready to construct variants of graphs that are not isomorphic from the enlarged graph 𝒳(𝒦)\mathcal{X}(\mathcal{K}). Now let T be a subset of V(𝒦)V(\mathcal{K}). Now we define an induced subgraph 𝒳T(𝒦)\mathcal{X}_{T}(\mathcal{K}) of the enlarged graph 𝒳(𝒦)\mathcal{X}(\mathcal{K}). Specifically, we define the new node group as follows

SvT:={{vXSv||X|0mod2} if vT{vXSv||X|1mod2} if vT\displaystyle S_{v}^{T}:=\begin{cases}\{v^{X}\in S_{v}\ |\ |X|\equiv 0\mod 2\}\text{ if }v\not\in T\\ \{v^{X}\in S_{v}\ |\ |X|\equiv 1\mod 2\}\text{ if }v\in T\end{cases} (22)

Then the induced subgraph is defined as 𝒳T(𝒦):=𝒳(𝒦)[(vV(𝒦)SvT)E]\mathcal{X}_{T}(\mathcal{K}):=\mathcal{X}(\mathcal{K})[(\cup_{v\in{V(\mathcal{K})}}S^{T}_{v})\cup E^{*}]. In Figure 4 we show 𝒳(𝒦)\mathcal{X}_{\emptyset}(\mathcal{K}) in bottom left (labeled with 𝒳0(𝒦)\mathcal{X}_{0}(\mathcal{K})) and 𝒳{v1}(𝒦)\mathcal{X}_{\{v_{1}\}}(\mathcal{K}) in the bottom right (labeled with 𝒳1(𝒦)\mathcal{X}_{1}(\mathcal{K})).

Lemma 1.

For all T,UV(𝒦)T,U\subseteq V(\mathcal{K}), 𝒳T(𝒦)𝒳U(𝒦)\mathcal{X}_{T}(\mathcal{K})\cong\mathcal{X}_{U}(\mathcal{K}) if and only if |T||U|mod2|T|\equiv|U|\mod 2. And if they are isomorphic, one isomorphism between 𝒳T(𝒦)\mathcal{X}_{T}(\mathcal{K}) and 𝒳U(𝒦)\mathcal{X}_{U}(\mathcal{K}) is hFh_{F} with F=E(𝒦[TU])F=E(\mathcal{K}[T\triangle U]), where E(𝒦[TU])E(\mathcal{K}[T\triangle U]) denotes the set of all edges {vi,vj}TU\{v_{i},v_{j}\}\subseteq T\triangle U.

Notice that hFh_{F} is an automorphism for 𝒳(𝒦)\mathcal{X}(\mathcal{K}), but with restricting its domain to (vV(𝒦)SvT)E(\cup_{v\in{V(\mathcal{K})}}S^{T}_{v})\cup E^{*} it becomes the isomorphism between 𝒳T(𝒦)\mathcal{X}_{T}(\mathcal{K}) and 𝒳U(𝒦)\mathcal{X}_{U}(\mathcal{K}).

The proof of the lemma can be found in [11] and [28]. With this lemma we know that 𝒳{v1}(𝒦)\mathcal{X}_{\{v_{1}\}}(\mathcal{K}) and 𝒳(𝒦)\mathcal{X}_{\emptyset}(\mathcal{K}) are not isomorphic. In next part we show that 𝒳(𝒦)\mathcal{X}_{\emptyset}(\mathcal{K}) and 𝒳{v1}(𝒦)\mathcal{X}_{\{v_{1}\}}(\mathcal{K}) cannot be distinguished by (k1)(k-1)-WL but can be distinguished by kk-MultisetWL, thus proving Theorem 2.

Main Proof

Theorem 2.

kk-MultisetWL is no less powerful than (kk-11)-WL in distinguishing graphs: for k3k\geq 3 there is a pair of graphs that can be distinguished by kk-MultisetWL but not by (kk-11)-WL.

Proof.

To prove the theorem, we show that for any k3k\geq 3, 𝒳{v1}(𝒦)\mathcal{X}_{\{v_{1}\}}(\mathcal{K}) and 𝒳(𝒦)\mathcal{X}_{\emptyset}(\mathcal{K}), defined previously, are two nonisomorphic graphs that can be distinguished by kk-MultisetWL but not by (k1)(k-1)-WL. It’s well known that these two graphs cannot be distinguished by (k1)(k-1)-WL, and one can refer to Theorem 5.17 in [28] for its proof. Now to prove these two graphs can be distinguished by kk-MultisetWL, using Theorem 11 we know it’s equivalent to show that Player I has a winning strategy for the doubly bijective kk-pebble game DBPk(𝒳(𝒦),𝒳{v1}(𝒦))\text{DBP}_{k}(\mathcal{X}_{\emptyset}(\mathcal{K}),\mathcal{X}_{\{v_{1}\}}(\mathcal{K})).

At the start of the pebble game DBPk(𝒳(𝒦),𝒳{v1}(𝒦))\text{DBP}_{k}(\mathcal{X}_{\emptyset}(\mathcal{K}),\mathcal{X}_{\{v_{1}\}}(\mathcal{K})), Player II is asked to provide a bijection between all kk-multisets of 𝒳(𝒦)\mathcal{X}_{\emptyset}(\mathcal{K}) to all kk-multisets of 𝒳{v1}(𝒦)\mathcal{X}_{\{v_{1}\}}(\mathcal{K}). For any bijection the Player II chosen, the Player I can pickup 𝒗~0={{v1,v2,,vk}}\tilde{\boldsymbol{v}}_{0}=\{\mskip-5.0mu\{v_{1}^{\emptyset},v_{2}^{\emptyset},...,v_{k}^{\emptyset}\}\mskip-5.0mu\}, with corresponding position 𝒗~0={{v1X1,v2X2,,vkXk}}\tilde{\boldsymbol{v}}^{\prime}_{0}=\{\mskip-5.0mu\{v_{1}^{X_{1}},v_{2}^{X_{2}},...,v_{k}^{X_{k}}\}\mskip-5.0mu\}. Notice that for a position (𝒙~,𝒚~):=({{x1,,xk}},{{y1,,yk}})(\tilde{\boldsymbol{x}},\tilde{\boldsymbol{y}}):=(\{\mskip-5.0mu\{x_{1},...,x_{k}\}\mskip-5.0mu\},\{\mskip-5.0mu\{y_{1},...,y_{k}\}\mskip-5.0mu\}), the position is called consistent if there exists a FE(𝒦)F\subseteq E(\mathcal{K}), such that hF(𝒙~)={{hF(x1),,hF(xk)}}=𝒚~h_{F}(\tilde{\boldsymbol{x}})=\{\mskip-5.0mu\{h_{F}(x_{1}),...,h_{F}(x_{k})\}\mskip-5.0mu\}=\tilde{\boldsymbol{y}}. One can show that Player I can easily win the game DBPk(𝒳(𝒦),𝒳{v1}(𝒦))\text{DBP}_{k}(\mathcal{X}_{\emptyset}(\mathcal{K}),\mathcal{X}_{\{v_{1}\}}(\mathcal{K})) after one additional step if the current position (𝒙~,𝒚~)(\tilde{\boldsymbol{x}},\tilde{\boldsymbol{y}}) is not consistent, even if 𝒳(𝒦)[𝒙~]\mathcal{X}_{\emptyset}(\mathcal{K})[\tilde{\boldsymbol{x}}] and 𝒳{v1}(𝒦)[𝒚~]\mathcal{X}_{\{v_{1}\}}(\mathcal{K})[\tilde{\boldsymbol{y}}] are isomorphic [28].

We claim that for any kk-multiset 𝒗~0={{v1X1,v2X2,,vkXk}}\tilde{\boldsymbol{v}}^{\prime}_{0}=\{\mskip-5.0mu\{v_{1}^{X_{1}},v_{2}^{X_{2}},...,v_{k}^{X_{k}}\}\mskip-5.0mu\} the position (𝒗~0,𝒗~0)(\tilde{\boldsymbol{v}}_{0},\tilde{\boldsymbol{v}}^{\prime}_{0}) cannot be consistent, and thus Player II loses DBPk(𝒳(𝒦),𝒳{v1}(𝒦))\text{DBP}_{k}(\mathcal{X}_{\emptyset}(\mathcal{K}),\mathcal{X}_{\{v_{1}\}}(\mathcal{K})) after the first round. Assume that there exists a subset FE(𝒦)F\subseteq E(\mathcal{K}), such that hF(𝒗~0)=𝒗~0h_{F}(\tilde{\boldsymbol{v}}_{0})=\tilde{\boldsymbol{v}}^{\prime}_{0}. By property 1 of automorphisms of 𝒳(𝒦)\mathcal{X}(\mathcal{K}) we have that hF(Sv1)=Sv1h_{F}(S_{v_{1}})=S_{v_{1}}, and since 𝒗~0\tilde{\boldsymbol{v}}_{0} contains exactly one vertex of each color, it also holds that hF(v1)=v1X1h_{F}(v_{1}^{\emptyset})=v_{1}^{X_{1}}. It follows from property 2 that X1=(FE(v1))=FE(v1)X_{1}=\emptyset\triangle\big{(}F\cap E(v_{1})\big{)}=F\cap E(v_{1}). Based on the definition of 𝒳{v1}(𝒦)\mathcal{X}_{\{v_{1}\}}(\mathcal{K}), we know that |X1|1mod2|X_{1}|\equiv 1\mod 2, and thus FF flips an odd number of neighbors of v1v_{1}, i.e. |FE(v1)|1mod2|F\cap E(v_{1})|\equiv 1\mod 2. Similarly, we have that hF(Svi)=Svih_{F}(S_{v_{i}})=S_{v_{i}} and |Xi|0mod2,i2|X_{i}|\equiv 0\mod 2,\forall i\geq 2, hence |FE(vi)|0mod2|F\cap E(v_{i})|\equiv 0\mod 2. It follows from a simple handshake argument that there exists an m0m\geq 0 such that

2m=vV|(FE)(v)|=i[k]{1}|FE(vi)|+|FE(v1)|1mod2,2m=\sum_{v\in V}|(F\cap E)(v)|=\sum_{i\in[k]\setminus\{1\}}|F\cap E(v_{i})|+|F\cap E(v_{1})|\equiv 1\mod 2,

which is a contradiction. Hence Player I has a winning strategy for DBPk(𝒳(𝒦),𝒳{v1}(𝒦))\text{DBP}_{k}(\mathcal{X}_{\emptyset}(\mathcal{K}),\mathcal{X}_{\{v_{1}\}}(\mathcal{K})).

A.4.4 Proof of Theorem 3

See Appendix.A.3.

A.4.5 Proof of Theorem 4

Theorem 4.

Let k1k\geq 1, then t\forall t\in\mathbb{N} and all graphs G,GG,G^{\prime}: 𝐦𝐰𝐥k(t)(G,𝐯^)=𝐦𝐰𝐥k(t)(G,𝐯^)𝐬𝐰𝐥k(t)(G,𝐯^)=𝐬𝐰𝐥k(t)(G,𝐯^)\bm{mwl}_{k}^{(t)}(G,\hat{\boldsymbol{v}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime})\Longrightarrow\bm{swl}_{k}^{(t)}(G,\hat{\boldsymbol{v}})=\bm{swl}_{k}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}).

Proof.

Notation: Let s()s(\cdot) transform a multiset to set by removing repeats, and let r()r(\cdot) return a tuple with the number of repeats for each distinct element in a multiset. Let ff be the inverse mapping such that 𝒗~=f(s(𝒗~),r(𝒗~))\tilde{\boldsymbol{v}}=f(s(\tilde{\boldsymbol{v}}),r(\tilde{\boldsymbol{v}})).

Define F(t+1)(G,G,𝒗~,𝒗~):={ injective f:𝒗~𝒗~|fF(t)(G,G,𝒗~,𝒗~), AND ,y𝒗~,hy:V(G)V(G),x,fF(t)(G,G,𝒗~[x/idx𝒗~(y)],𝒗~[hy(x)/idx𝒗~(f(y))])}F^{(t+1)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}):=\{\text{ injective }f:\tilde{\boldsymbol{v}}\rightarrow\tilde{\boldsymbol{v}}^{\prime}\ |\ f\in F^{(t)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}),\text{ AND },\forall y\in\tilde{\boldsymbol{v}},\exists h_{y}:V(G)\rightarrow V(G^{\prime}),\forall x,f\in F^{(t)}(G,G^{\prime},\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)],\tilde{\boldsymbol{v}}^{\prime}[h_{y}(x)/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(f(y))])\}. Let F(0)(G,G,𝒗~,𝒗~):={f|f is an isomorphism from G[𝒗~] to G[𝒗~]}F^{(0)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}):=\{f\ |\ f\text{ is an isomorphism from }G[\tilde{\boldsymbol{v}}]\text{ to }G^{\prime}[\tilde{\boldsymbol{v}}^{\prime}]\}.

Lemma 2.

t\forall t, 𝐦𝐰𝐥k(t)(G,𝐯~)=𝐦𝐰𝐥k(t)(G,𝐯~)\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}) \Longleftrightarrow hF(t)(G,G,𝐯~,𝐯~)\forall h\in F^{(t)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}), 𝐧^ with (i=1m𝐧^i=k,i𝐧^i1)\forall\hat{\boldsymbol{n}}\text{ with }(\sum_{i=1}^{m}\hat{\boldsymbol{n}}_{i}=k,\forall i\ \hat{\boldsymbol{n}}_{i}\geq 1), 𝐦𝐰𝐥k(t)(G,f(s(𝐯~),𝐧^))=𝐦𝐰𝐥k(t)(G,f(h(s(𝐯~)),𝐧^))\bm{mwl}_{k}^{(t)}(G,f(s(\tilde{\boldsymbol{v}}),\hat{\boldsymbol{n}}))=\bm{mwl}_{k}^{(t)}(G^{\prime},f(h(s(\tilde{\boldsymbol{v}})),\hat{\boldsymbol{n}})).

When t=0t=0, by the definition of 𝒔𝒘𝒍k(0)\bm{swl}_{k}^{(0)}, the statement is true. Now hypothesize that the statement is true for t\leq t case. For t+1t+1 case, the left side implies that existing a 𝒗~\tilde{\boldsymbol{v}} with s(𝒗~)=𝒗^s(\tilde{\boldsymbol{v}})=\hat{\boldsymbol{v}} and a 𝒗~\tilde{\boldsymbol{v}}^{\prime} with s(𝒗~)=𝒗^s(\tilde{\boldsymbol{v}}^{\prime})=\hat{\boldsymbol{v}}^{\prime}, such that 𝒎𝒘𝒍k(t+1)(G,𝒗~)=𝒎𝒘𝒍k(t+1)(G,𝒗~)\bm{mwl}_{k}^{(t+1)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t+1)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}). And for a mapping hF(t+1)(G,G,𝒗~,𝒗~)h\in F^{(t+1)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}), we have 𝒏^ with (i=1m𝒏^i=k,i𝒏^i1)\forall\hat{\boldsymbol{n}}\text{ with }(\sum_{i=1}^{m}\hat{\boldsymbol{n}}_{i}=k,\forall i\ \hat{\boldsymbol{n}}_{i}\geq 1), 𝒎𝒘𝒍k(t+1)(G,f(s(𝒗~),𝒏^))=𝒎𝒘𝒍k(t+1)(G,f(h(s(𝒗~)),𝒏^))\bm{mwl}_{k}^{(t+1)}(G,f(s(\tilde{\boldsymbol{v}}),\hat{\boldsymbol{n}}))=\bm{mwl}_{k}^{(t+1)}(G^{\prime},f(h(s(\tilde{\boldsymbol{v}})),\hat{\boldsymbol{n}})).

For a specific 𝒏^\hat{\boldsymbol{n}}, define 𝒖~=f(s(𝒗~),𝒏^)\tilde{\boldsymbol{u}}=f(s(\tilde{\boldsymbol{v}}),\hat{\boldsymbol{n}}) and 𝒖~=f(h(s(𝒗~)),𝒏^)=h(𝒖~)\tilde{\boldsymbol{u}}^{\prime}=f(h(s(\tilde{\boldsymbol{v}})),\hat{\boldsymbol{n}})=h(\tilde{\boldsymbol{u}}). By Eq.(3):
(1) 𝒎𝒘𝒍k(t)(G,𝒖~)=𝒎𝒘𝒍k(t)(G,𝒖~)\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{u}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{u}}^{\prime});
(2) fF(t+1)(G,G,𝒖~,𝒖~)\forall f\in{F^{(t+1)}(G,G^{\prime},\tilde{\boldsymbol{u}},\tilde{\boldsymbol{u}}^{\prime})}, y𝒖~\forall y\in\tilde{\boldsymbol{u}}, {{𝒎𝒘𝒍k(t)(G,𝒖~[x/idx𝒗~(y)])|xV(G)}}={{𝒎𝒘𝒍k(t)(G,𝒖~[x/idx𝒗~(f(y))])|xV(G)}}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{u}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])\big{|}x\in V(G)\}\mskip-5.0mu\}=\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{u}}^{\prime}[x/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(f(y))])\big{|}x\in V(G^{\prime})\}\mskip-5.0mu\}.

As hF(t+1)(G,G,𝒗~,𝒗~)=F(t+1)(G,G,𝒖~,𝒖~)h\in{F^{(t+1)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime})}={F^{(t+1)}(G,G^{\prime},\tilde{\boldsymbol{u}},\tilde{\boldsymbol{u}}^{\prime})}, we can change (2) by choosing ff as hh. Hence we update it as:
(2) y𝒖~\forall y\in\tilde{\boldsymbol{u}}, {{𝒎𝒘𝒍k(t)(G,𝒖~[x/idx𝒗~(y)])|xV(G)}}={{𝒎𝒘𝒍k(t)(G,h(𝒖~)[x/idx𝒗~(h(y))])|xV(G)}}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{u}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])\big{|}x\in V(G)\}\mskip-5.0mu\}=\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G^{\prime},h(\tilde{\boldsymbol{u}})[x/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))])\big{|}x\in V(G^{\prime})\}\mskip-5.0mu\}
And (2) can be split into two parts:
(2) y𝒖~\forall y\in\tilde{\boldsymbol{u}}, {{𝒎𝒘𝒍k(t)(G,𝒖~[x/idx𝒗~(y)])|xs(𝒖~)}}={{𝒎𝒘𝒍k(t)(G,h(𝒖~)[x/idx𝒗~(h(y))])|xs(h(𝒖~))}}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{u}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])\big{|}x\in s(\tilde{\boldsymbol{u}})\}\mskip-5.0mu\}=\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G^{\prime},h(\tilde{\boldsymbol{u}})[x/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))])\big{|}x\in s(h(\tilde{\boldsymbol{u}}))\}\mskip-5.0mu\} and
{{𝒎𝒘𝒍k(t)(G,𝒖~[x/idx𝒗~(y)])|xV(G)s(𝒖~)}}={{𝒎𝒘𝒍k(t)(G,h(𝒖~)[x/idx𝒗~(h(y))])|xV(G)s(h(𝒖~))}}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{u}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])\big{|}x\in V(G)\setminus s(\tilde{\boldsymbol{u}})\}\mskip-5.0mu\}=\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G^{\prime},h(\tilde{\boldsymbol{u}})[x/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(h(y))])\big{|}x\in V(G^{\prime})\setminus s(h(\tilde{\boldsymbol{u}}))\}\mskip-5.0mu\}

Combining the Lemma.2 and induction hypothesis, also knowing that s(𝒖~)=𝒗^s(\tilde{\boldsymbol{u}})=\hat{\boldsymbol{v}} and s(𝒖~)=𝒗^s(\tilde{\boldsymbol{u}}^{\prime})=\hat{\boldsymbol{v}}^{\prime}, we get:
(a) 𝒔𝒘𝒍k(t)(G,𝒗^)=𝒔𝒘𝒍k(t)(G,𝒗^)\bm{swl}_{k}^{(t)}(G,\hat{\boldsymbol{v}})=\bm{swl}_{k}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime});
(b) y𝒗^\forall y\in\hat{\boldsymbol{v}}, {{𝒔𝒘𝒍k(t)(G,𝒗^y)|x𝒗^}}={{𝒔𝒘𝒍k(t)(G,𝒗^h(y))|x𝒗^}}\{\mskip-5.0mu\{\bm{swl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}\setminus y)\big{|}x\in\hat{\boldsymbol{v}}\}\mskip-5.0mu\}=\{\mskip-5.0mu\{\bm{swl}_{k}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}\setminus h(y))\big{|}x\in\hat{\boldsymbol{v}}^{\prime}\}\mskip-5.0mu\} , this is derived with picking 𝒏^\hat{\boldsymbol{n}} with 𝒏^[y]=1\hat{\boldsymbol{n}}[y]=1, such that 𝒖~\tilde{\boldsymbol{u}} only has 1 yy and 𝒖~\tilde{\boldsymbol{u}}^{\prime} only has 1 h(y)h(y).
(c) y𝒗^\forall y\in\hat{\boldsymbol{v}}, {{𝒔𝒘𝒍k(t)(G,𝒗^{x})|xV(G)𝒗^}}={{𝒔𝒘𝒍k(t)(G,𝒗^{x})|xV(G)𝒗^}}\{\mskip-5.0mu\{\bm{swl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}\cup\{x\})\big{|}x\in V(G)\setminus\hat{\boldsymbol{v}}\}\mskip-5.0mu\}=\{\mskip-5.0mu\{\bm{swl}_{k}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}\cup\{x\})\big{|}x\in V(G^{\prime})\setminus\hat{\boldsymbol{v}}^{\prime}\}\mskip-5.0mu\}, this is derived with picking 𝒏^\hat{\boldsymbol{n}} with 𝒏^[y]>1\hat{\boldsymbol{n}}[y]>1.
(d) y𝒗^\forall y\in\hat{\boldsymbol{v}}, {{𝒔𝒘𝒍k(t)(G,𝒗^y{x})|xV(G)𝒗^}}={{𝒔𝒘𝒍k(t)(G,𝒗^h(y){x})|xV(G)𝒗^}}\{\mskip-5.0mu\{\bm{swl}_{k}^{(t)}(G,\hat{\boldsymbol{v}}\setminus y\cup\{x\})\big{|}x\in V(G)\setminus\hat{\boldsymbol{v}}\}\mskip-5.0mu\}=\{\mskip-5.0mu\{\bm{swl}_{k}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}\setminus h(y)\cup\{x\})\big{|}x\in V(G^{\prime})\setminus\hat{\boldsymbol{v}}^{\prime}\}\mskip-5.0mu\} , this is derived with picking 𝒏^\hat{\boldsymbol{n}} with 𝒏^[y]=1\hat{\boldsymbol{n}}[y]=1, such that 𝒖~\tilde{\boldsymbol{u}} only has 1 yy and 𝒖~\tilde{\boldsymbol{u}}^{\prime} only has 1 h(y)h(y).

Now combining (a) (b) (c) (d) and using Eq.(5), we can get that 𝒔𝒘𝒍k(t+1)(G,𝒗^)=𝒔𝒘𝒍k(t+1)(G,𝒗^)\bm{swl}_{k}^{(t+1)}(G,\hat{\boldsymbol{v}})=\bm{swl}_{k}^{(t+1)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}), and we proved the statement.

A.4.6 Proof of Theorem 5

Theorem 5.

Let k1k\geq 1, then t\forall t\in\mathbb{N} and all graphs G,GG,G^{\prime}:

  • (1)

    when 1c1<c2k1\leq c_{1}<c_{2}\leq k, if G,GG,G^{\prime} cannot be distinguished by (k,c2)()(k,c_{2})(\leq)-SetWL, they cannot be distinguished by (k,c1)()(k,c_{1})(\leq)-SetWL

  • (2)

    when k1<k2k_{1}<k_{2}, ck1\forall c\leq k_{1}, if G,GG,G^{\prime} cannot be distinguished by (k2,c)()(k_{2},c)(\leq)-SetWL, they cannot be distinguished by (k1,c)()(k_{1},c)(\leq)-SetWL

Proof.

We will prove (2) first, and the proof for (1) follows the same argument. To help understand, we present the formulation for (k,c)()(k,c)(\leq)-SetWL first.

𝒔𝒘𝒍k,c(t+12)(G,𝒗^)={{𝒔𝒘𝒍k,c(t)(G,𝒖^)|𝒖^𝒩k,c,rightG(𝒗^)}}\displaystyle\bm{swl}_{k,c}^{(t+\frac{1}{2})}(G,\hat{\boldsymbol{v}})=\{\mskip-5.0mu\{\bm{swl}_{k,c}^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{k,c,\text{right}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} (23)
𝒔𝒘𝒍k,c(t+1)(G,𝒗^)=(𝒔𝒘𝒍k,c(t)(G,𝒗^),𝒔𝒘𝒍k,c(t+12)(G,𝒗^),{{𝒔𝒘𝒍k,c(t)(G,𝒖^)|𝒖^𝒩k,c,leftG(𝒗^)}},\displaystyle\bm{swl}_{k,c}^{(t+1)}(G,\hat{\boldsymbol{v}})=\bigg{(}\bm{swl}_{k,c}^{(t)}(G,\hat{\boldsymbol{v}}),\bm{swl}_{k,c}^{(t+\frac{1}{2})}(G,\hat{\boldsymbol{v}}),\{\mskip-5.0mu\{\bm{swl}_{k,c}^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{k,c,\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\}, (24)
{{𝒔𝒘𝒍k,c(t+12)(G,𝒖^)|𝒖^𝒩k,c,leftG(𝒗^)}})\displaystyle\{\mskip-5.0mu\{\bm{swl}_{k,c}^{(t+\frac{1}{2})}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{k,c,\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\}\bigg{)} (25)
Lemma 3.

For k1<k2k_{1}<k_{2}, for any tt, for any 𝐯^\hat{\boldsymbol{v}} and 𝐯^\hat{\boldsymbol{v}}^{\prime} with k1\leq k_{1} nodes inside, 𝐬𝐰𝐥k2,c(t)(G,𝐯^)=𝐬𝐰𝐥k2,c(t)(G,𝐯^)\bm{swl}_{k_{2},c}^{(t)}(G,\hat{\boldsymbol{v}})=\bm{swl}_{k_{2},c}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}) \Longrightarrow 𝐬𝐰𝐥k1,c(t)(G,𝐯^)=𝐬𝐰𝐥k1,c(t)(G,𝐯^)\bm{swl}_{k_{1},c}^{(t)}(G,\hat{\boldsymbol{v}})=\bm{swl}_{k_{1},c}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}).

Proof of Lemma.3:

Proof.

We prove it by induction. As the color initialization stage of (k1,c)()(k_{1},c)(\leq)-SetWL and (k2,c)()(k_{2},c)(\leq)-SetWL are the same, when t=0t=0 the statement of the Lemma is correct. Assume it holds correct for t\leq t. When t+1t+1, 𝒔𝒘𝒍k2,c(t+1)(G,𝒗^)=𝒔𝒘𝒍k2,c(t+1)(G,𝒗^)\bm{swl}_{k_{2},c}^{(t+1)}(G,\hat{\boldsymbol{v}})=\bm{swl}_{k_{2},c}^{(t+1)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}) implies:

(1) 𝒔𝒘𝒍k2,c(t)(G,𝒗^)\bm{swl}_{k_{2},c}^{(t)}(G,\hat{\boldsymbol{v}}) = 𝒔𝒘𝒍k2,c(t)(G,𝒗^)\bm{swl}_{k_{2},c}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime})
(2) 𝒔𝒘𝒍k2,c(t+1/2)(G,𝒗^)\bm{swl}_{k_{2},c}^{(t+1/2)}(G,\hat{\boldsymbol{v}}) = 𝒔𝒘𝒍k2,c(t+1/2)(G,𝒗^)\bm{swl}_{k_{2},c}^{(t+1/2)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime})
(3) {{𝒔𝒘𝒍k2,c(t)(G,𝒖^)|𝒖^𝒩k2,c,leftG(𝒗^)}}\{\mskip-5.0mu\{\bm{swl}_{k_{2},c}^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{k_{2},c,\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} = {{𝒔𝒘𝒍k2,c(t)(G,𝒖^)|𝒖^𝒩k2,c,leftG(𝒗^)}}\{\mskip-5.0mu\{\bm{swl}_{k_{2},c}^{(t)}(G^{\prime},\hat{\boldsymbol{u}}^{\prime})\ |\ \hat{\boldsymbol{u}}^{\prime}\in\mathcal{N}^{G^{\prime}}_{k_{2},c,\text{left}}(\hat{\boldsymbol{v}}^{\prime})\}\mskip-5.0mu\}
(4) {{𝒔𝒘𝒍k2,c(t+1/2)(G,𝒖^)|𝒖^𝒩k2,c,leftG(𝒗^)}}\{\mskip-5.0mu\{\bm{swl}_{k_{2},c}^{(t+1/2)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{k_{2},c,\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} = {{𝒔𝒘𝒍k2,c(t+1/2)(G,𝒖^)|𝒖^𝒩k2,c,leftG(𝒗^)}}\{\mskip-5.0mu\{\bm{swl}_{k_{2},c}^{(t+1/2)}(G^{\prime},\hat{\boldsymbol{u}}^{\prime})\ |\ \hat{\boldsymbol{u}}^{\prime}\in\mathcal{N}^{G^{\prime}}_{k_{2},c,\text{left}}(\hat{\boldsymbol{v}}^{\prime})\}\mskip-5.0mu\}

(1) + induction hypothesis \Longrightarrow (a) 𝒔𝒘𝒍k1,c(t)(G,𝒗^)\bm{swl}_{k_{1},c}^{(t)}(G,\hat{\boldsymbol{v}}) = 𝒔𝒘𝒍k1,c(t)(G,𝒗^)\bm{swl}_{k_{1},c}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime})
(2) has two situations: |𝒗^|=|𝒗^|<k1|\hat{\boldsymbol{v}}|=|\hat{\boldsymbol{v}}^{\prime}|<k_{1} and |𝒗^|=|𝒗^|=k1|\hat{\boldsymbol{v}}|=|\hat{\boldsymbol{v}}^{\prime}|=k_{1}. For the first situation, 𝒩k2,c,rightG(𝒗^)=𝒩k1,c,rightG(𝒗^)\mathcal{N}^{G}_{k_{2},c,\text{right}}(\hat{\boldsymbol{v}})=\mathcal{N}^{G}_{k_{1},c,\text{right}}(\hat{\boldsymbol{v}}) and 𝒩k2,c,rightG(𝒗^)=𝒩k1,c,rightG(𝒗^)\mathcal{N}^{G^{\prime}}_{k_{2},c,\text{right}}(\hat{\boldsymbol{v}}^{\prime})=\mathcal{N}^{G^{\prime}}_{k_{1},c,\text{right}}(\hat{\boldsymbol{v}}^{\prime}), \exists a bijective mapping bb between 𝒩k2,c,rightG(𝒗^)\mathcal{N}^{G^{\prime}}_{k_{2},c,\text{right}}(\hat{\boldsymbol{v}}^{\prime}) and 𝒩k2,c,rightG(𝒗^)\mathcal{N}^{G}_{k_{2},c,\text{right}}(\hat{\boldsymbol{v}}) such that 𝒖^𝒩k2,c,rightG(𝒗^)\forall\hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{k_{2},c,\text{right}}(\hat{\boldsymbol{v}}), 𝒔𝒘𝒍k2,c(t)(G,𝒖^)\bm{swl}_{k_{2},c}^{(t)}(G,\hat{\boldsymbol{u}}) = 𝒔𝒘𝒍k2,c(t)(G,b(𝒖^))\bm{swl}_{k_{2},c}^{(t)}(G^{\prime},b(\hat{\boldsymbol{u}})) and by induction 𝒔𝒘𝒍k1,c(t)(G,𝒖^)\bm{swl}_{k_{1},c}^{(t)}(G,\hat{\boldsymbol{u}}) = 𝒔𝒘𝒍k1,c(t)(G,b(𝒖^))\bm{swl}_{k_{1},c}^{(t)}(G^{\prime},b(\hat{\boldsymbol{u}})), hence 𝒔𝒘𝒍k1,c(t+1/2)(G,𝒗^)\bm{swl}_{k_{1},c}^{(t+1/2)}(G,\hat{\boldsymbol{v}}) = 𝒔𝒘𝒍k1,c(t+1/2)(G,𝒗^)\bm{swl}_{k_{1},c}^{(t+1/2)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}). For the second situation, 𝒩k2,c,rightG(𝒗^)\mathcal{N}^{G^{\prime}}_{k_{2},c,\text{right}}(\hat{\boldsymbol{v}}^{\prime}) elements while 𝒩k1,c,rightG(𝒗^)=\mathcal{N}^{G^{\prime}}_{k_{1},c,\text{right}}(\hat{\boldsymbol{v}}^{\prime})=\emptyset, then clearly (b) 𝒔𝒘𝒍k1,c(t+1/2)(G,𝒗^)\bm{swl}_{k_{1},c}^{(t+1/2)}(G,\hat{\boldsymbol{v}}) = 𝒔𝒘𝒍k1,c(t+1/2)(G,𝒗^)\bm{swl}_{k_{1},c}^{(t+1/2)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}) (both are empty multisets).
(3) follows the same argument step in (2)’s first situation, with induction hypothesis we can get (c) {{𝒔𝒘𝒍k1,c(t)(G,𝒖^)|𝒖^𝒩k1,c,leftG(𝒗^)}}\{\mskip-5.0mu\{\bm{swl}_{k_{1},c}^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{k_{1},c,\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} = {{𝒔𝒘𝒍k1,c(t)(G,𝒖^)|𝒖^𝒩k1,c,leftG(𝒗^)}}\{\mskip-5.0mu\{\bm{swl}_{k_{1},c}^{(t)}(G^{\prime},\hat{\boldsymbol{u}}^{\prime})\ |\ \hat{\boldsymbol{u}}^{\prime}\in\mathcal{N}^{G^{\prime}}_{k_{1},c,\text{left}}(\hat{\boldsymbol{v}}^{\prime})\}\mskip-5.0mu\}
(4) implies that \exists a mapping bb between 𝒩k2,c,leftG(𝒗^)\mathcal{N}^{G^{\prime}}_{k_{2},c,\text{left}}(\hat{\boldsymbol{v}}^{\prime}) and 𝒩k2,c,leftG(𝒗^)\mathcal{N}^{G}_{k_{2},c,\text{left}}(\hat{\boldsymbol{v}}) such that for any 𝒖^𝒩k2,c,leftG(𝒗^)\hat{\boldsymbol{u}}\in\mathcal{N}^{G^{\prime}}_{k_{2},c,\text{left}}(\hat{\boldsymbol{v}}^{\prime}), 𝒔𝒘𝒍k2,c(t+1/2)(G,𝒖^)\bm{swl}_{k_{2},c}^{(t+1/2)}(G,\hat{\boldsymbol{u}}) = 𝒔𝒘𝒍k2,c(t+1/2)(G,b(𝒖^))\bm{swl}_{k_{2},c}^{(t+1/2)}(G^{\prime},b(\hat{\boldsymbol{u}}^{\prime})). Using the same argument in (2) and induction hypothesis, this implies that for any 𝒖^𝒩k1,c,leftG(𝒗^)\hat{\boldsymbol{u}}\in\mathcal{N}^{G^{\prime}}_{k_{1},c,\text{left}}(\hat{\boldsymbol{v}}^{\prime}), 𝒔𝒘𝒍k1,c(t+1/2)(G,𝒖^)\bm{swl}_{k_{1},c}^{(t+1/2)}(G,\hat{\boldsymbol{u}}) = 𝒔𝒘𝒍k1,c(t+1/2)(G,b(𝒖^))\bm{swl}_{k_{1},c}^{(t+1/2)}(G^{\prime},b(\hat{\boldsymbol{u}}^{\prime})). Hence we get (d) {{𝒔𝒘𝒍k1,c(t+1/2)(G,𝒖^)|𝒖^𝒩k1,c,leftG(𝒗^)}}\{\mskip-5.0mu\{\bm{swl}_{k_{1},c}^{(t+1/2)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{k_{1},c,\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} = {{𝒔𝒘𝒍k1,c(t+1/2)(G,𝒖^)|𝒖^𝒩k1,c,leftG(𝒗^)}}\{\mskip-5.0mu\{\bm{swl}_{k_{1},c}^{(t+1/2)}(G^{\prime},\hat{\boldsymbol{u}}^{\prime})\ |\ \hat{\boldsymbol{u}}^{\prime}\in\mathcal{N}^{G^{\prime}}_{k_{1},c,\text{left}}(\hat{\boldsymbol{v}}^{\prime})\}\mskip-5.0mu\}.

Combining (a) (b) (c) (d) we get 𝒔𝒘𝒍k1,c(t+1)(G,𝒗^)=𝒔𝒘𝒍k1,c(t+1)(G,𝒗^)\bm{swl}_{k_{1},c}^{(t+1)}(G,\hat{\boldsymbol{v}})=\bm{swl}_{k_{1},c}^{(t+1)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}). ∎

With Lemma.3 we are ready to prove (2) in Theorem 3. When two graphs GG and GG^{\prime} cannot be distinguished by (k2,c)()(k_{2},c)(\leq)-SetWL, we have {{𝒔𝒘𝒍k2,c(t)(G,𝒗^)|𝒗^V(Sk2,c-swl(G))}}\{\mskip-5.0mu\{\bm{swl}_{k_{2},c}^{(t)}(G,\hat{\boldsymbol{v}})\ |\ \hat{\boldsymbol{v}}\in V(S_{k_{2},c\text{-swl}}(G))\}\mskip-5.0mu\} = {{𝒔𝒘𝒍k2,c(t)(G,𝒗^)|𝒗^V(Sk2,c-swl(G))}}\{\mskip-5.0mu\{\bm{swl}_{k_{2},c}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime})\ |\ \hat{\boldsymbol{v}}^{\prime}\in V(S_{k_{2},c\text{-swl}}(G^{\prime}))\}\mskip-5.0mu\}. When 𝒗^\hat{\boldsymbol{v}} and 𝒖^\hat{\boldsymbol{u}} have different kk (number of nodes) and cc (number of components of its induced subgraph), their color cannot be the same (as t=0t=0 already be different). Hence {{𝒔𝒘𝒍k2,c(t)(G,𝒗^)|𝒗^V(Sk2,c-swl(G))}}\{\mskip-5.0mu\{\bm{swl}_{k_{2},c}^{(t)}(G,\hat{\boldsymbol{v}})\ |\ \hat{\boldsymbol{v}}\in V(S_{k_{2},c\text{-swl}}(G))\}\mskip-5.0mu\} = {{𝒔𝒘𝒍k2,c(t)(G,𝒗^)|𝒗^V(Sk2,c-swl(G))}}\{\mskip-5.0mu\{\bm{swl}_{k_{2},c}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime})\ |\ \hat{\boldsymbol{v}}^{\prime}\in V(S_{k_{2},c\text{-swl}}(G^{\prime}))\}\mskip-5.0mu\} is equivalent to kk2,ccc\forall k\leq k_{2},cc\leq c, {{𝒔𝒘𝒍k2,c(t)(G,𝒗^)|𝒗^Vk,cc(Sk2,c-swl(G))}}\{\mskip-5.0mu\{\bm{swl}_{k_{2},c}^{(t)}(G,\hat{\boldsymbol{v}})\ |\ \hat{\boldsymbol{v}}\in V_{k,cc}(S_{k_{2},c\text{-swl}}(G))\}\mskip-5.0mu\} = {{𝒔𝒘𝒍k2,c(t)(G,𝒗^)|𝒗^Vk,cc(Sk2,c-swl(G))}}\{\mskip-5.0mu\{\bm{swl}_{k_{2},c}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime})\ |\ \hat{\boldsymbol{v}}^{\prime}\in V_{k,cc}(S_{k_{2},c\text{-swl}}(G^{\prime}))\}\mskip-5.0mu\}. With Lemma.3 we can get that kk1,ccc\forall k\leq k_{1},cc\leq c, {{𝒔𝒘𝒍k1,c(t)(G,𝒗^)|𝒗^Vk,cc(Sk1,c-swl(G))}}\{\mskip-5.0mu\{\bm{swl}_{k_{1},c}^{(t)}(G,\hat{\boldsymbol{v}})\ |\ \hat{\boldsymbol{v}}\in V_{k,cc}(S_{k_{1},c\text{-swl}}(G))\}\mskip-5.0mu\} = {{𝒔𝒘𝒍k1,c(t)(G,𝒗^)|𝒗^Vk,cc(Sk1,c-swl(G))}}\{\mskip-5.0mu\{\bm{swl}_{k_{1},c}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime})\ |\ \hat{\boldsymbol{v}}^{\prime}\in V_{k,cc}(S_{k_{1},c\text{-swl}}(G^{\prime}))\}\mskip-5.0mu\}. Hence two graphs GG and GG^{\prime} cannot be distinguished by (k1,c)()(k_{1},c)(\leq)-SetWL.

A.4.7 Proof of Theorem 6

Theorem 6.

When (ii) BaseGNN can distinguish any non-isomorhpic graphs with at most kk nodes, (iiii) all MLPs have sufficient depth and width, and (iiiiii) POOL is an injective function, then for any tt\in\mathbb{N}, tt-layer (k,c)()(k,c)(\leq)-SetGNN is as expressive as (k,c)()(k,c)(\leq)-SetWL at the tt-th iteration.

Proof.

For any G,𝒗^G,\hat{\boldsymbol{v}}, let 𝒔𝒘𝒍k,c(t)(G,𝒗^)\bm{swl}_{k,c}^{(t)}(G,\hat{\boldsymbol{v}}) denotes the color of 𝒗^\hat{\boldsymbol{v}} at tt-iteration (k,c)()(k,c)(\leq)-SetWL and hk,c(t)(G,𝒗^)h_{k,c}^{(t)}(G,\hat{\boldsymbol{v}}) denotes the embedding of 𝒗^\hat{\boldsymbol{v}} at tt-th layer (k,c)()(k,c)(\leq)-SetGNN. We prove the above theorem by showing that 𝒔𝒘𝒍k,c(t)(G,𝒗^)\bm{swl}_{k,c}^{(t)}(G,\hat{\boldsymbol{v}}) = 𝒔𝒘𝒍k,c(t)(G,𝒗^)\bm{swl}_{k,c}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}\textquoteright) \Longleftrightarrow hk,c(t)(G,𝒗^)h_{k,c}^{(t)}(G,\hat{\boldsymbol{v}}) = hk,c(t)(G,𝒗^)h_{k,c}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}\textquoteright). We remove the subscript k,ck,c when possible without introducing confusion.

For easier reference, recall the updating formulation for tt-iteration (k,c)()(k,c)(\leq)-SetWL is

𝒔𝒘𝒍(t+12)(G,𝒗^)=\displaystyle\bm{swl}^{(t+\frac{1}{2})}(G,\hat{\boldsymbol{v}})= {{𝒔𝒘𝒍k,c(t)(G,𝒖^)|𝒖^𝒩rightG(𝒗^)}}\displaystyle\{\mskip-5.0mu\{\bm{swl}_{k,c}^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} (26)
𝒔𝒘𝒍(t+1)(G,𝒗^)=\displaystyle\bm{swl}^{(t+1)}(G,\hat{\boldsymbol{v}})= (𝒔𝒘𝒍(t)(G,𝒗^),𝒔𝒘𝒍(t+12)(G,𝒗^),\displaystyle\bigg{(}\bm{swl}^{(t)}(G,\hat{\boldsymbol{v}}),\bm{swl}^{(t+\frac{1}{2})}(G,\hat{\boldsymbol{v}}), (27)
{{𝒔𝒘𝒍(t)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}},{{𝒔𝒘𝒍(t+12)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}})\displaystyle\{\mskip-5.0mu\{\bm{swl}^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\},\{\mskip-5.0mu\{\bm{swl}^{(t+\frac{1}{2})}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\}\bigg{)} (28)

And the updating formulation for (k,c)()(k,c)(\leq)-SetGNN is

h(t+12)(𝒗^)=\displaystyle h^{(t+\frac{1}{2})}(\hat{\boldsymbol{v}})= 𝒖^𝒩rightG(𝒗^)MLP(t+12)(h(t)(𝒖^))\displaystyle\sum_{\hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})}\text{MLP}^{(t+\frac{1}{2})}(h^{(t)}(\hat{\boldsymbol{u}})) (29)
h(t+1)(𝒗^)=\displaystyle h^{(t+1)}(\hat{\boldsymbol{v}})= MLP(t)(h(t)(𝒗^),h(t+12)(𝒗^),\displaystyle\text{MLP}^{(t)}\Big{(}h^{(t)}(\hat{\boldsymbol{v}}),h^{(t+\frac{1}{2})}(\hat{\boldsymbol{v}}), (30)
𝒖^𝒩leftG(𝒗^)MLPA(t)(h(t)(𝒖^)),𝒖^𝒩leftG(𝒗^)MLPB(t)(h(t+12)(𝒖^)))\displaystyle\sum_{\hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})}\text{MLP}^{(t)}_{A}(h^{(t)}(\hat{\boldsymbol{u}})),\sum_{\hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})}\text{MLP}^{(t)}_{B}(h^{(t+\frac{1}{2})}(\hat{\boldsymbol{u}}))\Big{)} (31)

At t=0t=0, given powerful enough BaseGNN with condition (i) in the theorem, h(0)(G,𝒗^)h^{(0)}(G,\hat{\boldsymbol{v}}) = h(0)(G,𝒗^)h^{(0)}(G^{\prime},\hat{\boldsymbol{v}}\textquoteright) \Longleftrightarrow G[𝒗^]G[\hat{\boldsymbol{v}}] and G[𝒗^]G^{\prime}[\hat{\boldsymbol{v}}^{\prime}] are isomorphic \Longleftrightarrow 𝒔𝒘𝒍(0)(G,𝒗^)\bm{swl}^{(0)}(G,\hat{\boldsymbol{v}}) = 𝒔𝒘𝒍(0)(G,𝒗^)\bm{swl}^{(0)}(G^{\prime},\hat{\boldsymbol{v}}\textquoteright).

Now assume for t\leq t iterations the claim 𝒔𝒘𝒍(t)(G,𝒗^)\bm{swl}^{(t)}(G,\hat{\boldsymbol{v}}) = 𝒔𝒘𝒍(t)(G,𝒗^)\bm{swl}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}\textquoteright) \Longleftrightarrow h(t)(G,𝒗^)h^{(t)}(G,\hat{\boldsymbol{v}}) = h(t)(G,𝒗^)h^{(t)}(G^{\prime},\hat{\boldsymbol{v}}\textquoteright) holds (for any 𝒗^\hat{\boldsymbol{v}} and 𝒗^\hat{\boldsymbol{v}}^{\prime}). We prove it holds for t+1t+1 iteration. We first prove the forward direction. 𝒔𝒘𝒍(t+1)(G,𝒗^)\bm{swl}^{(t+1)}(G,\hat{\boldsymbol{v}}) = 𝒔𝒘𝒍(t+1)(G,𝒗^)\bm{swl}^{(t+1)}(G^{\prime},\hat{\boldsymbol{v}}\textquoteright) imples that

(1) 𝒔𝒘𝒍(t)(G,𝒗^)\bm{swl}^{(t)}(G,\hat{\boldsymbol{v}}) = 𝒔𝒘𝒍(t)(G,𝒗^)\bm{swl}^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime})
(2) 𝒔𝒘𝒍(t+1/2)(G,𝒗^)\bm{swl}^{(t+1/2)}(G,\hat{\boldsymbol{v}}) = 𝒔𝒘𝒍(t+1/2)(G,𝒗^)\bm{swl}^{(t+1/2)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime})
(3) {{𝒔𝒘𝒍(t)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{\bm{swl}^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} = {{𝒔𝒘𝒍(t)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{\bm{swl}^{(t)}(G^{\prime},\hat{\boldsymbol{u}}^{\prime})\ |\ \hat{\boldsymbol{u}}^{\prime}\in\mathcal{N}^{G^{\prime}}_{\text{left}}(\hat{\boldsymbol{v}}^{\prime})\}\mskip-5.0mu\}
(4) {{𝒔𝒘𝒍(t+1/2)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{\bm{swl}^{(t+1/2)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} = {{𝒔𝒘𝒍(t+1/2)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{\bm{swl}^{(t+1/2)}(G^{\prime},\hat{\boldsymbol{u}}^{\prime})\ |\ \hat{\boldsymbol{u}}^{\prime}\in\mathcal{N}^{G^{\prime}}_{\text{left}}(\hat{\boldsymbol{v}}^{\prime})\}\mskip-5.0mu\}

(1) and (3) can be directly transformed to relationship between hh using the inductive hypothesis. Formally, we have
(a) h(t)(G,𝒗^)h^{(t)}(G,\hat{\boldsymbol{v}}) = h(t)(G,𝒗^)h^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}) and
(c) {{h(t)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{h^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} = {{h(t)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{h^{(t)}(G^{\prime},\hat{\boldsymbol{u}}^{\prime})\ |\ \hat{\boldsymbol{u}}^{\prime}\in\mathcal{N}^{G^{\prime}}_{\text{left}}(\hat{\boldsymbol{v}}^{\prime})\}\mskip-5.0mu\}

(2) and (4) need one additional process. Notice that (2) is equivalent to {{𝒔𝒘𝒍(t)(G,𝒖^)|𝒖^𝒩rightG(𝒗^)}}={{𝒔𝒘𝒍(t)(G,𝒖^)|𝒖^𝒩rightG(𝒗^)}}\{\mskip-5.0mu\{\bm{swl}^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\}=\{\mskip-5.0mu\{\bm{swl}^{(t)}(G^{\prime},\hat{\boldsymbol{u}}^{\prime})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G^{\prime}}_{\text{right}}(\hat{\boldsymbol{v}}^{\prime})\}\mskip-5.0mu\} and by inductive hypothesis we know {{h(t)(G,𝒖^)|𝒖^𝒩rightG(𝒗^)}}={{h(t)(G,𝒖^)|𝒖^𝒩rightG(𝒗^)}}\{\mskip-5.0mu\{h^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\}=\{\mskip-5.0mu\{h^{(t)}(G^{\prime},\hat{\boldsymbol{u}}^{\prime})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G^{\prime}}_{\text{right}}(\hat{\boldsymbol{v}}^{\prime})\}\mskip-5.0mu\}. As the formulation Eq.(29) applies a MLP with summation, which is permutation invariant to ordering, to the multiset {{h(t)(G,𝒖^)|𝒖^𝒩rightG(𝒗^)}}\{\mskip-5.0mu\{h^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\}, the same multiset leads to the same output. Hence we know
(b) h(t+1/2)(G,𝒗^)h^{(t+1/2)}(G,\hat{\boldsymbol{v}}) = h(t+1/2)(G,𝒗^)h^{(t+1/2)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime}).

For (4) we know that there is a bijective mapping gg between 𝒩leftG(𝒗^)\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}}) and 𝒩leftG(𝒗^)\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}}^{\prime}) such that 𝒔𝒘𝒍(t+1/2)(G,𝒖^)=𝒔𝒘𝒍(t+1/2)(G,g(𝒖^))\bm{swl}^{(t+1/2)}(G,\hat{\boldsymbol{u}})=\bm{swl}^{(t+1/2)}(G^{\prime},g(\hat{\boldsymbol{u}})). Then using the same argument as (2) =>(b) we can get h(t+1/2)(G,𝒖^)=h(t+1/2)(G,g(𝒖^))h^{(t+1/2)}(G,\hat{\boldsymbol{u}})=h^{(t+1/2)}(G^{\prime},g(\hat{\boldsymbol{u}})) for any 𝒖^𝒩leftG(𝒗^)\hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}}), which is equivalent to
(d) {{h(t+1/2)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{h^{(t+1/2)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} = {{h(t+1/2)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{h^{(t+1/2)}(G^{\prime},\hat{\boldsymbol{u}}^{\prime})\ |\ \hat{\boldsymbol{u}}^{\prime}\in\mathcal{N}^{G^{\prime}}_{\text{left}}(\hat{\boldsymbol{v}}^{\prime})\}\mskip-5.0mu\}.

Combining (a)(b)(c)(d) and using the permutation invariant property of MLP with summation, we can derive that h(t+1)(G,𝒗^)h^{(t+1)}(G,\hat{\boldsymbol{v}}) = h(t+1)(G,𝒗^)h^{(t+1)}(G^{\prime},\hat{\boldsymbol{v}}\textquoteright).

Now for the backward direction. We first characterize the property of MLP with summation from DeepSet [64] and GIN [61]’s Lemma 5.

Lemma 4 (Lemma 5 of [61]).

Assume 𝒳\mathcal{X} is countable. There exists a function f:𝒳nf:\mathcal{X}\rightarrow\mathbb{R}^{n} so that h(X)=xXf(x)h(X)=\sum_{x\in X}f(x) is unique for each multiset X𝒳X\in\mathcal{X} of bounded size.

Using this Lemma, we know that given enough depth and width of a MLP, there exist a MLP that xXMLP(x)=yYMLP(y)X=Y\sum_{x\in X}\text{MLP}(x)=\sum_{y\in Y}\text{MLP}(y)\Longleftrightarrow X=Y or in other words two multisets XX and YY are equivalent. Now by h(t+1)(G,𝒗^)h^{(t+1)}(G,\hat{\boldsymbol{v}}) = h(t+1)(G,𝒗^)h^{(t+1)}(G^{\prime},\hat{\boldsymbol{v}}\textquoteright) and using the Eq.(31), we know there exists MLPs inside Eq.(29) and Eq.(31), such that

(1) h(t)(G,𝒗^)h^{(t)}(G,\hat{\boldsymbol{v}}) = h(t)(G,𝒗^)h^{(t)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime})
(2) h(t+1/2)(G,𝒗^)h^{(t+1/2)}(G,\hat{\boldsymbol{v}}) = h(t+1/2)(G,𝒗^)h^{(t+1/2)}(G^{\prime},\hat{\boldsymbol{v}}^{\prime})
(3) {{h(t)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{h^{(t)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} = {{h(t)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{h^{(t)}(G^{\prime},\hat{\boldsymbol{u}}^{\prime})\ |\ \hat{\boldsymbol{u}}^{\prime}\in\mathcal{N}^{G^{\prime}}_{\text{left}}(\hat{\boldsymbol{v}}^{\prime})\}\mskip-5.0mu\}
(4) {{h(t+1/2)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{h^{(t+1/2)}(G,\hat{\boldsymbol{u}})\ |\ \hat{\boldsymbol{u}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} = {{h(t+1/2)(G,𝒖^)|𝒖^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{h^{(t+1/2)}(G^{\prime},\hat{\boldsymbol{u}}^{\prime})\ |\ \hat{\boldsymbol{u}}^{\prime}\in\mathcal{N}^{G^{\prime}}_{\text{left}}(\hat{\boldsymbol{v}}^{\prime})\}\mskip-5.0mu\}

Where (3) and (4) are derived using the provided lemma. Hence following the same argument as the forward process, we can prove that 𝒔𝒘𝒍(t+1)(G,𝒗^)\bm{swl}^{(t+1)}(G,\hat{\boldsymbol{v}}) = 𝒔𝒘𝒍(t+1)(G,𝒗^)\bm{swl}^{(t+1)}(G^{\prime},\hat{\boldsymbol{v}}\textquoteright).

Combining the forward and backward direction, we have 𝒔𝒘𝒍(t+1)(G,𝒗^)\bm{swl}^{(t+1)}(G,\hat{\boldsymbol{v}}) = 𝒔𝒘𝒍(t+1)(G,𝒗^)\bm{swl}^{(t+1)}(G^{\prime},\hat{\boldsymbol{v}}\textquoteright) \Longleftrightarrow h(t+1)(G,𝒗^)h^{(t+1)}(G,\hat{\boldsymbol{v}}) = h(t+1)(G,𝒗^)h^{(t+1)}(G^{\prime},\hat{\boldsymbol{v}}\textquoteright). Hence by induction we proved for any step tt and any 𝒗^,𝒗^\hat{\boldsymbol{v}},\hat{\boldsymbol{v}}^{\prime}, the above statement is true. This shows that (k,c)()(k,c)(\leq)-SetGNN and (k,c)()(k,c)(\leq)-SetWL have same expressivity.

A.4.8 Proof of Theorem 7

Theorem 7.

For any tt\in\mathbb{N}, the tt-layer (k,c)()(k,c)(\leq)-SetGNN is more expressive than the tt-layer (k,c)()(k,c)(\leq)-SetGNN. As limt\lim_{t\to\infty}, (k,c)()(k,c)(\leq)-SetGNN is as expressive as (k,c)()(k,c)(\leq)-SetGNN.

Proof.

We first prove that tt-layer (k,c)()(k,c)(\leq)-SetGNN is more expressive than tt-layer (k,c)()(k,c)(\leq)-SetGNN, by showing that if a mm-set 𝒖^\hat{\boldsymbol{u}} has the same representation to another mm-set 𝒗^\hat{\boldsymbol{v}} in tt-layer (k,c)()(k,c)(\leq)-SetGNN, then they also have the same representation in tt-layer (k,c)()(k,c)(\leq)-SetGNN. Let hh be the representation inside (k,c)()(k,c)(\leq)-SetGNN, and gg be the representation inside (k,c)()(k,c)(\leq)-SetGNN.

To simplify the proof, we first simplify the formulations of (k,c)()(k,c)(\leq)-SetGNN Eq.13 and Eq.14 by removing unnecessary super and under script of MLP without introducing ambiguity. We then add another superscript to embeddings to indicate which step inside the one-layer bidirectional propagation. Notice that for one-layer bidirectional propagation, there are 2k22k-2 intermediate sequential steps. The proof assumes that all MLPs are injective. We rewrite Eq.13 and Eq.14 as follows

m-set 𝒗^,h(t+12)(𝒗^):=h(t+12,km)(𝒗^)=MLP(h(t)(𝒗^),𝒘^𝒩rightG(𝒗^)MLP(h(t+12,km1)(𝒘^)))\displaystyle\forall m\text{-set }\hat{\boldsymbol{v}},h^{(t+\frac{1}{2})}(\hat{\boldsymbol{v}}):=h^{(t+\frac{1}{2},k-m)}(\hat{\boldsymbol{v}})=\text{MLP}\Big{(}h^{(t)}(\hat{\boldsymbol{v}}),\sum_{\hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})}\text{MLP}(h^{{(t+\frac{1}{2},k-m-1)}}(\hat{\boldsymbol{w}}))\Big{)} (32)
m-set 𝒗^,h(t+1)(𝒗^):=h(t+1,m1)(𝒗^)=MLP(h(t+12,km)(𝒗^),𝒘^𝒩leftG(𝒗^)MLP(h(t+1,m2)(𝒘^)))\displaystyle\forall m\text{-set }\hat{\boldsymbol{v}},h^{(t+1)}(\hat{\boldsymbol{v}}):=h^{(t+1,m-1)}(\hat{\boldsymbol{v}})=\text{MLP}\Big{(}h^{(t+\frac{1}{2},k-m)}(\hat{\boldsymbol{v}}),\sum_{\hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})}\text{MLP}(h^{{(t+1,m-2)}}(\hat{\boldsymbol{w}}))\Big{)} (33)

Where we have boundary case with h(t+12,0)(𝒖^)=h(t)(𝒖^)h^{{(t+\frac{1}{2},0)}}(\hat{\boldsymbol{u}})=h^{{(t)}}(\hat{\boldsymbol{u}}) and h(t+1,0)(𝒖^)=h(t+12)(𝒖^)h^{{(t+1,0)}}(\hat{\boldsymbol{u}})=h^{{(t+\frac{1}{2})}}(\hat{\boldsymbol{u}}). h(t+12)(𝒗^):=h(t+12,km)(𝒗^)h^{(t+\frac{1}{2})}(\hat{\boldsymbol{v}}):=h^{(t+\frac{1}{2},k-m)}(\hat{\boldsymbol{v}}) represents that the representation is calculated at kmk-m step for t+12t+\frac{1}{2} layer.

We prove the theorem by induction on tt. Specifically, we want to prove that for any tt, 𝒗^\hat{\boldsymbol{v}}, 𝒖^\hat{\boldsymbol{u}}, h(t)(𝒖^)=h(t)(𝒗^)g(t)(𝒖^)=g(t)(𝒗^)h^{(t)}(\hat{\boldsymbol{u}})=h^{(t)}(\hat{\boldsymbol{v}})\Longrightarrow g^{(t)}(\hat{\boldsymbol{u}})=g^{(t)}(\hat{\boldsymbol{v}}) and h(t12)(𝒖^)=h(t12)(𝒗^)g(t12)(𝒖^)=g(t12)(𝒗^)h^{(t-\frac{1}{2})}(\hat{\boldsymbol{u}})=h^{(t-\frac{1}{2})}(\hat{\boldsymbol{v}})\Longrightarrow g^{(t-\frac{1}{2})}(\hat{\boldsymbol{u}})=g^{(t-\frac{1}{2})}(\hat{\boldsymbol{v}}). The base case is easy to verify as the definition of initialization step is the same.

Now assume that for tlt\leq l we have for any tt, 𝒗^\hat{\boldsymbol{v}}, 𝒖^\hat{\boldsymbol{u}}, h(t)(𝒖^)=h(t)(𝒗^)g(t)(𝒖^)=g(t)(𝒗^)h^{(t)}(\hat{\boldsymbol{u}})=h^{(t)}(\hat{\boldsymbol{v}})\Longrightarrow g^{(t)}(\hat{\boldsymbol{u}})=g^{(t)}(\hat{\boldsymbol{v}}) and h(t12)(𝒖^)=h(t12)(𝒗^)g(t12)(𝒖^)=g(t12)(𝒗^)h^{(t-\frac{1}{2})}(\hat{\boldsymbol{u}})=h^{(t-\frac{1}{2})}(\hat{\boldsymbol{v}})\Longrightarrow g^{(t-\frac{1}{2})}(\hat{\boldsymbol{u}})=g^{(t-\frac{1}{2})}(\hat{\boldsymbol{v}}). We first prove that for t=l+1t=l+1, h(l+12)(𝒖^)=h(l+12)(𝒗^)g(l+12)(𝒖^)=g(l+12)(𝒗^)h^{(l+\frac{1}{2})}(\hat{\boldsymbol{u}})=h^{(l+\frac{1}{2})}(\hat{\boldsymbol{v}})\Longrightarrow g^{(l+\frac{1}{2})}(\hat{\boldsymbol{u}})=g^{(l+\frac{1}{2})}(\hat{\boldsymbol{v}}).

First, Eq.32 can be rewrite as

h(t+12)(𝒗^):h(t+12,km)(𝒗^)=MLP(h(t)(𝒗^),𝒘^𝒩rightG(𝒗^)MLP(h(t+12,km1)(𝒘^)))\displaystyle h^{(t+\frac{1}{2})}(\hat{\boldsymbol{v}}):h^{(t+\frac{1}{2},k-m)}(\hat{\boldsymbol{v}})=\text{MLP}\Big{(}h^{(t)}(\hat{\boldsymbol{v}}),\sum_{\hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})}\text{MLP}(h^{{(t+\frac{1}{2},k-m-1)}}(\hat{\boldsymbol{w}}))\Big{)} (34)
=MLP(h(t)(𝒗^),𝒘^𝒩rightG(𝒗^)MLP(MLP(h(t)(𝒘^)),𝒑^𝒩rightG(𝒘^)MLP(h(t+12,km2)(𝒑^))))\displaystyle=\text{MLP}\Big{(}h^{(t)}(\hat{\boldsymbol{v}}),\sum_{\hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})}\text{MLP}(\text{MLP}(h^{(t)}(\hat{\boldsymbol{w}})),\sum_{\hat{\boldsymbol{p}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{w}})}\text{MLP}(h^{{(t+\frac{1}{2},k-m-2)}}(\hat{\boldsymbol{p}})))\Big{)} (35)

Then h(l+12)(𝒗^)=h(l+12)(𝒖^)𝒘^𝒩rightG(𝒗^)MLP(h(l)(𝒘^))=𝒘^𝒩rightG(𝒖^)MLP(h(l)(𝒘^))h^{(l+\frac{1}{2})}(\hat{\boldsymbol{v}})=h^{(l+\frac{1}{2})}(\hat{\boldsymbol{u}})\Longrightarrow\sum_{\hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})}\text{MLP}(h^{(l)}(\hat{\boldsymbol{w}}))=\sum_{\hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{u}})}\text{MLP}(h^{(l)}(\hat{\boldsymbol{w}})). Hence we can find a bijective mapping ff between 𝒩rightG(𝒗^)\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}}) and 𝒩rightG(𝒖^)\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{u}}) such that MLP(h(l)(𝒘^))=MLP(h(l)(f(𝒘^)))\text{MLP}(h^{(l)}(\hat{\boldsymbol{w}}))=\text{MLP}(h^{(l)}(f(\hat{\boldsymbol{w}}))). Then by inductive hyphothesis, MLP(g(l)(𝒘^))=MLP(g(l)(f(𝒘^)))\text{MLP}(g^{(l)}(\hat{\boldsymbol{w}}))=\text{MLP}(g^{(l)}(f(\hat{\boldsymbol{w}}))) for all 𝒘^𝒩rightG(𝒗^)\hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}}). This implies that 𝒘^𝒩rightG(𝒗^)MLP(g(l)(𝒘^))=𝒘^𝒩rightG(𝒖^)MLP(g(l)(𝒘^))\sum_{\hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{v}})}\text{MLP}(g^{(l)}(\hat{\boldsymbol{w}}))=\sum_{\hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{right}}(\hat{\boldsymbol{u}})}\text{MLP}(g^{(l)}(\hat{\boldsymbol{w}})) or equivalently g(l+12)(𝒗^)=g(l+12)(𝒖^)g^{(l+\frac{1}{2})}(\hat{\boldsymbol{v}})=g^{(l+\frac{1}{2})}(\hat{\boldsymbol{u}}).

Now we can assume that for tmt\leq m we have for any tt, 𝒗^\hat{\boldsymbol{v}}, 𝒖^\hat{\boldsymbol{u}}, h(t)(𝒖^)=h(t)(𝒗^)g(t)(𝒖^)=g(t)(𝒗^)h^{(t)}(\hat{\boldsymbol{u}})=h^{(t)}(\hat{\boldsymbol{v}})\Longrightarrow g^{(t)}(\hat{\boldsymbol{u}})=g^{(t)}(\hat{\boldsymbol{v}}) and for tl+1t\leq l+1 h(t12)(𝒖^)=h(t12)(𝒗^)g(t12)(𝒖^)=g(t12)(𝒗^)h^{(t-\frac{1}{2})}(\hat{\boldsymbol{u}})=h^{(t-\frac{1}{2})}(\hat{\boldsymbol{v}})\Longrightarrow g^{(t-\frac{1}{2})}(\hat{\boldsymbol{u}})=g^{(t-\frac{1}{2})}(\hat{\boldsymbol{v}}). We prove that for t=l+1t=l+1, h(l+1)(𝒖^)=h(l+1)(𝒗^)g(l+1)(𝒖^)=g(l+1)(𝒗^)h^{(l+1)}(\hat{\boldsymbol{u}})=h^{(l+1)}(\hat{\boldsymbol{v}})\Longrightarrow g^{(l+1)}(\hat{\boldsymbol{u}})=g^{(l+1)}(\hat{\boldsymbol{v}}).

We first rewrite Eq.33 as

h(t+1)(𝒗^):=h(t+1,m1)(𝒗^)=MLP(h(t+12,km)(𝒗^),𝒘^𝒩leftG(𝒗^)MLP(h(t+1,m2)(𝒘^)))\displaystyle h^{(t+1)}(\hat{\boldsymbol{v}}):=h^{(t+1,m-1)}(\hat{\boldsymbol{v}})=\text{MLP}\Big{(}h^{(t+\frac{1}{2},k-m)}(\hat{\boldsymbol{v}}),\sum_{\hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})}\text{MLP}(h^{{(t+1,m-2)}}(\hat{\boldsymbol{w}}))\Big{)} (36)
=MLP(h(t+12,km)(𝒗^),𝒘^𝒩leftG(𝒗^)MLP(MLP(h(t+12,km+1)(𝒘^),𝒑^𝒩leftG(𝒘^)MLP(h(t+1,m3)(𝒑^)))))\displaystyle=\text{MLP}\Bigg{(}h^{(t+\frac{1}{2},k-m)}(\hat{\boldsymbol{v}}),\sum_{\hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})}\text{MLP}\Big{(}\text{MLP}\big{(}h^{(t+\frac{1}{2},k-m+1)}(\hat{\boldsymbol{w}}),\sum_{\hat{\boldsymbol{p}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{w}})}\text{MLP}(h^{{(t+1,m-3)}}(\hat{\boldsymbol{p}}))\big{)}\Big{)}\Bigg{)} (37)

Then h(l+1)(𝒖^)=h(l+1)(𝒗^)h^{(l+1)}(\hat{\boldsymbol{u}})=h^{(l+1)}(\hat{\boldsymbol{v}}) implies

1) h(t+12,km)(𝒗^)=h(t+12,km)(𝒖^)h^{(t+\frac{1}{2},k-m)}(\hat{\boldsymbol{v}})=h^{(t+\frac{1}{2},k-m)}(\hat{\boldsymbol{u}}) or equivalently h(t+12)(𝒗^)=h(t+12)(𝒖^)h^{(t+\frac{1}{2})}(\hat{\boldsymbol{v}})=h^{(t+\frac{1}{2})}(\hat{\boldsymbol{u}})
2) {{h(t+12)(𝒘^)|𝒘^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{h^{(t+\frac{1}{2})}(\hat{\boldsymbol{w}})\ |\ \hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} = {{h(t+12)(𝒘^)|𝒘^𝒩leftG(𝒖^)}}\{\mskip-5.0mu\{h^{(t+\frac{1}{2})}(\hat{\boldsymbol{w}})\ |\ \hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{u}})\}\mskip-5.0mu\}

Also by Eq.32 we know that h(t+12)(𝒗^)=h(t+12)(𝒖^)h^{(t+\frac{1}{2})}(\hat{\boldsymbol{v}})=h^{(t+\frac{1}{2})}(\hat{\boldsymbol{u}}) implies

3) h(t)(𝒗^)=h(t)(𝒖^)h^{(t)}(\hat{\boldsymbol{v}})=h^{(t)}(\hat{\boldsymbol{u}}).

Combining with 2) and 3) we know

4) {{h(t)(𝒘^)|𝒘^𝒩leftG(𝒗^)}}\{\mskip-5.0mu\{h^{(t)}(\hat{\boldsymbol{w}})\ |\ \hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{v}})\}\mskip-5.0mu\} = {{h(t)(𝒘^)|𝒘^𝒩leftG(𝒖^)}}\{\mskip-5.0mu\{h^{(t)}(\hat{\boldsymbol{w}})\ |\ \hat{\boldsymbol{w}}\in\mathcal{N}^{G}_{\text{left}}(\hat{\boldsymbol{u}})\}\mskip-5.0mu\}

Now using our inductive hypothesis, we can transform 1) 2) 3) 4) by replace hh with gg. Then based on the equation in 11, we know g(l+1)(𝒖^)=g(l+1)(𝒗^)g^{(l+1)}(\hat{\boldsymbol{u}})=g^{(l+1)}(\hat{\boldsymbol{v}}).

Combining the two proved inductive hypothesis and applying them alternately, we know that for any tt, 𝒗^\hat{\boldsymbol{v}}, 𝒖^\hat{\boldsymbol{u}}, h(t)(𝒖^)=h(t)(𝒗^)g(t)(𝒖^)=g(t)(𝒗^)h^{(t)}(\hat{\boldsymbol{u}})=h^{(t)}(\hat{\boldsymbol{v}})\Longrightarrow g^{(t)}(\hat{\boldsymbol{u}})=g^{(t)}(\hat{\boldsymbol{v}}) and h(t12)(𝒖^)=h(t12)(𝒗^)g(t12)(𝒖^)=g(t12)(𝒗^)h^{(t-\frac{1}{2})}(\hat{\boldsymbol{u}})=h^{(t-\frac{1}{2})}(\hat{\boldsymbol{v}})\Longrightarrow g^{(t-\frac{1}{2})}(\hat{\boldsymbol{u}})=g^{(t-\frac{1}{2})}(\hat{\boldsymbol{v}}). Hence tt-layer (k,c)()(k,c)(\leq)-SetGNN is more expressive than tt-layer (k,c)()(k,c)(\leq)-SetGNN.

Now let’s considering tt to infinity. Then all representations will become stable with h(t)(𝒗^)=h(t)(𝒖^)h(t+1)(𝒗^)=h(t+1)(𝒖^)h^{(t)}(\hat{\boldsymbol{v}})=h^{(t)}(\hat{\boldsymbol{u}})\Longleftrightarrow h^{(t+1)}(\hat{\boldsymbol{v}})=h^{(t+1)}(\hat{\boldsymbol{u}}) and h(t1/2)(𝒗^)=h(t1/2)(𝒖^)h(t+1/2)(𝒗^)=h(t+1/2)(𝒖^)h^{(t-1/2)}(\hat{\boldsymbol{v}})=h^{(t-1/2)}(\hat{\boldsymbol{u}})\Longleftrightarrow h^{(t+1/2)}(\hat{\boldsymbol{v}})=h^{(t+1/2)}(\hat{\boldsymbol{u}}). Notice that for a single set 𝒗^\hat{\boldsymbol{v}}, the information used from its neighbors are the same in both (k,c)()(k,c)(\leq)-SetGNN and (k,c)()(k,c)(\leq)-SetGNN. Hence the equilibrium equations for set 𝒗^\hat{\boldsymbol{v}} should be the same in both (k,c)()(k,c)(\leq)-SetGNN and (k,c)()(k,c)(\leq)-SetGNN. Then they will have the same stable representations at the end.

A.4.9 Proof of Theorem 8

Theorem 8.

Let GG be a graph with cc connected components C1,,CcC_{1},...,C_{c}, and GG^{\prime} be a graph also with cc connected components C1,,CcC^{\prime}_{1},...,C^{\prime}_{c}, then GG and GG^{\prime} are isomorphic if and only if p:[c][c]\exists p:[c]\rightarrow[c], s.t. i[c]\forall i\in[c], CiC_{i} and Cp(i)C^{\prime}_{p(i)} are isomorphic.

Proof.

Right \Longrightarrow Left. Let hi:V(Ci)V(Cp(i))h_{i}:V(C_{i})\rightarrow V(C^{\prime}_{p(i)}) be one isomorphism from CiC_{i} to Cp(i)C^{\prime}_{p(i)} for i[c]i\in[c]. Then we can create a new mapping h:V(G)V(G)h:V(G)\rightarrow V(G^{\prime}), such that for any xV(C)x\in V(C), it first locates which component xx inside, for example ii, then apply hih_{i} to xx. Clearly hh is a isomorphism between GG and GG^{\prime}, hence GG and GG^{\prime} are isomorphic.

Left \Longrightarrow Right. GG and GG^{\prime} are isomorphic, then there exists an isomorphism hh from GG to GG^{\prime}. Now for a specific component CiC_{i} in GG, hh maps V(Ci)V(C_{i}) to a nodes set inside V(G)V(G^{\prime}), and name it as BiB_{i}. For iji\not=j, BiBj=B_{i}\cap B_{j}=\emptyset and there is not edges eE(G)e\in E(G^{\prime}) between BiB_{i} and BjB_{j} otherwise there must be an corresponding edge (apply h1h^{-1}) between V(Ci)V(C_{i}) and V(Cj)V(C_{j}) which is impossible. Hence {G[Bi]}i=1c\{G^{\prime}[B_{i}]\}_{i=1}^{c} are disconnected subgraphs. As hh also preserves connections, for any ii, CiC_{i} and G[Bi]G^{\prime}[B_{i}] are isomorphic. Hence we proved the right side.

A.4.10 Conjecture that k-MWL is equvialent to k-WL

Proof of t,wlk(t)(G,v~)=wlk(t)(G,v~)mwlk(t)(G,v~)=mwlk(t)(G,v~)\forall t,\bm{wl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{wl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime})\Longleftarrow\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime})

Proof.

We first take a closer look at the condition of 𝒎𝒘𝒍k(t+1)(G,𝒗~)=𝒎𝒘𝒍k(t+1)(G,𝒗~)\bm{mwl}_{k}^{(t+1)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t+1)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}). By Eq.(3) we know this is equivalent to
(1) 𝒎𝒘𝒍k(t)(G,𝒗~)=𝒎𝒘𝒍k(t)(G,𝒗~)\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}) and (2) {{{{𝒎𝒘𝒍k(t)(G,𝒗~[x/oG1(𝒗~,i)])|xV(G)}}|i=1,,k}}\bigg{\{}\mskip-10.0mu\bigg{\{}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}}[x/o^{-1}_{G}(\tilde{\boldsymbol{v}},i)])\big{|}x\in V(G)\}\mskip-5.0mu\}\ |\ i=1,...,k\bigg{\}}\mskip-10.0mu\bigg{\}} = {{{{𝒎𝒘𝒍k(t)(G,𝒗~[x/oG1(𝒗~,i)])|xV(G)}}|i=1,,k}}\bigg{\{}\mskip-10.0mu\bigg{\{}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}[x/o^{-1}_{G}(\tilde{\boldsymbol{v}}^{\prime},i)])\big{|}x\in V(G^{\prime})\}\mskip-5.0mu\}\ |\ i=1,...,k\bigg{\}}\mskip-10.0mu\bigg{\}}. (2) can be rewrite as {{{{𝒎𝒘𝒍k(t)(G,𝒗~[x/idx𝒗~(y)])|xV(G)}}|y𝒗~}}\bigg{\{}\mskip-10.0mu\bigg{\{}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])\big{|}x\in V(G)\}\mskip-5.0mu\}\ |\ y\in\tilde{\boldsymbol{v}}\bigg{\}}\mskip-10.0mu\bigg{\}} = {{{{𝒎𝒘𝒍k(t)(G,𝒗~[x/idx𝒗~(y)])|xV(G)}}|y𝒗~}}\bigg{\{}\mskip-10.0mu\bigg{\{}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}[x/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(y)])\big{|}x\in V(G^{\prime})\}\mskip-5.0mu\}\ |\ y\in\tilde{\boldsymbol{v}}^{\prime}\bigg{\}}\mskip-10.0mu\bigg{\}}, which is equivalent to (3)  bijective f:V(G)V(G)\exists\text{ bijective }f:V(G)\rightarrow V(G^{\prime}), y𝒗~\forall y\in\tilde{\boldsymbol{v}}, {{𝒎𝒘𝒍k(t)(G,𝒗~[x/idx𝒗~(y)])|xV(G)}}={{𝒎𝒘𝒍k(t)(G,𝒗~[x/idx𝒗~(f(y))])|xV(G)}}\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])\big{|}x\in V(G)\}\mskip-5.0mu\}=\{\mskip-5.0mu\{\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}[x/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(f(y))])\big{|}x\in V(G^{\prime})\}\mskip-5.0mu\}.

Define F(t+1)(G,G,𝒗~,𝒗~):={ injective f:𝒗~𝒗~|fF(t)(G,G,𝒗~,𝒗~), AND ,y𝒗~,hy:V(G)V(G),x,fF(t)(G,G,𝒗~[x/idx𝒗~(y)],𝒗~[hy(x)/idx𝒗~(f(y))])}F^{(t+1)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}):=\{\text{ injective }f:\tilde{\boldsymbol{v}}\rightarrow\tilde{\boldsymbol{v}}^{\prime}\ |\ f\in F^{(t)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}),\text{ AND },\forall y\in\tilde{\boldsymbol{v}},\exists h_{y}:V(G)\rightarrow V(G^{\prime}),\forall x,f\in F^{(t)}(G,G^{\prime},\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)],\tilde{\boldsymbol{v}}^{\prime}[h_{y}(x)/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(f(y))])\}. Let F(0)(G,G,𝒗~,𝒗~):={f|f is an isomorphism from G[𝒗~] to G[𝒗~]}F^{(0)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}):=\{f\ |\ f\text{ is an isomorphism from }G[\tilde{\boldsymbol{v}}]\text{ to }G^{\prime}[\tilde{\boldsymbol{v}}^{\prime}]\}.

Lemma 5.

t\forall t, 𝐦𝐰𝐥k(t)(G,𝐯~)=𝐦𝐰𝐥k(t)(G,𝐯~)\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}) if and only if F(t)(G,G,𝐯~,𝐯~)F^{(t)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime})\not=\emptyset.

Notice that this Lemma is conjectured to be true. This needs additional proof.

Lemma 6.

t,𝒎𝒘𝒍k(t)(G,𝒗~)=𝒎𝒘𝒍k(t)(G,𝒗~)fF(t)(G,G,𝒗~,𝒗~),𝒘𝒍k(t)(G,𝒗)=𝒘𝒍k(t)(G,f(𝒗))\forall t,\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime})\Longrightarrow\forall f\in F^{(t)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}),\bm{wl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}})=\bm{wl}_{k}^{(t)}(G,f(\overrightarrow{\boldsymbol{v}}))

Proof of Lemma.6: We prove it by induction on tt. When t=0t=0, F(0)(G,G,𝒗~,𝒗~)F^{(0)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}) contains all isomorphisms between G[𝒗~]G[\tilde{\boldsymbol{v}}] and G[𝒗~]G^{\prime}[\tilde{\boldsymbol{v}}^{\prime}], hence the right side is correct. Assume the statement is correct for t\leq t. For t+1t+1 case, the left side implies (1) 𝒎𝒘𝒍k(t)(G,𝒗~)=𝒎𝒘𝒍k(t)(G,𝒗~)\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}) and (2)F(t+1)(G,G,𝒗~,𝒗~)F^{(t+1)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime})\not=\emptyset, fF(t+1)(G,G,𝒗~,𝒗~)\forall f\in F^{(t+1)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}), y𝒗~\forall y\in\tilde{\boldsymbol{v}}, hy:V(G)V(G)\exists h_{y}:V(G)\rightarrow V(G^{\prime}), xV(G)\forall x\in V(G), 𝒎𝒘𝒍k(t)(G,𝒗~[x/idx𝒗~(y)])=𝒎𝒘𝒍k(t)(G,𝒗~[hy(x)/idx𝒗~(f(y))])\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)])=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}[h_{y}(x)/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(f(y))]). By induction hypothesis, (1) and F(t+1)(G,G,𝒗~,𝒗~)F(t)(G,G,𝒗~,𝒗~)F^{(t+1)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime})\subseteq F^{(t)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}) imply (a) fF(t+1)(G,G,𝒗~,𝒗~),𝒘𝒍k(t)(G,𝒗)=𝒘𝒍k(t)(G,f(𝒗))\forall f\in F^{(t+1)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}),\bm{wl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}})=\bm{wl}_{k}^{(t)}(G^{\prime},f(\overrightarrow{\boldsymbol{v}})); (2) and y𝒗~\forall y\in\tilde{\boldsymbol{v}}, fF(t)(G,G,𝒗~[x/idx𝒗~(y)],𝒗~[hy(x)/idx𝒗~(f(y))])f\in F^{(t)}(G,G^{\prime},\tilde{\boldsymbol{v}}[x/\text{idx}_{\tilde{\boldsymbol{v}}}(y)],\tilde{\boldsymbol{v}}^{\prime}[h_{y}(x)/\text{idx}_{\tilde{\boldsymbol{v}}^{\prime}}(f(y))]) imply (b) h\exists h, xV(G)\forall x\in V(G), 𝒘𝒍k(t)(G,𝒗[x/idx𝒗(y)])=𝒘𝒍k(t)(G,f(𝒗)[h(x)/idxf(𝒗)(f(y))])\bm{wl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}}[x/\text{idx}_{\overrightarrow{\boldsymbol{v}}}(y)])=\bm{wl}_{k}^{(t)}(G^{\prime},f(\overrightarrow{\boldsymbol{v}})[h(x)/\text{idx}_{f(\overrightarrow{\boldsymbol{v}})}(f(y))]), which can be rewritten as y𝒗~\forall y\in\tilde{\boldsymbol{v}}, {{𝒘𝒍k(t)(G,𝒗[x/idx𝒗(y)]|xV(G))}}\{\mskip-5.0mu\{\bm{wl}_{k}^{(t)}(G,\overrightarrow{\boldsymbol{v}}[x/\text{idx}_{\overrightarrow{\boldsymbol{v}}}(y)]\ |\ x\in V(G))\}\mskip-5.0mu\} = {{𝒘𝒍k(t)(G,f(𝒗)[x/idxf(𝒗)(f(y))]|xV(G))}}\{\mskip-5.0mu\{\bm{wl}_{k}^{(t)}(G^{\prime},f(\overrightarrow{\boldsymbol{v}})[x/\text{idx}_{f(\overrightarrow{\boldsymbol{v}})}(f(y))]\ |\ x\in V(G))\}\mskip-5.0mu\}. Now applying Eq.(1) with (a) and (b), we can derive that fF(t+1)(G,G,𝒗~,𝒗~)\forall f\in F^{(t+1)}(G,G^{\prime},\tilde{\boldsymbol{v}},\tilde{\boldsymbol{v}}^{\prime}), 𝒘𝒍k(t+1)(G,𝒗)=𝒘𝒍k(t+1)(G,f(𝒗))\bm{wl}_{k}^{(t+1)}(G,\overrightarrow{\boldsymbol{v}})=\bm{wl}_{k}^{(t+1)}(G,f(\overrightarrow{\boldsymbol{v}})). Hence the statement is correct for any tt.

Using Lemma.6 and the conclusion (pperm[k]\forall p\in\text{perm[k]}, 𝒘𝒍k(t+1)(G,p(𝒗))=𝒘𝒍k(t+1)(G,p(f(𝒗)))\bm{wl}_{k}^{(t+1)}(G,p(\overrightarrow{\boldsymbol{v}}))=\bm{wl}_{k}^{(t+1)}(G^{\prime},p(f(\overrightarrow{\boldsymbol{v}})))) proved in the first part of the proof of Theorem.1, we proved t,𝒎𝒘𝒍k(t)(G,𝒗~)=𝒎𝒘𝒍k(t)(G,𝒗~)𝒘𝒍k(t)(G,𝒗~)=𝒘𝒍k(t)(G,𝒗~)\forall t,\bm{mwl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{mwl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime})\Longrightarrow\bm{wl}_{k}^{(t)}(G,\tilde{\boldsymbol{v}})=\bm{wl}_{k}^{(t)}(G^{\prime},\tilde{\boldsymbol{v}}^{\prime}). ∎

A.5 Algorithm of extracting (k,c)()(k,c)(\leq) sets and constructing the super-graph of (k,c)()(k,c)(\leq)-SetWL

Here we present the algorithm of constructing supernodes (t+1,c)()(t+1,c)(\leq) sets from supernodes (t,c)()(t,c)(\leq) sets, and constructing the bipartite graph between (t,c)()(t,c)(\leq) sets and (t+1,c)()(t+1,c)(\leq) sets. Notice the algorithm we presented is just the pseudo code and we use for-loops to help presentation. The algorithm can easily be parallelized (removing for-loops) with using matrix operations.

Algorithm 1 Constructing supernodes (t+1,c)()(t+1,c)(\leq) sets and the bipartite graph between tt and t+1t+1
Input graph GG, the list Ot,cO_{t,\leq c} containing all supernodes with tt nodes and c\leq c number of components, the list Nt,cN_{t,\leq c} with Nt,c[i]N_{t,\leq c}[i] be the number of components of G[Ot,c[i]]G[O_{t,\leq c}[i]].
Ot+1,cO_{t+1,\leq c}, Nt+1,cN_{t+1,\leq c}, BtB_{t} containing edges between Ot,cO_{t,\leq c} and Ot+1,cO_{t+1,\leq c}
Ot+1,c,Nt+1,c,Bt[],[],[]O_{t+1,\leq c},N_{t+1,\leq c},B_{t}\leftarrow[],[],[]
for all  𝒔^,n\hat{\boldsymbol{s}},n in zip( Ot,c,Nt,cO_{t,\leq c},N_{t,\leq c}do
     𝒩1\mathcal{N}_{1}\leftarrow 11-st hop neighbors of G[𝒔^]G[\hat{\boldsymbol{s}}]
     𝒩>1\mathcal{N}_{>1}\leftarrow (V(G)𝒔^)𝒩1(V(G)\setminus\hat{\boldsymbol{s}})\setminus\mathcal{N}_{1}
     for all mm in 𝒩1\mathcal{N}_{1} do \triangleright Number of components doesn’t change.
         𝒒^𝒔^+{m}\hat{\boldsymbol{q}}\leftarrow\hat{\boldsymbol{s}}+\{m\}
         Append 𝒒^\hat{\boldsymbol{q}} into Ot+1,cO_{t+1,\leq c}, nn into Nt+1,cN_{t+1,\leq c}
         Append edge (𝒔^,𝒒^)(\hat{\boldsymbol{s}},\hat{\boldsymbol{q}}) into BtB_{t} \triangleright We change the edge to (𝒔^,𝒒^,m)(\hat{\boldsymbol{s}},\hat{\boldsymbol{q}},m) for Algorithm.2
     end for
     if n < c then \triangleright Creating an additional component.
         for all mm in 𝒩>1\mathcal{N}_{>1} do
              𝒒^𝒔^+{m}\hat{\boldsymbol{q}}\leftarrow\hat{\boldsymbol{s}}+\{m\}
              Append 𝒒^\hat{\boldsymbol{q}} into Ot+1,cO_{t+1,\leq c}, n+1n+1 into Nt+1,cN_{t+1,\leq c}
              Append edge (𝒔^,𝒒^)(\hat{\boldsymbol{s}},\hat{\boldsymbol{q}}) into BtB_{t} \triangleright We change the edge to (𝒔^,𝒒^,m)(\hat{\boldsymbol{s}},\hat{\boldsymbol{q}},m) for Algorithm.2
         end for
     end if
end for
Remove repeated elements inside Ot+1,cO_{t+1,\leq c}, with corresponding mask MM
Apply mask MM to Nt+1,cN_{t+1,\leq c}
return Ot+1,cO_{t+1,\leq c}, Nt+1,cN_{t+1,\leq c}, BtB_{t}

To get the full supergraph Sk,c-swlS_{k,c\text{-swl}}, we need to apply Algorithm.1 kk-11 times sequentially.

A.6 Algorithm of connecting supernodes to their connected components

In Sec.4.3 we proved that for a supernode with multi-component induced subgraph (call it multi-component supernode), the color initialization can be greatly sped up given knowing its each component’s corresponding supernode. Hence we need to build the bipartite graph (call it components graph) between single-component supernodes and multi-component supernodes. We present the algorithm of constructing these kind of connections sequentially. That is, given knowing the connections between single-component supernodes and all multi-component supernodes with t\leq t number of nodes, to build the connections between single-component supernodes and all multi-component supernodes with t\leq t++11 number of nodes. To build the full bipartite graph for (k,c)()(k,c)(\leq) sets, we need to conduct the algorithm kk-11 times sequentially.

Algorithm 2 Constructing the bipartite graph between all single-component supernodes and all multi-component supernodes with t\leq t++11 number of nodes)
{Bi}i=1t\{B_{i}\}_{i=1}^{t}, {Oi,c}i=1t+1\{O_{i,\leq c}\}_{i=1}^{t+1}, {Ni,c}i=1t+1\{N_{i,\leq c}\}_{i=1}^{t+1} from Algorithm.1 (with blue lines applied), dictionary DtD_{\leq t} with each key being a multi-component supernode with t\leq t nodes, and value being a list of its corresponding single-component supernodes.
Dt+1D_{\leq t+1}
Dt+1DtD_{\leq t+1}\leftarrow D_{\leq t}
for all 𝒒^\hat{\boldsymbol{q}} in Ot+1,cO_{t+1,\leq c} do
     Get all edges T={(𝒔^i,𝒒^),mi}T=\{(\hat{\boldsymbol{s}}_{i},\hat{\boldsymbol{q}}),m_{i}\} in BtB_{t} with 𝒒^\hat{\boldsymbol{q}} inside, and let jargmaxiNt,c[𝒔^i]j\leftarrow\arg\max_{i}N_{t,\leq c}[\hat{\boldsymbol{s}}_{i}]
     if  Nt,c[𝒔^j]==Nt+1,c[𝒒^]N_{t,\leq c}[\hat{\boldsymbol{s}}_{j}]==N_{t+1,\leq c}[\hat{\boldsymbol{q}}] then \triangleright No increasing of components
         Dt+1[𝒒^][]D_{\leq t+1}[\hat{\boldsymbol{q}}]\leftarrow[]
         for all 𝒑^\hat{\boldsymbol{p}} in Dt[𝒔^j]D_{\leq t}[\hat{\boldsymbol{s}}_{j}] do
              Assume k=|𝒑^|k=|\hat{\boldsymbol{p}}|, i.e. the number of nodes inside.
              Traverse BkB_{k} to find the supernode (𝒑^,mj)(\hat{\boldsymbol{p}},m_{j})
              if Found it and Nk+1,c[(𝒑^,mj)]==1N_{k+1,\leq c}[(\hat{\boldsymbol{p}},m_{j})]==1 then
                  Append (𝒑^,mj)(\hat{\boldsymbol{p}},m_{j}) to Dt+1[𝒒^]D_{\leq t+1}[\hat{\boldsymbol{q}}]
              else
                  Append 𝒑^\hat{\boldsymbol{p}} to Dt+1[𝒒^]D_{\leq t+1}[\hat{\boldsymbol{q}}]
              end if
         end for
     else\triangleright Nt,c[𝒔^j]=Nt+1,c[𝒒^]1N_{t,\leq c}[\hat{\boldsymbol{s}}_{j}]=N_{t+1,\leq c}[\hat{\boldsymbol{q}}]-1
         Dt+1[𝒒^]Dt[𝒔^j]D_{\leq t+1}[\hat{\boldsymbol{q}}]\leftarrow D_{\leq t}[\hat{\boldsymbol{s}}_{j}] + [mj][m_{j}]
     end if
end for
return Dt+1D_{\leq t+1}

For the clear of presentation we use for-loops inside Algorithm.2, while in practice we use matrix operations to eliminate for-loops.

A.7 Dataset Details

Table 6: Dataset statistics.
Dataset Task # Cls./Tasks # Graphs Avg. # Nodes Avg. # Edges
EXP [1] Distinguish 1-WL failed graphs 2 1200 44.4 110.2
SR25 [6] Distinguish 3-WL failed graphs 15 15 25 300
CountingSub. [15] Regress num. of substructures 4 1500 / 1000 / 2500 18.8 62.6
GraphProp. [17] Regress global graph properties 3 5120 / 640 / 1280 19.5 101.1
ZINC-12K [19] Regress molecular property 1 10000 / 1000 / 1000 23.1 49.8
QM9 [59] Regress molecular properties 19222We use the version of the dataset from PyG [20], and it contains 19 tasks. 130831 18.0 37.3

A.8 Experimental Setup

Due to limited time and resource, we highly restrict the hyperparameters and fix most of hyperparameters the same across all models and baselines to ensure a fair comparison. This means the performance of (k,c)()(k,c)(\leq)-SetGNN reported in the paper is not the best performance given that we didn’t tune much hyperparameters. Nevertheless the performance still reflects the theory and designs proposed in the paper, and we postpone studying the SOTA performance of (k,c)()(k,c)(\leq)-SetGNN to future work. To be clear, we fix batch size to 128, the hidden size to 128, the number of layers of Base GNN to 4, and the number of layers of (k,c)()(k,c)(\leq)-SetGNN (the number of iterations of (k,c)()(k,c)(\leq)-SetWL) to be 2 (we will do ablation study over it later). This hyperparameters configuration is used for all datasets. We have run the baseline GINE over many datasets, and we tune the number of layers from [4,6] and keep other hyperparameters the same as (k,c)()(k,c)(\leq)-SetGNN. For all other baselines, we took the performance reported in [68].

For all datasets except QM9, we follow the same configuration used in [68]. For QM9, we use the dataset from PyG and conduct regression over all 19 targets simultaneously. To balance the scale of each target, we preprocess the dataset by standardizing every target to a Gaussian distribution with mean 0 and standard derivation 1. The dataset is randomly split with ratio 80%/10%/10% to train/validation/test sets (with a fixed random state so that all runs and models use the same split). For every graph in QM9, it contains 3d positional coordinates for all nodes, and we use them to augment edge features by using the absolute difference between the coordinates of two nodes on an edge for all models. Notice that our goal is not to achieve SOTA performance on QM9 but mainly to verify our theory and effectiveness of designs.

We use Batch Normalization and ReLU activation in all models. We use Adam optimizer with learning rate 0.001 in all experiments for optimization. We repeat all experiments three times (for random initialization) to calculate mean and standard derivation. All experiments are conducted on V100 and RTX-A6000 GPUs.

A.9 Graph Property Dataset Results

Table 7: Train and Test performances of (k,c)()(k,c)(\leq)-SetGNN on regressing graph properties by varying kk and cc.
Regressing Graph Properties (log10\log_{10}(MSE))
Is Connected Diameter Radius
k c Train Test Train Test Train Test
2 1 -4.2266 ± 0.1222 -2.9577 ± 0.1295 -4.0347 ± 0.0468 -3.6322 ± 0.0458 -4.4690 ± 0.0348 -4.9436 ± 0.0277
3 1 -4.2360 ± 0.1854 -3.4631 ± 0.6392 -4.0228 ± 0.1256 -3.7885 ± 0.0589 -4.4762 ± 0.1176 -5.0245 ± 0.0881
4 1 -4.7776 ± 0.0386 -4.9941 ± 0.0913 -4.1396 ± 0.0442 -4.0122 ± 0.0071 -4.2837 ± 0.5880 -4.1528 ± 0.9383
2 2 -4.6623 ± 0.3170 -4.7848 ± 0.3150 -4.0802 ± 0.1654 -3.8962 ± 0.0124 -4.5362 ± 0.2012 -5.1603 ± 0.0610
3 2 -4.2601 ± 0.3192 -4.4547 ± 1.1715 -4.3235 ± 0.3050 -3.9905 ± 0.0799 -4.6766 ± 0.1797 -4.9836 ± 0.0658
4 2 -4.8489 ± 0.1354 -5.4667 ± 0.2125 -4.5033 ± 0.1610 -3.9495 ± 0.3202 -4.4130 ± 0.2686 -4.1432 ± 0.4405

A.10 Ablation Study over Number of Layers

Table 8: Ablation study of (k,c)()(k,c)(\leq)-SetGNN over number of bidirectional message passing layers (L) on ZINC dataset.
L 2 4 6
k c Train Test Train Test Train Test
2 1 0.1381 ± 0.0240 0.2345 ± 0.0131 0.1135 ± 0.0418 0.1921 ± 0.0133 0.0712 ± 0.0015 0.1729 ± 0.0134
3 1 0.1172 ± 0.0063 0.2252 ± 0.0030 0.0792 ± 0.0190 0.1657 ± 0.0035 0.0692 ± 0.0118 0.1679 ± 0.0061
4 1 0.0693 ± 0.0111 0.1636 ± 0.0052 0.0700 ± 0.0085 0.1566 ± 0.0101 0.0768 ± 0.0116 0.1572 ± 0.0051

The table shows that increasing the number of bidirectional message passing (tt in Eq.(13) and Eq.(14) ) always increase the performance, which is aligning with the fact that increasing number of layers always increases expressivity.

A.11 Computational Footprint on QM9

Refer to caption
(a) Memory usage
Refer to caption
(b) Runtime
Figure 5: (k,c)()(k,c)(\leq)-SetGNN’s computational footprint scales with both kk and cc in terms of memory (a) and runtime (b). Solid blue, orange and green lines track scaling as kk increases, when running (k,c)()(k,c)(\leq)-SetGNN on the QM9 dataset with c=1c=1, 2 and 3 respectively.

A.12 Discussion of k-Bipartite Message Passing

Refer to caption
Figure 6: k-bipartite message passing recovers many well-known GNNs: message passing based GNNs [25], edge-enhanced GNNs [7], and line-graph GNNs [14, 16]. The line graph is marked with red dash frame.

Interestingly, the k-bipartite bidirectional message passing is very general and its modification covers many well known GNNs. Considering only using 1-sets and 2-sets with single connected components (the red frame inside Figure 6), and initializing their embeddings with their original nodes and edges representations, then it covers the follows

  • Message passing based GNNs [25]. By using sequential message passing defined in Eq.(13) and Eq.(14), and performing forward step first then backward step for all 1-sets.

  • Line graph based GNNs [14, 16]. By using sequential message passing defined in Eq.(13) and Eq.(14), and performing backward step first then forward step for all 2-sets.

  • Relational Graph Networks [7] or edge-enhanced GNN [16]. By using performing bidirectional sequential message passing on all 1-sets and 2-sets.