This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the Computational Capability of Graph Neural Networks: A Circuit Complexity Bound Perspective

Xiaoyu Li [email protected]. Independent Researcher.    Yingyu Liang [email protected]. The University of Hong Kong. [email protected]. University of Wisconsin-Madison.    Zhenmei Shi [email protected]. University of Wisconsin-Madison.    Zhao Song [email protected]. The Simons Institute for the Theory of Computing at UC Berkeley.    Wei Wang [email protected]. UC Berkeley.    Jiahao Zhang [email protected] . Independent Researcher.

Graph Neural Networks (GNNs) have become the standard approach for learning and reasoning over relational data, leveraging the message-passing mechanism that iteratively propagates node embeddings through graph structures. While GNNs have achieved significant empirical success, their theoretical limitations remain an active area of research. Existing studies primarily focus on characterizing GNN expressiveness through Weisfeiler-Lehman (WL) graph isomorphism tests. In this paper, we take a fundamentally different approach by exploring the computational limitations of GNNs through the lens of circuit complexity. Specifically, we analyze the circuit complexity of common GNN architectures and prove that under constraints of constant-depth layers, linear or sublinear embedding sizes, and polynomial precision, GNNs cannot solve key problems such as graph connectivity and graph isomorphism unless 𝖳𝖢0=𝖭𝖢1\mathsf{TC}^{0}=\mathsf{NC}^{1}. These results reveal the intrinsic expressivity limitations of GNNs behind their empirical success and introduce a novel framework for analyzing GNN expressiveness that can be extended to a broader range of GNN models and graph decision problems.

1 Introduction

Graphs are ubiquitous representations for relational data, describing interconnected elements with interactions in domains such as molecules [21, 51], social networks [17, 41], and user-item interactions in recommendation systems [46, 23]. Graph Neural Networks (GNNs) [25, 24, 44] have emerged as a dominant tool for learning expressive representations from graphs, enabling tasks like node property prediction [49, 3], link prediction [54, 58], and graph classification [55, 52]. The key to GNNs’ success lies in the message-passing mechanism [21], which aggregates information from local neighborhoods through a graph convolution matrix, followed by nonlinear transformations to update node representations.

Despite their empirical success, concerns regarding the computational limitations of GNNs are emerging [19, 50]. A fundamental research question arises from the concern:

What computational capabilities do GNNs and their variants possess, and what classes of problems can they provably solve?

Addressing these questions is crucial for understanding GNNs from a principled and theoretically robust perspective, identifying their limitations, and building trust in their deployment for real-world applications.

Previous work has made significant strides in addressing these questions. A notable line of research connects GNN expressiveness to the Weisfeiler-Leman (WL) graph isomorphism test [52, 31, 32, 57], which iteratively refines color labels on kk-tuples of nodes (known as kk-WL) and distinguishes graphs by the resulting color histograms. For example, the Graph Isomorphism Network (GIN) [52] equipped with summation aggregator and injective readout function is as expressive as 11-WL, while kk-GNNs [31] achieve expressiveness equivalent to kk-WL by encoding kk-node subgraphs as hypernodes in message passing. However, these results focus on graph isomorphism tasks and do not address a broader range of graph query problems. Moreover, the WL framework only provides a specific level of expressiveness without establishing the theoretical upper bounds, and it often ignores the interplay between node features and graph topology, which is central to GNNs.

In this paper, we take a fundamentally different approach to analyzing the computational limitations of GNNs by examining the circuit complexity bounds. Circuit complexity [22, 38, 45] is a foundational topic in theoretical computer science, characterizing computational models based on the types of Boolean gates they use and the depth and size of their circuits. Importantly, circuit complexity bounds provably reflect the set of problems solvable within a given model. For instance, models (e.g., Transformers) bounded by the 𝖳𝖢0\mathsf{TC}^{0} complexity class can only solve 𝖳𝖢0\mathsf{TC}^{0} problems (e.g., Dyck language recognition) but cannot solve 𝖭𝖢1\mathsf{NC}^{1}-complete problems like arithmetic formula evaluation unless 𝖳𝖢0=𝖭𝖢1\mathsf{TC}^{0}=\mathsf{NC}^{1} [8, 33, 34]. Analogously, if we demonstrate that GNN computations lie within a particular circuit complexity class, we can formally identify problems that GNNs can solve and cannot solve.

Our approach evaluates the circuit complexity of GNN components, from basic activation functions to the entire graph convolution process. We show that GNNs with a constant number of layers, poly(n)\operatorname{poly}(n) precision, and embedding sizes d=O(n)d=O(n) can be approximated by uniform 𝖳𝖢0\mathsf{TC}^{0} circuits. Consequently, unless 𝖳𝖢0=𝖭𝖢1\mathsf{TC}^{0}=\mathsf{NC}^{1}, such GNNs cannot solve problems like graph connectivity problems or graph isomorphism problems. These findings illuminate the fundamental expressivity limitations of GNNs despite their empirical success and also establish a novel framework for analyzing their expressiveness, which can be seamlessly generalized to more GNN models and decision problems on graphs.

Our contributions are summarized as follows:

  • We prove that a uniform 𝖳𝖢0\mathsf{TC}^{0} circuit family can approximate GNNs with a constant number of layers, poly(n)\operatorname{poly}(n) precision, and d=O(n)d=O(n) embedding size (Theorem 4.14).

  • We establish that unless 𝖳𝖢0=𝖭𝖢1\mathsf{TC}^{0}=\mathsf{NC}^{1}, graph neural networks with a constant number of layers, poly(n)\operatorname{poly}(n) precision and d=O(n)d=O(n) embedding size cannot solve the graph connectivity problems (Theorem 5.11 and Theorem 5.12).

  • We establish that unless 𝖳𝖢0=𝖭𝖢1\mathsf{TC}^{0}=\mathsf{NC}^{1}, graph neural networks with a constant number of layers, poly(n)\operatorname{poly}(n) precision and d=O(n)d=O(n) embedding size cannot solve the graph isomorphism problems (Theorem 5.13).

Roadmap. Section 2 reviews relevant works. Section 3 introduces GNNs and basic concepts from theoretical computer science. Section 4 presents the circuit complexity bounds for GNNs. Section 5 discusses the hardness of specific graph problems. Finally, Section 6 concludes our findings.

2 Related Work

We present related works on the computational limitation of GNNs and existing circuit complexity bounds for neural networks.

2.1 Limitations of Graph Neural Networks

Graph Neural Networks (GNNs) have demonstrated impressive performance on graph learning and mining tasks. However, their inherent limitations in solving decision problems on graphs remain an open question. The predominant framework for analyzing GNN limitations is the Weisfeiler-Lehman (WL) hierarchy [29, 40, 57], a well-established tool for assessing GNNs’ ability to address the graph isomorphism problem—an NP-intermediate problem not solvable in polynomial time [2, 20]. The WL hierarchy leverages the computationally efficient heuristic of color refinement to bound the capability of differentiating non-isomorphic graphs.

The expressiveness of message-passing GNNs is bounded by the 11-WL test [52]. Standard GNN models such as GCN [25], GAT [44], and GIN [52] are either equivalent to or strictly limited by the expressiveness of 11-WL. To go beyond 11-WL, high-order GNNs extend message-passing mechanisms to kk-node subgraphs [31, 32, 30], mimicking the kk-WL or kk-FWL (Folklore WL) tests. Models like kk-GNN and kk-FGNN match the expressiveness of these higher-order tests, offering stronger guarantees than standard message-passing GNNs. However, the parameter kk cannot be scaled to sufficiently large values due to inherent computational and memory constraints.

Another promising line of research involves subgraph GNNs, which aim to address the inherent symmetry of graphs that cannot be distinguished by WL tests. These models transform the original graph into a set of slightly perturbed subgraphs, which are then processed by GNNs [12, 36, 37]. Recent work has shown that for an nn-node graph, subgraph GNNs operating on subgraphs with kk nodes are strictly bounded by (k+1)(k+1)-FWL [37]. Besides, distance-aware GNNs inject distance information—overlooked by both message-passing GNNs and 11-WL—into their architectures. For instance, kk-hop MPNNs aggregate information from kk-hop neighbors in each layer and have been shown to be strictly bounded by 22-FWL [16]. Additionally, the subgraph WL hierarchy demonstrates that distance encoding can be represented by local 22-WL tests [56].

Despite the widespread use of the WL framework in analyzing GNNs’ computational limitations, it often overlooks the role of node features [7] and focuses exclusively on the graph isomorphism problem, making it insufficiently comprehensive and unsuitable for generalizing to other graph decision problems. A detailed comparison of our work with WL-based GNN expressiveness is provided in Section 5.4.

2.2 Circuit Complexity and Neural Networks

Circuit complexity, a foundational area in theoretical computer science, studies the computational power of Boolean circuit families. Various circuit complexity classes play a significant role in analyzing machine learning models. For instance, the class 𝖠𝖢0\mathsf{AC}^{0} represents problems solvable in parallel using standard Boolean gates, while 𝖳𝖢0\mathsf{TC}^{0} extends this by incorporating threshold or modulo gates. The stronger class 𝖭𝖢1\mathsf{NC}^{1} corresponds to problems solvable by circuits with O(logn)O(\log n) depth and bounded fan-in [35]. A key result relevant to machine learning is the complexity inclusion 𝖠𝖢0𝖳𝖢0𝖭𝖢1\mathsf{AC}^{0}\subset\mathsf{TC}^{0}\subseteq\mathsf{NC}^{1}, though whether 𝖳𝖢0=𝖭𝖢1\mathsf{TC}^{0}=\mathsf{NC}^{1} remains an open question [45, 1].

Circuit complexity bounds have been effectively used to analyze the computational power of various neural network architectures. For example, Transformers, including two canonical variants—Average-Head Attention Transformers (AHATs) and SoftMax-Attention Transformers (SMATs)—have been studied in this context. Specifically, [35] shows that AHATs can be simulated by non-uniform constant-depth threshold circuits in 𝖳𝖢0\mathsf{TC}^{0}, while [26] demonstrates that SMATs can also be simulated in a LL-uniform manner within 𝖳𝖢0\mathsf{TC}^{0}. A follow-up study [34] unifies these results, concluding that both AHATs and SMATs are approximable by 𝖣𝖫𝖮𝖦𝖳𝖨𝖬𝖤\mathsf{DLOGTIME}-uniform 𝖳𝖢0\mathsf{TC}^{0} circuits. Beyond standard Transformers, RoPE-based Transformers [39], a widely adopted variant in large language models, have also been analyzed using circuit complexity frameworks [28, 9]. Similarly, the emerging Mamba architecture [18] falls within the 𝖣𝖫𝖮𝖦𝖳𝖨𝖬𝖤\mathsf{DLOGTIME}-uniform 𝖳𝖢0\mathsf{TC}^{0} family [10]. Additionally, Hopfield networks, initially introduced as associative memory systems, have also been shown to exhibit 𝖳𝖢0\mathsf{TC}^{0} circuit complexity bounds [27].

Despite the success of circuit complexity in analyzing other neural networks, its application to GNNs is underexplored. While some prior works have attempted to characterize the computational power of GNNs within circuit computation models [5, 14], these efforts are orthogonal to our contributions. A detailed discussion is provided in Section 5.4.

3 Preliminary

This section provides fundamental definitions for this paper. We first introduce some notations. In Section 3.1, we present an in-depth overview of the computation of floating point numbers. In Section 3.2, we review several basic definitions of the Graph Neural Networks (GNNs). In Section 3.3, we present some basic concepts of circuit families and their complexity.

Notations.

For any positive integer nn, we let [n]:={0,1,2,,n}[n]:=\{0,1,2,\dots,n\} denote the set of first nn natural numbers. For a vector xx, diag(x)\mathrm{diag}(x) denotes the diagonal matrix with entries from xx. The concatenation operator |||| represents combining two vectors or matrices along their last dimension. For example, given An×d1A\in\mathbb{R}^{n\times d_{1}} and Bn×d2B\in\mathbb{R}^{n\times d_{2}}, the result A||BA||B has shape n×(d1+d2)\mathbb{R}^{n\times(d_{1}+d_{2})}. We use 𝟏n\boldsymbol{1}_{n} to represent the nn-dimensional all-ones vector. Let 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) be an undirected graph with vertex set 𝒱={v1,,vn}\mathcal{V}=\{v_{1},\dots,v_{n}\} and edge set ={e1,,em}\mathcal{E}=\{e_{1},\dots,e_{m}\}, where nn and mm are the numbers of vertices and edges, respectively. The adjacency matrix A{0,1}n×nA\in\{0,1\}^{n\times n} encodes \mathcal{E}, with Ai,j=1A_{i,j}=1 if (i,j)(i,j)\in\mathcal{E}, and Ai,j=0A_{i,j}=0 otherwise. The Kleene closure of a set VV, denoted VV^{*}, is the set of all finite sequences whose elements belong to VV. Specifically, xVx\in V^{*} implies xx is a sequence of arbitrary length with elements from VV. For convenience, we denote the floating point number as 𝖥𝖯𝖭\mathsf{FPN}.

3.1 Floating Point Numbers

In this subsection, we introduce fundamental definitions of floating-point numbers and their operations. These concepts establish a critical computational framework for implementing GNNs on real-world machines.

Definition 3.1 (Floating Point Numbers (𝖥𝖯𝖭\mathsf{FPN}s), Definition 9 in [8]).

A pp-bit floating-point number (𝖥𝖯𝖭)(\mathsf{FPN}) is represented as a 22-tuple of binary integers s,e\langle s,e\rangle, in which the significand |s|{0}[2p1,2p)|s|\in\{0\}\cup[2^{p-1},2^{p}) and the exponent e[2p,2p1]e\in[-2^{p},2^{p}-1]. The value of the 𝖥𝖯𝖭\mathsf{FPN} is given by s2es\cdot 2^{e}. Specifically, when e=2pe=2^{p}, the floating-point number represents positive or negative infinity, depending on the sign of the significand mm. We denote the set of all the pp-bit 𝖥𝖯𝖭\mathsf{FPN}s as 𝔽p\mathbb{F}_{p}.

Definition 3.2 (Rounding, Definition 9 in [8]).

Let the real number rr\in\mathbb{R} be of infinite precision. The closest-pp bit precision 𝖥𝖯𝖭\mathsf{FPN} for rr is denoted by roundp(r)𝔽p\mathrm{round}_{p}(r)\in\mathbb{F}_{p}. If two such numbers exist, we denote roundp(r)\mathrm{round}_{p}(r) as the 𝖥𝖯𝖭\mathsf{FPN} with even significand.

Building upon the fundamental concepts introduced above, we present the critical floating-point operations used to compute the outputs of graph neural networks.

Definition 3.3 (𝖥𝖯𝖭\mathsf{FPN} operations, page 5 on [8]).

Let x,yx,y be two integers. We first denote the integer division operation //\mathbin{/\!/} as:

x//y\displaystyle x\mathbin{/\!/}y :={x/yif x/y is a multiple of 1/4x/y+1/8otherwise.\displaystyle:=\begin{cases}x/y&\text{if $x/y$ is a multiple of $1/4$}\\ x/y+1/8&\text{otherwise.}\end{cases}

Given two pp-bits 𝖥𝖯𝖭\mathsf{FPN}s s1,e1,s2,e2𝔽p\left\langle s_{1},e_{1}\right\rangle,\left\langle s_{2},e_{2}\right\rangle\in\mathbb{F}_{p}, we define the following basic operations:

s1,e1+s2,e2\displaystyle\left\langle s_{1},e_{1}\right\rangle+\left\langle s_{2},e_{2}\right\rangle :={roundp(s1+s2//2e1e2,e1)if e1e2roundp(s1//2e2e1+s2,e2)if e1e2\displaystyle:=\begin{cases}\mathrm{round}_{p}({\left\langle s_{1}+s_{2}\mathbin{/\!/}2^{e_{1}-e_{2}},e_{1}\right\rangle})&\text{if $e_{1}\geq e_{2}$}\\ \mathrm{round}_{p}({\left\langle s_{1}\mathbin{/\!/}2^{e_{2}-e_{1}}+s_{2},e_{2}\right\rangle})&\text{if $e_{1}\leq e_{2}$}\end{cases}
s1,e1×s2,e2\displaystyle\left\langle s_{1},e_{1}\right\rangle\times\left\langle s_{2},e_{2}\right\rangle :=roundp(s1s2,e1+e2)\displaystyle:=\mathrm{round}_{p}(\left\langle s_{1}s_{2},e_{1}+e_{2}\right\rangle)
s1,e1÷s2,e2\displaystyle\left\langle s_{1},e_{1}\right\rangle\div\left\langle s_{2},e_{2}\right\rangle :=roundp(s12p1//s2,e1e2p+1)\displaystyle:=\mathrm{round}_{p}({\left\langle s_{1}\cdot 2^{p-1}\mathbin{/\!/}s_{2},e_{1}-e_{2}-p+1\right\rangle})
s1,e1s2,e2\displaystyle\left\langle s_{1},e_{1}\right\rangle\leq\left\langle s_{2},e_{2}\right\rangle {s1s2//2e1e2if e1e2s1//2e2e1s2if e1e2.\displaystyle\Leftrightarrow\begin{cases}s_{1}\leq s_{2}\mathbin{/\!/}2^{e_{1}-e_{2}}&\text{if $e_{1}\geq e_{2}$}\\ s_{1}\mathbin{/\!/}2^{e_{2}-e_{1}}\leq s_{2}&\text{if $e_{1}\leq e_{2}$.}\end{cases}

The basic operations described above can be efficiently computed in parallel using simple hardware implementations in 𝖳𝖢0\mathsf{TC}^{0} circuits, as formalized in the following lemma:

Lemma 3.4 (Computing 𝖥𝖯𝖭\mathsf{FPN} operations with 𝖳𝖢0\mathsf{TC}^{0} circuits, Lemma 10 and Lemma 11 in [8]).

We denote the number of digits as a positive integer pp. If ppoly(n)p\leq\operatorname{poly}(n), then:

  • Basic Operations: The “++”, “×\times”, “÷\div”, and comparison (\leq) of two pp-bit 𝖥𝖯𝖭\mathsf{FPN}s, as defined in Definition 3.1, can be computed with O(1)O(1)-depth uniform threshold circuits with poly(n)\operatorname{poly}(n) size. Let the maximum circuit depth required for these basic operations be dstdd_{\mathrm{std}}.

  • Iterated Operations: The product of nn pp-bit 𝖥𝖯𝖭\mathsf{FPN}s and the sum of nn pp-bit 𝖥𝖯𝖭\mathsf{FPN}s (with rounding applied after summation) can be both computed with O(1)O(1)-depth uniform threshold circuits with poly(n)\operatorname{poly}(n) size. Let the maximum circuit depth required for multiplication be dd_{\otimes} and for addition be dd_{\oplus}.

Beyond these basic floating-point operations, certain specialized floating-point operations are also known to be computable within 𝖳𝖢0\mathsf{TC}^{0} circuits, as demonstrated in the following lemmas:

Lemma 3.5 (Computing exp\exp with 𝖳𝖢0\mathsf{TC}^{0} circuits, Lemma 12 in [8]).

Let x𝔽px\in\mathbb{F}_{p} be a pp-bit 𝖥𝖯𝖭\mathsf{FPN}. If ppoly(n)p\leq\operatorname{poly}(n), there exists an O(1)O(1)-depth uniform threshold circuit of size poly(n)\operatorname{poly}(n) that can compute exp(x)\exp(x) with relative error less than 2p2^{-p}. Let the maximum circuit depth required for approximating exp\exp be dexpd_{\mathrm{exp}}.

Lemma 3.6 (Computing square root with 𝖳𝖢0\mathsf{TC}^{0} circuits, Lemma 12 in [8]).

Let x𝔽px\in\mathbb{F}_{p} be a pp-bit 𝖥𝖯𝖭\mathsf{FPN}. If ppoly(n)p\leq\operatorname{poly}(n), there exists an O(1)O(1)-depth uniform threshold circuit of poly(n)\operatorname{poly}(n) size that can compute x\sqrt{x} with relative error less than 2p2^{-p}. Let the maximum circuit depth required for approximating square roots be dsqrtd_{\mathrm{sqrt}}.

Lemma 3.7 (Computing matrix multiplication with 𝖳𝖢0\mathsf{TC}^{0} circuits, Lemma 4.2 in [9]).

Let two matrix operands be A𝔽pn1×n2,B𝔽pn2×n3A\in\mathbb{F}_{p}^{n_{1}\times n_{2}},B\in\mathbb{F}_{p}^{n_{2}\times n_{3}}. If ppoly(n)p\leq\operatorname{poly}(n) and n1,n2,n3nn_{1},n_{2},n_{3}\leq n, then there exists an O(1)O(1)-depth uniform threshold circuit of poly(n)\operatorname{poly}(n) size and depth (dstd+d)(d_{\mathrm{std}}+d_{\oplus}) that can compute the matrix product ABAB.

3.2 Graph Neural Networks

With the foundational framework of 𝖥𝖯𝖭\mathsf{FPN} operations, we now formalize the components of Graph Neural Networks (GNNs) in floating-point representations. This subsection commences by introducing activation functions and the softmax operation, which are fundamental building tools for GNN layers.

Definition 3.8 (ReLU).

For an embedding matrix X𝔽pn×dX\in\mathbb{F}_{p}^{n\times d}, the output of the 𝖱𝖾𝖫𝖴\mathsf{ReLU} activation function is a matrix Y𝔽pn×dY\in\mathbb{F}_{p}^{n\times d}, where each element is defined as Yi,j:=max{0,Xi,j}Y_{i,j}:=\max\{0,X_{i,j}\} for all 1in1\leq i\leq n and 1jd1\leq j\leq d.

Definition 3.9 (Leaky ReLU).

For an embedding matrix X𝔽pn×dX\in\mathbb{F}_{p}^{n\times d} and a predefined negative slope s𝔽ps\in\mathbb{F}_{p}, the output of the 𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴\mathsf{LeakyReLU} function is a matrix Y𝔽pn×dY\in\mathbb{F}_{p}^{n\times d}, where each element is defined as Yi,j:=max{0,Xi,j}+smin{0,Xi,j}Y_{i,j}:=\max\{0,X_{i,j}\}+s\cdot\min\{0,X_{i,j}\} for all 1in1\leq i\leq n and 1jd1\leq j\leq d.

Definition 3.10 (Softmax).

Let X𝔽pn×dX\in\mathbb{F}_{p}^{n\times d} be an embedding matrix. The row-wise softmax function is defined element-wise as:

𝖲𝗈𝖿𝗍𝗆𝖺𝗑(X)i,j:=exp(Xi,j)k=1dexp(Xi,k),\mathsf{Softmax}(X)_{i,j}:=\frac{\exp(X_{i,j})}{\sum_{k=1}^{d}\exp(X_{i,k})},

or equivalently in matrix form as:

𝖲𝗈𝖿𝗍𝗆𝖺𝗑(X):=diag(exp(X)𝟏d)1exp(X),\mathsf{Softmax}(X):=\operatorname{diag}(\exp(X)\cdot\boldsymbol{1}_{d})^{-1}\exp(X),

where 𝟏d\boldsymbol{1}_{d} is a dd-dimensional column vector of ones.

These activation functions and softmax operations form the basis of GNN computation. We now introduce the convolution matrices central to the message-passing scheme of GNNs, focusing on three widely used GNN models: GCN [25], GIN [52], and GAT [44].

Definition 3.11 (GCN convolution matrix).

Let A𝔽pn×nA\in\mathbb{F}_{p}^{n\times n} be the adjacency matrix of a graph with nn nodes, and let D𝔽pn×nD\in\mathbb{F}_{p}^{n\times n} be the diagonal degree matrix, where Di,i=j=1nAi,jD_{i,i}=\sum_{j=1}^{n}A_{i,j}. Let I𝔽pn×nI\in\mathbb{F}_{p}^{n\times n} denote the identity matrix. The GCN convolution matrix is defined as:

CGCN:=(D+I)1/2(A+I)(D+I)1/2𝔽pn×n.C_{\mathrm{GCN}}:=(D+I)^{-1/2}(A+I)(D+I)^{-1/2}\in\mathbb{F}_{p}^{n\times n}.
Definition 3.12 (GIN convolution matrix).

Let A𝔽pn×nA\in\mathbb{F}_{p}^{n\times n} be the adjacency matrix, and let ϵ𝔽p\epsilon\in\mathbb{F}_{p} be a constant. The GIN convolution matrix is defined as:

CGIN:=A+(1+ϵ)I𝔽pn×n.C_{\mathrm{GIN}}:=A+(1+\epsilon)I\in\mathbb{F}_{p}^{n\times n}.
Definition 3.13 (GAT convolution matrix).

Let X𝔽pn×dX\in\mathbb{F}_{p}^{n\times d} be the embedding matrix, and let W𝔽pd×dW\in\mathbb{F}_{p}^{d\times d} and a𝔽p2da\in\mathbb{F}_{p}^{2d} denote model weights. The attention weight matrix E𝔽pn×nE\in\mathbb{F}_{p}^{n\times n} is defined as:

Ei,j:={a𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴(WXi||WXj),Ai,j=1,,otherwise.E_{i,j}:=\left\{\begin{aligned} &a^{\top}\mathsf{LeakyReLU}(WX_{i}||WX_{j}),&A_{i,j}=1,\\ &-\infty,&\text{otherwise}.\end{aligned}\right.

The GAT convolution matrix is then given by:

CGAT:=𝖲𝗈𝖿𝗍𝗆𝖺𝗑(E).C_{\mathrm{GAT}}:=\mathsf{Softmax}(E).

Therefore, we unify the three commonly used graph convolution matrices with basic components to define a general GNN layer.

Definition 3.14 (One GNN layer).

Let X𝔽pn×dX\in\mathbb{F}_{p}^{n\times d} be the embedding matrix, W𝔽pd×dW\in\mathbb{F}_{p}^{d\times d} be the model weights, and C𝔽pn×nC\in\mathbb{F}_{p}^{n\times n} an arbitrary convolution matrix (e.g., CGCN,CGIN,CGATC_{\mathrm{GCN}},C_{\mathrm{GIN}},C_{\mathrm{GAT}}). A single GNN layer is defined as:

𝖦𝖭𝖭i(X):=𝖱𝖾𝖫𝖴(CXW).\mathsf{GNN}_{i}(X):=\mathsf{ReLU}(CXW).

By stacking multiple GNN layers, we obtain a multi-layer GNN capable of learning expressive node embeddings. Different prediction tasks, such as node-level, link-level, or graph-level tasks, require graph pooling operations to aggregate information from specific node subsets. We introduce two commonly used 𝖱𝖤𝖠𝖣𝖮𝖴𝖳\mathsf{READOUT} functions [6] for graph pooling:

Definition 3.15 (Graph average readout layer).

Let X𝔽pn×dX\in\mathbb{F}_{p}^{n\times d} be a node embedding matrix. A graph average readout layer 𝖱𝖤𝖠𝖣𝖮𝖴𝖳:𝔽pn×d𝔽pd\mathsf{READOUT}:\mathbb{F}_{p}^{n\times d}\rightarrow\mathbb{F}_{p}^{d} selects a subset B={i1,i2,,i|B|}[n]B=\{i_{1},i_{2},\dots,i_{|B|}\}\subseteq[n] and computes:

𝖱𝖤𝖠𝖣𝖮𝖴𝖳(X):=1|B|(Xi1+Xi2++Xi|B|).\mathsf{READOUT}(X):=\frac{1}{|B|}(X_{i_{1}}+X_{i_{2}}+\dots+X_{i_{|B|}}).
Definition 3.16 (Graph maximum readout layer).

Let X𝔽pn×dX\in\mathbb{F}_{p}^{n\times d} be a node embedding matrix. A graph maximum readout layer 𝖱𝖤𝖠𝖣𝖮𝖴𝖳:𝔽pn×d𝔽pd\mathsf{READOUT}:\mathbb{F}_{p}^{n\times d}\rightarrow\mathbb{F}_{p}^{d} selects a subset B={i1,i2,,i|B|}[n]B=\{i_{1},i_{2},\dots,i_{|B|}\}\subseteq[n] and computes each dimension j[d]j\in[d] as:

𝖱𝖤𝖠𝖣𝖮𝖴𝖳(X)j:=max{Xi1,j,Xi2,j,,Xi|B|,j}.\mathsf{READOUT}(X)_{j}:=\max\{X_{i_{1},j},X_{i_{2},j},\cdots,X_{i_{|B|},j}\}.
Remark 3.17.

Graph readout functions in Definitions 3.15 and 3.16 support decision problems at various levels, including but not limited to node, link, and graph tasks, since one can target specific nodes, edges, or the entire graph by appropriately selecting the subset BB.

Finally, we introduce the MLP prediction head, essential for converting the aggregated embedding into specific predictions:

Definition 3.18 (MLP prediction head).

Let x𝔽pdx\in\mathbb{F}_{p}^{d} be a single output embedding. With model weights W𝔽pd×dW\in\mathbb{F}_{p}^{d\times d}, w𝔽pdw\in\mathbb{F}_{p}^{d}, and bias b𝔽pdb\in\mathbb{F}_{p}^{d}, the MLP prediction head is defined as:

𝖧𝖾𝖺𝖽(x):=w𝖱𝖾𝖫𝖴(Wx+b).\mathsf{Head}(x):=w^{\top}\mathsf{ReLU}(Wx+b).

Finally, we integrate all the previously defined GNN components to present the complete formulation of a multi-layer GNN.

Definition 3.19 (Multi-layer GNN).

Let mm be the GNN layer number. Let X𝔽pn×dX\in\mathbb{F}_{p}^{n\times d} denote the input feature matrix. For each i1,,mi\in{1,\dots,m}, let 𝖦𝖭𝖭i\mathsf{GNN}_{i} represent the ii-th GNN layer in Definition 3.14. Let 𝖱𝖤𝖠𝖣𝖮𝖴𝖳\mathsf{READOUT} be a graph readout function (Definition 3.15 or Definition 3.16), and 𝖧𝖾𝖺𝖽\mathsf{Head} be the MLP prediction head from Definition 3.18. The mm-layer GNN 𝖦𝖭𝖭:𝔽pn×d𝔽p\mathsf{GNN}:\mathbb{F}_{p}^{n\times d}\rightarrow\mathbb{F}_{p} is then defined as:

𝖦𝖭𝖭(X):=𝖧𝖾𝖺𝖽𝖱𝖤𝖠𝖣𝖮𝖴𝖳𝖦𝖭𝖭m𝖦𝖭𝖭m1𝖦𝖭𝖭1(X).\mathsf{GNN}(X):=\mathsf{Head}\circ\mathsf{READOUT}\circ\mathsf{GNN}_{m}\circ\mathsf{GNN}_{m-1}\circ\cdots\circ\mathsf{GNN}_{1}(X).

3.3 Circuit Complexity Classes

In this subsection, we introduce the fundamental concepts of circuit complexity, a key concept of theoretical computer science and computational complexity.

Definition 3.20 (Boolean circuit, Definition 6.1 in [1]).

A Boolean circuit with nn binary inputs and one binary output is a mapping CnC_{n} between {0,1}n\{0,1\}^{n} and {0,1}\{0,1\}, represented as a directed acyclic graph (DAG). The graph consists of:

  • nn input nodes, each with in-degree zero, corresponding to the input variables.

  • One output node, each with out-degree zero, representing the output variable.

  • Intermediate nodes, called gates, perform logical operations (e.g., 𝖭𝖮𝖳\mathsf{NOT}, 𝖮𝖱\mathsf{OR}, 𝖠𝖭𝖣\mathsf{AND}) on the inputs. Each gate has one out-edge, representing the result of the computation. The in-degree of gate nodes is also referred to as their fan-in.

The structure of the graph allows the Boolean circuit to evaluate logical functions based on the input values, producing a corresponding output.

Definition 3.21 (Complexity measures of Boolean circuits).

The size of a Boolean circuit CC is defined as the number of nodes in its computation graph. The depth of CC is the length of the longest path in its computation graph.

To analyze the expressiveness of specific Boolean circuits, we first formally define the concept of languages recognized by a circuit family.

Definition 3.22 (Language recognition with circuit families, Definition 2 in [33]).

A circuit family 𝒞\mathcal{C} denotes a set of Boolean circuits. The circuit family 𝒞\mathcal{C} can recognize a language L{0,1}L\subseteq\{0,1\}^{*} if, for every string s{0,1}s\in\{0,1\}^{*}, there exists a Boolean circuit C|s|𝒞C_{|s|}\in\mathcal{C} with input size |s||s| such that C|s|(s)=1C_{|s|}(s)=1 if and only if sLs\in L.

We now define complexity classes of languages based on the circuit families capable of recognizing them, with a focus on the resources (e.g., depth, size) these circuits require:

Definition 3.23 (𝖭𝖢i\mathsf{NC}^{i}, Definition 6.21 in [1]).

A language LL belongs to the class 𝖭𝖢i\mathsf{NC}^{i} (Nick’s Class) if there is a family of Boolean circuits that can recognize LL, where the circuits have poly(n)\operatorname{poly}(n) size, O((logn)i)O((\log n)^{i}) depth, and 𝖭𝖮𝖳\mathsf{NOT}, 𝖮𝖱\mathsf{OR}, 𝖠𝖭𝖣\mathsf{AND} logical gates with bounded fan-in.

Definition 3.24 (𝖠𝖢i\mathsf{AC}^{i}, Definition 6.22 in [1]).

A language LL belongs to the class 𝖠𝖢i\mathsf{AC}^{i} if there is a family of Boolean circuits that can recognize LL, where the circuits have poly(n)\operatorname{poly}(n) size, O((logn)i)O((\log n)^{i}) depth, and 𝖭𝖮𝖳\mathsf{NOT}, 𝖮𝖱\mathsf{OR}, 𝖠𝖭𝖣\mathsf{AND} logical gates with unbounded fan-in.

Definition 3.25 (𝖳𝖢i\mathsf{TC}^{i}, Definition 4.34 in [45]).

A language LL belongs to the class 𝖳𝖢i\mathsf{TC}^{i} if there is a family of Boolean circuits that can recognize LL, where the circuits have poly(n)\operatorname{poly}(n) size, O((logn)i)O((\log n)^{i}) depth, and unbounded fan-in gates for 𝖭𝖮𝖳\mathsf{NOT}, 𝖮𝖱\mathsf{OR}, 𝖠𝖭𝖣\mathsf{AND}, and 𝖬𝖠𝖩𝖮𝖱𝖨𝖳𝖸\mathsf{MAJORITY} operations, where a 𝖬𝖠𝖩𝖮𝖱𝖨𝖳𝖸\mathsf{MAJORITY} gate outputs one if more than 1/21/2 of its inputs are ones.

Remark 3.26.

The 𝖬𝖠𝖩𝖮𝖱𝖨𝖳𝖸\mathsf{MAJORITY} gates in Definition 3.25 can be substituted with prime-modulus 𝖬𝖮𝖣\mathsf{MOD} gates or 𝖳𝖧𝖱𝖤𝖲𝖧𝖮𝖫𝖣\mathsf{THRESHOLD} gates. Any Boolean circuit utilizing one of these gates is called a threshold circuit.

Next, we define the complexity class 𝖣𝖤𝖳\mathsf{DET}, which plays a critical role in certain hardness results.

Definition 3.27 (𝖣𝖤𝖳\mathsf{DET}, on page 12 of [13]).

A Boolean circuit belongs to the 𝖣𝖤𝖳\mathsf{DET} family if it computes a problem that is 𝖭𝖢1\mathsf{NC}^{1}-reducible to the computation of the determinant det(A)\det(A) of an n×nn\times n matrix AA with nn-bit integers.

Fact 3.28 (Inclusion of circuit complexity classes, page 110 on [1], Corollary 4.35 of [45], page 19 of [13]).

We have 𝖭𝖢i𝖠𝖢i𝖳𝖢i𝖭𝖢i+1\mathsf{NC}^{i}\subseteq\mathsf{AC}^{i}\subseteq\mathsf{TC}^{i}\subseteq\mathsf{NC}^{i+1} for all ii\in\mathbb{N}. Specifically, for i=1i=1, we have 𝖭𝖢1𝖣𝖤𝖳𝖭𝖢2\mathsf{NC}^{1}\subseteq\mathsf{DET}\subseteq\mathsf{NC}^{2}.

Remark 3.29.

As noted on page 116 of [45] and page 110 of [1], it is well-known that 𝖭𝖢i𝖠𝖢i𝖳𝖢i\mathsf{NC}^{i}\subsetneq\mathsf{AC}^{i}\subsetneq\mathsf{TC}^{i} for i=0i=0. However, whether 𝖳𝖢0𝖭𝖢1\mathsf{TC}^{0}\subsetneq\mathsf{NC}^{1} remains an open problem in circuit complexity. Additionally, as discussed on page 18 of [13], it is unlikely that 𝖣𝖤𝖳𝖠𝖢1\mathsf{DET}\subseteq\mathsf{AC}^{1}, despite 𝖣𝖤𝖳\mathsf{DET} being bounded between 𝖭𝖢1\mathsf{NC}^{1} and 𝖭𝖢2\mathsf{NC}^{2}.

We have formulated non-uniform circuit families that allow different structures for different input lengths. While flexible, this lack of uniformity is impractical compared to computational models like Turing machines, where the same device handles all input lengths. To address this, we introduce uniform circuit families, where circuits for all input lengths can be systematically generated by a Turing machine under specific time and space constraints. We begin by introducing 𝖫\mathsf{L}-uniformity.

Definition 3.30 (𝖫\mathsf{L}-uniformity, Definition 6.5 in [1]).

Let 𝒞\mathcal{C} denote a circuit family, and let 𝖢\mathsf{C} denote a language class recognizable by 𝒞\mathcal{C}. A language L{0,1}L\subseteq\{0,1\}^{*} belongs to the 𝖫\mathsf{L}-uniform class of 𝖢\mathsf{C} if there exists an O(logn)O(\log n)-space Turing machine that can produce a circuit Cn𝒞C_{n}\in\mathcal{C} with nn variables for any input 1n1^{n}. The circuit CnC_{n} must recognize LL for inputs of size nn.

Next, we define 𝖣𝖫𝖮𝖦𝖳𝖨𝖬𝖤\mathsf{DLOGTIME}-uniformity, which refines 𝖫\mathsf{L}-uniformity by introducing a more computationally practical time constraint. Throughout this paper, references to uniform circuit families specifically denote their 𝖣𝖫𝖮𝖦𝖳𝖨𝖬𝖤\mathsf{DLOGTIME}-uniform versions.

Definition 3.31 (𝖣𝖫𝖮𝖦𝖳𝖨𝖬𝖤\mathsf{DLOGTIME}-uniformity, Definition 4.28 in [4]).

Let 𝖢\mathsf{C} be an 𝖫\mathsf{L}-uniform language class as defined in Definition 3.30. A language L{0,1}L\subseteq\{0,1\}^{*} belongs to the 𝖣𝖫𝖮𝖦𝖳𝖨𝖬𝖤\mathsf{DLOGTIME}-uniform class of 𝖢\mathsf{C} if there exists a Turing machine that can produce a circuit Cn𝒞C_{n}\in\mathcal{C} with nn variables for any input 1n1^{n} within O(logn)O(\log n) time. The circuit CnC_{n} must recognize the language LL for inputs of size nn.

4 Complexity of Graph Neural Networks

In this section, we establish foundational complexity results for each component of a graph neural network (GNN) and then combine these results to derive a circuit complexity bound for the entire multi-layer GNN. Section 4.1 examines activation functions, which form the basics for graph convolution computations. Section 4.2 explores the computation of graph convolution matrices. Section 4.3 analyzes a single GNN layer, the fundamental building block of a multi-layer GNN. In Section 4.4, we investigate graph readout functions and the MLP prediction head, essential for making predictions with a multi-layer GNN. Finally, in Section 4.5, we integrate all components and analyze the complete multi-layer GNN structure, culminating in Section 4.6, where we present the key result: the circuit complexity bound of graph neural networks.

4.1 Computing Activation Functions

In this subsection, we first establish a useful fact about computing pairwise max\max and min\min functions with 𝖳𝖢0\mathsf{TC}^{0} circuits. We then demonstrate that the 𝖱𝖾𝖫𝖴\mathsf{ReLU} and 𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴\mathsf{LeakyReLU} activation functions on embedding matrices can be efficiently computed by uniform threshold circuits.

Fact 4.1 (Computing pairwise max\max and min\min with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Let a,b𝔽pa,b\in\mathbb{F}_{p} be two 𝖥𝖯𝖭\mathsf{FPN}s. If ppoly(n)p\leq\operatorname{poly}(n), there is an O(1)O(1)-depth uniform threshold circuit of poly(n)\operatorname{poly}(n) size and (dstd+3)(d_{\mathrm{std}}+3) depth that can compute min{a,b}\min\{a,b\} and max{a,b}\max\{a,b\}.

Proof.

For two 𝖥𝖯𝖭\mathsf{FPN}s aa and bb, we first compute the comparison result c=1c=1 f aba\leq b, and c=0c=0 otherwise, leveraging an O(1)O(1)-depth uniform threshold circuit with depth dstdd_{\mathrm{std}}, as stated in Lemma 3.3. Once cc is computed, each bit i[p]i\in[p] of min{a,b}\min\{a,b\} and max{a,b}\max\{a,b\} can be determined as min{a,b}i=(cai)(¬cbi)\min\{a,b\}_{i}=(c\land a_{i})\lor(\lnot c\land b_{i}) and max{a,b}i=(cbi)(¬cai)\max\{a,b\}_{i}=(c\land b_{i})\lor(\lnot c\land a_{i}) respectively, which can be computed with a 33-depth Boolean circuit in parallel. Combining the depths for comparison and logical operations, the overall circuit depth is dstd+3d_{\mathrm{std}}+3. Since ppoly(n)p\leq\operatorname{poly}(n) and each operation is computable with a polynomial-size circuit, we conclude that poly(n)\operatorname{poly}(n) is the size of the entire circuit. This completes the proof. ∎

Lemma 4.2 (Computing 𝖱𝖾𝖫𝖴\mathsf{ReLU} with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Let X𝔽pn×dX\in\mathbb{F}_{p}^{n\times d} be a matrix. If ppoly(n),d=O(n)p\leq\operatorname{poly}(n),d=O(n), there is aa constant-depth uniform threshold circuit of poly(n)\operatorname{poly}(n) size and depth (dstd+3)(d_{\mathrm{std}}+3) that can compute 𝖱𝖾𝖫𝖴(X)\mathsf{ReLU}(X).

Proof.

For each pair of subscript i,j[n]i,j\in[n], the entry 𝖱𝖾𝖫𝖴(X)i,j\mathsf{ReLU}(X)_{i,j} is formulated as 𝖱𝖾𝖫𝖴(X)i,j=max{0,Xi,j}\mathsf{ReLU}(X)_{i,j}=\max\{0,X_{i,j}\} following Definition 3.8. By Fact 4.1, the computation of max\max function can be finished with a uniform threshold circuit with depth (dstd+3(d_{\mathrm{std}}+3) and poly(n)\operatorname{poly}(n) size. If we compute all the O(nd)O(n2)poly(n)O(nd)\leq O(n^{2})\leq\operatorname{poly}(n) entries in parallel, we can also compute 𝖱𝖾𝖫𝖴(X)\mathsf{ReLU}(X) with depth (dstd+3)(d_{\mathrm{std}}+3) and size poly(n)\operatorname{poly}(n). Hence, we finish the proof. ∎

Lemma 4.3 (Computing 𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴\mathsf{LeakyReLU} with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Let X𝔽pn×dX\in\mathbb{F}_{p}^{n\times d} be a matrix and s𝔽ps\in\mathbb{F}_{p} is the negative slope. If ppoly(n),d=O(n)p\leq\operatorname{poly}(n),d=O(n), there is an O(1)O(1)-depth uniform threshold circuit of poly(n)\operatorname{poly}(n) size and depth (3dstd+3)(3d_{\mathrm{std}}+3) that can compute 𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴(X)\mathsf{LeakyReLU}(X).

Proof.

Considering i,j[n]\forall i,j\in[n], the entry 𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴(X)i,j\mathsf{LeakyReLU}(X)_{i,j} is formulated as 𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴(X)i,j=max{0,Xi,j}+smin{0,Xi,j}\mathsf{LeakyReLU}(X)_{i,j}=\max\{0,X_{i,j}\}+s\cdot\min\{0,X_{i,j}\} following Definition 3.8. By Fact 4.1, both max{0,Xi,j}\max\{0,X_{i,j}\} and min{0,Xi,j}\min\{0,X_{i,j}\} function can be computed with a uniform threshold circuit with depth dstd+3d_{\mathrm{std}}+3 and poly(n)\operatorname{poly}(n) size. After that, we can multiply min{0,Xi,j}\min\{0,X_{i,j}\} with slope ss and further add it with max{0,Xi,j}\max\{0,X_{i,j}\}, which can be computed with a 2dstd2d_{\mathrm{std}}-depth polynomial size circuit by applying Lemma 3.4 twice. Combining the above computation steps, we have dtotal=dstd+3+2dstd=3dstd+3.d_{\mathrm{total}}=d_{\mathrm{std}}+3+2d_{\mathrm{std}}=3d_{\mathrm{std}}+3. Since there are O(nd)O(n2)poly(n)O(nd)\leq O(n^{2})\leq poly(n) entries in XX and all the computations are finished with polynomial-size circuits, the size of the entire circuit is also poly(n)\operatorname{poly}(n). Thus, we complete the proof. ∎

4.2 Computing Graph Convolution Matrices

In this subsection, we present results on the computation of representative graph convolution matrices using uniform threshold circuits.

Lemma 4.4 (Computing GCN convolution matrix with 𝖳𝖢0\mathsf{TC}^{0} circuits).

If ppoly(n)p\leq\operatorname{poly}(n), then there is a poly(n)\operatorname{poly}(n) size uniform threshold circuit with depth (3d+3dstd+dsqrt)(3d_{\oplus}+3d_{\mathrm{std}}+d_{\mathrm{sqrt}}), which can compute the graph convolution matrix CGCNC_{\mathrm{GCN}} in Definition 3.11.

Proof.

Considering all the rows i[n]\forall i\in[n] in the adjacency matrix AA, the corresponding element of the degree matrix DD is written as Di,j=j=1nAi,jD_{i,j}=\sum_{j=1}^{n}A_{i,j} by definition. Following Lemma 3.4, each iterated summation is computable with a poly(n)\operatorname{poly}(n) size uniform threshold circuit with dd_{\oplus} depth. After that, we can compute all the elements of (D+I)(D+I) and (A+I)(A+I) in parallel, which can be finished with a uniform threshold circuit having poly(n)\operatorname{poly}(n) size and dstdd_{\mathrm{std}} depth by Lemma 3.3. Then we can compute all the elements in (D+I)1/2(D+I)^{1/2} with a dsqrtd_{\mathrm{sqrt}}-depth circuit following Lemma 3.6. Finally, we compute the matrix multiplication (D+I)1/2(A+I)(D+I)1/2(D+I)^{1/2}(A+I)(D+I)^{1/2} with uniform threshold circuit of poly(n)\operatorname{poly}(n) size and depth 2(dstd+d)2(d_{\mathrm{std}}+d_{\oplus}) by applying Lemma 3.7 twice. Combining all the aforementioned circuits, we have

dtotal=\displaystyle d_{\mathrm{total}}= d+dstd+dsqrt+2(dstd+d)\displaystyle~{}d_{\oplus}+d_{\mathrm{std}}+d_{\mathrm{sqrt}}+2(d_{\mathrm{std}}+d_{\oplus})
=\displaystyle= 3d+3dstd+dsqrt.\displaystyle~{}3d_{\oplus}+3d_{\mathrm{std}}+d_{\mathrm{sqrt}}.

Since each step utilizes a circuit of size poly(n)\operatorname{poly}(n), the size of the entire circuit is also poly(n)\operatorname{poly}(n). Therefore, we can conclude that a poly(n)\operatorname{poly}(n) size and (3d+3dstd+dsqrt)(3d_{\oplus}+3d_{\mathrm{std}}+d_{\mathrm{sqrt}}) depth uniform threshold circuit can compute the GCN convolution matrix CGCNC_{\mathrm{GCN}}. ∎

Lemma 4.5 (Computing GIN convolution matrix with 𝖳𝖢0\mathsf{TC}^{0} circuits).

If ppoly(n)p\leq\operatorname{poly}(n), then there is a poly(n)\operatorname{poly}(n) size uniform threshold circuit with depth 3dstd3d_{\mathrm{std}}, which can compute the graph convolution matrix CGINC_{\mathrm{GIN}} in Definition 3.12.

Proof.

By Lemma 3.4, we can first simply compute the constant (1+ϵ)(1+\epsilon) with a dstdd_{\mathrm{std}}-depth uniform threshold circuit with size poly(n)\operatorname{poly}(n). Then we can consider each element i,j[n]i,j\in[n] of the adjacency matrix AA, computing (CGIN)i,j=Ai,j+(1+ϵ)Ii,j(C_{\mathrm{GIN}})_{i,j}=A_{i,j}+(1+\epsilon)\cdot I_{i,j} in parallel. Applying Lemma 3.4 twice, we can conclude that all the matrix elements can be produced with a poly(n)\operatorname{poly}(n) size uniform threshold circuit with depth 2dstd2d_{\mathrm{std}}. Thus, by combining all the aforementioned circuits, we have dtotal=dstd+2dstd=3dstd.d_{\mathrm{total}}=d_{\mathrm{std}}+2d_{\mathrm{std}}=3d_{\mathrm{std}}. Therefore, we can conclude that a poly(n)\operatorname{poly}(n) size and 3dstd3d_{\mathrm{std}} depth uniform threshold circuit can compute the GIN convolution matrix CGINC_{\mathrm{GIN}}. ∎

Before discussing the 𝖳𝖢0\mathsf{TC}^{0} implementation of the GAT convolution matrix, we introduce a key fact on the softmax mechanism, which is crucial for attention computation in GAT.

Lemma 4.6 (Computing 𝖲𝗈𝖿𝗍𝗆𝖺𝗑\mathsf{Softmax} with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Let X𝔽pn×dX\in\mathbb{F}_{p}^{n\times d} be a matrix. If ppoly(n),d=O(n)p\leq\operatorname{poly}(n),d=O(n), there is a poly(n)\operatorname{poly}(n) size uniform threshold circuit with depth (dexp+3dstd+2d+1)(d_{\mathrm{exp}}+3d_{\mathrm{std}}+2d_{\oplus}+1) that can compute 𝖲𝗈𝖿𝗍𝗆𝖺𝗑(X)\mathsf{Softmax}(X) in Definition 3.10.

Proof.

By Lemma 3.5, all the O(nd)O(n2)poly(n)O(nd)\leq O(n^{2})\leq\operatorname{poly}(n) entries of exp(X)\exp(X) can be produced in parallel through a polynomial-size uniform threshold circuit with depth dexpd_{\mathrm{exp}}. Next, we compute exp(X)𝟏d\exp(X)\cdot\boldsymbol{1}_{d} using a circuit of depth (dstd+d)(d_{\mathrm{std}}+d_{\oplus}) as per Lemma 3.7, and extract the diagonal matrix diag(exp(X)𝟏d)\operatorname{diag}(\exp(X)\cdot\boldsymbol{1}_{d}) using a depth-1 circuit. Subsequently, we compute the inverse of the diagonal matrix with a depth-dstdd_{\mathrm{std}} circuit by inverting all diagonal elements in parallel, as guaranteed by Lemma 3.4. We then multiply diag(exp(X)𝟏d)1\operatorname{diag}(\exp(X)\cdot\boldsymbol{1}_{d})^{-1} and exp(X)\exp(X) using a polynomial-size circuit with depth (dstd+d)(d_{\mathrm{std}}+d_{\oplus}), again following Lemma 3.7.

Combining these circuits, the total depth is:

dtotal=\displaystyle d_{\mathrm{total}}= dexp+(dstd+d)+1+dstd+(dstd+d)\displaystyle~{}d_{\mathrm{exp}}+(d_{\mathrm{std}}+d_{\oplus})+1+d_{\mathrm{std}}+(d_{\mathrm{std}}+d_{\oplus})
=\displaystyle= dexp+3dstd+2d+1.\displaystyle~{}d_{\mathrm{exp}}+3d_{\mathrm{std}}+2d_{\oplus}+1.

Since the number of parallel operations is poly(n)\operatorname{poly}(n), 𝖲𝗈𝖿𝗍𝗆𝖺𝗑(X)\mathsf{Softmax}(X) can be computed with a poly(n)\operatorname{poly}(n) size uniform threshold circuit with depth (dexp+3dstd+2d+1)(d_{\mathrm{exp}}+3d_{\mathrm{std}}+2d_{\oplus}+1). This completes the proof. ∎

Building on the softmax computation fact, we show that the GAT convolution matrix can indeed be computed with 𝖳𝖢0\mathsf{TC}^{0} circuits, as formalized in the following lemma.

Lemma 4.7 (Computing GAT convolution matrix with 𝖳𝖢0\mathsf{TC}^{0} circuits).

If ppoly(n),d=O(n)p\leq\operatorname{poly}(n),d=O(n), then there is a poly(n)\operatorname{poly}(n) size uniform threshold circuit with depth (dexp+9dstd+4d+4)(d_{\mathrm{exp}}+9d_{\mathrm{std}}+4d_{\oplus}+4), which can compute the graph convolution matrix CGATC_{\mathrm{GAT}} in Definition 3.13.

Proof.

Let ω𝔽p\omega\in\mathbb{F}_{p} denote a 𝖥𝖯𝖭\mathsf{FPN} with a negative infinity value, and let the model weight a=(α1,,αp,αp+1,,α2p)𝔽p2da=(\alpha_{1},\dots,\alpha_{p},\alpha_{p+1},\dots,\alpha_{2p})\in\mathbb{F}_{p}^{2d}. We define a1=(α1,,αp)𝔽pda_{1}=(\alpha_{1},\dots,\alpha_{p})\in\mathbb{F}_{p}^{d} and a2=(αp+1,,α2p)𝔽pda_{2}=(\alpha_{p+1},\dots,\alpha_{2p})\in\mathbb{F}_{p}^{d}, such that a=a1a2a=a_{1}\|a_{2}. For all i,j[n]i,j\in[n], each entry Ei,jE_{i,j} of the attention weight matrix is given by:

Ei,j=\displaystyle E_{i,j}= Ai,ja𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴(WXiWXj)+(1Ai,j)ω\displaystyle~{}A_{i,j}\cdot a^{\top}\mathsf{LeakyReLU}(WX_{i}\|WX_{j})+(1-A_{i,j})\cdot\omega
=\displaystyle= Ai,ja1𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴(WXi)+Ai,ja2𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴(WXj)+(1Ai,j)ω,\displaystyle~{}A_{i,j}\cdot a_{1}^{\top}\mathsf{LeakyReLU}(WX_{i})+A_{i,j}\cdot a_{2}^{\top}\mathsf{LeakyReLU}(WX_{j})+(1-A_{i,j})\cdot\omega,

following Definition 3.13. We examine the three terms separately:

Ei,j=Ai,ja1𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴(WXi):=Ei,j1+Ai,ja2𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴(WXj):=Ei,j2+(1Ai,j)ω:=Ei,j3.\displaystyle E_{i,j}=\underbrace{A_{i,j}\cdot a_{1}^{\top}\mathsf{LeakyReLU}(WX_{i})}_{:=E^{1}_{i,j}}+\underbrace{A_{i,j}\cdot a_{2}^{\top}\mathsf{LeakyReLU}(WX_{j})}_{:=E^{2}_{i,j}}+\underbrace{(1-A_{i,j})\cdot\omega}_{:=E^{3}_{i,j}}.

For the first term Ei,j1E^{1}_{i,j}, there is a poly(n)\operatorname{poly}(n) size and (dstd+d)(d_{\mathrm{std}}+d_{\oplus}) uniform threshold circuit to compute WXiWX_{i} following Lemma 3.7. Then we can compute 𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴(WXi)\mathsf{LeakyReLU}(WX_{i}) with a (3dstd+3)(3d_{\mathrm{std}}+3) depth circuit by Lemma 3.9. After that, we can combine Lemma 3.7 and Lemma 3.4 to multiply Ai,jA_{i,j}, a1a^{\top}_{1} and 𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴(WXi)\mathsf{LeakyReLU}(WX_{i}) with a polynomial size and (2dstd+d)(2d_{\mathrm{std}}+d_{\oplus}) depth threshold circuit. Combining all the circuits above, we can get the total depth d1d_{1} for the first Ei,j1E^{1}_{i,j} term as follows:

d1=\displaystyle d_{1}= (dstd+d)+(3dstd+3)+(2dstd+d)\displaystyle~{}(d_{\mathrm{std}}+d_{\oplus})+(3d_{\mathrm{std}}+3)+(2d_{\mathrm{std}}+d_{\oplus})
=\displaystyle= 6dstd+2d+3.\displaystyle~{}6d_{\mathrm{std}}+2d_{\oplus}+3.

For the second term Ei,j2E^{2}_{i,j}, since its computation is equivalent to the first term, we can conclude that the depth of the second term d2d_{2} equals d1d_{1}.

For the third term Ei,j3E^{3}_{i,j}, we can simply apply Lemma 3.4 twice and conclude that (1Ai,j)ω(1-A_{i,j})\cdot\omega can be produced with a uniform threshold circuit, in which the size is poly(n)\operatorname{poly}(n) and the depth is d3=2dstdd_{3}=2d_{\mathrm{std}}.

Integrating the computation of all the three terms Ei,j1,Ei,j2,Ei,j3E^{1}_{i,j},E^{2}_{i,j},E^{3}_{i,j} for Ei,jE_{i,j}, we can compute these three terms in parallel, and then obtain a polynomial-size uniform threshold circuit with depth de=max{d1,d2,d3}=d1=6dstd+2d+3d_{e}=\max\{d_{1},d_{2},d_{3}\}=d_{1}=6d_{\mathrm{std}}+2d_{\oplus}+3. Since we have obtained a uniform threshold circuit to compute each element in Ei,jE_{i,j} and the elements to be computed in parallel is of O(nd)O(n2)poly(n)O(nd)\leq O(n^{2})\leq\operatorname{poly}(n), we can compute all the elements in Ei,jE_{i,j} in parallel without increasing the circuit’s polynomial size and depth. At last, we can compute 𝖲𝗈𝖿𝗍𝗆𝖺𝗑(E)\mathsf{Softmax}(E) to obtain the final convolution matrix, and this can be finished with a uniform threshold circuit with depth (dexp+3dstd+2d+1)(d_{\mathrm{exp}}+3d_{\mathrm{std}}+2d_{\oplus}+1) following Lemma 4.6. Therefore, the final depth of the circuit is the summation of the circuit for computing the matrix EE and applying the 𝖲𝗈𝖿𝗍𝗆𝖺𝗑\mathsf{Softmax} operation, which is:

dtotal=\displaystyle d_{\mathrm{total}}= de+dexp+3dstd+2d+1\displaystyle~{}d_{e}+d_{\mathrm{exp}}+3d_{\mathrm{std}}+2d_{\oplus}+1
=\displaystyle= 6dstd+2d+3+dexp+3dstd+2d+1\displaystyle~{}6d_{\mathrm{std}}+2d_{\oplus}+3+d_{\mathrm{exp}}+3d_{\mathrm{std}}+2d_{\oplus}+1
=\displaystyle= dexp+9dstd+4d+4.\displaystyle~{}d_{\mathrm{exp}}+9d_{\mathrm{std}}+4d_{\oplus}+4.

Therefore, we can conclude that the GAT convolution matrix is computable with a uniform threshold circuit within polynomial size and (dexp+9dstd+4d+4)(d_{\mathrm{exp}}+9d_{\mathrm{std}}+4d_{\oplus}+4) depth, which finishes the proof. ∎

After establishing the computation of all graph convolution matrices, we introduce a simplified notation for their circuit depths. Specifically, we define the depth of the uniform threshold circuit used for graph convolution matrices as dconv=max{3d+3dstd+dsqrt,3dstd,dexp+9dstd+4d+4}d_{\mathrm{conv}}=\max\{3d_{\oplus}+3d_{\mathrm{std}}+d_{\mathrm{sqrt}},3d_{\mathrm{std}},d_{\mathrm{exp}}+9d_{\mathrm{std}}+4d_{\oplus}+4\}, based on the results in Lemma 4.4, Lemma 4.5, and Lemma 4.7.

4.3 Computing Single Graph Neural Network Layer

In this subsection, we combine the results on graph convolution matrices and basic activation functions to determine the circuit depth requirement for a complete GNN layer.

Lemma 4.8 (Computing a single GNN layer with 𝖳𝖢0\mathsf{TC}^{0} circuits).

If ppoly(n),d=O(n)p\leq\operatorname{poly}(n),d=O(n) , then the GNN layer in Definition 3.14 can be computed with a uniform threshold circuit of poly(n)\operatorname{poly}(n) size and depth (dconv+3dstd+2d+3)(d_{\mathrm{conv}}+3d_{\mathrm{std}}+2d_{\oplus}+3).

Proof.

By Lemma 4.4, Lemma 4.5 and Lemma 4.7, we can obtain that the graph convolution matrix can be computed with a dconvd_{\mathrm{conv}} depth polynomial size threshold circuit. By Lemma 3.7, we can further conclude that CXWCXW can be produced by a circuit with polynomial size and 2(dstd+d)2(d_{\mathrm{std}}+d_{\oplus}) depth. Then we can compute 𝖱𝖾𝖫𝖴(CXW)\mathsf{ReLU}(CXW) with a (dstd+3)(d_{\mathrm{std}}+3)-depth and polynomial size circuit following Lemma 4.2. Combining all the circuit depths mentioned above, we have:

dtotal\displaystyle d_{\mathrm{total}} =dconv+2(dstd+d)+(dstd+3)\displaystyle=~{}d_{\mathrm{conv}}+2(d_{\mathrm{std}}+d_{\oplus})+(d_{\mathrm{std}}+3)
=dconv+3dstd+2d+3.\displaystyle=~{}d_{\mathrm{conv}}+3d_{\mathrm{std}}+2d_{\oplus}+3.

Since all the steps use a polynomial step circuit, we can conclude that a single GNN layer can be computed with a uniform threshold circuit having poly(n)\operatorname{poly}(n) size and (dconv+3dstd+2d+3)(d_{\mathrm{conv}}+3d_{\mathrm{std}}+2d_{\oplus}+3) depth. Thus, we finish the proof. ∎

4.4 Computing Pooling Layer and Prediction Head

As shown in Definition 3.15, Definition 3.16, and Definition 3.18, we have described basic components for transforming GNN node embeddings into predictions. Here, we present results on computing these components with uniform threshold circuits, starting with two graph readout layers.

Lemma 4.9 (Computing graph average readout with 𝖳𝖢0\mathsf{TC}^{0} circuits).

If ppoly(n),d=O(n)p\leq\operatorname{poly}(n),d=O(n), then the graph average readout layer in Definition 3.15 can be computed with a poly(n)\operatorname{poly}(n) size uniform threshold circuit with depth (3dstd+2d)(3d_{\mathrm{std}}+2d_{\oplus}).

Proof.

Let β=(β1,β2,,βn)𝔽pn\beta=(\beta_{1},\beta_{2},\dots,\beta_{n})^{\top}\in\mathbb{F}_{p}^{n} be the vectorized form of index set BB, where βik=1\beta_{i_{k}}=1 for all ikBi_{k}\in B and βik=0\beta_{i_{k}}=0 otherwise. Therefore, the output 𝖱𝖤𝖠𝖣𝖮𝖴𝖳(X)\mathsf{READOUT}(X) in Definition 3.15 is equivalent to the following:

𝖱𝖤𝖠𝖣𝖮𝖴𝖳(X)=\displaystyle\mathsf{READOUT}(X)= 1|B|Xd×ndiag(β)n×n𝟏nn×1,\displaystyle~{}\frac{1}{|B|}\underbrace{X^{\top}}_{d\times n}\cdot\underbrace{\operatorname{diag}(\beta)}_{n\times n}\cdot\underbrace{\boldsymbol{1}_{n}}_{n\times 1},

where 𝟏n\boldsymbol{1}_{n} is a nn-dimensional one vector. Thus, we can apply Lemma 3.7 twice and obtain a poly(n)\operatorname{poly}(n) size and 2(dstd+d)2(d_{\mathrm{std}}+d_{\oplus}) depth uniform threshold circuit that can compute matrix product Xdiag(β)𝟏nX^{\top}\cdot\operatorname{diag}(\beta)\cdot\boldsymbol{1}_{n}. Thus, we follow Lemma 3.4 to multiply 1|B|\frac{1}{|B|} with a depth dstdd_{\mathrm{std}} circuit at poly(n)\operatorname{poly}(n) size. Combining the two circuits mentioned above, we have the total depth:

dtotal=2(dstd+d)+dstd=3dstd+2d.d_{\mathrm{total}}=2(d_{\mathrm{std}}+d_{\oplus})+d_{\mathrm{std}}=3d_{\mathrm{std}}+2d_{\oplus}.

Since all the partial circuits are in polynomial size, we can conclude that there is a poly(n)\operatorname{poly}(n) size and (3dstd+2d)(3d_{\mathrm{std}}+2d_{\oplus}) depth uniform threshold circuit which can compute the graph pooling layer. Hence, we finish the proof. ∎

Before delving into the graph maximum readout layer, we establish a useful fact about computing maximum and minimum values in 𝖳𝖢0\mathsf{TC}^{0}.

Fact 4.10 (Computing max\max and min\min of nn values with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Let x1,x2,,xn𝔽px_{1},x_{2},\cdots,x_{n}\in\mathbb{F}_{p} be nn 𝖥𝖯𝖭\mathsf{FPN}s. If precision ppoly(n)p\leq\operatorname{poly}(n), there is an O(1)O(1)-depth uniform threshold circuit of size poly(n)\operatorname{poly}(n) and depth (dstd+3)(d_{\mathrm{std}}+3) that can compute the maximum and minimum of these 𝖥𝖯𝖭\mathsf{FPN}s, i.e., max{x1,,xn}\max\{x_{1},\cdots,x_{n}\} and min{x1,,xn}\min\{x_{1},\cdots,x_{n}\}.

Proof.

For all the 𝖥𝖯𝖭\mathsf{FPN} pairs i,j[n]i,j\in[n], we first compute the comparison results between all pairs of the nn numbers. Let Cmax{0,1}n×nC^{\max}\in\{0,1\}^{n\times n}, where Ci,jmax=1C^{\max}_{i,j}=1 if xixjx_{i}\geq x_{j} and Ci,jmax=0C^{\max}_{i,j}=0 otherwise, and Cmin{0,1}n×nC^{\min}\in\{0,1\}^{n\times n}, where Ci,jmin=1C^{\min}_{i,j}=1 if xixjx_{i}\leq x_{j} and Ci,jmin=0C^{\min}_{i,j}=0 otherwise. Since there are in total O(n2)poly(n)O(n^{2})\leq\operatorname{poly}(n) comparison operations, we conclude that each Ci,jC_{i,j} can be produced in parallel using a poly(n)\operatorname{poly}(n) size uniform threshold circuit with dstdd_{\mathrm{std}} depth, following Lemma 3.4.

Once CmaxC^{\max} and CminC^{\min} are computed, we compute the dominance vector Dmax{0,1}nD^{\max}\in\{0,1\}^{n} in parallel using a 1-depth Boolean circuit, where Dimax=j=1nCi,jmaxD^{\max}_{i}=\bigwedge_{j=1}^{n}C^{\max}_{i,j}. Here, Dimax=1D^{\max}_{i}=1 indicates that xix_{i} is a maximum value. Similarly, we can compute Dmin{0,1}nD^{\min}\in\{0,1\}^{n} in parallel, where Dimin=j=1nCi,jminD^{\min}_{i}=\bigwedge_{j=1}^{n}C^{\min}_{i,j}, to identify the minimum value.

To select the maximum value, we compute each bit k[p]k\in[p] of the output omax=max{x1,x2,,xn}o^{\max}=\max\{x_{1},x_{2},\cdots,x_{n}\} in parallel using a 2-depth Boolean circuit:

okmax=i=1n(Dimax(xi)k),o_{k}^{\max}=\bigvee_{i=1}^{n}(D^{\max}_{i}\land(x_{i})_{k}),

in which (xi)k(x_{i})_{k} is the kk-th bit of xix_{i}. Since all maximum values are equal, if a specific bit kk of all maximum values is 1, the OR operation ensures okmax=1o_{k}^{\max}=1; similarly, if all maximum values have 0 in bit kk, then okmax=0o_{k}^{\max}=0. Thus, this approach guarantees correctness regardless of the number of maximum values. Similarly, each bit k[p]k\in[p] of the minimum output omin=min{x1,x2,,xn}o^{\min}=\min\{x_{1},x_{2},\cdots,x_{n}\} is computed using:

okmin=i=1n(Dimin(xi)k).o_{k}^{\min}=\bigvee_{i=1}^{n}(D^{\min}_{i}\land(x_{i})_{k}).

Combining all the steps above, the total circuit depth is therefore dtotal=dstd+1+2=dstd+3d_{\text{total}}=d_{\mathrm{std}}+1+2=d_{\mathrm{std}}+3. Since ppoly(n)p\leq\operatorname{poly}(n) and all the steps are computed by a circuit of poly(n)\operatorname{poly}(n) size, we conclude that the total circuit size is also poly(n)\operatorname{poly}(n). This completes the proof. ∎

With this fact, the graph maximum readout layer can be computed seamlessly using 𝖳𝖢0\mathsf{TC}^{0} circuits, as stated in the following lemma:

Lemma 4.11 (Computing graph maximum readout with 𝖳𝖢0\mathsf{TC}^{0} circuits).

If ppoly(n),d=O(n)p\leq\operatorname{poly}(n),d=O(n), the graph maximum readout layer in Definition 3.16 can be computed using a poly(n)\operatorname{poly}(n) size uniform threshold circuit with depth (dstd+3)(d_{\mathrm{std}}+3).

Proof.

For each entry i[d]i\in[d] of the output 𝖱𝖤𝖠𝖣𝖮𝖴𝖳(X)\mathsf{READOUT}(X), since it corresponds to the maximum of |B|n|B|\leq n numbers, we can compute them in parallel through a poly(n)\operatorname{poly}(n) size uniform threshold circuit with depth (dstd+3)(d_{\mathrm{std}}+3) by Fact 4.10. This completes the proof. ∎

Following the computation of graph readout functions, we introduce a simplified notation for their circuit depths. Specifically, we define the depth of the uniform threshold circuit used for computing the graph readout layer as dread=max{3dstd+2d,dstd+3}d_{\mathrm{read}}=\max\{3d_{\mathrm{std}}+2d_{\oplus},d_{\mathrm{std}}+3\}, which follows the results in Lemma 4.9 and Lemma 4.11.

Next, we introduce the circuit depth for computing an MLP prediction head.

Lemma 4.12 (Computing MLP head with 𝖳𝖢0\mathsf{TC}^{0} circuits).

If ppoly(n),d=O(n)p\leq\operatorname{poly}(n),d=O(n), then the MLP prediction head in Definition 3.18 can be computed with a poly(n)\operatorname{poly}(n) size uniform threshold circuit with depth (4dstd+2d+3)(4d_{\mathrm{std}}+2d_{\oplus}+3).

Proof.

By Lemma 3.7, matrix product WxWx is computable with a poly(n)\operatorname{poly}(n) size and (dstd+d)(d_{\mathrm{std}}+d_{\oplus}) depth uniform threshold circuit. After that, for each i[d]i\in[d], we can compute each element (Wx+b)i=(Wx)i+bi(Wx+b)_{i}=(Wx)_{i}+b_{i} in parallel with a poly(n)\operatorname{poly}(n) size and dstdd_{\mathrm{std}} size uniform threshold circuit by Lemma 3.4. Then, we can further apply the activation function with a polynomial size and (dstd+3)(d_{\mathrm{std}}+3) uniform threshold circuit by applying Lemma 4.2. Finally, following Lemma 3.7, the inner product between ww and 𝖱𝖾𝖫𝖴(Wx+b)\mathsf{ReLU}(Wx+b) is computable with a polynomial size and dstd+dd_{\mathrm{std}}+d_{\oplus} uniform threshold circuit.

Combining all the aforementioned circuits, we have the total circuit depth:

dtotal=\displaystyle d_{\mathrm{total}}= (dstd+d)+dstd+(dstd+3)+(dstd+d)\displaystyle~{}(d_{\mathrm{std}}+d_{\oplus})+d_{\mathrm{std}}+(d_{\mathrm{std}}+3)+(d_{\mathrm{std}}+d_{\oplus})
=\displaystyle= 4dstd+2d+3.\displaystyle~{}4d_{\mathrm{std}}+2d_{\oplus}+3.

Since the circuits to compute each step are all in polynomial size, we can conclude that the MLP prediction head is computable with a poly(n)\operatorname{poly}(n) size and (4dstd+2d+3)(4d_{\mathrm{std}}+2d_{\oplus}+3) depth uniform threshold circuit. Hence, we finish our proof. ∎

4.5 Computing Multi-Layer Graph Neural Network

Since we have presented the circuit depths for all the components of an entire multi-layer GNN, we now derive the overall circuit depth of the entire GNN in this subsection.

Lemma 4.13 (Computing multi-layer GNN with 𝖳C0\mathsf{T}C^{0} circuits).

If ppoly(n),d=O(n)p\leq\operatorname{poly}(n),d=O(n), then the multi-layer graph neural network in Definition 3.19 can be computed with a poly(n)\operatorname{poly}(n) size uniform threshold circuit with depth (mdconv+(3m+4)dstd+(2m+2)d+dread+3m+3)(md_{\mathrm{conv}}+(3m+4)d_{\mathrm{std}}+(2m+2)d_{\oplus}+d_{\mathrm{read}}+3m+3).

Proof.

For each layer i[m]i\in[m], 𝖦𝖭𝖭i\mathsf{GNN}_{i} can be computed with a poly(n)\operatorname{poly}(n) size and (dconv+3dstd+2d+3)(d_{\mathrm{conv}}+3d_{\mathrm{std}}+2d_{\oplus}+3) depth uniform threshold circuit by Lemma 4.8. Hence, the composition of all the mm layers 𝖦𝖭𝖭m𝖦𝖭𝖭m1𝖦𝖭𝖭1(X)\mathsf{GNN}_{m}\circ\mathsf{GNN}_{m-1}\circ\cdots\circ\mathsf{GNN}_{1}(X) can be computed with a uniform threshold circuit within a total depth of m(dconv+3dstd+2d+3)m(d_{\mathrm{conv}}+3d_{\mathrm{std}}+2d_{\oplus}+3). Then, we can compute the graph readout layer with poly(n)\operatorname{poly}(n) size and dreadd_{\mathrm{read}} depth uniform threshold circuit, following Lemma 4.9 and Lemma 4.11. Finally, we can apply Lemma 4.12 to compute the MLP prediction head with a poly(n)\operatorname{poly}(n) size and (4dstd+2d+3)(4d_{\mathrm{std}}+2d_{\oplus}+3) uniform threshold circuit.

Combining the circuits to compute each layer of the entire model, we have the following total circuit depth:

dtotal=\displaystyle d_{\mathrm{total}}= m(dconv+3dstd+2d+3)+dread+(4dstd+2d+3)\displaystyle~{}m(d_{\mathrm{conv}}+3d_{\mathrm{std}}+2d_{\oplus}+3)+d_{\mathrm{read}}+(4d_{\mathrm{std}}+2d_{\oplus}+3)
=\displaystyle= mdconv+(3m+4)dstd+(2m+2)d+dread+3m+3.\displaystyle~{}md_{\mathrm{conv}}+(3m+4)d_{\mathrm{std}}+(2m+2)d_{\oplus}+d_{\mathrm{read}}+3m+3.

Since the circuits for computing each part of the model are all in poly(n)\operatorname{poly}(n) size, we can conclude that the multi-layer GNN can be computed with a poly(n)\operatorname{poly}(n) size and (mdconv+(3m+4)dstd+(2m+2)d+dread+3m+3)(md_{\mathrm{conv}}+(3m+4)d_{\mathrm{std}}+(2m+2)d_{\oplus}+d_{\mathrm{read}}+3m+3) depth uniform threshold circuit. Hence, we finish the proof. ∎

4.6 Main Result: Graph Neural Networks Circuit Complexity

Finally, this subsection presents the circuit complexity bound of graph neural networks, which is the main result of this paper.

Theorem 4.14 (Main Result, Circuit complexity bound of graph neural networks).

If precision ppoly(n)p\leq\operatorname{poly}(n), embedding size d=O(n)d=O(n) and the number of layers mO(1)m\leq O(1), then the graph neural network 𝖦𝖭𝖭:𝔽pn×d𝔽p\mathsf{GNN}:\mathbb{F}_{p}^{n\times d}\rightarrow\mathbb{F}_{p} in Definition 3.19 can be simulated by the uniform 𝖳𝖢0\mathsf{TC}^{0} circuit family.

Proof.

By Lemma 4.13 and m=O(1)m=O(1), the circuit to compute 𝖦𝖭𝖭(x)\mathsf{GNN}(x) has depth

mdconv+(3m+4)dstd+(2m+2)d+dread+3m+3=O(1)md_{\mathrm{conv}}+(3m+4)d_{\mathrm{std}}+(2m+2)d_{\oplus}+d_{\mathrm{read}}+3m+3=O(1)

and size poly(n)\operatorname{poly}(n). Then, we can conclude that a uniform 𝖳𝖢0\mathsf{TC}^{0} circuit family can simulate 𝖦𝖭𝖭(x)\mathsf{GNN}(x), following Definition 3.25. Hence, we finish the proof. ∎

The main result in Theorem 4.14 establishes that unless 𝖳𝖢0=𝖭𝖢1\mathsf{TC}^{0}=\mathsf{NC}^{1}, graph neural networks with poly(n)\operatorname{poly}(n) precision, constant depth, and O(n)O(n) embedding size belong to the uniform 𝖳𝖢0\mathsf{TC}^{0} circuit class. This highlights an inherent expressiveness limitation of GNNs, despite their empirical success, as they cannot solve problems beyond the capability of 𝖳𝖢0\mathsf{TC}^{0} circuits. In the next section, we illustrate this limitation by analyzing two practical graph query problems.

5 Hardness

We explore two critical decision problems on graphs and their associated hardness results. Section 5.1 introduces two graph connectivity problems. Section 5.2 presents the basic concepts of the graph isomorphism problem. In Section 5.3, we present the main hardness result of this paper, which theoretically demonstrates that graph neural networks cannot solve these problems.

5.1 Graph Connectivity Problem

To formally define the graph connectivity problem, we begin by introducing two basic concepts related to sequences of connected nodes in graphs: walks and paths.

Definition 5.1 (Walk, Definition on page 26 of [48]).

Given a graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}), a walk in 𝒢\mathcal{G} is a finite sequence of nodes, denoted by

v0v1v2vm,v_{0}\rightarrow v_{1}\rightarrow v_{2}\rightarrow\cdots\rightarrow v_{m},

where v0,v1,v2,,vm𝒱v_{0},v_{1},v_{2},\dots,v_{m}\in\mathcal{V} and any two consecutive nodes are connected (i.e., i[m],(vi1,vi)\forall i\in[m],(v_{i-1},v_{i})\in\mathcal{E}).

Definition 5.2 (Path, Definition on page 26 of [48]).

For a walk v0v1v2vmv_{0}\rightarrow v_{1}\rightarrow v_{2}\rightarrow\cdots\rightarrow v_{m}, if all the nodes v0,,vmv_{0},\dots,v_{m} and all the edges (v0,v1),,(vm1,vm)(v_{0},v_{1}),\cdots,(v_{m-1},v_{m}) are distinct, then we call this walk as a path.

Building upon these basic concepts, we now formally define two types of graph connectivity problems: the pairwise s-t connectivity problem, which checks whether two specific nodes are connected, and the global graph connectivity problem, which verifies the connectivity of all nodes.

Definition 5.3 (Undirected graph s-t connectivity problem, Definition on page 2 of [47]).

Let 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) be an arbitrary undirected graph. The undirected s-t graph connectivity problem is: Does there exist a path between two specific nodes s,t𝒱s,t\in\mathcal{V}?

Definition 5.4 (Undirected graph connectivity problem, Definition on page 2 of [47]).

Let 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) be an arbitrary undirected graph. The undirected graph connectivity problem is: Does there exist a path between all the nodes in 𝒱\mathcal{V}?

With these problem formulations, we proceed to their computational complexity results.

Lemma 5.5 (Theorem 2 in [47]).

The undirected graph s-t connectivity problem in Definition 5.3 is 𝖭𝖢1\mathsf{NC}^{1}-complete.

Lemma 5.6 (Theorem 3 in [11]).

The undirected graph connectivity problem in Definition 5.4 is 𝖭𝖢1\mathsf{NC}^{1}-hard.

5.2 Graph Isomorphism Problem

In this subsection, we turn to the graph isomorphism problem, a core challenge in understanding the expressiveness of GNNs [52, 57]. This problem involves determining whether two graphs are structurally identical by examining possible permutations of their nodes.

Definition 5.7 (Graph isomorphism, on page 3 of [43]).

Let 𝒢1=(𝒱1,1)\mathcal{G}_{1}=(\mathcal{V}_{1},\mathcal{E}_{1}) and 𝒢2=(𝒱2,2)\mathcal{G}_{2}=(\mathcal{V}_{2},\mathcal{E}_{2}) be two graphs. An isomorphism between 𝒢1\mathcal{G}_{1} and 𝒢2\mathcal{G}_{2} is a bijection ϕ\phi the between their sets of vertices 𝒱1\mathcal{V}_{1} and 𝒱2\mathcal{V}_{2} which preserves the edges, i.e. v1,v2𝒱1,(v1,v2)1(ϕ(v1),ϕ(v2))𝒱2\forall v_{1},v_{2}\in\mathcal{V}_{1},(v_{1},v_{2})\in\mathcal{E}_{1}~{}\Leftrightarrow~{}(\phi(v_{1}),\phi(v_{2}))\in\mathcal{V}_{2} and v1,v2𝒱2,(v1,v2)2(ϕ(v1),ϕ(v2))𝒱1\forall v_{1},v_{2}\in\mathcal{V}_{2},(v_{1},v_{2})\in\mathcal{E}_{2}~{}\Leftrightarrow~{}(\phi(v_{1}),\phi(v_{2}))\in\mathcal{V}_{1}.

We can now define the graph isomorphism problem, which checks the existence of such an isomorphism.

Definition 5.8 (Graph isomorphism problem, on page 3 of [43]).

Let 𝒢1=(𝒱1,1)\mathcal{G}_{1}=(\mathcal{V}_{1},\mathcal{E}_{1}) and 𝒢2=(𝒱2,2)\mathcal{G}_{2}=(\mathcal{V}_{2},\mathcal{E}_{2}) be two graphs. The graph isomorphism problem is: Does there exist a graph isomorphism ϕ\phi between 𝒢1\mathcal{G}_{1} and 𝒢2\mathcal{G}_{2}?

The computational complexity of the graph isomorphism problem is summarized in the following results:

Lemma 5.9 (Theorem 4.9 in [43]).

The graph isomorphism problem in Definition 5.8 is hard for the class 𝖣𝖤𝖳\mathsf{DET} under 𝖠𝖢0\mathsf{AC}^{0} reductions.

Corollary 5.10.

The graph isomorphism problem in Definition 5.8 is 𝖭C1\mathsf{N}C^{1}-hard.

Proof.

By Lemma 5.9, it is known that the graph isomorphism problem is hard for the class 𝖣𝖤𝖳\mathsf{DET} under 𝖠𝖢0\mathsf{AC}^{0} reductions. Following Fact 3.28, since 𝖠𝖢0𝖭𝖢1𝖣𝖤𝖳\mathsf{AC}^{0}\subseteq\mathsf{NC}^{1}\subseteq\mathsf{DET}, we can conclude that the graph isomorphism problem is 𝖣𝖤𝖳\mathsf{DET}-hard ignoring the 𝖠𝖢0\mathsf{AC}^{0} reduction condition. Thus, we can apply Fact 3.28 for the second time and conclude that the graph isomorphism problem is 𝖭𝖢1\mathsf{NC}^{1}-hard, which finishes the proof. ∎

5.3 Hardness Results

This subsection presents the main hardness results, which highlight the limitations of GNNs on two practical graph decision problems. We begin by showing that GNNs are unable to solve both types of graph connectivity problems.

Theorem 5.11.

Unless 𝖳𝖢0=𝖭𝖢1\mathsf{TC}^{0}=\mathsf{NC}^{1}, a graph neural network with poly(n)\operatorname{poly}(n) precision, constant number of layers, embedding size d=O(n)d=O(n) cannot solve the graph s-t connectivity problem.

Proof.

By Theorem 4.14, we have already shown that the graph neural network is in the 𝖳𝖢0\mathsf{TC}^{0} circuit family. We can also conclude that the graph s-t connectivity problem is in 𝖭𝖢1\mathsf{NC}^{1} by Lemma 5.5. Thus, combining with Fact 3.28, we can complete the proof. ∎

Theorem 5.12.

Unless 𝖳𝖢0=𝖭𝖢1\mathsf{TC}^{0}=\mathsf{NC}^{1}, a graph neural network with poly(n)\operatorname{poly}(n) precision, constant number of layers, embedding size d=O(n)d=O(n) cannot solve the graph connectivity problem.

Proof.

By Theorem 4.14, we have already shown that the graph neural network is in the 𝖳𝖢0\mathsf{TC}^{0} circuit family. We can also conclude that the graph connectivity problem is 𝖭𝖢1\mathsf{NC}^{1}-hard by Lemma 5.6. Thus, combining with Fact 3.28, we can complete the proof. ∎

Next, we show the hardness results for the graph isomorphism problem, which demonstrates the expressiveness limitations of GNNs from a non-Weisfeiler-Lehman (WL) perspective.

Theorem 5.13.

Unless 𝖳𝖢0=𝖭𝖢1\mathsf{TC}^{0}=\mathsf{NC}^{1}, a graph neural network with poly(n)\operatorname{poly}(n) precision, constant number of layers, embedding size d=O(n)d=O(n) cannot solve the graph isomorphism problem.

Proof.

By Theorem 4.14, we have already shown that the graph neural network is in the 𝖳𝖢0\mathsf{TC}^{0} circuit family. We can also conclude that the graph connectivity problem is 𝖭𝖢1\mathsf{NC}^{1}-hard by Corollary 5.10. Thus, combining with Fact 3.28, we can complete the proof. ∎

Remark 5.14.

Our hardness results assume an embedding size of d=O(n)d=O(n), which is significantly stronger and encompasses the constant embedding sizes typically used in practice, where d=O(1)d=O(1). This highlights that the computational limitations of GNNs cannot be easily mitigated by simply increasing the embedding size.

5.4 Discussion

After presenting the main hardness results of this paper, we explore the connections and comparisons with a broad range of relevant works, highlighting the contribution of this paper.

Connection and comparison to WL.

The Weisfeiler-Lehman (WL) hierarchy is the most widely used framework for analyzing the expressiveness of GNNs, particularly for their ability to solve graph isomorphism problems [52, 57]. While our Theorem 5.13 similarly shows that GNNs cannot solve the graph isomorphism problem unless 𝖳𝖢0=𝖭𝖢1\mathsf{TC}^{0}=\mathsf{NC}^{1}, our results differ fundamentally from previous WL-based findings. Unlike the WL framework, which focuses solely on the topological structure of graphs, our analysis considers practical factors such as floating-point precision and node features. This broader perspective allows us to bound the expressiveness of feature-enhanced GNNs, including those with position encodings [53, 15] or random node features [42]. Moreover, while the WL framework is limited to graph isomorphism, we extend the analysis to show that GNNs cannot solve the graph connectivity problem (Theorem 5.11 and Theorem 5.12), a limitation that is difficult to capture using the WL framework.

Comparison to GNN computational models.

Recent studies have explored GNN expressiveness through computational models. For example, [14] investigates the RL-CONGEST model, which examines the communication complexity of GNNs in distributed settings. However, this work focuses on communication cost rather than expressiveness and is less relevant to our study. The most similar work to ours is [5], which shows that aggregate-combine GNNs cannot capture a variant of first-order logic, 𝖥𝖮𝖢2\mathsf{FOC}_{2}. Since counting in 𝖥𝖮𝖢2\mathsf{FOC}_{2} can be simulated by 𝖥𝖮\mathsf{FO} and 𝖥𝖮=𝖠𝖢0\mathsf{FO}=\mathsf{AC}^{0}, their result implies that 𝖠𝖢0\mathsf{AC}^{0} is not a complexity lower bound for AC-GNNs. In contrast, our work establishes that GNNs are upper-bounded by the 𝖳𝖢0\mathsf{TC}^{0} circuit family, providing a clear upper bound rather than refusing a lower bound. This highlights a fundamental difference between our result and previous results in [5].

6 Conclusion

In this work, we show the computational limits of graph neural networks (GNNs). Unlike prior approaches based on the Weisfeiler-Lehman (WL) test, we adopt a fundamentally different perspective: circuit complexity. We show that GNNs with poly(n)\operatorname{poly}(n) precision, constant number of layers, and embedding sizes d=O(n)d=O(n), regardless of the specific message-passing mechanisms or global readout functions, belong to the uniform 𝖳𝖢0\mathsf{TC}^{0} circuit complexity class. As a result, we establish critical hardness results for two practical graph decision problems: graph connectivity and graph isomorphism. Our findings demonstrate that GNNs cannot solve these problems beyond 𝖳𝖢0\mathsf{TC}^{0}, unless 𝖳𝖢0=𝖭𝖢1\mathsf{TC}^{0}=\mathsf{NC}^{1}. These results are significant as they extend previous GNN expressiveness limitations, which were framed from the WL perspective, by introducing a fundamentally different circuit complexity bound. Our analysis not only incorporates previously overlooked factors, such as floating-point number precision and the interactions between node embeddings and topological structure but also applies to a broader range of graph query problems, such as graph connectivity.

For future works, we believe our theoretical framework offers a trustworthy foundation that can be generalized to assess the expressiveness of other GNN architectures and the hardness of additional graph query problems. Our research may also inspire the development of new architectures that go beyond the complexity of the 𝖳𝖢0\mathsf{TC}^{0} circuit family.

References

  • AB [09] Sanjeev Arora and Boaz Barak. Computational complexity: a modern approach. Cambridge University Press, 2009.
  • Bab [16] László Babai. Graph isomorphism in quasipolynomial time. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 684–697, 2016.
  • BAY [22] Shaked Brody, Uri Alon, and Eran Yahav. How attentive are graph attention networks? In International Conference on Learning Representations, 2022.
  • BI [94] D Mix Barrington and Neil Immerman. Time, hardware, and uniformity. In Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory, pages 176–185. IEEE, 1994.
  • BKM+ [20] Pablo Barceló, Egor V. Kostylev, Mikael Monet, Jorge Pérez, Juan Reutter, and Juan Pablo Silva. The logical expressiveness of graph neural networks. In International Conference on Learning Representations, 2020.
  • BL [23] Filippo Maria Bianchi and Veronica Lachi. The expressive power of pooling in graph neural networks. Advances in neural information processing systems, 36:71603–71618, 2023.
  • BRH+ [21] Muhammet Balcilar, Guillaume Renton, Pierre Héroux, Benoit Gaüzère, Sébastien Adam, and Paul Honeine. Analyzing the expressive power of graph neural networks in a spectral perspective. In International Conference on Learning Representations, 2021.
  • Chi [24] David Chiang. Transformers in uniform 𝖳𝖢0\mathsf{TC}^{0}. arXiv preprint arXiv:2409.13629, 2024.
  • [9] Bo Chen, Xiaoyu Li, Yingyu Liang, Jiangxuan Long, Zhenmei Shi, and Zhao Song. Circuit complexity bounds for rope-based transformer architecture. arXiv preprint arXiv:2411.07602, 2024.
  • [10] Yifang Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, and Zhao Song. The computational limits of state-space models and mamba via the lens of circuit complexity. arXiv preprint arXiv:2412.06148, 2024.
  • CM [87] Stephen A Cook and Pierre McKenzie. Problems complete for deterministic logarithmic space. Journal of Algorithms, 8(3):385–394, 1987.
  • CMR [21] Leonardo Cotta, Christopher Morris, and Bruno Ribeiro. Reconstruction for powerful graph representations. Advances in Neural Information Processing Systems, 34:1713–1726, 2021.
  • Coo [85] Stephen A Cook. A taxonomy of problems with fast parallel algorithms. Information and control, 64(1-3):2–22, 1985.
  • CWS [24] Guanyu Cui, Zhewei Wei, and Hsin-Hao Su. Rethinking the expressiveness of gnns: A computational model perspective. arXiv preprint arXiv:2410.01308, 2024.
  • DLL+ [22] Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. Graph neural networks with learnable structural and positional representations. In International Conference on Learning Representations, 2022.
  • FCL+ [22] Jiarui Feng, Yixin Chen, Fuhai Li, Anindya Sarkar, and Muhan Zhang. How powerful are k-hop message passing graph neural networks. Advances in Neural Information Processing Systems, 35:4776–4790, 2022.
  • FML+ [19] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. Graph neural networks for social recommendation. In The world wide web conference, pages 417–426, 2019.
  • GD [23] Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
  • GJJ [20] Vikas Garg, Stefanie Jegelka, and Tommi Jaakkola. Generalization and representational limits of graph neural networks. In International Conference on Machine Learning, pages 3419–3430. PMLR, 2020.
  • GS [20] Martin Grohe and Pascal Schweitzer. The graph isomorphism problem. Communications of the ACM, 63(11):128–134, 2020.
  • GSR+ [17] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
  • HAF [22] Yiding Hao, Dana Angluin, and Robert Frank. Formal language recognition by hard attention transformers: Perspectives from circuit complexity. Transactions of the Association for Computational Linguistics, 10:800–810, 2022.
  • HDW+ [20] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 639–648, 2020.
  • HYL [17] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
  • KW [17] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017.
  • LAG+ [23] Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, and Cyril Zhang. Transformers learn shortcuts to automata. In The Eleventh International Conference on Learning Representations, 2023.
  • LLL+ [24] Xiaoyu Li, Yuanpeng Li, Yingyu Liang, Zhenmei Shi, and Zhao Song. On the expressive power of modern hopfield networks. arXiv preprint arXiv:2412.05562, 2024.
  • LLS+ [24] Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song, and Mingda Wan. Theoretical constraints on the expressive power of rope-based tensor attention transformers. arXiv preprint arXiv:2412.18040, 2024.
  • LW [68] Andrei Leman and Boris Weisfeiler. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsiya, 2(9):12–16, 1968.
  • MBHSL [19] Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, and Yaron Lipman. Provably powerful graph networks. Advances in neural information processing systems, 32, 2019.
  • MRF+ [19] Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 4602–4609, 2019.
  • MRM [20] Christopher Morris, Gaurav Rattan, and Petra Mutzel. Weisfeiler and leman go sparse: Towards scalable higher-order graph embeddings. Advances in Neural Information Processing Systems, 33:21824–21840, 2020.
  • MS [23] William Merrill and Ashish Sabharwal. The parallelism tradeoff: Limitations of log-precision transformers. Transactions of the Association for Computational Linguistics, 11:531–545, 2023.
  • MS [24] William Merrill and Ashish Sabharwal. A logic for expressing log-precision transformers. Advances in Neural Information Processing Systems, 36, 2024.
  • MSS [22] William Merrill, Ashish Sabharwal, and Noah A Smith. Saturated transformers are constant-depth threshold circuits. Transactions of the Association for Computational Linguistics, 10:843–856, 2022.
  • PW [22] Pál András Papp and Roger Wattenhofer. A theoretical comparison of graph neural network extensions. In International Conference on Machine Learning, pages 17323–17345. PMLR, 2022.
  • QRG+ [22] Chendi Qian, Gaurav Rattan, Floris Geerts, Mathias Niepert, and Christopher Morris. Ordered subgraph aggregation networks. Advances in Neural Information Processing Systems, 35:21030–21045, 2022.
  • Ruz [81] Walter L Ruzzo. On uniform circuit complexity. Journal of Computer and System Sciences, 22(3):365–383, 1981.
  • SAL+ [24] Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568:127063, 2024.
  • Sat [20] Ryoma Sato. A survey on the expressive power of graph neural networks. arXiv preprint arXiv:2003.04078, 2020.
  • SLYS [21] Aravind Sankar, Yozen Liu, Jun Yu, and Neil Shah. Graph neural networks for friend ranking in large-scale social platforms. In Proceedings of the Web Conference 2021, pages 2535–2546, 2021.
  • SYK [21] Ryoma Sato, Makoto Yamada, and Hisashi Kashima. Random features strengthen graph neural networks. In Proceedings of the 2021 SIAM international conference on data mining (SDM), pages 333–341. SIAM, 2021.
  • Tor [04] Jacobo Torán. On the hardness of graph isomorphism. SIAM Journal on Computing, 33(5):1093–1108, 2004.
  • VCC+ [18] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations, 2018.
  • Vol [99] Heribert Vollmer. Introduction to circuit complexity: a uniform approach. Springer Science & Business Media, 1999.
  • WHW+ [19] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval, pages 165–174, 2019.
  • Wig [92] Avi Wigderson. The complexity of graph connectivity. In International Symposium on Mathematical Foundations of Computer Science, pages 112–132. Springer, 1992.
  • Wil [10] R.J. Wilson. Introduction to Graph Theory. Longman, 2010.
  • WSZ+ [19] Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. Simplifying graph convolutional networks. In International conference on machine learning, pages 6861–6871. PMLR, 2019.
  • WW [22] Asiri Wijesinghe and Qing Wang. A new perspective on ”how graph neural networks go beyond weisfeiler-lehman?”. In International Conference on Learning Representations, 2022.
  • WWCBF [22] Yuyang Wang, Jianren Wang, Zhonglin Cao, and Amir Barati Farimani. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, 4(3):279–287, 2022.
  • XHLJ [18] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2018.
  • YYL [19] Jiaxuan You, Rex Ying, and Jure Leskovec. Position-aware graph neural networks. In International conference on machine learning, pages 7134–7143. PMLR, 2019.
  • ZC [18] Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. Advances in neural information processing systems, 31, 2018.
  • ZCNC [18] Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. An end-to-end deep learning architecture for graph classification. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  • ZFD+ [23] Bohang Zhang, Guhao Feng, Yiheng Du, Di He, and Liwei Wang. A complete expressiveness hierarchy for subgraph gnns via subgraph weisfeiler-lehman tests. In International Conference on Machine Learning, pages 41019–41077. PMLR, 2023.
  • ZGD+ [24] Bohang Zhang, Jingchu Gai, Yiheng Du, Qiwei Ye, Di He, and Liwei Wang. Beyond weisfeiler-lehman: A quantitative framework for GNN expressiveness. In The Twelfth International Conference on Learning Representations, 2024.
  • ZZXT [21] Zhaocheng Zhu, Zuobai Zhang, Louis-Pascal Xhonneux, and Jian Tang. Neural bellman-ford networks: A general graph neural network framework for link prediction. Advances in Neural Information Processing Systems, 34:29476–29490, 2021.