On the Computational Capability of Graph Neural Networks: A Circuit Complexity Bound Perspective

Xiaoyu Li [email protected]. Independent Researcher. Yingyu Liang [email protected]. The University of Hong Kong. [email protected]. University of Wisconsin-Madison. Zhenmei Shi [email protected]. University of Wisconsin-Madison. Zhao Song [email protected]. The Simons Institute for the Theory of Computing at UC Berkeley. Wei Wang [email protected]. UC Berkeley. Jiahao Zhang [email protected] . Independent Researcher.

Graph Neural Networks (GNNs) have become the standard approach for learning and reasoning over relational data, leveraging the message-passing mechanism that iteratively propagates node embeddings through graph structures. While GNNs have achieved significant empirical success, their theoretical limitations remain an active area of research. Existing studies primarily focus on characterizing GNN expressiveness through Weisfeiler-Lehman (WL) graph isomorphism tests. In this paper, we take a fundamentally different approach by exploring the computational limitations of GNNs through the lens of circuit complexity. Specifically, we analyze the circuit complexity of common GNN architectures and prove that under constraints of constant-depth layers, linear or sublinear embedding sizes, and polynomial precision, GNNs cannot solve key problems such as graph connectivity and graph isomorphism unless $\mathsf{TC}^{0}=\mathsf{NC}^{1}$ . These results reveal the intrinsic expressivity limitations of GNNs behind their empirical success and introduce a novel framework for analyzing GNN expressiveness that can be extended to a broader range of GNN models and graph decision problems.

1 Introduction

Graphs are ubiquitous representations for relational data, describing interconnected elements with interactions in domains such as molecules [21, 51], social networks [17, 41], and user-item interactions in recommendation systems [46, 23]. Graph Neural Networks (GNNs) [25, 24, 44] have emerged as a dominant tool for learning expressive representations from graphs, enabling tasks like node property prediction [49, 3], link prediction [54, 58], and graph classification [55, 52]. The key to GNNs’ success lies in the message-passing mechanism [21], which aggregates information from local neighborhoods through a graph convolution matrix, followed by nonlinear transformations to update node representations.

Despite their empirical success, concerns regarding the computational limitations of GNNs are emerging [19, 50]. A fundamental research question arises from the concern:

What computational capabilities do GNNs and their variants possess, and what classes of problems can they provably solve?

Addressing these questions is crucial for understanding GNNs from a principled and theoretically robust perspective, identifying their limitations, and building trust in their deployment for real-world applications.

Previous work has made significant strides in addressing these questions. A notable line of research connects GNN expressiveness to the Weisfeiler-Leman (WL) graph isomorphism test [52, 31, 32, 57], which iteratively refines color labels on $k$ -tuples of nodes (known as $k$ -WL) and distinguishes graphs by the resulting color histograms. For example, the Graph Isomorphism Network (GIN) [52] equipped with summation aggregator and injective readout function is as expressive as $1$ -WL, while $k$ -GNNs [31] achieve expressiveness equivalent to $k$ -WL by encoding $k$ -node subgraphs as hypernodes in message passing. However, these results focus on graph isomorphism tasks and do not address a broader range of graph query problems. Moreover, the WL framework only provides a specific level of expressiveness without establishing the theoretical upper bounds, and it often ignores the interplay between node features and graph topology, which is central to GNNs.

In this paper, we take a fundamentally different approach to analyzing the computational limitations of GNNs by examining the circuit complexity bounds. Circuit complexity [22, 38, 45] is a foundational topic in theoretical computer science, characterizing computational models based on the types of Boolean gates they use and the depth and size of their circuits. Importantly, circuit complexity bounds provably reflect the set of problems solvable within a given model. For instance, models (e.g., Transformers) bounded by the $\mathsf{TC}^{0}$ complexity class can only solve $\mathsf{TC}^{0}$ problems (e.g., Dyck language recognition) but cannot solve $\mathsf{NC}^{1}$ -complete problems like arithmetic formula evaluation unless $\mathsf{TC}^{0}=\mathsf{NC}^{1}$ [8, 33, 34]. Analogously, if we demonstrate that GNN computations lie within a particular circuit complexity class, we can formally identify problems that GNNs can solve and cannot solve.

Our approach evaluates the circuit complexity of GNN components, from basic activation functions to the entire graph convolution process. We show that GNNs with a constant number of layers, $\operatorname{poly}(n)$ precision, and embedding sizes $d=O(n)$ can be approximated by uniform $\mathsf{TC}^{0}$ circuits. Consequently, unless $\mathsf{TC}^{0}=\mathsf{NC}^{1}$ , such GNNs cannot solve problems like graph connectivity problems or graph isomorphism problems. These findings illuminate the fundamental expressivity limitations of GNNs despite their empirical success and also establish a novel framework for analyzing their expressiveness, which can be seamlessly generalized to more GNN models and decision problems on graphs.

Our contributions are summarized as follows:

•

We prove that a uniform $\mathsf{TC}^{0}$ circuit family can approximate GNNs with a constant number of layers, $\operatorname{poly}(n)$ precision, and $d=O(n)$ embedding size (Theorem 4.14).
•

We establish that unless $\mathsf{TC}^{0}=\mathsf{NC}^{1}$ , graph neural networks with a constant number of layers, $\operatorname{poly}(n)$ precision and $d=O(n)$ embedding size cannot solve the graph connectivity problems (Theorem 5.11 and Theorem 5.12).
•

We establish that unless $\mathsf{TC}^{0}=\mathsf{NC}^{1}$ , graph neural networks with a constant number of layers, $\operatorname{poly}(n)$ precision and $d=O(n)$ embedding size cannot solve the graph isomorphism problems (Theorem 5.13).

Roadmap. Section 2 reviews relevant works. Section 3 introduces GNNs and basic concepts from theoretical computer science. Section 4 presents the circuit complexity bounds for GNNs. Section 5 discusses the hardness of specific graph problems. Finally, Section 6 concludes our findings.

2 Related Work

We present related works on the computational limitation of GNNs and existing circuit complexity bounds for neural networks.

2.1 Limitations of Graph Neural Networks

Graph Neural Networks (GNNs) have demonstrated impressive performance on graph learning and mining tasks. However, their inherent limitations in solving decision problems on graphs remain an open question. The predominant framework for analyzing GNN limitations is the Weisfeiler-Lehman (WL) hierarchy [29, 40, 57], a well-established tool for assessing GNNs’ ability to address the graph isomorphism problem—an NP-intermediate problem not solvable in polynomial time [2, 20]. The WL hierarchy leverages the computationally efficient heuristic of color refinement to bound the capability of differentiating non-isomorphic graphs.

The expressiveness of message-passing GNNs is bounded by the $1$ -WL test [52]. Standard GNN models such as GCN [25], GAT [44], and GIN [52] are either equivalent to or strictly limited by the expressiveness of $1$ -WL. To go beyond $1$ -WL, high-order GNNs extend message-passing mechanisms to $k$ -node subgraphs [31, 32, 30], mimicking the $k$ -WL or $k$ -FWL (Folklore WL) tests. Models like $k$ -GNN and $k$ -FGNN match the expressiveness of these higher-order tests, offering stronger guarantees than standard message-passing GNNs. However, the parameter $k$ cannot be scaled to sufficiently large values due to inherent computational and memory constraints.

Another promising line of research involves subgraph GNNs, which aim to address the inherent symmetry of graphs that cannot be distinguished by WL tests. These models transform the original graph into a set of slightly perturbed subgraphs, which are then processed by GNNs [12, 36, 37]. Recent work has shown that for an $n$ -node graph, subgraph GNNs operating on subgraphs with $k$ nodes are strictly bounded by $(k+1)$ -FWL [37]. Besides, distance-aware GNNs inject distance information—overlooked by both message-passing GNNs and $1$ -WL—into their architectures. For instance, $k$ -hop MPNNs aggregate information from $k$ -hop neighbors in each layer and have been shown to be strictly bounded by $2$ -FWL [16]. Additionally, the subgraph WL hierarchy demonstrates that distance encoding can be represented by local $2$ -WL tests [56].

Despite the widespread use of the WL framework in analyzing GNNs’ computational limitations, it often overlooks the role of node features [7] and focuses exclusively on the graph isomorphism problem, making it insufficiently comprehensive and unsuitable for generalizing to other graph decision problems. A detailed comparison of our work with WL-based GNN expressiveness is provided in Section 5.4.

2.2 Circuit Complexity and Neural Networks

Circuit complexity, a foundational area in theoretical computer science, studies the computational power of Boolean circuit families. Various circuit complexity classes play a significant role in analyzing machine learning models. For instance, the class $\mathsf{AC}^{0}$ represents problems solvable in parallel using standard Boolean gates, while $\mathsf{TC}^{0}$ extends this by incorporating threshold or modulo gates. The stronger class $\mathsf{NC}^{1}$ corresponds to problems solvable by circuits with $O(\log n)$ depth and bounded fan-in [35]. A key result relevant to machine learning is the complexity inclusion $\mathsf{AC}^{0}\subset\mathsf{TC}^{0}\subseteq\mathsf{NC}^{1}$ , though whether $\mathsf{TC}^{0}=\mathsf{NC}^{1}$ remains an open question [45, 1].

Circuit complexity bounds have been effectively used to analyze the computational power of various neural network architectures. For example, Transformers, including two canonical variants—Average-Head Attention Transformers (AHATs) and SoftMax-Attention Transformers (SMATs)—have been studied in this context. Specifically, [35] shows that AHATs can be simulated by non-uniform constant-depth threshold circuits in $\mathsf{TC}^{0}$ , while [26] demonstrates that SMATs can also be simulated in a $L$ -uniform manner within $\mathsf{TC}^{0}$ . A follow-up study [34] unifies these results, concluding that both AHATs and SMATs are approximable by $\mathsf{DLOGTIME}$ -uniform $\mathsf{TC}^{0}$ circuits. Beyond standard Transformers, RoPE-based Transformers [39], a widely adopted variant in large language models, have also been analyzed using circuit complexity frameworks [28, 9]. Similarly, the emerging Mamba architecture [18] falls within the $\mathsf{DLOGTIME}$ -uniform $\mathsf{TC}^{0}$ family [10]. Additionally, Hopfield networks, initially introduced as associative memory systems, have also been shown to exhibit $\mathsf{TC}^{0}$ circuit complexity bounds [27].

Despite the success of circuit complexity in analyzing other neural networks, its application to GNNs is underexplored. While some prior works have attempted to characterize the computational power of GNNs within circuit computation models [5, 14], these efforts are orthogonal to our contributions. A detailed discussion is provided in Section 5.4.

3 Preliminary

This section provides fundamental definitions for this paper. We first introduce some notations. In Section 3.1, we present an in-depth overview of the computation of floating point numbers. In Section 3.2, we review several basic definitions of the Graph Neural Networks (GNNs). In Section 3.3, we present some basic concepts of circuit families and their complexity.

Notations.

For any positive integer $n$ , we let $[n]:=\{0,1,2,\dots,n\}$ denote the set of first $n$ natural numbers. For a vector $x$ , $\mathrm{diag}(x)$ denotes the diagonal matrix with entries from $x$ . The concatenation operator $||$ represents combining two vectors or matrices along their last dimension. For example, given $A\in\mathbb{R}^{n\times d_{1}}$ and $B\in\mathbb{R}^{n\times d_{2}}$ , the result $A||B$ has shape $\mathbb{R}^{n\times(d_{1}+d_{2})}$ . We use $\boldsymbol{1}_{n}$ to represent the $n$ -dimensional all-ones vector. Let $\mathcal{G}=(\mathcal{V},\mathcal{E})$ be an undirected graph with vertex set $\mathcal{V}=\{v_{1},\dots,v_{n}\}$ and edge set $\mathcal{E}=\{e_{1},\dots,e_{m}\}$ , where $n$ and $m$ are the numbers of vertices and edges, respectively. The adjacency matrix $A\in\{0,1\}^{n\times n}$ encodes $\mathcal{E}$ , with $A_{i,j}=1$ if $(i,j)\in\mathcal{E}$ , and $A_{i,j}=0$ otherwise. The Kleene closure of a set $V$ , denoted $V^{*}$ , is the set of all finite sequences whose elements belong to $V$ . Specifically, $x\in V^{*}$ implies $x$ is a sequence of arbitrary length with elements from $V$ . For convenience, we denote the floating point number as $\mathsf{FPN}$ .

3.1 Floating Point Numbers

In this subsection, we introduce fundamental definitions of floating-point numbers and their operations. These concepts establish a critical computational framework for implementing GNNs on real-world machines.

Definition 3.1 (Floating Point Numbers ( $\mathsf{FPN}$ s), Definition 9 in [8]).

A $p$ -bit floating-point number $(\mathsf{FPN})$ is represented as a $2$ -tuple of binary integers $\langle s,e\rangle$ , in which the significand $|s|\in\{0\}\cup[2^{p-1},2^{p})$ and the exponent $e\in[-2^{p},2^{p}-1]$ . The value of the $\mathsf{FPN}$ is given by $s\cdot 2^{e}$ . Specifically, when $e=2^{p}$ , the floating-point number represents positive or negative infinity, depending on the sign of the significand $m$ . We denote the set of all the $p$ -bit $\mathsf{FPN}$ s as $\mathbb{F}_{p}$ .

Definition 3.2 (Rounding, Definition 9 in [8]).

Let the real number $r\in\mathbb{R}$ be of infinite precision. The closest- $p$ bit precision $\mathsf{FPN}$ for $r$ is denoted by $\mathrm{round}_{p}(r)\in\mathbb{F}_{p}$ . If two such numbers exist, we denote $\mathrm{round}_{p}(r)$ as the $\mathsf{FPN}$ with even significand.

Building upon the fundamental concepts introduced above, we present the critical floating-point operations used to compute the outputs of graph neural networks.

Definition 3.3 ( $\mathsf{FPN}$ operations, page 5 on [8]).

Let $x,y$ be two integers. We first denote the integer division operation $\mathbin{/\!/}$ as:

\displaystyle x\mathbin{/\!/}y

\displaystyle:=\begin{cases}x/y&\text{if $x/y$ is a multiple of $1/4$}\\ x/y+1/8&\text{otherwise.}\end{cases}

Given two $p$ -bits $\mathsf{FPN}$ s $\left\langle s_{1},e_{1}\right\rangle,\left\langle s_{2},e_{2}\right\rangle\in\mathbb{F}_{p}$ , we define the following basic operations:

	$\displaystyle\left\langle s_{1},e_{1}\right\rangle+\left\langle s_{2},e_{2}\right\rangle$	$\displaystyle:=\begin{cases}\mathrm{round}_{p}({\left\langle s_{1}+s_{2}\mathbin{/\!/}2^{e_{1}-e_{2}},e_{1}\right\rangle})&\text{if $e_{1}\geq e_{2}$}\\ \mathrm{round}_{p}({\left\langle s_{1}\mathbin{/\!/}2^{e_{2}-e_{1}}+s_{2},e_{2}\right\rangle})&\text{if $e_{1}\leq e_{2}$}\end{cases}$
	$\displaystyle\left\langle s_{1},e_{1}\right\rangle\times\left\langle s_{2},e_{2}\right\rangle$	$\displaystyle:=\mathrm{round}_{p}(\left\langle s_{1}s_{2},e_{1}+e_{2}\right\rangle)$
	$\displaystyle\left\langle s_{1},e_{1}\right\rangle\div\left\langle s_{2},e_{2}\right\rangle$	$\displaystyle:=\mathrm{round}_{p}({\left\langle s_{1}\cdot 2^{p-1}\mathbin{/\!/}s_{2},e_{1}-e_{2}-p+1\right\rangle})$
	$\displaystyle\left\langle s_{1},e_{1}\right\rangle\leq\left\langle s_{2},e_{2}\right\rangle$	$\displaystyle\Leftrightarrow\begin{cases}s_{1}\leq s_{2}\mathbin{/\!/}2^{e_{1}-e_{2}}&\text{if $e_{1}\geq e_{2}$}\\ s_{1}\mathbin{/\!/}2^{e_{2}-e_{1}}\leq s_{2}&\text{if $e_{1}\leq e_{2}$.}\end{cases}$

The basic operations described above can be efficiently computed in parallel using simple hardware implementations in $\mathsf{TC}^{0}$ circuits, as formalized in the following lemma:

Lemma 3.4 (Computing $\mathsf{FPN}$ operations with $\mathsf{TC}^{0}$ circuits, Lemma 10 and Lemma 11 in [8]).

We denote the number of digits as a positive integer $p$ . If $p\leq\operatorname{poly}(n)$ , then:

•

Basic Operations: The “ $+$ ”, “ $\times$ ”, “ $\div$ ”, and comparison ( $\leq$ ) of two $p$ -bit $\mathsf{FPN}$ s, as defined in Definition 3.1, can be computed with $O(1)$ -depth uniform threshold circuits with $\operatorname{poly}(n)$ size. Let the maximum circuit depth required for these basic operations be $d_{\mathrm{std}}$ .
•

Iterated Operations: The product of $n$ $p$ -bit $\mathsf{FPN}$ s and the sum of $n$ $p$ -bit $\mathsf{FPN}$ s (with rounding applied after summation) can be both computed with $O(1)$ -depth uniform threshold circuits with $\operatorname{poly}(n)$ size. Let the maximum circuit depth required for multiplication be $d_{\otimes}$ and for addition be $d_{\oplus}$ .

Beyond these basic floating-point operations, certain specialized floating-point operations are also known to be computable within $\mathsf{TC}^{0}$ circuits, as demonstrated in the following lemmas:

Lemma 3.5 (Computing $\exp$ with $\mathsf{TC}^{0}$ circuits, Lemma 12 in [8]).

Let $x\in\mathbb{F}_{p}$ be a $p$ -bit $\mathsf{FPN}$ . If $p\leq\operatorname{poly}(n)$ , there exists an $O(1)$ -depth uniform threshold circuit of size $\operatorname{poly}(n)$ that can compute $\exp(x)$ with relative error less than $2^{-p}$ . Let the maximum circuit depth required for approximating $\exp$ be $d_{\mathrm{exp}}$ .

Lemma 3.6 (Computing square root with $\mathsf{TC}^{0}$ circuits, Lemma 12 in [8]).

Let $x\in\mathbb{F}_{p}$ be a $p$ -bit $\mathsf{FPN}$ . If $p\leq\operatorname{poly}(n)$ , there exists an $O(1)$ -depth uniform threshold circuit of $\operatorname{poly}(n)$ size that can compute $\sqrt{x}$ with relative error less than $2^{-p}$ . Let the maximum circuit depth required for approximating square roots be $d_{\mathrm{sqrt}}$ .

Lemma 3.7 (Computing matrix multiplication with $\mathsf{TC}^{0}$ circuits, Lemma 4.2 in [9]).

Let two matrix operands be $A\in\mathbb{F}_{p}^{n_{1}\times n_{2}},B\in\mathbb{F}_{p}^{n_{2}\times n_{3}}$ . If $p\leq\operatorname{poly}(n)$ and $n_{1},n_{2},n_{3}\leq n$ , then there exists an $O(1)$ -depth uniform threshold circuit of $\operatorname{poly}(n)$ size and depth $(d_{\mathrm{std}}+d_{\oplus})$ that can compute the matrix product $AB$ .

3.2 Graph Neural Networks

With the foundational framework of $\mathsf{FPN}$ operations, we now formalize the components of Graph Neural Networks (GNNs) in floating-point representations. This subsection commences by introducing activation functions and the softmax operation, which are fundamental building tools for GNN layers.

Definition 3.8 (ReLU).

For an embedding matrix $X\in\mathbb{F}_{p}^{n\times d}$ , the output of the $\mathsf{ReLU}$ activation function is a matrix $Y\in\mathbb{F}_{p}^{n\times d}$ , where each element is defined as $Y_{i,j}:=\max\{0,X_{i,j}\}$ for all $1\leq i\leq n$ and $1\leq j\leq d$ .

Definition 3.9 (Leaky ReLU).

For an embedding matrix $X\in\mathbb{F}_{p}^{n\times d}$ and a predefined negative slope $s\in\mathbb{F}_{p}$ , the output of the $\mathsf{LeakyReLU}$ function is a matrix $Y\in\mathbb{F}_{p}^{n\times d}$ , where each element is defined as $Y_{i,j}:=\max\{0,X_{i,j}\}+s\cdot\min\{0,X_{i,j}\}$ for all $1\leq i\leq n$ and $1\leq j\leq d$ .

Definition 3.10 (Softmax).

Let $X\in\mathbb{F}_{p}^{n\times d}$ be an embedding matrix. The row-wise softmax function is defined element-wise as:

\mathsf{Softmax}(X)_{i,j}:=\frac{\exp(X_{i,j})}{\sum_{k=1}^{d}\exp(X_{i,k})},

or equivalently in matrix form as:

\mathsf{Softmax}(X):=\operatorname{diag}(\exp(X)\cdot\boldsymbol{1}_{d})^{-1}\exp(X),

where $\boldsymbol{1}_{d}$ is a $d$ -dimensional column vector of ones.

These activation functions and softmax operations form the basis of GNN computation. We now introduce the convolution matrices central to the message-passing scheme of GNNs, focusing on three widely used GNN models: GCN [25], GIN [52], and GAT [44].

Definition 3.11 (GCN convolution matrix).

Let $A\in\mathbb{F}_{p}^{n\times n}$ be the adjacency matrix of a graph with $n$ nodes, and let $D\in\mathbb{F}_{p}^{n\times n}$ be the diagonal degree matrix, where $D_{i,i}=\sum_{j=1}^{n}A_{i,j}$ . Let $I\in\mathbb{F}_{p}^{n\times n}$ denote the identity matrix. The GCN convolution matrix is defined as:

C_{\mathrm{GCN}}:=(D+I)^{-1/2}(A+I)(D+I)^{-1/2}\in\mathbb{F}_{p}^{n\times n}.

Definition 3.12 (GIN convolution matrix).

Let $A\in\mathbb{F}_{p}^{n\times n}$ be the adjacency matrix, and let $\epsilon\in\mathbb{F}_{p}$ be a constant. The GIN convolution matrix is defined as:

C_{\mathrm{GIN}}:=A+(1+\epsilon)I\in\mathbb{F}_{p}^{n\times n}.

Definition 3.13 (GAT convolution matrix).

Let $X\in\mathbb{F}_{p}^{n\times d}$ be the embedding matrix, and let $W\in\mathbb{F}_{p}^{d\times d}$ and $a\in\mathbb{F}_{p}^{2d}$ denote model weights. The attention weight matrix $E\in\mathbb{F}_{p}^{n\times n}$ is defined as:

E_{i,j}:=\left\{\begin{aligned} &a^{\top}\mathsf{LeakyReLU}(WX_{i}||WX_{j}),&A_{i,j}=1,\\ &-\infty,&\text{otherwise}.\end{aligned}\right.

The GAT convolution matrix is then given by:

C_{\mathrm{GAT}}:=\mathsf{Softmax}(E).

Therefore, we unify the three commonly used graph convolution matrices with basic components to define a general GNN layer.

Definition 3.14 (One GNN layer).

Let $X\in\mathbb{F}_{p}^{n\times d}$ be the embedding matrix, $W\in\mathbb{F}_{p}^{d\times d}$ be the model weights, and $C\in\mathbb{F}_{p}^{n\times n}$ an arbitrary convolution matrix (e.g., $C_{\mathrm{GCN}},C_{\mathrm{GIN}},C_{\mathrm{GAT}}$ ). A single GNN layer is defined as:

\mathsf{GNN}_{i}(X):=\mathsf{ReLU}(CXW).

By stacking multiple GNN layers, we obtain a multi-layer GNN capable of learning expressive node embeddings. Different prediction tasks, such as node-level, link-level, or graph-level tasks, require graph pooling operations to aggregate information from specific node subsets. We introduce two commonly used $\mathsf{READOUT}$ functions [6] for graph pooling:

Definition 3.15 (Graph average readout layer).

Let $X\in\mathbb{F}_{p}^{n\times d}$ be a node embedding matrix. A graph average readout layer $\mathsf{READOUT}:\mathbb{F}_{p}^{n\times d}\rightarrow\mathbb{F}_{p}^{d}$ selects a subset $B=\{i_{1},i_{2},\dots,i_{|B|}\}\subseteq[n]$ and computes:

\mathsf{READOUT}(X):=\frac{1}{|B|}(X_{i_{1}}+X_{i_{2}}+\dots+X_{i_{|B|}}).

Definition 3.16 (Graph maximum readout layer).

Let $X\in\mathbb{F}_{p}^{n\times d}$ be a node embedding matrix. A graph maximum readout layer $\mathsf{READOUT}:\mathbb{F}_{p}^{n\times d}\rightarrow\mathbb{F}_{p}^{d}$ selects a subset $B=\{i_{1},i_{2},\dots,i_{|B|}\}\subseteq[n]$ and computes each dimension $j\in[d]$ as:

\mathsf{READOUT}(X)_{j}:=\max\{X_{i_{1},j},X_{i_{2},j},\cdots,X_{i_{|B|},j}\}.

Remark 3.17.

Graph readout functions in Definitions 3.15 and 3.16 support decision problems at various levels, including but not limited to node, link, and graph tasks, since one can target specific nodes, edges, or the entire graph by appropriately selecting the subset $B$ .

Finally, we introduce the MLP prediction head, essential for converting the aggregated embedding into specific predictions:

Definition 3.18 (MLP prediction head).

Let $x\in\mathbb{F}_{p}^{d}$ be a single output embedding. With model weights $W\in\mathbb{F}_{p}^{d\times d}$ , $w\in\mathbb{F}_{p}^{d}$ , and bias $b\in\mathbb{F}_{p}^{d}$ , the MLP prediction head is defined as:

\mathsf{Head}(x):=w^{\top}\mathsf{ReLU}(Wx+b).

Finally, we integrate all the previously defined GNN components to present the complete formulation of a multi-layer GNN.

Definition 3.19 (Multi-layer GNN).

Let $m$ be the GNN layer number. Let $X\in\mathbb{F}_{p}^{n\times d}$ denote the input feature matrix. For each $i\in{1,\dots,m}$ , let $\mathsf{GNN}_{i}$ represent the $i$ -th GNN layer in Definition 3.14. Let $\mathsf{READOUT}$ be a graph readout function (Definition 3.15 or Definition 3.16), and $\mathsf{Head}$ be the MLP prediction head from Definition 3.18. The $m$ -layer GNN $\mathsf{GNN}:\mathbb{F}_{p}^{n\times d}\rightarrow\mathbb{F}_{p}$ is then defined as:

\mathsf{GNN}(X):=\mathsf{Head}\circ\mathsf{READOUT}\circ\mathsf{GNN}_{m}\circ\mathsf{GNN}_{m-1}\circ\cdots\circ\mathsf{GNN}_{1}(X).

3.3 Circuit Complexity Classes

In this subsection, we introduce the fundamental concepts of circuit complexity, a key concept of theoretical computer science and computational complexity.

Definition 3.20 (Boolean circuit, Definition 6.1 in [1]).

A Boolean circuit with $n$ binary inputs and one binary output is a mapping $C_{n}$ between $\{0,1\}^{n}$ and $\{0,1\}$ , represented as a directed acyclic graph (DAG). The graph consists of:

•

$n$ input nodes, each with in-degree zero, corresponding to the input variables.
•

One output node, each with out-degree zero, representing the output variable.
•

Intermediate nodes, called gates, perform logical operations (e.g., $\mathsf{NOT}$ , $\mathsf{OR}$ , $\mathsf{AND}$ ) on the inputs. Each gate has one out-edge, representing the result of the computation. The in-degree of gate nodes is also referred to as their fan-in.

The structure of the graph allows the Boolean circuit to evaluate logical functions based on the input values, producing a corresponding output.

Definition 3.21 (Complexity measures of Boolean circuits).

The size of a Boolean circuit $C$ is defined as the number of nodes in its computation graph. The depth of $C$ is the length of the longest path in its computation graph.

To analyze the expressiveness of specific Boolean circuits, we first formally define the concept of languages recognized by a circuit family.

Definition 3.22 (Language recognition with circuit families, Definition 2 in [33]).

A circuit family $\mathcal{C}$ denotes a set of Boolean circuits. The circuit family $\mathcal{C}$ can recognize a language $L\subseteq\{0,1\}^{*}$ if, for every string $s\in\{0,1\}^{*}$ , there exists a Boolean circuit $C_{|s|}\in\mathcal{C}$ with input size $|s|$ such that $C_{|s|}(s)=1$ if and only if $s\in L$ .

We now define complexity classes of languages based on the circuit families capable of recognizing them, with a focus on the resources (e.g., depth, size) these circuits require:

Definition 3.23 ( $\mathsf{NC}^{i}$ , Definition 6.21 in [1]).

A language $L$ belongs to the class $\mathsf{NC}^{i}$ (Nick’s Class) if there is a family of Boolean circuits that can recognize $L$ , where the circuits have $\operatorname{poly}(n)$ size, $O((\log n)^{i})$ depth, and $\mathsf{NOT}$ , $\mathsf{OR}$ , $\mathsf{AND}$ logical gates with bounded fan-in.

Definition 3.24 ( $\mathsf{AC}^{i}$ , Definition 6.22 in [1]).

A language $L$ belongs to the class $\mathsf{AC}^{i}$ if there is a family of Boolean circuits that can recognize $L$ , where the circuits have $\operatorname{poly}(n)$ size, $O((\log n)^{i})$ depth, and $\mathsf{NOT}$ , $\mathsf{OR}$ , $\mathsf{AND}$ logical gates with unbounded fan-in.

Definition 3.25 ( $\mathsf{TC}^{i}$ , Definition 4.34 in [45]).

A language $L$ belongs to the class $\mathsf{TC}^{i}$ if there is a family of Boolean circuits that can recognize $L$ , where the circuits have $\operatorname{poly}(n)$ size, $O((\log n)^{i})$ depth, and unbounded fan-in gates for $\mathsf{NOT}$ , $\mathsf{OR}$ , $\mathsf{AND}$ , and $\mathsf{MAJORITY}$ operations, where a $\mathsf{MAJORITY}$ gate outputs one if more than $1/2$ of its inputs are ones.

Remark 3.26.

The $\mathsf{MAJORITY}$ gates in Definition 3.25 can be substituted with prime-modulus $\mathsf{MOD}$ gates or $\mathsf{THRESHOLD}$ gates. Any Boolean circuit utilizing one of these gates is called a threshold circuit.

Next, we define the complexity class $\mathsf{DET}$ , which plays a critical role in certain hardness results.

Definition 3.27 ( $\mathsf{DET}$ , on page 12 of [13]).

A Boolean circuit belongs to the $\mathsf{DET}$ family if it computes a problem that is $\mathsf{NC}^{1}$ -reducible to the computation of the determinant $\det(A)$ of an $n\times n$ matrix $A$ with $n$ -bit integers.

Fact 3.28 (Inclusion of circuit complexity classes, page 110 on [1], Corollary 4.35 of [45], page 19 of [13]).

We have $\mathsf{NC}^{i}\subseteq\mathsf{AC}^{i}\subseteq\mathsf{TC}^{i}\subseteq\mathsf{NC}^{i+1}$ for all $i\in\mathbb{N}$ . Specifically, for $i=1$ , we have $\mathsf{NC}^{1}\subseteq\mathsf{DET}\subseteq\mathsf{NC}^{2}$ .

Remark 3.29.

As noted on page 116 of [45] and page 110 of [1], it is well-known that $\mathsf{NC}^{i}\subsetneq\mathsf{AC}^{i}\subsetneq\mathsf{TC}^{i}$ for $i=0$ . However, whether $\mathsf{TC}^{0}\subsetneq\mathsf{NC}^{1}$ remains an open problem in circuit complexity. Additionally, as discussed on page 18 of [13], it is unlikely that $\mathsf{DET}\subseteq\mathsf{AC}^{1}$ , despite $\mathsf{DET}$ being bounded between $\mathsf{NC}^{1}$ and $\mathsf{NC}^{2}$ .

We have formulated non-uniform circuit families that allow different structures for different input lengths. While flexible, this lack of uniformity is impractical compared to computational models like Turing machines, where the same device handles all input lengths. To address this, we introduce uniform circuit families, where circuits for all input lengths can be systematically generated by a Turing machine under specific time and space constraints. We begin by introducing $\mathsf{L}$ -uniformity.

Definition 3.30 ( $\mathsf{L}$ -uniformity, Definition 6.5 in [1]).

Let $\mathcal{C}$ denote a circuit family, and let $\mathsf{C}$ denote a language class recognizable by $\mathcal{C}$ . A language $L\subseteq\{0,1\}^{*}$ belongs to the $\mathsf{L}$ -uniform class of $\mathsf{C}$ if there exists an $O(\log n)$ -space Turing machine that can produce a circuit $C_{n}\in\mathcal{C}$ with $n$ variables for any input $1^{n}$ . The circuit $C_{n}$ must recognize $L$ for inputs of size $n$ .

Next, we define $\mathsf{DLOGTIME}$ -uniformity, which refines $\mathsf{L}$ -uniformity by introducing a more computationally practical time constraint. Throughout this paper, references to uniform circuit families specifically denote their $\mathsf{DLOGTIME}$ -uniform versions.

Definition 3.31 ( $\mathsf{DLOGTIME}$ -uniformity, Definition 4.28 in [4]).

Let $\mathsf{C}$ be an $\mathsf{L}$ -uniform language class as defined in Definition 3.30. A language $L\subseteq\{0,1\}^{*}$ belongs to the $\mathsf{DLOGTIME}$ -uniform class of $\mathsf{C}$ if there exists a Turing machine that can produce a circuit $C_{n}\in\mathcal{C}$ with $n$ variables for any input $1^{n}$ within $O(\log n)$ time. The circuit $C_{n}$ must recognize the language $L$ for inputs of size $n$ .

4 Complexity of Graph Neural Networks

In this section, we establish foundational complexity results for each component of a graph neural network (GNN) and then combine these results to derive a circuit complexity bound for the entire multi-layer GNN. Section 4.1 examines activation functions, which form the basics for graph convolution computations. Section 4.2 explores the computation of graph convolution matrices. Section 4.3 analyzes a single GNN layer, the fundamental building block of a multi-layer GNN. In Section 4.4, we investigate graph readout functions and the MLP prediction head, essential for making predictions with a multi-layer GNN. Finally, in Section 4.5, we integrate all components and analyze the complete multi-layer GNN structure, culminating in Section 4.6, where we present the key result: the circuit complexity bound of graph neural networks.

4.1 Computing Activation Functions

In this subsection, we first establish a useful fact about computing pairwise $\max$ and $\min$ functions with $\mathsf{TC}^{0}$ circuits. We then demonstrate that the $\mathsf{ReLU}$ and $\mathsf{LeakyReLU}$ activation functions on embedding matrices can be efficiently computed by uniform threshold circuits.

Fact 4.1 (Computing pairwise $\max$ and $\min$ with $\mathsf{TC}^{0}$ circuits).

Let $a,b\in\mathbb{F}_{p}$ be two $\mathsf{FPN}$ s. If $p\leq\operatorname{poly}(n)$ , there is an $O(1)$ -depth uniform threshold circuit of $\operatorname{poly}(n)$ size and $(d_{\mathrm{std}}+3)$ depth that can compute $\min\{a,b\}$ and $\max\{a,b\}$ .

Proof.

For two $\mathsf{FPN}$ s $a$ and $b$ , we first compute the comparison result $c=1$ f $a\leq b$ , and $c=0$ otherwise, leveraging an $O(1)$ -depth uniform threshold circuit with depth $d_{\mathrm{std}}$ , as stated in Lemma 3.3. Once $c$ is computed, each bit $i\in[p]$ of $\min\{a,b\}$ and $\max\{a,b\}$ can be determined as $\min\{a,b\}_{i}=(c\land a_{i})\lor(\lnot c\land b_{i})$ and $\max\{a,b\}_{i}=(c\land b_{i})\lor(\lnot c\land a_{i})$ respectively, which can be computed with a $3$ -depth Boolean circuit in parallel. Combining the depths for comparison and logical operations, the overall circuit depth is $d_{\mathrm{std}}+3$ . Since $p\leq\operatorname{poly}(n)$ and each operation is computable with a polynomial-size circuit, we conclude that $\operatorname{poly}(n)$ is the size of the entire circuit. This completes the proof. ∎

Lemma 4.2 (Computing $\mathsf{ReLU}$ with $\mathsf{TC}^{0}$ circuits).

Let $X\in\mathbb{F}_{p}^{n\times d}$ be a matrix. If $p\leq\operatorname{poly}(n),d=O(n)$ , there is aa constant-depth uniform threshold circuit of $\operatorname{poly}(n)$ size and depth $(d_{\mathrm{std}}+3)$ that can compute $\mathsf{ReLU}(X)$ .

Proof.

For each pair of subscript $i,j\in[n]$ , the entry $\mathsf{ReLU}(X)_{i,j}$ is formulated as $\mathsf{ReLU}(X)_{i,j}=\max\{0,X_{i,j}\}$ following Definition 3.8. By Fact 4.1, the computation of $\max$ function can be finished with a uniform threshold circuit with depth $(d_{\mathrm{std}}+3$ ) and $\operatorname{poly}(n)$ size. If we compute all the $O(nd)\leq O(n^{2})\leq\operatorname{poly}(n)$ entries in parallel, we can also compute $\mathsf{ReLU}(X)$ with depth $(d_{\mathrm{std}}+3)$ and size $\operatorname{poly}(n)$ . Hence, we finish the proof. ∎

Lemma 4.3 (Computing $\mathsf{LeakyReLU}$ with $\mathsf{TC}^{0}$ circuits).

Let $X\in\mathbb{F}_{p}^{n\times d}$ be a matrix and $s\in\mathbb{F}_{p}$ is the negative slope. If $p\leq\operatorname{poly}(n),d=O(n)$ , there is an $O(1)$ -depth uniform threshold circuit of $\operatorname{poly}(n)$ size and depth $(3d_{\mathrm{std}}+3)$ that can compute $\mathsf{LeakyReLU}(X)$ .

Proof.

Considering $\forall i,j\in[n]$ , the entry $\mathsf{LeakyReLU}(X)_{i,j}$ is formulated as $\mathsf{LeakyReLU}(X)_{i,j}=\max\{0,X_{i,j}\}+s\cdot\min\{0,X_{i,j}\}$ following Definition 3.8. By Fact 4.1, both $\max\{0,X_{i,j}\}$ and $\min\{0,X_{i,j}\}$ function can be computed with a uniform threshold circuit with depth $d_{\mathrm{std}}+3$ and $\operatorname{poly}(n)$ size. After that, we can multiply $\min\{0,X_{i,j}\}$ with slope $s$ and further add it with $\max\{0,X_{i,j}\}$ , which can be computed with a $2d_{\mathrm{std}}$ -depth polynomial size circuit by applying Lemma 3.4 twice. Combining the above computation steps, we have $d_{\mathrm{total}}=d_{\mathrm{std}}+3+2d_{\mathrm{std}}=3d_{\mathrm{std}}+3.$ Since there are $O(nd)\leq O(n^{2})\leq poly(n)$ entries in $X$ and all the computations are finished with polynomial-size circuits, the size of the entire circuit is also $\operatorname{poly}(n)$ . Thus, we complete the proof. ∎

4.2 Computing Graph Convolution Matrices

In this subsection, we present results on the computation of representative graph convolution matrices using uniform threshold circuits.

Lemma 4.4 (Computing GCN convolution matrix with $\mathsf{TC}^{0}$ circuits).

If $p\leq\operatorname{poly}(n)$ , then there is a $\operatorname{poly}(n)$ size uniform threshold circuit with depth $(3d_{\oplus}+3d_{\mathrm{std}}+d_{\mathrm{sqrt}})$ , which can compute the graph convolution matrix $C_{\mathrm{GCN}}$ in Definition 3.11.

Proof.

Considering all the rows $\forall i\in[n]$ in the adjacency matrix $A$ , the corresponding element of the degree matrix $D$ is written as $D_{i,j}=\sum_{j=1}^{n}A_{i,j}$ by definition. Following Lemma 3.4, each iterated summation is computable with a $\operatorname{poly}(n)$ size uniform threshold circuit with $d_{\oplus}$ depth. After that, we can compute all the elements of $(D+I)$ and $(A+I)$ in parallel, which can be finished with a uniform threshold circuit having $\operatorname{poly}(n)$ size and $d_{\mathrm{std}}$ depth by Lemma 3.3. Then we can compute all the elements in $(D+I)^{1/2}$ with a $d_{\mathrm{sqrt}}$ -depth circuit following Lemma 3.6. Finally, we compute the matrix multiplication $(D+I)^{1/2}(A+I)(D+I)^{1/2}$ with uniform threshold circuit of $\operatorname{poly}(n)$ size and depth $2(d_{\mathrm{std}}+d_{\oplus})$ by applying Lemma 3.7 twice. Combining all the aforementioned circuits, we have

	$\displaystyle d_{\mathrm{total}}=$	$\displaystyle~{}d_{\oplus}+d_{\mathrm{std}}+d_{\mathrm{sqrt}}+2(d_{\mathrm{std}}+d_{\oplus})$
	$\displaystyle=$	$\displaystyle~{}3d_{\oplus}+3d_{\mathrm{std}}+d_{\mathrm{sqrt}}.$

Since each step utilizes a circuit of size $\operatorname{poly}(n)$ , the size of the entire circuit is also $\operatorname{poly}(n)$ . Therefore, we can conclude that a $\operatorname{poly}(n)$ size and $(3d_{\oplus}+3d_{\mathrm{std}}+d_{\mathrm{sqrt}})$ depth uniform threshold circuit can compute the GCN convolution matrix $C_{\mathrm{GCN}}$ . ∎

Lemma 4.5 (Computing GIN convolution matrix with $\mathsf{TC}^{0}$ circuits).

If $p\leq\operatorname{poly}(n)$ , then there is a $\operatorname{poly}(n)$ size uniform threshold circuit with depth $3d_{\mathrm{std}}$ , which can compute the graph convolution matrix $C_{\mathrm{GIN}}$ in Definition 3.12.

Proof.

By Lemma 3.4, we can first simply compute the constant $(1+\epsilon)$ with a $d_{\mathrm{std}}$ -depth uniform threshold circuit with size $\operatorname{poly}(n)$ . Then we can consider each element $i,j\in[n]$ of the adjacency matrix $A$ , computing $(C_{\mathrm{GIN}})_{i,j}=A_{i,j}+(1+\epsilon)\cdot I_{i,j}$ in parallel. Applying Lemma 3.4 twice, we can conclude that all the matrix elements can be produced with a $\operatorname{poly}(n)$ size uniform threshold circuit with depth $2d_{\mathrm{std}}$ . Thus, by combining all the aforementioned circuits, we have $d_{\mathrm{total}}=d_{\mathrm{std}}+2d_{\mathrm{std}}=3d_{\mathrm{std}}.$ Therefore, we can conclude that a $\operatorname{poly}(n)$ size and $3d_{\mathrm{std}}$ depth uniform threshold circuit can compute the GIN convolution matrix $C_{\mathrm{GIN}}$ . ∎

Before discussing the $\mathsf{TC}^{0}$ implementation of the GAT convolution matrix, we introduce a key fact on the softmax mechanism, which is crucial for attention computation in GAT.

Lemma 4.6 (Computing $\mathsf{Softmax}$ with $\mathsf{TC}^{0}$ circuits).

Let $X\in\mathbb{F}_{p}^{n\times d}$ be a matrix. If $p\leq\operatorname{poly}(n),d=O(n)$ , there is a $\operatorname{poly}(n)$ size uniform threshold circuit with depth $(d_{\mathrm{exp}}+3d_{\mathrm{std}}+2d_{\oplus}+1)$ that can compute $\mathsf{Softmax}(X)$ in Definition 3.10.

Proof.

By Lemma 3.5, all the $O(nd)\leq O(n^{2})\leq\operatorname{poly}(n)$ entries of $\exp(X)$ can be produced in parallel through a polynomial-size uniform threshold circuit with depth $d_{\mathrm{exp}}$ . Next, we compute $\exp(X)\cdot\boldsymbol{1}_{d}$ using a circuit of depth $(d_{\mathrm{std}}+d_{\oplus})$ as per Lemma 3.7, and extract the diagonal matrix $\operatorname{diag}(\exp(X)\cdot\boldsymbol{1}_{d})$ using a depth-1 circuit. Subsequently, we compute the inverse of the diagonal matrix with a depth- $d_{\mathrm{std}}$ circuit by inverting all diagonal elements in parallel, as guaranteed by Lemma 3.4. We then multiply $\operatorname{diag}(\exp(X)\cdot\boldsymbol{1}_{d})^{-1}$ and $\exp(X)$ using a polynomial-size circuit with depth $(d_{\mathrm{std}}+d_{\oplus})$ , again following Lemma 3.7.

Combining these circuits, the total depth is:

	$\displaystyle d_{\mathrm{total}}=$	$\displaystyle~{}d_{\mathrm{exp}}+(d_{\mathrm{std}}+d_{\oplus})+1+d_{\mathrm{std}}+(d_{\mathrm{std}}+d_{\oplus})$
	$\displaystyle=$	$\displaystyle~{}d_{\mathrm{exp}}+3d_{\mathrm{std}}+2d_{\oplus}+1.$

Since the number of parallel operations is $\operatorname{poly}(n)$ , $\mathsf{Softmax}(X)$ can be computed with a $\operatorname{poly}(n)$ size uniform threshold circuit with depth $(d_{\mathrm{exp}}+3d_{\mathrm{std}}+2d_{\oplus}+1)$ . This completes the proof. ∎

Building on the softmax computation fact, we show that the GAT convolution matrix can indeed be computed with $\mathsf{TC}^{0}$ circuits, as formalized in the following lemma.

Lemma 4.7 (Computing GAT convolution matrix with $\mathsf{TC}^{0}$ circuits).

If $p\leq\operatorname{poly}(n),d=O(n)$ , then there is a $\operatorname{poly}(n)$ size uniform threshold circuit with depth $(d_{\mathrm{exp}}+9d_{\mathrm{std}}+4d_{\oplus}+4)$ , which can compute the graph convolution matrix $C_{\mathrm{GAT}}$ in Definition 3.13.

Proof.

Let $\omega\in\mathbb{F}_{p}$ denote a $\mathsf{FPN}$ with a negative infinity value, and let the model weight $a=(\alpha_{1},\dots,\alpha_{p},\alpha_{p+1},\dots,\alpha_{2p})\in\mathbb{F}_{p}^{2d}$ . We define $a_{1}=(\alpha_{1},\dots,\alpha_{p})\in\mathbb{F}_{p}^{d}$ and $a_{2}=(\alpha_{p+1},\dots,\alpha_{2p})\in\mathbb{F}_{p}^{d}$ , such that $a=a_{1}\|a_{2}$ . For all $i,j\in[n]$ , each entry $E_{i,j}$ of the attention weight matrix is given by:

	$\displaystyle E_{i,j}=$	$\displaystyle~{}A_{i,j}\cdot a^{\top}\mathsf{LeakyReLU}(WX_{i}\\|WX_{j})+(1-A_{i,j})\cdot\omega$
	$\displaystyle=$	$\displaystyle~{}A_{i,j}\cdot a_{1}^{\top}\mathsf{LeakyReLU}(WX_{i})+A_{i,j}\cdot a_{2}^{\top}\mathsf{LeakyReLU}(WX_{j})+(1-A_{i,j})\cdot\omega,$

following Definition 3.13. We examine the three terms separately:

\displaystyle E_{i,j}=\underbrace{A_{i,j}\cdot a_{1}^{\top}\mathsf{LeakyReLU}(WX_{i})}_{:=E^{1}_{i,j}}+\underbrace{A_{i,j}\cdot a_{2}^{\top}\mathsf{LeakyReLU}(WX_{j})}_{:=E^{2}_{i,j}}+\underbrace{(1-A_{i,j})\cdot\omega}_{:=E^{3}_{i,j}}.

For the first term $E^{1}_{i,j}$ , there is a $\operatorname{poly}(n)$ size and $(d_{\mathrm{std}}+d_{\oplus})$ uniform threshold circuit to compute $WX_{i}$ following Lemma 3.7. Then we can compute $\mathsf{LeakyReLU}(WX_{i})$ with a $(3d_{\mathrm{std}}+3)$ depth circuit by Lemma 3.9. After that, we can combine Lemma 3.7 and Lemma 3.4 to multiply $A_{i,j}$ , $a^{\top}_{1}$ and $\mathsf{LeakyReLU}(WX_{i})$ with a polynomial size and $(2d_{\mathrm{std}}+d_{\oplus})$ depth threshold circuit. Combining all the circuits above, we can get the total depth $d_{1}$ for the first $E^{1}_{i,j}$ term as follows:

	$\displaystyle d_{1}=$	$\displaystyle~{}(d_{\mathrm{std}}+d_{\oplus})+(3d_{\mathrm{std}}+3)+(2d_{\mathrm{std}}+d_{\oplus})$
	$\displaystyle=$	$\displaystyle~{}6d_{\mathrm{std}}+2d_{\oplus}+3.$

For the second term $E^{2}_{i,j}$ , since its computation is equivalent to the first term, we can conclude that the depth of the second term $d_{2}$ equals $d_{1}$ .

For the third term $E^{3}_{i,j}$ , we can simply apply Lemma 3.4 twice and conclude that $(1-A_{i,j})\cdot\omega$ can be produced with a uniform threshold circuit, in which the size is $\operatorname{poly}(n)$ and the depth is $d_{3}=2d_{\mathrm{std}}$ .

Integrating the computation of all the three terms $E^{1}_{i,j},E^{2}_{i,j},E^{3}_{i,j}$ for $E_{i,j}$ , we can compute these three terms in parallel, and then obtain a polynomial-size uniform threshold circuit with depth $d_{e}=\max\{d_{1},d_{2},d_{3}\}=d_{1}=6d_{\mathrm{std}}+2d_{\oplus}+3$ . Since we have obtained a uniform threshold circuit to compute each element in $E_{i,j}$ and the elements to be computed in parallel is of $O(nd)\leq O(n^{2})\leq\operatorname{poly}(n)$ , we can compute all the elements in $E_{i,j}$ in parallel without increasing the circuit’s polynomial size and depth. At last, we can compute $\mathsf{Softmax}(E)$ to obtain the final convolution matrix, and this can be finished with a uniform threshold circuit with depth $(d_{\mathrm{exp}}+3d_{\mathrm{std}}+2d_{\oplus}+1)$ following Lemma 4.6. Therefore, the final depth of the circuit is the summation of the circuit for computing the matrix $E$ and applying the $\mathsf{Softmax}$ operation, which is:

	$\displaystyle d_{\mathrm{total}}=$	$\displaystyle~{}d_{e}+d_{\mathrm{exp}}+3d_{\mathrm{std}}+2d_{\oplus}+1$
	$\displaystyle=$	$\displaystyle~{}6d_{\mathrm{std}}+2d_{\oplus}+3+d_{\mathrm{exp}}+3d_{\mathrm{std}}+2d_{\oplus}+1$
	$\displaystyle=$	$\displaystyle~{}d_{\mathrm{exp}}+9d_{\mathrm{std}}+4d_{\oplus}+4.$

Therefore, we can conclude that the GAT convolution matrix is computable with a uniform threshold circuit within polynomial size and $(d_{\mathrm{exp}}+9d_{\mathrm{std}}+4d_{\oplus}+4)$ depth, which finishes the proof. ∎

After establishing the computation of all graph convolution matrices, we introduce a simplified notation for their circuit depths. Specifically, we define the depth of the uniform threshold circuit used for graph convolution matrices as $d_{\mathrm{conv}}=\max\{3d_{\oplus}+3d_{\mathrm{std}}+d_{\mathrm{sqrt}},3d_{\mathrm{std}},d_{\mathrm{exp}}+9d_{\mathrm{std}}+4d_{\oplus}+4\}$ , based on the results in Lemma 4.4, Lemma 4.5, and Lemma 4.7.

4.3 Computing Single Graph Neural Network Layer

In this subsection, we combine the results on graph convolution matrices and basic activation functions to determine the circuit depth requirement for a complete GNN layer.

Lemma 4.8 (Computing a single GNN layer with $\mathsf{TC}^{0}$ circuits).

If $p\leq\operatorname{poly}(n),d=O(n)$ , then the GNN layer in Definition 3.14 can be computed with a uniform threshold circuit of $\operatorname{poly}(n)$ size and depth $(d_{\mathrm{conv}}+3d_{\mathrm{std}}+2d_{\oplus}+3)$ .

Proof.

By Lemma 4.4, Lemma 4.5 and Lemma 4.7, we can obtain that the graph convolution matrix can be computed with a $d_{\mathrm{conv}}$ depth polynomial size threshold circuit. By Lemma 3.7, we can further conclude that $CXW$ can be produced by a circuit with polynomial size and $2(d_{\mathrm{std}}+d_{\oplus})$ depth. Then we can compute $\mathsf{ReLU}(CXW)$ with a $(d_{\mathrm{std}}+3)$ -depth and polynomial size circuit following Lemma 4.2. Combining all the circuit depths mentioned above, we have:

	$\displaystyle d_{\mathrm{total}}$	$\displaystyle=~{}d_{\mathrm{conv}}+2(d_{\mathrm{std}}+d_{\oplus})+(d_{\mathrm{std}}+3)$
		$\displaystyle=~{}d_{\mathrm{conv}}+3d_{\mathrm{std}}+2d_{\oplus}+3.$

Since all the steps use a polynomial step circuit, we can conclude that a single GNN layer can be computed with a uniform threshold circuit having $\operatorname{poly}(n)$ size and $(d_{\mathrm{conv}}+3d_{\mathrm{std}}+2d_{\oplus}+3)$ depth. Thus, we finish the proof. ∎

4.4 Computing Pooling Layer and Prediction Head

As shown in Definition 3.15, Definition 3.16, and Definition 3.18, we have described basic components for transforming GNN node embeddings into predictions. Here, we present results on computing these components with uniform threshold circuits, starting with two graph readout layers.

Lemma 4.9 (Computing graph average readout with $\mathsf{TC}^{0}$ circuits).

If $p\leq\operatorname{poly}(n),d=O(n)$ , then the graph average readout layer in Definition 3.15 can be computed with a $\operatorname{poly}(n)$ size uniform threshold circuit with depth $(3d_{\mathrm{std}}+2d_{\oplus})$ .

Proof.

Let $\beta=(\beta_{1},\beta_{2},\dots,\beta_{n})^{\top}\in\mathbb{F}_{p}^{n}$ be the vectorized form of index set $B$ , where $\beta_{i_{k}}=1$ for all $i_{k}\in B$ and $\beta_{i_{k}}=0$ otherwise. Therefore, the output $\mathsf{READOUT}(X)$ in Definition 3.15 is equivalent to the following:

\displaystyle\mathsf{READOUT}(X)=

\displaystyle~{}\frac{1}{|B|}\underbrace{X^{\top}}_{d\times n}\cdot\underbrace{\operatorname{diag}(\beta)}_{n\times n}\cdot\underbrace{\boldsymbol{1}_{n}}_{n\times 1},

where $\boldsymbol{1}_{n}$ is a $n$ -dimensional one vector. Thus, we can apply Lemma 3.7 twice and obtain a $\operatorname{poly}(n)$ size and $2(d_{\mathrm{std}}+d_{\oplus})$ depth uniform threshold circuit that can compute matrix product $X^{\top}\cdot\operatorname{diag}(\beta)\cdot\boldsymbol{1}_{n}$ . Thus, we follow Lemma 3.4 to multiply $\frac{1}{|B|}$ with a depth $d_{\mathrm{std}}$ circuit at $\operatorname{poly}(n)$ size. Combining the two circuits mentioned above, we have the total depth:

d_{\mathrm{total}}=2(d_{\mathrm{std}}+d_{\oplus})+d_{\mathrm{std}}=3d_{\mathrm{std}}+2d_{\oplus}.

Since all the partial circuits are in polynomial size, we can conclude that there is a $\operatorname{poly}(n)$ size and $(3d_{\mathrm{std}}+2d_{\oplus})$ depth uniform threshold circuit which can compute the graph pooling layer. Hence, we finish the proof. ∎

Before delving into the graph maximum readout layer, we establish a useful fact about computing maximum and minimum values in $\mathsf{TC}^{0}$ .

Fact 4.10 (Computing $\max$ and $\min$ of $n$ values with $\mathsf{TC}^{0}$ circuits).

Let $x_{1},x_{2},\cdots,x_{n}\in\mathbb{F}_{p}$ be $n$ $\mathsf{FPN}$ s. If precision $p\leq\operatorname{poly}(n)$ , there is an $O(1)$ -depth uniform threshold circuit of size $\operatorname{poly}(n)$ and depth $(d_{\mathrm{std}}+3)$ that can compute the maximum and minimum of these $\mathsf{FPN}$ s, i.e., $\max\{x_{1},\cdots,x_{n}\}$ and $\min\{x_{1},\cdots,x_{n}\}$ .

Proof.

For all the $\mathsf{FPN}$ pairs $i,j\in[n]$ , we first compute the comparison results between all pairs of the $n$ numbers. Let $C^{\max}\in\{0,1\}^{n\times n}$ , where $C^{\max}_{i,j}=1$ if $x_{i}\geq x_{j}$ and $C^{\max}_{i,j}=0$ otherwise, and $C^{\min}\in\{0,1\}^{n\times n}$ , where $C^{\min}_{i,j}=1$ if $x_{i}\leq x_{j}$ and $C^{\min}_{i,j}=0$ otherwise. Since there are in total $O(n^{2})\leq\operatorname{poly}(n)$ comparison operations, we conclude that each $C_{i,j}$ can be produced in parallel using a $\operatorname{poly}(n)$ size uniform threshold circuit with $d_{\mathrm{std}}$ depth, following Lemma 3.4.

Once $C^{\max}$ and $C^{\min}$ are computed, we compute the dominance vector $D^{\max}\in\{0,1\}^{n}$ in parallel using a 1-depth Boolean circuit, where $D^{\max}_{i}=\bigwedge_{j=1}^{n}C^{\max}_{i,j}$ . Here, $D^{\max}_{i}=1$ indicates that $x_{i}$ is a maximum value. Similarly, we can compute $D^{\min}\in\{0,1\}^{n}$ in parallel, where $D^{\min}_{i}=\bigwedge_{j=1}^{n}C^{\min}_{i,j}$ , to identify the minimum value.

To select the maximum value, we compute each bit $k\in[p]$ of the output $o^{\max}=\max\{x_{1},x_{2},\cdots,x_{n}\}$ in parallel using a 2-depth Boolean circuit:

o_{k}^{\max}=\bigvee_{i=1}^{n}(D^{\max}_{i}\land(x_{i})_{k}),

in which $(x_{i})_{k}$ is the $k$ -th bit of $x_{i}$ . Since all maximum values are equal, if a specific bit $k$ of all maximum values is 1, the OR operation ensures $o_{k}^{\max}=1$ ; similarly, if all maximum values have 0 in bit $k$ , then $o_{k}^{\max}=0$ . Thus, this approach guarantees correctness regardless of the number of maximum values. Similarly, each bit $k\in[p]$ of the minimum output $o^{\min}=\min\{x_{1},x_{2},\cdots,x_{n}\}$ is computed using:

o_{k}^{\min}=\bigvee_{i=1}^{n}(D^{\min}_{i}\land(x_{i})_{k}).

Combining all the steps above, the total circuit depth is therefore $d_{\text{total}}=d_{\mathrm{std}}+1+2=d_{\mathrm{std}}+3$ . Since $p\leq\operatorname{poly}(n)$ and all the steps are computed by a circuit of $\operatorname{poly}(n)$ size, we conclude that the total circuit size is also $\operatorname{poly}(n)$ . This completes the proof. ∎

With this fact, the graph maximum readout layer can be computed seamlessly using $\mathsf{TC}^{0}$ circuits, as stated in the following lemma:

Lemma 4.11 (Computing graph maximum readout with $\mathsf{TC}^{0}$ circuits).

If $p\leq\operatorname{poly}(n),d=O(n)$ , the graph maximum readout layer in Definition 3.16 can be computed using a $\operatorname{poly}(n)$ size uniform threshold circuit with depth $(d_{\mathrm{std}}+3)$ .

Proof.

For each entry $i\in[d]$ of the output $\mathsf{READOUT}(X)$ , since it corresponds to the maximum of $|B|\leq n$ numbers, we can compute them in parallel through a $\operatorname{poly}(n)$ size uniform threshold circuit with depth $(d_{\mathrm{std}}+3)$ by Fact 4.10. This completes the proof. ∎

Following the computation of graph readout functions, we introduce a simplified notation for their circuit depths. Specifically, we define the depth of the uniform threshold circuit used for computing the graph readout layer as $d_{\mathrm{read}}=\max\{3d_{\mathrm{std}}+2d_{\oplus},d_{\mathrm{std}}+3\}$ , which follows the results in Lemma 4.9 and Lemma 4.11.

Next, we introduce the circuit depth for computing an MLP prediction head.

Lemma 4.12 (Computing MLP head with $\mathsf{TC}^{0}$ circuits).

If $p\leq\operatorname{poly}(n),d=O(n)$ , then the MLP prediction head in Definition 3.18 can be computed with a $\operatorname{poly}(n)$ size uniform threshold circuit with depth $(4d_{\mathrm{std}}+2d_{\oplus}+3)$ .

Proof.

By Lemma 3.7, matrix product $Wx$ is computable with a $\operatorname{poly}(n)$ size and $(d_{\mathrm{std}}+d_{\oplus})$ depth uniform threshold circuit. After that, for each $i\in[d]$ , we can compute each element $(Wx+b)_{i}=(Wx)_{i}+b_{i}$ in parallel with a $\operatorname{poly}(n)$ size and $d_{\mathrm{std}}$ size uniform threshold circuit by Lemma 3.4. Then, we can further apply the activation function with a polynomial size and $(d_{\mathrm{std}}+3)$ uniform threshold circuit by applying Lemma 4.2. Finally, following Lemma 3.7, the inner product between $w$ and $\mathsf{ReLU}(Wx+b)$ is computable with a polynomial size and $d_{\mathrm{std}}+d_{\oplus}$ uniform threshold circuit.

Combining all the aforementioned circuits, we have the total circuit depth:

	$\displaystyle d_{\mathrm{total}}=$	$\displaystyle~{}(d_{\mathrm{std}}+d_{\oplus})+d_{\mathrm{std}}+(d_{\mathrm{std}}+3)+(d_{\mathrm{std}}+d_{\oplus})$
	$\displaystyle=$	$\displaystyle~{}4d_{\mathrm{std}}+2d_{\oplus}+3.$

Since the circuits to compute each step are all in polynomial size, we can conclude that the MLP prediction head is computable with a $\operatorname{poly}(n)$ size and $(4d_{\mathrm{std}}+2d_{\oplus}+3)$ depth uniform threshold circuit. Hence, we finish our proof. ∎

4.5 Computing Multi-Layer Graph Neural Network

Since we have presented the circuit depths for all the components of an entire multi-layer GNN, we now derive the overall circuit depth of the entire GNN in this subsection.

Lemma 4.13 (Computing multi-layer GNN with $\mathsf{T}C^{0}$ circuits).

If $p\leq\operatorname{poly}(n),d=O(n)$ , then the multi-layer graph neural network in Definition 3.19 can be computed with a $\operatorname{poly}(n)$ size uniform threshold circuit with depth $(md_{\mathrm{conv}}+(3m+4)d_{\mathrm{std}}+(2m+2)d_{\oplus}+d_{\mathrm{read}}+3m+3)$ .

Proof.

For each layer $i\in[m]$ , $\mathsf{GNN}_{i}$ can be computed with a $\operatorname{poly}(n)$ size and $(d_{\mathrm{conv}}+3d_{\mathrm{std}}+2d_{\oplus}+3)$ depth uniform threshold circuit by Lemma 4.8. Hence, the composition of all the $m$ layers $\mathsf{GNN}_{m}\circ\mathsf{GNN}_{m-1}\circ\cdots\circ\mathsf{GNN}_{1}(X)$ can be computed with a uniform threshold circuit within a total depth of $m(d_{\mathrm{conv}}+3d_{\mathrm{std}}+2d_{\oplus}+3)$ . Then, we can compute the graph readout layer with $\operatorname{poly}(n)$ size and $d_{\mathrm{read}}$ depth uniform threshold circuit, following Lemma 4.9 and Lemma 4.11. Finally, we can apply Lemma 4.12 to compute the MLP prediction head with a $\operatorname{poly}(n)$ size and $(4d_{\mathrm{std}}+2d_{\oplus}+3)$ uniform threshold circuit.

Combining the circuits to compute each layer of the entire model, we have the following total circuit depth:

	$\displaystyle d_{\mathrm{total}}=$	$\displaystyle~{}m(d_{\mathrm{conv}}+3d_{\mathrm{std}}+2d_{\oplus}+3)+d_{\mathrm{read}}+(4d_{\mathrm{std}}+2d_{\oplus}+3)$
	$\displaystyle=$	$\displaystyle~{}md_{\mathrm{conv}}+(3m+4)d_{\mathrm{std}}+(2m+2)d_{\oplus}+d_{\mathrm{read}}+3m+3.$

Since the circuits for computing each part of the model are all in $\operatorname{poly}(n)$ size, we can conclude that the multi-layer GNN can be computed with a $\operatorname{poly}(n)$ size and $(md_{\mathrm{conv}}+(3m+4)d_{\mathrm{std}}+(2m+2)d_{\oplus}+d_{\mathrm{read}}+3m+3)$ depth uniform threshold circuit. Hence, we finish the proof. ∎

4.6 Main Result: Graph Neural Networks Circuit Complexity

Finally, this subsection presents the circuit complexity bound of graph neural networks, which is the main result of this paper.

Theorem 4.14 (Main Result, Circuit complexity bound of graph neural networks).

If precision $p\leq\operatorname{poly}(n)$ , embedding size $d=O(n)$ and the number of layers $m\leq O(1)$ , then the graph neural network $\mathsf{GNN}:\mathbb{F}_{p}^{n\times d}\rightarrow\mathbb{F}_{p}$ in Definition 3.19 can be simulated by the uniform $\mathsf{TC}^{0}$ circuit family.

Proof.

By Lemma 4.13 and $m=O(1)$ , the circuit to compute $\mathsf{GNN}(x)$ has depth

md_{\mathrm{conv}}+(3m+4)d_{\mathrm{std}}+(2m+2)d_{\oplus}+d_{\mathrm{read}}+3m+3=O(1)

and size $\operatorname{poly}(n)$ . Then, we can conclude that a uniform $\mathsf{TC}^{0}$ circuit family can simulate $\mathsf{GNN}(x)$ , following Definition 3.25. Hence, we finish the proof. ∎

The main result in Theorem 4.14 establishes that unless $\mathsf{TC}^{0}=\mathsf{NC}^{1}$ , graph neural networks with $\operatorname{poly}(n)$ precision, constant depth, and $O(n)$ embedding size belong to the uniform $\mathsf{TC}^{0}$ circuit class. This highlights an inherent expressiveness limitation of GNNs, despite their empirical success, as they cannot solve problems beyond the capability of $\mathsf{TC}^{0}$ circuits. In the next section, we illustrate this limitation by analyzing two practical graph query problems.

5 Hardness

We explore two critical decision problems on graphs and their associated hardness results. Section 5.1 introduces two graph connectivity problems. Section 5.2 presents the basic concepts of the graph isomorphism problem. In Section 5.3, we present the main hardness result of this paper, which theoretically demonstrates that graph neural networks cannot solve these problems.

5.1 Graph Connectivity Problem

To formally define the graph connectivity problem, we begin by introducing two basic concepts related to sequences of connected nodes in graphs: walks and paths.

Definition 5.1 (Walk, Definition on page 26 of [48]).

Given a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ , a walk in $\mathcal{G}$ is a finite sequence of nodes, denoted by

v_{0}\rightarrow v_{1}\rightarrow v_{2}\rightarrow\cdots\rightarrow v_{m},

where $v_{0},v_{1},v_{2},\dots,v_{m}\in\mathcal{V}$ and any two consecutive nodes are connected (i.e., $\forall i\in[m],(v_{i-1},v_{i})\in\mathcal{E}$ ).

Definition 5.2 (Path, Definition on page 26 of [48]).

For a walk $v_{0}\rightarrow v_{1}\rightarrow v_{2}\rightarrow\cdots\rightarrow v_{m}$ , if all the nodes $v_{0},\dots,v_{m}$ and all the edges $(v_{0},v_{1}),\cdots,(v_{m-1},v_{m})$ are distinct, then we call this walk as a path.

Building upon these basic concepts, we now formally define two types of graph connectivity problems: the pairwise s-t connectivity problem, which checks whether two specific nodes are connected, and the global graph connectivity problem, which verifies the connectivity of all nodes.

Definition 5.3 (Undirected graph s-t connectivity problem, Definition on page 2 of [47]).

Let $\mathcal{G}=(\mathcal{V},\mathcal{E})$ be an arbitrary undirected graph. The undirected s-t graph connectivity problem is: Does there exist a path between two specific nodes $s,t\in\mathcal{V}$ ?

Definition 5.4 (Undirected graph connectivity problem, Definition on page 2 of [47]).

Let $\mathcal{G}=(\mathcal{V},\mathcal{E})$ be an arbitrary undirected graph. The undirected graph connectivity problem is: Does there exist a path between all the nodes in $\mathcal{V}$ ?

With these problem formulations, we proceed to their computational complexity results.

Lemma 5.5 (Theorem 2 in [47]).

The undirected graph s-t connectivity problem in Definition 5.3 is $\mathsf{NC}^{1}$ -complete.

Lemma 5.6 (Theorem 3 in [11]).

The undirected graph connectivity problem in Definition 5.4 is $\mathsf{NC}^{1}$ -hard.

5.2 Graph Isomorphism Problem

In this subsection, we turn to the graph isomorphism problem, a core challenge in understanding the expressiveness of GNNs [52, 57]. This problem involves determining whether two graphs are structurally identical by examining possible permutations of their nodes.

Definition 5.7 (Graph isomorphism, on page 3 of [43]).

Let $\mathcal{G}_{1}=(\mathcal{V}_{1},\mathcal{E}_{1})$ and $\mathcal{G}_{2}=(\mathcal{V}_{2},\mathcal{E}_{2})$ be two graphs. An isomorphism between $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ is a bijection $\phi$ the between their sets of vertices $\mathcal{V}_{1}$ and $\mathcal{V}_{2}$ which preserves the edges, i.e. $\forall v_{1},v_{2}\in\mathcal{V}_{1},(v_{1},v_{2})\in\mathcal{E}_{1}~{}\Leftrightarrow~{}(\phi(v_{1}),\phi(v_{2}))\in\mathcal{V}_{2}$ and $\forall v_{1},v_{2}\in\mathcal{V}_{2},(v_{1},v_{2})\in\mathcal{E}_{2}~{}\Leftrightarrow~{}(\phi(v_{1}),\phi(v_{2}))\in\mathcal{V}_{1}$ .

We can now define the graph isomorphism problem, which checks the existence of such an isomorphism.

Definition 5.8 (Graph isomorphism problem, on page 3 of [43]).

Let $\mathcal{G}_{1}=(\mathcal{V}_{1},\mathcal{E}_{1})$ and $\mathcal{G}_{2}=(\mathcal{V}_{2},\mathcal{E}_{2})$ be two graphs. The graph isomorphism problem is: Does there exist a graph isomorphism $\phi$ between $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ ?

The computational complexity of the graph isomorphism problem is summarized in the following results:

Lemma 5.9 (Theorem 4.9 in [43]).

The graph isomorphism problem in Definition 5.8 is hard for the class $\mathsf{DET}$ under $\mathsf{AC}^{0}$ reductions.

Corollary 5.10.

The graph isomorphism problem in Definition 5.8 is $\mathsf{N}C^{1}$ -hard.

Proof.

By Lemma 5.9, it is known that the graph isomorphism problem is hard for the class $\mathsf{DET}$ under $\mathsf{AC}^{0}$ reductions. Following Fact 3.28, since $\mathsf{AC}^{0}\subseteq\mathsf{NC}^{1}\subseteq\mathsf{DET}$ , we can conclude that the graph isomorphism problem is $\mathsf{DET}$ -hard ignoring the $\mathsf{AC}^{0}$ reduction condition. Thus, we can apply Fact 3.28 for the second time and conclude that the graph isomorphism problem is $\mathsf{NC}^{1}$ -hard, which finishes the proof. ∎

5.3 Hardness Results

This subsection presents the main hardness results, which highlight the limitations of GNNs on two practical graph decision problems. We begin by showing that GNNs are unable to solve both types of graph connectivity problems.

Theorem 5.11.

Unless $\mathsf{TC}^{0}=\mathsf{NC}^{1}$ , a graph neural network with $\operatorname{poly}(n)$ precision, constant number of layers, embedding size $d=O(n)$ cannot solve the graph s-t connectivity problem.

Proof.

By Theorem 4.14, we have already shown that the graph neural network is in the $\mathsf{TC}^{0}$ circuit family. We can also conclude that the graph s-t connectivity problem is in $\mathsf{NC}^{1}$ by Lemma 5.5. Thus, combining with Fact 3.28, we can complete the proof. ∎

Theorem 5.12.

Proof.

By Theorem 4.14, we have already shown that the graph neural network is in the $\mathsf{TC}^{0}$ circuit family. We can also conclude that the graph connectivity problem is $\mathsf{NC}^{1}$ -hard by Lemma 5.6. Thus, combining with Fact 3.28, we can complete the proof. ∎

Next, we show the hardness results for the graph isomorphism problem, which demonstrates the expressiveness limitations of GNNs from a non-Weisfeiler-Lehman (WL) perspective.

Theorem 5.13.

Proof.

By Theorem 4.14, we have already shown that the graph neural network is in the $\mathsf{TC}^{0}$ circuit family. We can also conclude that the graph connectivity problem is $\mathsf{NC}^{1}$ -hard by Corollary 5.10. Thus, combining with Fact 3.28, we can complete the proof. ∎

Remark 5.14.

Our hardness results assume an embedding size of $d=O(n)$ , which is significantly stronger and encompasses the constant embedding sizes typically used in practice, where $d=O(1)$ . This highlights that the computational limitations of GNNs cannot be easily mitigated by simply increasing the embedding size.

5.4 Discussion

After presenting the main hardness results of this paper, we explore the connections and comparisons with a broad range of relevant works, highlighting the contribution of this paper.

Connection and comparison to WL.

The Weisfeiler-Lehman (WL) hierarchy is the most widely used framework for analyzing the expressiveness of GNNs, particularly for their ability to solve graph isomorphism problems [52, 57]. While our Theorem 5.13 similarly shows that GNNs cannot solve the graph isomorphism problem unless $\mathsf{TC}^{0}=\mathsf{NC}^{1}$ , our results differ fundamentally from previous WL-based findings. Unlike the WL framework, which focuses solely on the topological structure of graphs, our analysis considers practical factors such as floating-point precision and node features. This broader perspective allows us to bound the expressiveness of feature-enhanced GNNs, including those with position encodings [53, 15] or random node features [42]. Moreover, while the WL framework is limited to graph isomorphism, we extend the analysis to show that GNNs cannot solve the graph connectivity problem (Theorem 5.11 and Theorem 5.12), a limitation that is difficult to capture using the WL framework.

Comparison to GNN computational models.

Recent studies have explored GNN expressiveness through computational models. For example, [14] investigates the RL-CONGEST model, which examines the communication complexity of GNNs in distributed settings. However, this work focuses on communication cost rather than expressiveness and is less relevant to our study. The most similar work to ours is [5], which shows that aggregate-combine GNNs cannot capture a variant of first-order logic, $\mathsf{FOC}_{2}$ . Since counting in $\mathsf{FOC}_{2}$ can be simulated by $\mathsf{FO}$ and $\mathsf{FO}=\mathsf{AC}^{0}$ , their result implies that $\mathsf{AC}^{0}$ is not a complexity lower bound for AC-GNNs. In contrast, our work establishes that GNNs are upper-bounded by the $\mathsf{TC}^{0}$ circuit family, providing a clear upper bound rather than refusing a lower bound. This highlights a fundamental difference between our result and previous results in [5].

6 Conclusion

In this work, we show the computational limits of graph neural networks (GNNs). Unlike prior approaches based on the Weisfeiler-Lehman (WL) test, we adopt a fundamentally different perspective: circuit complexity. We show that GNNs with $\operatorname{poly}(n)$ precision, constant number of layers, and embedding sizes $d=O(n)$ , regardless of the specific message-passing mechanisms or global readout functions, belong to the uniform $\mathsf{TC}^{0}$ circuit complexity class. As a result, we establish critical hardness results for two practical graph decision problems: graph connectivity and graph isomorphism. Our findings demonstrate that GNNs cannot solve these problems beyond $\mathsf{TC}^{0}$ , unless $\mathsf{TC}^{0}=\mathsf{NC}^{1}$ . These results are significant as they extend previous GNN expressiveness limitations, which were framed from the WL perspective, by introducing a fundamentally different circuit complexity bound. Our analysis not only incorporates previously overlooked factors, such as floating-point number precision and the interactions between node embeddings and topological structure but also applies to a broader range of graph query problems, such as graph connectivity.

For future works, we believe our theoretical framework offers a trustworthy foundation that can be generalized to assess the expressiveness of other GNN architectures and the hardness of additional graph query problems. Our research may also inspire the development of new architectures that go beyond the complexity of the $\mathsf{TC}^{0}$ circuit family.

References

AB [09] Sanjeev Arora and Boaz Barak. Computational complexity: a modern approach. Cambridge University Press, 2009.
Bab [16] László Babai. Graph isomorphism in quasipolynomial time. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 684–697, 2016.
BAY [22] Shaked Brody, Uri Alon, and Eran Yahav. How attentive are graph attention networks? In International Conference on Learning Representations, 2022.
BI [94] D Mix Barrington and Neil Immerman. Time, hardware, and uniformity. In Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory, pages 176–185. IEEE, 1994.
BKM⁺ [20] Pablo Barceló, Egor V. Kostylev, Mikael Monet, Jorge Pérez, Juan Reutter, and Juan Pablo Silva. The logical expressiveness of graph neural networks. In International Conference on Learning Representations, 2020.
BL [23] Filippo Maria Bianchi and Veronica Lachi. The expressive power of pooling in graph neural networks. Advances in neural information processing systems, 36:71603–71618, 2023.
BRH⁺ [21] Muhammet Balcilar, Guillaume Renton, Pierre Héroux, Benoit Gaüzère, Sébastien Adam, and Paul Honeine. Analyzing the expressive power of graph neural networks in a spectral perspective. In International Conference on Learning Representations, 2021.
Chi [24] David Chiang. Transformers in uniform $\mathsf{TC}^{0}$ . arXiv preprint arXiv:2409.13629, 2024.
[9] Bo Chen, Xiaoyu Li, Yingyu Liang, Jiangxuan Long, Zhenmei Shi, and Zhao Song. Circuit complexity bounds for rope-based transformer architecture. arXiv preprint arXiv:2411.07602, 2024.
[10] Yifang Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, and Zhao Song. The computational limits of state-space models and mamba via the lens of circuit complexity. arXiv preprint arXiv:2412.06148, 2024.
CM [87] Stephen A Cook and Pierre McKenzie. Problems complete for deterministic logarithmic space. Journal of Algorithms, 8(3):385–394, 1987.
CMR [21] Leonardo Cotta, Christopher Morris, and Bruno Ribeiro. Reconstruction for powerful graph representations. Advances in Neural Information Processing Systems, 34:1713–1726, 2021.
Coo [85] Stephen A Cook. A taxonomy of problems with fast parallel algorithms. Information and control, 64(1-3):2–22, 1985.
CWS [24] Guanyu Cui, Zhewei Wei, and Hsin-Hao Su. Rethinking the expressiveness of gnns: A computational model perspective. arXiv preprint arXiv:2410.01308, 2024.
DLL⁺ [22] Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. Graph neural networks with learnable structural and positional representations. In International Conference on Learning Representations, 2022.
FCL⁺ [22] Jiarui Feng, Yixin Chen, Fuhai Li, Anindya Sarkar, and Muhan Zhang. How powerful are k-hop message passing graph neural networks. Advances in Neural Information Processing Systems, 35:4776–4790, 2022.
FML⁺ [19] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. Graph neural networks for social recommendation. In The world wide web conference, pages 417–426, 2019.
GD [23] Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
GJJ [20] Vikas Garg, Stefanie Jegelka, and Tommi Jaakkola. Generalization and representational limits of graph neural networks. In International Conference on Machine Learning, pages 3419–3430. PMLR, 2020.
GS [20] Martin Grohe and Pascal Schweitzer. The graph isomorphism problem. Communications of the ACM, 63(11):128–134, 2020.
GSR⁺ [17] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
HAF [22] Yiding Hao, Dana Angluin, and Robert Frank. Formal language recognition by hard attention transformers: Perspectives from circuit complexity. Transactions of the Association for Computational Linguistics, 10:800–810, 2022.
HDW⁺ [20] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 639–648, 2020.
HYL [17] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
KW [17] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017.
LAG⁺ [23] Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, and Cyril Zhang. Transformers learn shortcuts to automata. In The Eleventh International Conference on Learning Representations, 2023.
LLL⁺ [24] Xiaoyu Li, Yuanpeng Li, Yingyu Liang, Zhenmei Shi, and Zhao Song. On the expressive power of modern hopfield networks. arXiv preprint arXiv:2412.05562, 2024.
LLS⁺ [24] Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song, and Mingda Wan. Theoretical constraints on the expressive power of rope-based tensor attention transformers. arXiv preprint arXiv:2412.18040, 2024.
LW [68] Andrei Leman and Boris Weisfeiler. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsiya, 2(9):12–16, 1968.
MBHSL [19] Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, and Yaron Lipman. Provably powerful graph networks. Advances in neural information processing systems, 32, 2019.
MRF⁺ [19] Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 4602–4609, 2019.
MRM [20] Christopher Morris, Gaurav Rattan, and Petra Mutzel. Weisfeiler and leman go sparse: Towards scalable higher-order graph embeddings. Advances in Neural Information Processing Systems, 33:21824–21840, 2020.
MS [23] William Merrill and Ashish Sabharwal. The parallelism tradeoff: Limitations of log-precision transformers. Transactions of the Association for Computational Linguistics, 11:531–545, 2023.
MS [24] William Merrill and Ashish Sabharwal. A logic for expressing log-precision transformers. Advances in Neural Information Processing Systems, 36, 2024.
MSS [22] William Merrill, Ashish Sabharwal, and Noah A Smith. Saturated transformers are constant-depth threshold circuits. Transactions of the Association for Computational Linguistics, 10:843–856, 2022.
PW [22] Pál András Papp and Roger Wattenhofer. A theoretical comparison of graph neural network extensions. In International Conference on Machine Learning, pages 17323–17345. PMLR, 2022.
QRG⁺ [22] Chendi Qian, Gaurav Rattan, Floris Geerts, Mathias Niepert, and Christopher Morris. Ordered subgraph aggregation networks. Advances in Neural Information Processing Systems, 35:21030–21045, 2022.
Ruz [81] Walter L Ruzzo. On uniform circuit complexity. Journal of Computer and System Sciences, 22(3):365–383, 1981.
SAL⁺ [24] Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568:127063, 2024.
Sat [20] Ryoma Sato. A survey on the expressive power of graph neural networks. arXiv preprint arXiv:2003.04078, 2020.
SLYS [21] Aravind Sankar, Yozen Liu, Jun Yu, and Neil Shah. Graph neural networks for friend ranking in large-scale social platforms. In Proceedings of the Web Conference 2021, pages 2535–2546, 2021.
SYK [21] Ryoma Sato, Makoto Yamada, and Hisashi Kashima. Random features strengthen graph neural networks. In Proceedings of the 2021 SIAM international conference on data mining (SDM), pages 333–341. SIAM, 2021.
Tor [04] Jacobo Torán. On the hardness of graph isomorphism. SIAM Journal on Computing, 33(5):1093–1108, 2004.
VCC⁺ [18] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations, 2018.
Vol [99] Heribert Vollmer. Introduction to circuit complexity: a uniform approach. Springer Science & Business Media, 1999.
WHW⁺ [19] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval, pages 165–174, 2019.
Wig [92] Avi Wigderson. The complexity of graph connectivity. In International Symposium on Mathematical Foundations of Computer Science, pages 112–132. Springer, 1992.
Wil [10] R.J. Wilson. Introduction to Graph Theory. Longman, 2010.
WSZ⁺ [19] Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. Simplifying graph convolutional networks. In International conference on machine learning, pages 6861–6871. PMLR, 2019.
WW [22] Asiri Wijesinghe and Qing Wang. A new perspective on ”how graph neural networks go beyond weisfeiler-lehman?”. In International Conference on Learning Representations, 2022.
WWCBF [22] Yuyang Wang, Jianren Wang, Zhonglin Cao, and Amir Barati Farimani. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, 4(3):279–287, 2022.
XHLJ [18] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2018.
YYL [19] Jiaxuan You, Rex Ying, and Jure Leskovec. Position-aware graph neural networks. In International conference on machine learning, pages 7134–7143. PMLR, 2019.
ZC [18] Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. Advances in neural information processing systems, 31, 2018.
ZCNC [18] Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. An end-to-end deep learning architecture for graph classification. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
ZFD⁺ [23] Bohang Zhang, Guhao Feng, Yiheng Du, Di He, and Liwei Wang. A complete expressiveness hierarchy for subgraph gnns via subgraph weisfeiler-lehman tests. In International Conference on Machine Learning, pages 41019–41077. PMLR, 2023.
ZGD⁺ [24] Bohang Zhang, Jingchu Gai, Yiheng Du, Qiwei Ye, Di He, and Liwei Wang. Beyond weisfeiler-lehman: A quantitative framework for GNN expressiveness. In The Twelfth International Conference on Learning Representations, 2024.
ZZXT [21] Zhaocheng Zhu, Zuobai Zhang, Louis-Pascal Xhonneux, and Jian Tang. Neural bellman-ford networks: A general graph neural network framework for link prediction. Advances in Neural Information Processing Systems, 34:29476–29490, 2021.

On the Computational Capability of Graph Neural Networks: A Circuit Complexity Bound Perspective

1 Introduction

2 Related Work

2.1 Limitations of Graph Neural Networks

2.2 Circuit Complexity and Neural Networks

3 Preliminary

Notations.

3.1 Floating Point Numbers

Definition 3.1 (Floating Point Numbers (𝖥𝖯𝖭\mathsf{FPN}s), Definition 9 in [8]).

Definition 3.2 (Rounding, Definition 9 in [8]).

Definition 3.3 (𝖥𝖯𝖭\mathsf{FPN} operations, page 5 on [8]).

Lemma 3.4 (Computing 𝖥𝖯𝖭\mathsf{FPN} operations with 𝖳𝖢0\mathsf{TC}^{0} circuits, Lemma 10 and Lemma 11 in [8]).

Lemma 3.5 (Computing exp\exp with 𝖳𝖢0\mathsf{TC}^{0} circuits, Lemma 12 in [8]).

Lemma 3.6 (Computing square root with 𝖳𝖢0\mathsf{TC}^{0} circuits, Lemma 12 in [8]).

Lemma 3.7 (Computing matrix multiplication with 𝖳𝖢0\mathsf{TC}^{0} circuits, Lemma 4.2 in [9]).

3.2 Graph Neural Networks

Definition 3.8 (ReLU).

Definition 3.9 (Leaky ReLU).

Definition 3.10 (Softmax).

Definition 3.11 (GCN convolution matrix).

Definition 3.12 (GIN convolution matrix).

Definition 3.13 (GAT convolution matrix).

Definition 3.14 (One GNN layer).

Definition 3.15 (Graph average readout layer).

Definition 3.16 (Graph maximum readout layer).

Remark 3.17.

Definition 3.18 (MLP prediction head).

Definition 3.19 (Multi-layer GNN).

3.3 Circuit Complexity Classes

Definition 3.20 (Boolean circuit, Definition 6.1 in [1]).

Definition 3.21 (Complexity measures of Boolean circuits).

Definition 3.22 (Language recognition with circuit families, Definition 2 in [33]).

Definition 3.23 (𝖭𝖢i\mathsf{NC}^{i}, Definition 6.21 in [1]).

Definition 3.24 (𝖠𝖢i\mathsf{AC}^{i}, Definition 6.22 in [1]).

Definition 3.25 (𝖳𝖢i\mathsf{TC}^{i}, Definition 4.34 in [45]).

Remark 3.26.

Definition 3.27 (𝖣𝖤𝖳\mathsf{DET}, on page 12 of [13]).

Fact 3.28 (Inclusion of circuit complexity classes, page 110 on [1], Corollary 4.35 of [45], page 19 of [13]).

Remark 3.29.

Definition 3.30 (𝖫\mathsf{L}-uniformity, Definition 6.5 in [1]).

Definition 3.31 (𝖣𝖫𝖮𝖦𝖳𝖨𝖬𝖤\mathsf{DLOGTIME}-uniformity, Definition 4.28 in [4]).

4 Complexity of Graph Neural Networks

4.1 Computing Activation Functions

Fact 4.1 (Computing pairwise max\max and min\min with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Proof.

Lemma 4.2 (Computing 𝖱𝖾𝖫𝖴\mathsf{ReLU} with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Proof.

Lemma 4.3 (Computing 𝖫𝖾𝖺𝗄𝗒𝖱𝖾𝖫𝖴\mathsf{LeakyReLU} with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Proof.

4.2 Computing Graph Convolution Matrices

Lemma 4.4 (Computing GCN convolution matrix with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Proof.

Lemma 4.5 (Computing GIN convolution matrix with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Proof.

Lemma 4.6 (Computing 𝖲𝗈𝖿𝗍𝗆𝖺𝗑\mathsf{Softmax} with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Proof.

Lemma 4.7 (Computing GAT convolution matrix with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Proof.

4.3 Computing Single Graph Neural Network Layer

Lemma 4.8 (Computing a single GNN layer with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Proof.

4.4 Computing Pooling Layer and Prediction Head

Lemma 4.9 (Computing graph average readout with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Proof.

Fact 4.10 (Computing max\max and min\min of nn values with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Proof.

Lemma 4.11 (Computing graph maximum readout with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Proof.

Lemma 4.12 (Computing MLP head with 𝖳𝖢0\mathsf{TC}^{0} circuits).

Proof.

4.5 Computing Multi-Layer Graph Neural Network

Lemma 4.13 (Computing multi-layer GNN with 𝖳​C0\mathsf{T}C^{0} circuits).

Proof.

4.6 Main Result: Graph Neural Networks Circuit Complexity

Theorem 4.14 (Main Result, Circuit complexity bound of graph neural networks).

Proof.

5 Hardness

5.1 Graph Connectivity Problem

Definition 5.1 (Walk, Definition on page 26 of [48]).

Definition 5.2 (Path, Definition on page 26 of [48]).

Definition 3.1 (Floating Point Numbers ( $\mathsf{FPN}$ s), Definition 9 in [8]).

Definition 3.3 ( $\mathsf{FPN}$ operations, page 5 on [8]).

Lemma 3.4 (Computing $\mathsf{FPN}$ operations with $\mathsf{TC}^{0}$ circuits, Lemma 10 and Lemma 11 in [8]).

Lemma 3.5 (Computing $\exp$ with $\mathsf{TC}^{0}$ circuits, Lemma 12 in [8]).

Lemma 3.6 (Computing square root with $\mathsf{TC}^{0}$ circuits, Lemma 12 in [8]).

Lemma 3.7 (Computing matrix multiplication with $\mathsf{TC}^{0}$ circuits, Lemma 4.2 in [9]).

Definition 3.23 ( $\mathsf{NC}^{i}$ , Definition 6.21 in [1]).

Definition 3.24 ( $\mathsf{AC}^{i}$ , Definition 6.22 in [1]).

Definition 3.25 ( $\mathsf{TC}^{i}$ , Definition 4.34 in [45]).

Definition 3.27 ( $\mathsf{DET}$ , on page 12 of [13]).

Definition 3.30 ( $\mathsf{L}$ -uniformity, Definition 6.5 in [1]).

Definition 3.31 ( $\mathsf{DLOGTIME}$ -uniformity, Definition 4.28 in [4]).

Fact 4.1 (Computing pairwise $\max$ and $\min$ with $\mathsf{TC}^{0}$ circuits).

Lemma 4.2 (Computing $\mathsf{ReLU}$ with $\mathsf{TC}^{0}$ circuits).

Lemma 4.3 (Computing $\mathsf{LeakyReLU}$ with $\mathsf{TC}^{0}$ circuits).

Lemma 4.4 (Computing GCN convolution matrix with $\mathsf{TC}^{0}$ circuits).

Lemma 4.5 (Computing GIN convolution matrix with $\mathsf{TC}^{0}$ circuits).

Lemma 4.6 (Computing $\mathsf{Softmax}$ with $\mathsf{TC}^{0}$ circuits).

Lemma 4.7 (Computing GAT convolution matrix with $\mathsf{TC}^{0}$ circuits).

Lemma 4.8 (Computing a single GNN layer with $\mathsf{TC}^{0}$ circuits).

Lemma 4.9 (Computing graph average readout with $\mathsf{TC}^{0}$ circuits).

Fact 4.10 (Computing $\max$ and $\min$ of $n$ values with $\mathsf{TC}^{0}$ circuits).

Lemma 4.11 (Computing graph maximum readout with $\mathsf{TC}^{0}$ circuits).

Lemma 4.12 (Computing MLP head with $\mathsf{TC}^{0}$ circuits).

Lemma 4.13 (Computing multi-layer GNN with $\mathsf{T}C^{0}$ circuits).