This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Jet Discrimination with Quantum Complete Graph Neural Network

Yi-An Chen [email protected]    Kai-Feng Chen Department of Physics, National Taiwan University, Taipei, Taiwan
Abstract

Machine learning, particularly deep neural networks, has been widely used in high-energy physics, demonstrating remarkable results in various applications. Furthermore, the extension of machine learning to quantum computers has given rise to the emerging field of quantum machine learning. In this paper, we propose the Quantum Complete Graph Neural Network (QCGNN), which is a variational quantum algorithm based model designed for learning on complete graphs. QCGNN with deep parametrized operators offers a polynomial speedup over its classical and quantum counterparts, leveraging the property of quantum parallelism. We investigate the application of QCGNN with the challenging task of jet discrimination, where the jets are represented as complete graphs. Additionally, we conduct a comparative analysis with classical models to establish a performance benchmark. The code is available at https://github.com/NTUHEP-QML/QCGNN

I Introduction

The proton-proton collisions at the Large Hadron Collider (LHC) produce jets from hard scattering events. Jets are collimated sprays of particles formed through the hadronization of elementary particles. Jet discrimination, i.e., identifying the type of elementary particle that initiates the jet, is one of the most challenging tasks in particle physics.

Deep neural networks (DNNs), celebrated for their architectural flexibility and expressive power, have been widely adopted in high-energy physics (HEP) [1, 2, 3]. Designing a DNN model tailored for jet discrimination poses a significant challenge due to the variable number of constituent particles within jets. Various data representations and DNN models have been proposed for jet discrimination, including images [4, 5, 6, 7, 8, 9, 10, 11], sequences [12, 13, 14, 15, 16, 17, 18], trees [19, 20], graphs [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33], and sets [34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]. Jet images are typically two-dimensional representations (e.g., pseudorapidity η\eta versus azimuthal angle ϕ\phi) where particle information is encoded in a discretized two-dimensional grid. Sequences or trees order particles according to specific criteria (e.g., by transverse momentum or distance parameter). Despite the simplicity of these representations, they often lose information about individual particles, lack translational or rotational invariance, or disregard permutation invariance. To preserve the information of each constituent particle and the relevant symmetries, graphs or sets are widely used for jet representation, with each particle represented as a node in a graph or an element of a set.

In the upcoming high-luminosity LHC (HL-LHC), the data volume is expected to increase by several orders of magnitude compared to the LHC. The increased luminosity and event complexity due to pile-up will make data analysis even more challenging. Consequently, efficient methodologies and novel technologies in data analysis, such as parallel computing and machine learning, are in high demand. Furthermore, quantum computing has made significant strides in recent decades, leading to the development of quantum machine learning (QML) [45, 46, 47, 48, 49]. QML leverages the unique properties of quantum systems, such as superposition and entanglement, to potentially achieve learning capabilities unattainable with classical computers. QML has been explored in several HEP analyses [50], including reconstruction [51, 52, 53], classification [54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67], anomaly detection [68, 69, 70, 71, 72], and data generation [73, 74, 75, 76, 77].

In this paper, we introduce the Quantum Complete Graph Neural Network (QCGNN), a variational quantum algorithm [78, 79, 80] based model specifically designed for learning complete graphs [81]. For QCGNN with deep parametrized operators, it has a polynomial speedup over its classical counterparts by utilizing the property of quantum parallelism. The application of QCGNN is studied through jet discrimination using two public datasets: the Top dataset [82, 83] for binary classification and the JetNet dataset [84, 85] for multi-class classification.

The structure of this paper is as follows: In Sec. II, we describe the architectures of QCGNN and the classical graph neural networks used for benchmarking, as well as discuss the computational complexity of learning with classical and quantum models. Sec. III details the experimental setup for jet discrimination, with the results presented in Sec. IV. Finally, we summarize our findings in Sec. V.

II Methodology

II.1 Graph Neural Network

Graphs are ubiquitous data structures that represent relationships and connections between entities. Analyzing and extracting valuable information from graph-structured data is a fundamental challenge in modern data science and machine learning. To address this challenge, Graph Neural Networks (GNNs) have emerged as a powerful and versatile framework for learning from graph-structured data [86, 87, 88].

A graph GG is described by its set of nodes and edges. Let NN denote the number of nodes, and EijE_{ij} the edge from the ii-th node to the jj-th node. Throughout this paper, the graphs are assumed to be undirected (Eij=EjiE_{ij}=E_{ji}) and unweighted (all edges have equal weights). Furthermore, a graph is considered complete [81] if all pairs of nodes are connected. A complete graph that is undirected and unweighted can also be viewed as a set. Each node is associated with a feature vector 𝐱d\mathbf{x}\in\mathbb{R}^{d}, where dd is the dimension of the features.

Graphs are permutation invariant, meaning that reordering the indices of the nodes does not alter the information contained in the graph. The general structure of permutation-invariant models has been studied in [89, 34]. In this work, we focus on a specific type of GNN, the Message Passing Graph Neural Network (MPGNN) [90], which provides a simple and intuitive approach to designing GNNs. The MPGNN is constructed by iterating the following formula111The formula is adapted from the PyTorch Geometric documentation https://pytorch-geometric.readthedocs.io/en/latest/tutorial/create_gnn.html. The ej,ie_{j,i} term is omitted here as we only consider undirected and unweighted graphs.

𝐱i(k)=γ(k)(𝐱i(k1),j𝒩(i)Φ(k)(𝐱i(k1),𝐱j(k1))),\mathbf{x}^{(k)}_{i}=\gamma^{(k)}\left(\mathbf{x}^{(k-1)}_{i},\bigoplus_{\begin{subarray}{c}j\in\mathcal{N}(i)\end{subarray}}\Phi^{(k)}(\mathbf{x}^{(k-1)}_{i},\mathbf{x}^{(k-1)}_{j})\right), (1)

where Φ(k)\Phi^{(k)} extracts information from the neighboring nodes 𝒩(i)\mathcal{N}(i) of the ii-th node, and γ(k)\gamma^{(k)} updates the node features in the kk-th iteration. As pointed out in [89], summation over a set is the sufficient and necessary condition for satisfying permutation invariance. This concept can be generalized by the aggregation function \bigoplus, which is typically chosen as SUM, MEAN, MAX, or MIN222Note that MAX and MIN can be expressed as summation over the pp-norm, which can be absorbed into Φ\Phi., among others. A brief discussion on the relation between state-of-the-art models and the MPGNN is provided in Appendix A.

II.2 Quantum Complete Graph Neural Network

Refer to caption
Figure 1: Example of a QCGNN ansatz for learning a 3-node complete graph, with nI=2n_{I}=2 and nQ=4n_{Q}=4. The quantum state is initially prepared in a uniform quantum state as described in Eq. 2 via a uniform state oracle (blue block). The dashed box contains the encoding (red blocks) and parametrized operators (green blocks), which may be re-uploaded RR times [91]. The encoding operators consist of multi-controlled operators that correspond to Eq. 3, transforming the quantum state to Eq. 4, or, in this example, Eq. 5. The empty (\circ) and filled (\bullet) circles in the IR represent the controlled conditions for the controlled qubits: the encoding operators are applied if the corresponding qubit in IR is in state |0\ket{0} and |1\ket{1}, respectively. Since the encoding operators are operated with different controlled conditions, the red controlled encoding blocks are mutually commutable, i.e., the quantum state stays the same if one first encode 𝐱1\mathbf{x}_{1} then 𝐱0\mathbf{x}_{0} with their corresponding controlled condition. The quantum state then evolves to Eq. 6 through the parametrized operators. If the dashed box is re-uploaded RR times, the quantum state evolves to Eq. 7. Finally, the qubits in the IR are measured in the XX basis as described in Eq. 10, while the NR qubits are measured using Pauli-ZZ observables to calculate the result in Eq. 9. The details of the encoding and parametrized operators used are provided in Sec. III.2.

In the noisy intermediate-scale quantum (NISQ) era [92], variational quantum algorithms [78, 79, 80] present an intuitive approach to implementing quantum neural networks. These networks utilize a variational quantum circuit (VQC) with tunable parameters updated via classical optimization routines. Typically, an nn-qubit VQC can be expressed as:

f(𝐱,𝜽)=0|nU(𝐱,𝜽)PU(𝐱,𝜽)|0nf(\mathbf{x},\bm{\theta})=\bra{0}^{\otimes n}U^{\dagger}(\mathbf{x},\bm{\theta})PU(\mathbf{x},\bm{\theta})\ket{0}^{\otimes n}

for some Pauli string observable PP and unitary operator UU. The unitary operator encodes the data 𝐱\mathbf{x} into the quantum state and includes several tunable parameters 𝜽\bm{\theta}. These parameters are optimized via gradient descent [93, 94, 95, 96] using an appropriate loss function. Typically, UU can be divided into encoding and parametrized operators, denoted as UENCU_{\text{ENC}} and UPARAMU_{\text{PARAM}}, respectively.

The QCGNN architecture comprises two qubit registers: an index register (IR) and a neural network register (NR), with nIn_{I} and nQn_{Q} qubits, respectively. Fig. 1 illustrates an example of a QCGNN circuit with nI=2n_{I}=2 and nQ=4n_{Q}=4. To encode the information of an undirected and unweighted complete graph with NN nodes, we set nI=log2(N)n_{I}=\lceil\log_{2}(N)\rceil. For graphs with different number of nodes, nIn_{I} could be different, e.g., a 4-particle jet needs nI=2n_{I}=2 qubits, while a 5-particle jet requires nI=3n_{I}=3. We will show that the output of QCGNN justifies that QCGNN can handle variable size of graphs. The quantum state in the IR is initialized to uniform basis states through a uniform state oracle (USO) and evolves as

|ψ0=1Ni=0N1|i|0nQ,\ket{\psi_{0}}=\frac{1}{\sqrt{N}}\sum^{N-1}_{i=0}\ket{i}\ket{0}^{\otimes n_{Q}}, (2)

where |0nQ\ket{0}^{\otimes n_{Q}} is the initial quantum state in the NR with all qubits in |0\ket{0} state. The decimal representation, e.g., |0\ket{0}, |1\ket{1}, |2\ket{2}, |3\ket{3}, is used here, equivalent to the binary representation |00\ket{00}, |01\ket{01}, |10\ket{10}, |11\ket{11}. The context should clarify whether the decimal or binary representation is being used. If NN is a power of 2, i.e., nI=log2Nn_{I}=\log_{2}N, the USO can be constructed using Hadamard gates. Otherwise, other methods described in [97, 98] can be employed.

The node features 𝐱i\mathbf{x}_{i} of the ii-th node are encoded through a series of unitary operators UENC(𝐱i)U_{\text{ENC}}(\mathbf{x}_{i}), controlled by the corresponding basis state in IR, where the control condition is the binary representation of ii with nIn_{I} digits, denoted as Cbin(i)nI(UENC(𝐱i))C^{n_{I}}_{\text{bin}(i)}\Bigl{(}U_{\text{ENC}}(\mathbf{x}_{i})\Bigr{)}. This controlled operator acts on the quantum state as follows:

Cbin(i)nI(UENC(𝐱i))[|j|0nQ]=δij|jUENC(𝐱i)|0nQ+(1δij)|j|0nQ,\begin{split}C^{n_{I}}_{\text{bin}(i)}&\Bigl{(}U_{\text{ENC}}(\mathbf{x}_{i})\Bigr{)}\Bigl{[}\ket{j}\ket{0}^{\otimes n_{Q}}\Bigr{]}=\\ &\delta_{ij}\ket{j}U_{\text{ENC}}(\mathbf{x}_{i})\ket{0}^{\otimes n_{Q}}+(1-\delta_{ij})\ket{j}\ket{0}^{\otimes n_{Q}},\end{split} (3)

where δij\delta_{ij} is the Kronecker delta. The controlled operator Cbin(i)nI(UENC(𝐱i))C^{n_{I}}_{\text{bin}(i)}\Bigl{(}U_{\text{ENC}}(\mathbf{x}_{i})\Bigr{)} on the left-hand side acts on both the IR and NR, while UENC(𝐱i)U_{\text{ENC}}(\mathbf{x}_{i}) on the right-hand side acts only on the NR. After encoding the information of all NN nodes, the quantum state evolves as

|ψ0|ψ1=[i=0N1Cbin(i)nI(UENC(𝐱i))]|ψ0=1Nj=0N1[i=0N1Cbin(i)nI(UENC(𝐱i))](|j|0nQ)=1Ni=0N1|iUENC(𝐱i)|0nQ,\begin{split}\ket{\psi_{0}}&\rightarrow\ket{\psi_{1}}=\Biggl{[}\bigotimes^{N-1}_{i=0}C^{n_{I}}_{\text{bin}(i)}\Bigl{(}U_{\text{ENC}}(\mathbf{x}_{i})\Bigr{)}\Biggr{]}\ket{\psi_{0}}\\ &=\frac{1}{\sqrt{N}}\sum^{N-1}_{j=0}\Biggl{[}\bigotimes^{N-1}_{i=0}C^{n_{I}}_{\text{bin}(i)}\Bigl{(}U_{\text{ENC}}(\mathbf{x}_{i})\Bigr{)}\Biggr{]}\Bigl{(}\ket{j}\ket{0}^{\otimes n_{Q}}\Bigr{)}\\ &=\frac{1}{\sqrt{N}}\sum^{N-1}_{i=0}\ket{i}U_{\text{ENC}}(\mathbf{x}_{i})\ket{0}^{\otimes n_{Q}},\\ \end{split} (4)

where it is used that |j\ket{j} must be acted upon by one of the Cbin(i)nI(UENC(𝐱i))C^{n_{I}}_{\text{bin}(i)}\Bigl{(}U_{\text{ENC}}(\mathbf{x}_{i})\Bigr{)} as ii ranges from 0 to N1N-1. For example, in Fig. 1, where N=3N=3, nQ=4n_{Q}=4, and nI=log2N=2n_{I}=\lceil\log_{2}N\rceil=2, the quantum state |ψ1\ket{\psi_{1}} can be expressed as

|ψ1N=3=13[|0UENC(𝐱0)|0000+|1UENC(𝐱1)|0000+|2UENC(𝐱2)|0000],\begin{split}\ket{\psi^{N=3}_{1}}&=\frac{1}{\sqrt{3}}\biggr{[}\ket{0}U_{\text{ENC}}(\mathbf{x}_{0})\ket{0000}\\ &+\ket{1}U_{\text{ENC}}(\mathbf{x}_{1})\ket{0000}+\ket{2}U_{\text{ENC}}(\mathbf{x}_{2})\ket{0000}\biggr{]},\end{split} (5)

where |0000\ket{0000} is equivalent to |04\ket{0}^{\otimes 4}. At first glance, Eq. 4 and Eq. 5 appear to depend on the node ordering. However, as we will demonstrate, by selecting the appropriate observables, the final output of QCGNN is permutation invariant.

Next, a series of parametrized unitary operators UPARAM(𝜽)U_{\text{PARAM}}(\bm{\theta}) with tunable parameters 𝜽\bm{\theta} are applied to the NR, evolving the quantum state to

|ψ1|ψ2=UPARAM(𝜽)|ψ1=1Ni=0N1|iUPARAM(𝜽)UENC(𝐱i)|0nQ.\begin{split}\ket{\psi_{1}}\rightarrow\ket{\psi_{2}}&=U_{\text{PARAM}}(\bm{\theta})\ket{\psi_{1}}\\ &=\frac{1}{\sqrt{N}}\sum^{N-1}_{i=0}\ket{i}U_{\text{PARAM}}(\bm{\theta})U_{\text{ENC}}(\mathbf{x}_{i})\ket{0}^{\otimes n_{Q}}.\end{split} (6)

To increase the expressive power of VQCs, the data re-uploading technique [91] can be employed. The idea of data re-uploading is to encode the data multiple times, allowing the quantum state to interact with the data in a more complex manner. The data re-uploading technique can be implemented by alternately applying the encoding and parametrized operators several times, but with different parameters. As the encoding operators in |ψ2\ket{\psi_{2}} correspond to the node indices |i\ket{i} in the IR, the data re-uploading technique can be implemented straightforwardly without making each particle information entangled. After re-uploading RR times, the final quantum state |ψ\ket{\psi} evolves as

|ψ=1Ni=0N1|i[r=1RUPARAM(𝜽(r))UENC(𝐱i)]|0nQ,\ket{\psi}=\frac{1}{\sqrt{N}}\sum^{N-1}_{i=0}\ket{i}\Biggl{[}\bigotimes^{R}_{r=1}U_{\text{PARAM}}(\bm{\theta}^{(r)})U_{\text{ENC}}(\mathbf{x}_{i})\Biggr{]}\ket{0}^{\otimes n_{Q}}\\ , (7)

where the parameters 𝜽\bm{\theta} may differ across each UPARAMU_{\text{PARAM}} operator, denoted by the superscript rr. We denote the quantum state of the NR as

|𝐱i,𝜽=[r=1RUPARAM(𝜽(r))UENC(𝐱i)]|0nQ.\ket{\mathbf{x}_{i},\bm{\theta}}=\Biggl{[}\bigotimes^{R}_{r=1}U_{\text{PARAM}}(\bm{\theta}^{(r)})U_{\text{ENC}}(\mathbf{x}_{i})\Biggr{]}\ket{0}^{\otimes n_{Q}}.

Given a Pauli string observable PP applied to the NR, the expectation value of the measurement is given by

ψ|P|ψ=1Ni=0N1j=0N1i|j𝐱i,𝜽|P|𝐱j,𝜽=1Ni=0N1j=0N1δij𝐱i,𝜽|P|𝐱j,𝜽=1Ni=0N1𝐱i,𝜽|P|𝐱i,𝜽,\begin{split}\braket{\psi}{P}{\psi}&=\frac{1}{N}\sum^{N-1}_{i=0}\sum^{N-1}_{j=0}\braket{i}{j}\bra{\mathbf{x}_{i},\bm{\theta}}P\ket{\mathbf{x}_{j},\bm{\theta}}\\ &=\frac{1}{N}\sum^{N-1}_{i=0}\sum^{N-1}_{j=0}\delta_{ij}\bra{\mathbf{x}_{i},\bm{\theta}}P\ket{\mathbf{x}_{j},\bm{\theta}}\\ &=\frac{1}{N}\sum^{N-1}_{i=0}\bra{\mathbf{x}_{i},\bm{\theta}}P\ket{\mathbf{x}_{i},\bm{\theta}},\end{split} (8)

which results in a sum over each node. Now, consider an additional 2nI×2nI2^{n_{I}}\times 2^{n_{I}} Hermitian matrix JJ filled with ones, used as an observable of the IR. Observing that

ψ|JP|ψ=1Ni=0N1j=0N1i|J|j𝐱i,𝜽|P|𝐱j,𝜽=1Ni=0N1j=0N1𝐱i,𝜽|P|𝐱j,𝜽=1Ni=0N1j=0N1h(𝐱i,𝐱j;P),\begin{split}\braket{\psi}{J\otimes P}{\psi}&=\frac{1}{N}\sum^{N-1}_{i=0}\sum^{N-1}_{j=0}\bra{i}J\ket{j}\bra{\mathbf{x}_{i},\bm{\theta}}P\ket{\mathbf{x}_{j},\bm{\theta}}\\ &=\frac{1}{N}\sum^{N-1}_{i=0}\sum^{N-1}_{j=0}\bra{\mathbf{x}_{i},\bm{\theta}}P\ket{\mathbf{x}_{j},\bm{\theta}}\\ &=\frac{1}{N}\sum^{N-1}_{i=0}\sum^{N-1}_{j=0}h(\mathbf{x}_{i},\mathbf{x}_{j};P),\end{split} (9)

where i|J|j=Jij=1\bra{i}J\ket{j}=J_{ij}=1, and we define

h(𝐱i,𝐱j;P)=12[𝐱i,𝜽|P|𝐱j,𝜽+𝐱j,𝜽|P|𝐱i,𝜽],h(\mathbf{x}_{i},\mathbf{x}_{j};P)=\frac{1}{2}\Bigl{[}\bra{\mathbf{x}_{i},\bm{\theta}}P\ket{\mathbf{x}_{j},\bm{\theta}}+\bra{\mathbf{x}_{j},\bm{\theta}}P\ket{\mathbf{x}_{i},\bm{\theta}}\Bigr{]},

which is symmetric, i.e., h(𝐱i,𝐱j;P)=h(𝐱j,𝐱i;P)h(\mathbf{x}_{i},\mathbf{x}_{j};P)=h(\mathbf{x}_{j},\mathbf{x}_{i};P). Notice that Eq. 9 computes the average value over all possible pairs, leading to the permutation invariance of the final output. In practice, the observable JJ can be decomposed as

J=q=1nI[1111]=q=1nI(IqIR+XqIR),J=\bigotimes^{n_{I}}_{q=1}\left[\begin{matrix}1&1\\ 1&1\end{matrix}\right]=\bigotimes^{n_{I}}_{q=1}(I^{\text{IR}}_{q}+X^{\text{IR}}_{q}), (10)

where XqIRX^{\text{IR}}_{q} refers to the Pauli-X observable of the qq-th qubit in IR, and IqIRI^{\text{IR}}_{q} is the 2×22\times 2 identity matrix. The expansion of Eq. 10 on the right-hand side yields 2nIN2^{n_{I}}\approx N different combinations of Pauli string observables. The value of Eq. 9 can be obtained by summing the expectation values of all combinations of Pauli strings in Eq. 10, corresponding to the SUM operation in the classical MPGNN’s aggregation function. In certain cases, one might wish to exclude contributions from nodes themselves, these values can be simply removed by considering

JJq=1nIIqIR,J\longrightarrow J-\bigotimes^{n_{I}}_{q=1}I^{\text{IR}}_{q},

which is equivalent to subtracting the value of Eq. 8 from Eq. 9.

II.3 Computational Complexity

In this subsection, we examine the computational complexity of both MPGNN and QCGNN in the context of complete graphs. We focus on the simplest case where both models map a set of dd-dimensional features to a scalar, i.e., d\mathbb{R}^{d}\rightarrow\mathbb{R}. For instance, the MPGNN might employ a feed-forward neural network terminating in a single neuron, while the QCGNN could utilize one qubit in NR, measured in the ZZ basis (nQ=1n_{Q}=1 and P=ZP=Z). Additionally, we assume that the computational cost of obtaining a scalar output in classical models is roughly equivalent to the cost of measuring a Pauli string observable in quantum models.

To compute all pairwise information, MPGNN requires O(N2)O(N^{2}) computations, with each pair passing through the neural network once. In contrast, due to the quantum parallelism inherent in QCGNN, UPARAMU_{\text{PARAM}} can process all pairs of nodes simultaneously. To aggregate the final result, QCGNN requires only measuring on O(2nI)O(N)O(2^{n_{I}})\approx O(N) Pauli string observables, as indicated by Eq. 10. This suggests that QCGNN could offer a polynomial speedup over MPGNN. However, in the case of QCGNN, additional costs associated with multi-controlled operators and the USO should be taken into account. Although various methods exist for decomposing multi-controlled operators, such as those discussed in [99, 100, 101], for simplicity, we adhere to a basic approach outlined in [102], which requires an additional O(nI)O(log2N)O(n_{I})\approx O(\log_{2}N) ancilla qubits and Toffoli gates. Based on the results in [97, 98], preparing a uniform quantum state necessitates O(log2N)O(\log_{2}N) gates. If the parametrized operators sufficiently deep, these additional costs may become negligible, enabling QCGNN to achieve an O(N)O(N) speedup over MPGNN.

Certain traditional VQC ansatz, such as quantum kernel methods [103, 104], also share similarities with QCGNN. Quantum kernel methods compute the kernel function k(𝐱i,𝐱j)=|𝐱i,𝜽|𝐱j,𝜽|2k(\mathbf{x}_{i},\mathbf{x}_{j})=|\braket{\mathbf{x}_{i},\bm{\theta}}{\mathbf{x}_{j},\bm{\theta}}|^{2} and are typically constructed using UENCU_{\text{ENC}} and UPARAMU_{\text{PARAM}}. These methods have a computational complexity of O(N2)O(N^{2}), as they still require computing each pair individually. Again, if UPARAMU_{\text{PARAM}} is sufficiently deep, the additional costs from data encoding may be negligible, allowing QCGNN to achieve an O(N)O(N) speedup over quantum kernel methods. This advantage arises because QCGNN computes O(N2)O(N^{2}) pairwise information simultaneously, with only O(2nI)O(N)O(2^{n_{I}})\approx O(N) additional measurement costs, as given by Eq. 10.

II.4 Extending QCGNN to General Graphs

QCGNN can also be extended to weighted graphs, but the additional cost might render it impractical. Consider a simple case, where an undirected, weighted graph has an adjacency matrix AA that can be expressed as the outer product of a vector |w=iwi|i\ket{w}=\sum_{i}w_{i}\ket{i}, with edge weight Aij=wiwjA_{ij}=w_{i}w_{j}. Instead of initializing the IR uniformly, we initialize the quantum state as

i=0N11N|i|0nQi=0N1wiw|w|i|0nQ,\sum^{N-1}_{i=0}\frac{1}{\sqrt{N}}\ket{i}\ket{0}^{\otimes n_{Q}}\longrightarrow\sum^{N-1}_{i=0}\frac{w_{i}}{\sqrt{\braket{w}{w}}}\ket{i}\ket{0}^{\otimes n_{Q}},

so that the terms in Eq. 9 are modified as

1Nh(𝐱i,𝐱j;P)wiwjw|wh(𝐱i,𝐱j;P)=AijTr(A)h(𝐱i,𝐱j;P).\frac{1}{N}h(\mathbf{x}_{i},\mathbf{x}_{j};P)\rightarrow\frac{w_{i}w_{j}}{\braket{w}{w}}h(\mathbf{x}_{i},\mathbf{x}_{j};P)=\frac{A_{ij}}{\operatorname{Tr}(A)}h(\mathbf{x}_{i},\mathbf{x}_{j};P).

Instead of initializing with a uniform state, the quantum state |w\ket{w} can be initialized using AMPLITUDE EMBEDDING, where the information of the weights wiw_{i} is embedded in the amplitude of |i\ket{i}.

To generalize to directed and weighted graphs, note that any matrix can be decomposed into symmetric and skew-symmetric matrices. Since both types are normal matrices, they are diagonalizable according to the spectral theorem. One can apply the method described above for each eigenbasis individually and multiply by a factor proportional to the corresponding eigenvalue. However, the additional computational cost associated with diagonalizing matrices and AMPLITUDE EMBEDDING might be substantial, potentially negating the advantages of QCGNN.

For practical applications, we primarily consider the use of QCGNN for undirected, unweighted, and complete graphs. The added complexity of handling weighted, directed, and incomplete graphs may diminish the computational benefits of QCGNN, making it less feasible for real-world applications without further optimizations.

III Experimental Setup

III.1 Dataset for Jet Discrimination

We demonstrate the feasibility of QCGNN using two publicly available Monte Carlo simulated datasets333In the first published version, we used the dataset generated by ourselves using [105, 106, 107, 108, 109]. For the revised version, we switched to other existing public datasets since the number of data is much more sufficient. for jet discrimination: the Top dataset [82] and the JetNet dataset [84]. The jets in both datasets are clustered using the anti-kTk_{T} algorithm [110, 111] with a distance parameter R=0.8R=0.8.

The Top dataset [82] is used for binary classification, distinguishing signal jets from top quarks (Top) and background jets from mixed quark-gluon interactions (QCD). The transverse momentum of the jets is in the range [550,650][550,650] GeV. The dataset is divided into 1.2 million training samples, 400 thousand validation samples, and 400 thousand testing samples. Further details of the Top dataset can be found in [83].

The JetNet dataset [84] is used for multi-class classification, with jets originating from gluons (g), top quarks (t), light quarks (q), WW bosons (w), and ZZ bosons (z). Each class of jet has a transverse momentum of approximately 1 TeV, with around 170 thousand samples. For each jet event, only the top 30 particles with the highest transverse momentum are retained if the number of particles exceeds 30. Further details of the JetNet dataset can be found in [85]

In our approach, each jet is represented as a complete graph. Each node corresponds to a particle in the jet, with node features related to particle flow information. For the ii-th particle, the input features 𝐱i(0)\mathbf{x}^{(0)}_{i} include the transverse momentum fraction zi=pTi/pTjetz_{i}={p_{T}}_{i}/{p_{T}}_{jet}, the relative pseudorapidity Δηi=ηiηjet\Delta\eta_{i}=\eta_{i}-\eta_{jet}, and the relative azimuthal angle Δϕi=ϕiϕjet\Delta\phi_{i}=\phi_{i}-\phi_{jet}. For QCGNN, these input features are further preprocessed as follows:

zitan1(zi),Δηiπ2ΔηiR,Δϕiπ2ΔϕiR,\begin{split}z_{i}&\longrightarrow\tan^{-1}(z_{i}),\\ \Delta\eta_{i}&\longrightarrow\frac{\pi}{2}\frac{\Delta\eta_{i}}{R},\\ \Delta\phi_{i}&\longrightarrow\frac{\pi}{2}\frac{\Delta\phi_{i}}{R},\end{split} (11)

since rotation gates are used for data encoding (see Sec. III.2). Note that the indices of the particles are arbitrary due to the use of permutation-invariant models for graphs.

III.2 Classical and Quantum Models

Refer to caption
Figure 2: The ansatz of encoding operator used in QCGNN with nQ=4n_{Q}=4. The rotation gates RxR_{x} and RyR_{y} are defined in Eq. 13. The rotation angle 𝐱i,j(0)\mathbf{x}^{(0)}_{i,j} corresponds to the jj-th feature of the ii-th particle, with the three features being the transformed particle flow information described in Eq. 11. Since the data encoding method used in each NR qubit is identical, this ansatz can be generalized to any number of qubits.
Refer to caption
Figure 3: The ansatz of strongly entangling layers [112] used in QCGNN is illustrated with nQ=4n_{Q}=4 as an example. The three-angle rotation gate RR is defined in Eq. 14. The parameters 𝜽\bm{\theta} are tunable, with distinct parameters for each repetition ll. Specifically, 𝜽l,1\bm{\theta}_{l,1}, 𝜽l,2\bm{\theta}_{l,2}, 𝜽l,3\bm{\theta}_{l,3}, and 𝜽l,4\bm{\theta}_{l,4} are all three-dimensional parameters corresponding to the arguments of R(α,β,γ)R(\alpha,\beta,\gamma) in Eq. 14. Note that this ansatz can be naturally generalized for nQ3n_{Q}\geq 3. For nQ<3n_{Q}<3, alternative ansatz should be considered.
Refer to caption
Figure 4: The detailed structure of the quantum model based on QCGNN and the classical model based on MPGNN used for benchmarking are described. The particle flow features are defined in Sec. III.1, and the hyperparameters of the models are discussed in Sec. III.3. The classical model on the left is based on MPGNN, with the aggregation function chosen to be SUM. The number of hidden neurons in MPGNN is denoted as nMn_{M}, set equivalently to nQn_{Q} (3 or 6) for comparison with QCGNN. The quantum model on the right is based on QCGNN, with feature preprocessed as described in Eq. 11. The ansatz for the parametrized operators follows the pattern depicted in Fig. 3. Note that the data re-upload technique introduced in [91] is used, which involves encoding the data twice and applying the parametrized operators with different parameters each time. The dimension of the final output is denoted as nCn_{C}, with nC=1n_{C}=1 for the Top dataset and nC=5n_{C}=5 for the JetNet dataset. The final output is passed through a Sigmoid function for binary classification (nC=1n_{C}=1) or a Softmax function for multi-class classification (nC=5n_{C}=5).

The classical model for benchmarking is based on MPGNN from Eq. 1, with the aggregation function chosen to be SUM. The function Φ\Phi is implemented as a feed-forward neural network consisting of linear layers and the ReLU activation functions [113], while γ\gamma is simply the summation of Φ\Phi only, i.e., 𝐱i(1)=jiΦ(𝐱i(0),𝐱j(0))\mathbf{x}^{(1)}_{i}=\sum_{j\neq i}\Phi(\mathbf{x}^{(0)}_{i},\mathbf{x}^{(0)}_{j}), where jj ranges from 0 to N1N-1 except ii. The input to Φ\Phi is simply the concatenation of 𝐱i(0)\mathbf{x}^{(0)}_{i} and 𝐱j(0)\mathbf{x}^{(0)}_{j}, requiring only 6 neurons in the input layer of Φ\Phi. Consequently, the graph feature, denoted as 𝐱C\mathbf{x}^{C}, is computed through

𝐱C=i=0N1𝐱i(1)=i=0N1jiΦ(𝐱i(0),𝐱j(0))=i=0N1j=0N1Φ(𝐱i(0),𝐱j(0))i=0N1Φ(𝐱i(0),𝐱i(0))\begin{split}\mathbf{x}^{C}&=\sum^{N-1}_{i=0}\mathbf{x}^{(1)}_{i}\\ &=\sum^{N-1}_{i=0}\sum_{j\neq i}\Phi(\mathbf{x}^{(0)}_{i},\mathbf{x}^{(0)}_{j})\\ &=\sum^{N-1}_{i=0}\sum^{N-1}_{j=0}\Phi(\mathbf{x}^{(0)}_{i},\mathbf{x}^{(0)}_{j})-\sum^{N-1}_{i=0}\Phi(\mathbf{x}^{(0)}_{i},\mathbf{x}^{(0)}_{i})\end{split} (12)

The structure of MPGNN is similar to the Particle Flow Network (PFN) in [34], with the distinction that PFN calculates Φ(𝐱i(0))\Phi(\mathbf{x}^{(0)}_{i}) for each particle, whereas MPGNN calculates the pairwise information Φ(𝐱i(0),𝐱j(0))\Phi(\mathbf{x}^{(0)}_{i},\mathbf{x}^{(0)}_{j}) between particles.

The quantum model is based on QCGNN, which consists of encoding operators and parametrized operators. The data re-uploading technique [91] is employed 2 times before the final measurements (indicated by the dashed box in Fig. 1 with R=2R=2). For simplicity, we use single-angle rotation gates, defined as

Rx(θ)=[cos(θ/2)isin(θ/2)isin(θ/2)cos(θ/2)],Ry(θ)=[cos(θ/2)sin(θ/2)sin(θ/2)cos(θ/2)],Rz(θ)=[eiθ/200eiθ/2],\begin{split}R_{x}(\theta)&=\begin{bmatrix}\cos(\theta/2)&-i\sin(\theta/2)\\ -i\sin(\theta/2)&\cos(\theta/2)\end{bmatrix},\\ R_{y}(\theta)&=\begin{bmatrix}\cos(\theta/2)&-\sin(\theta/2)\\ \sin(\theta/2)&\cos(\theta/2)\end{bmatrix},\\ R_{z}(\theta)&=\begin{bmatrix}e^{-i\theta/2}&0\\ 0&e^{i\theta/2}\end{bmatrix},\\ \end{split} (13)

and triple-angle rotation gates, defined as

R(α,β,γ)=Rz(γ)Ry(β)Rz(α)=[eiα+γ2cos(β2)eiαγ2sin(β2)eiαγ2sin(β2)eiα+γ2cos(β2)]\begin{split}R(\alpha,\beta,\gamma)&=R_{z}(\gamma)R_{y}(\beta)R_{z}(\alpha)\\ &=\begin{bmatrix}e^{-i\frac{\alpha+\gamma}{2}}\cos(\frac{\beta}{2})&-e^{-i\frac{\alpha-\gamma}{2}}\sin(\frac{\beta}{2})\\ e^{-i\frac{\alpha-\gamma}{2}}\sin(\frac{\beta}{2})&e^{-i\frac{\alpha+\gamma}{2}}\cos(\frac{\beta}{2})\end{bmatrix}\end{split} (14)

to encode the particle flow information. The parametrized operators are constructed with strongly entangling layers [112] using rotation gates and CNOT gates. The ansatz for encoding ii-th particle features and the strongly entangling layers are shown in Fig. 2 and Fig. 3 respectively. The qq-th component of the graph feature 𝐱QnQ\mathbf{x}^{Q}\in\mathbb{R}^{n_{Q}} is computed by

𝐱qQ=N[ψ|JZqNR|ψψ|ZqNR|ψ]=i=0N1j=0N1h(𝐱i(0),𝐱j(0);ZqNR)i=0N1h(𝐱i(0),𝐱i(0);ZqNR)\begin{split}\mathbf{x}^{Q}_{q}&=N\Bigl{[}\braket{\psi}{J\otimes Z^{\text{NR}}_{q}}{\psi}-\braket{\psi}{Z^{\text{NR}}_{q}}{\psi}\Bigr{]}\\ &=\sum^{N-1}_{i=0}\sum^{N-1}_{j=0}h(\mathbf{x}^{(0)}_{i},\mathbf{x}^{(0)}_{j};Z^{\text{NR}}_{q})-\sum^{N-1}_{i=0}h(\mathbf{x}^{(0)}_{i},\mathbf{x}^{(0)}_{i};Z^{\text{NR}}_{q})\\ \end{split} (15)

where the observable ZqNRZ^{\text{NR}}_{q} refers to the Pauli-Z measurement of the qq-th qubit in NR only, and the summations are computed through Eq. 8 and Eq. 9. To be clarified, the subscript ii of 𝐱i(0)\mathbf{x}^{(0)}_{i} corresponds to the ii-th node, while the subscript qq of 𝐱qQ\mathbf{x}^{Q}_{q} refers to the qq-th component of QCGNN output. Note that all qubits in NR can be measured simultaneously, but the measurement output from other qubits is ignored when calculating the expectation value over ZqNRZ^{\text{NR}}_{q}. This setup can be thought of as a classical feed-forward neural network with nQn_{Q} neurons in the output layer. Notice how Eq. 15 resembles Eq. 12, indicating the permutation invariance of the final output.

Eventually, both 𝐱C\mathbf{x}^{C} and 𝐱Q\mathbf{x}^{Q} are followed by another feed-forward neural network consisting of linear layers and ReLU activation functions. The full setup of the classical and quantum models is depicted in Fig. 4. For binary classification using the Top dataset, the output layer has a single neuron followed by a Sigmoid function and is trained with binary cross-entropy loss. For multi-class classification using the JetNet dataset, the output layer has five neurons followed by a Softmax function and is trained with multi-class cross-entropy loss.

III.3 Training Setup and Model Hyperparameters

Refer to caption
Figure 5: Histograms of the number of particles per jet for the Top and the JetNet datasets. A detailed description of both datasets is provided in Sec. III.1. The original JetNet dataset exhibits a sharp distribution at 30 particles, as the original data retain only the first 30 particles with the highest transverse momentum. The histograms for particles with a relative transverse momentum zi0.025z_{i}\geq 0.025 are also given. This threshold is selected to ensure that the majority of the distribution of the number of particles per jet falls between 4 and 16.

The complete training process was conducted with 5 different random seeds, each for 30 epochs. The Top and the JetNet datasets comprise 2 and 5 classes, respectively. For each class, we selected 25,000 training samples, 2,500 validation samples, and 2,500 testing samples. This limited data selection is due to the extensive training time required for the QCGNN, as discussed in Sec. III.4. For each random seed, the data were randomly sampled from the original dataset. To balance between demonstrating the training performance and computational demands, particles with transverse momentum less than 2.5% of pTjet{p_{T}}_{jet}, i.e., zi<0.025z_{i}<0.025, were discarded, so that the majority of the distribution of the number of particles per jet is between 4 and 16. The histograms of the number of particles per jet for the original and preprocessed Top and JetNet datasets are shown in Fig. 5. To mitigate the extensive training time associated with simulating QML, events with fewer than 4 or more than 16 particles were discarded. These choices strike a balance between performance and the amount of training data, as discussed in Appendix B.

Due to limited computational resources, the number of qubits nQn_{Q} in NR was tested with nQ=3n_{Q}=3 and nQ=6n_{Q}=6, with the number of strongly entangling layers in each parametrized operator (UPARAMU_{\text{PARAM}}) set to nQ/3n_{Q}/3. Given that the maximum number of particles in a jet is 16, we used nI=log216=4n_{I}=\lceil\log_{2}16\rceil=4 qubits for IR. The number of hidden neurons nMn_{M} in the MPGNN was set to 3 or 6 for comparison with the QCGNN, ensuring that both models have a comparable number of parameters, as discussed in Appendix C.

We also evaluated the performance of classical state-of-the-art models, including the Particle Flow Network (PFN) [34], Particle Net (PNet) [35], and Particle Transformer (ParT) [36]. The structure and hyperparameters of PFN, PNet, and ParT were configured according to their respective original publications. Notably, we excluded mass information from the interaction matrix of ParT, as only particle flow information (pT,Δη,Δϕ)(p_{T},\Delta\eta,\Delta\phi) was used. The kk-nearest neighbor method used in the original PNet was configured with k=3k=3, given that the minimum number of particles per jet was 4.

The classical models were implemented using PyTorch [114] and PyTorch Geometric [115], while the quantum circuit of QCGNN was simulated using PennyLane [116]. The cross-entropy loss was optimized using the Adam optimizer [117] with a learning rate of 10310^{-3} for all models. The batch size was set to 64, the maximum allowable due to memory constraints, as simulating quantum circuits requires substantial memory resources.

III.4 Implementing QCGNN with Simulators

Unlike classical models, parameter gradients for real quantum computers cannot be computed using traditional methods such as finite difference methods. Instead, the parameter-shift rule (PSR) [93, 94, 95, 96] can be employed to calculate gradients. However, applying PSR on quantum computers necessitates extensive requests and long queue times with actual quantum devices. Furthermore, the noise in current quantum computers is insufficiently low to enable stable training of quantum neural networks, often resulting in training failures.

To circumvent these issues during the NISQ era [92], we trained the QCGNNs on classical computers using PennyLane [116] quantum circuit simulators with zero noise. Nonetheless, simulating quantum circuits is highly time-consuming, even for a few qubits. Although PennyLane supports QML on GPUs, speed improvements over CPUs are significant only with many qubits, typically more than 20 qubits444The benchmark of quantum simulation on GPU can be found in PennyLane’s blog: ”Lightning-fast simulations with PennyLane and the NVIDIA cuQuantum SDK”. For this study, we used CPUs to train the QCGNNs. Training a 10-qubit (nI=4n_{I}=4 and nQ=6n_{Q}=6) QCGNN with 10,000 samples and a batch size of 64 takes approximately 1,000 seconds per epoch. Consequently, the training over 5 random seeds for 30 epochs required nearly a month.

IV Results

IV.1 Performance of Classical and Quantum Models

Model Top Dataset (2 classes) JetNet Dataset (5 classes)
# params AUC Accuracy # params AUC Accuracy
Particle Transformer 2.2M 0.946±\pm0.005 0.868±\pm0.009 2.2M 0.889±\pm0.002 0.656±\pm0.006
Particle Net 177K 0.953±\pm0.003 0.885±\pm0.006 178K 0.896±\pm0.003 0.669±\pm0.004
Particle Flow Network 72.3K 0.954±\pm0.004 0.885±\pm0.005 72.7K 0.900±\pm0.003 0.675±\pm0.005
MPGNN - nM=64n_{M}=64 13K 0.961±\pm0.003 0.896±\pm0.003 13.3K 0.903±\pm0.002 0.683±\pm0.007
MPGNN - nM=6n_{M}=6 255 0.924±\pm0.006 0.866±\pm0.006 323 0.865±\pm0.004 0.615±\pm0.010
MPGNN - nM=3n_{M}=3 126 0.922±\pm0.005 0.864±\pm0.006 194 0.757±\pm0.110 0.475±\pm0.141
QCGNN - nQ=6n_{Q}=6 201 0.932±\pm0.004 0.868±\pm0.005 269 0.822±\pm0.003 0.543±\pm0.006
QCGNN - nQ=3n_{Q}=3 99 0.919±\pm0.006 0.864±\pm0.005 167 0.796±\pm0.009 0.505±\pm0.014
Table 1: The performance of different models on the Top and the JetNet datasets. As detailed in Sec. III.2, nMn_{M} denotes the number of hidden neurons in the MPGNN, and nQn_{Q} represents the number of qubits in the NR of the QCGNN. The number of parameters for each model is provided, along with the area under the ROC curve (AUC) and accuracy. The AUC score is computed as the average of all possible pairwise combinations of classes, while the accuracy is calculated across all classes simultaneously. The results are averaged over 5 random seeds, with one standard deviation shown.

The performance of the classical and quantum models on the Top and the JetNet datasets is summarized in Table. 1. The inference scores of the MPGNN and QCGNN are comparable when the number of parameters is in roughly the same order of amount. We anticipate that QCGNN has the potential to achieve performance on par with MPGNN as the number of qubits increases. When training with a smaller number of parameters on the multi-class classification JetNet dataset, we observe that QCGNN is more stable than MPGNN, with the latter exhibiting a larger standard deviation.

Among the state-of-the-art models, MPGNN with nM=64n_{M}=64 has higher inference scores compared to others. This is likely due to the information of jets being lost during the data preprocessing, where only 4 to 16 particles per jet are utilized. When training with the full information from the original jet dataset, i.e., without discarding information from soft particles, other state-of-the-art models can compete with MPGNN, even better. Details of the state-of-the-art models trained on the original jet dataset are provided in Appendix B.

IV.2 Executing Pre-trained QCGNN on IBMQ

Refer to caption
Figure 6: This figure illustrates the extrapolation of noise probabilities (depolarizing error and amplitude damping) using quantum circuit simulators with pre-trained QCGNNs. An ideal quantum computer corresponds to a noise probability of zero on the xx-axis, while ”IBMQ” refers to results obtained from ibm_brussels. The yy-axis shows the area under the ROC curve (AUC) and accuracy for 400 jet data samples, each containing only 4 particles, which requires nI=2n_{I}=2 qubits for IR. The error bars represent one standard deviation across 5 full executions with different random seeds.

Although implementing full training on quantum computers is impractical in the NISQ era, we can still evaluate the performance of the pre-trained QCGNN on IBMQ real devices [118]. To minimize the noise effects caused by real quantum gates, we select events with only four particles from the Top dataset, i.e., using nI=2n_{I}=2 qubits in IR, thereby reducing the number of gates required for initial state preparation and data encoding. In this setup, the USO can be efficiently implemented using Hadamard gates for each qubit in IR. On IBMQ real devices, only 1-qubit and 2-qubit gates are available, and the multi-controlled gates used in data encoding are decomposed using methods described in [102].

We selected ibm_brussels with 1024 shots to test the performance of QCGNN on an IBMQ real device. However, the quantum computers in the NISQ era are currently too noisy to yield usable results. For binary classification, the inference of QCGNN on ibm_brussels results in approximately 0.50.5 AUC and 0.50.5 accuracy, which equates to random guessing. To assess how noise affects the performance of QCGNN, we perform an extrapolation over noise using PennyLane simulators, with the results shown in Fig. 6. We simulate quantum noise, including depolarizing error and amplitude damping, occurring after each quantum operation with a certain probability. As indicated in Fig. 6, the noise probability must be reduced to below 10310^{-3} to achieve reliable results555The noise probability here corresponds to the simulated noise in the quantum circuit simulation..

IV.3 QCGNN Runtime on IBMQ

IBMQ Backend N TENCT_{\text{ENC}} TPARAMT_{\text{PARAM}}
ibm_nazca 2 2.567 0.209
4 5.352 0.197
8 10.551 0.219
ibm_strasbourg 2 2.595 0.217
4 5.416 0.197
8 11.085 0.211
Table 2: The runtime of quantum circuit gate operations for encoding layers and parametrized layers on different IBMQ backends is analyzed. The number of particles per jet is denoted as NN. The runtimes TENCT_{\text{ENC}} and TPARAMT_{\text{PARAM}} are defined in Sec. IV.3 and are presented in seconds.

To validate the time complexity analysis discussed in Sec. II.3, we initialized untrained QCGNNs and executed them on various IBMQ backends, including ibm_nazca and ibm_strasbourg, with nQ=100n_{Q}=100 and 1024 shots. We set the number of nodes to 2, 4, and 8, such that only Hadamard gates are required for the initial state preparation. To determine the quantum gate runtime for encoding and parametrized operators, we first ran QCGNN without any operators to measure the runtime T0T_{0} for quantum state initialization and measurement. We then applied encoding operators with 10 times of re-uploading to obtain the runtime T1T_{1}. Finally, we applied parametrized operators, constructed with 10 strongly entangling layers and 10 times of re-uploading (resulting in 100 strongly entangling layers in total), to measure the runtime T2T_{2}. Each runtime measurement was averaged over 10 executions. The runtime of encoding operators was computed as follows:

TENC=T1T010,T_{\text{ENC}}=\frac{T_{1}-T_{0}}{10},

and the runtime of each strongly entangling layer in parametrized operators is calculated by

TPARAM=T2T1100.T_{\text{PARAM}}=\frac{T_{2}-T_{1}}{100}.

The results presented in Table. 2 indicate that the runtime of encoding operators scales approximately linearly with the number of particles per jet, while the runtime of parametrized operators remains approximately constant, as expected. As discussed in Sec. II.3, when the parametrized operators are sufficiently deep, the runtime will be dominated by these operators, making the additional computational cost associated with data encoding negligible.

V Summary

The representation of jets as graphs, leveraging the property of permutation invariance, has been widely utilized in particle physics. However, constructing graphs from particle jets in a physically meaningful manner remains an unresolved challenge. In the absence of specific physical assumptions, we adopt a straightforward approach by representing jets as complete graphs with undirected, unweighted edges. Motivated by the structure of complete graphs, we propose the Quantum Complete Graph Neural Network (QCGNN) for learning through aggregation using SUM or MEAN operations. When training on NN particles, QCGNN exhibits O(N)O(N) computational complexity if the parametrized operators are sufficiently deep, offering a polynomial speedup over classical models that require O(N2)O(N^{2}).

To demonstrate the practicality of QCGNN, we conduct experiments on jet discrimination. Sec. IV.1 shows that QCGNN performs comparably to classical models with a similar number of parameters. Moreover, QCGNN displays a more stable training process across different random seeds. Although the pre-trained QCGNN has been tested on IBMQ real devices, the noise in quantum circuits remains too significant to yield reliable results. To assess the impact of noise in the NISQ era, we perform noise extrapolation using simulators, as detailed in Sec. IV.2. We also conducted a series of executions on IBMQ quantum devices to estimate the runtime of QCGNN, as discussed in Sec. IV.3. The time costs of encoding and parametrized operators are approximately linear and constant to the number of particles per jet, respectively.

In conclusion, QCGNN provides a more efficient method for learning unstructured jets using QML. The additional computational costs associated with quantum state initialization and data encoding are negligible when the parametrized operators are sufficiently deep, as discussed in Sec. II.3. However, it remains an open question whether QML provides a definitive quantum advantage in HEP. Moreover, developing more expressive and suitable methods for HEP data encoding continues to be an intriguing and ongoing area of research.

Acknowledgement

The authors thank Chiao-Hsuan Wang for helpful discussions and suggestions about quantum computation. The accessibility of IBMQ resources is supported by the IBM Quantum Hub at National Taiwan University.

Appendix A Relation Between State-of-the-Art Models and MPGNN

In Sec. II.1, we introduce the MPGNN. Here, we show that some state-of-the-art models can be considered as a special case of MPGNN, i.e., in the form of

𝐱i(k)=γ(k)(𝐱i(k1),j𝒩(i)Φ(k)(𝐱i(k1),𝐱j(k1))).\mathbf{x}^{(k)}_{i}=\gamma^{(k)}\left(\mathbf{x}^{(k-1)}_{i},\bigoplus_{j\in\mathcal{N}(i)}\Phi^{(k)}(\mathbf{x}^{(k-1)}_{i},\mathbf{x}^{(k-1)}_{j})\right). (16)

A.1 Particle Flow Networks (PFN) as MPGNN

The PFN introduced in [34] first transforms the particle features into a latent space via a feed-forward neural network Φ\Phi, followed by a summation. Then another feed-forward neural network γ\gamma will be applied to get the final score FF of jet discrimination. In the form of MPGNN, the PFN can be written as

F=γ(𝐱iGΦ(𝐱i)).F=\gamma\left(\sum_{\mathbf{x}_{i}\in G}\Phi(\mathbf{x}_{i})\right).

A.2 Particle Net (PNet) as MPGNN

The PNet introduced in [35] turns jets into graphs by dynamically determining the edges through the distance in feature space or latent space. The EdgeConv of PNet can be written as

𝐱i(k)=γ(k)(j𝒩(i)Φ(k)(𝐱i(k1),𝐱i(k1)𝐱j(k1))),\mathbf{x}^{(k)}_{i}=\gamma^{(k)}\left(\bigoplus_{j\in\mathcal{N}(i)}\Phi^{(k)}(\mathbf{x}^{(k-1)}_{i},\mathbf{x}^{(k-1)}_{i}-\mathbf{x}^{(k-1)}_{j})\right),

where the neighbors are dynamically determined through the kk-nearest neighbor method. The EdgeConv also calculates the difference between features, then passes through either a convolutional neural network or a feed-forward neural network, captured by γ\gamma and Φ\Phi.

A.3 Particle Transformer (ParT) as MPGNN

The ParT introduced in [36] uses the transformer architecture to learn the jet features. The structure of the transformer is rather complicated, but each attention block can still be written in the form of MPGNN. As ParT considers all pairs of particle information without positional embedding, the ParT can be seen as dealing with complete graphs. The queries (Q) and the keys (K) in the attention mechanism are captured by Φ\Phi, with the aggregation function chosen to be SOFTMAX (can be seen as a summation over a particular transformation that can be absorbed into Φ\Phi), and the values (V) in the attention mechanism are captured by γ\gamma. Note the functions in the transformer such as GeLU or LayerNorm can also be absorbed into Φ\Phi and γ\gamma.

Appendix B Performance of Classical Models on Different Number of Training Samples

Refer to caption
Figure 7: This figure illustrates the performance of state-of-the-art models with varying numbers of training samples. The blue, orange, green, and red lines represent the Particle Flow Network (PFN), Particle Net (PNet), Particle Transformer (ParT), and MPGNN with nM=64n_{M}=64, respectively. The upper row displays performance metrics on the Top dataset, while the lower row shows performance metrics on the JetNet dataset. The left column presents the area under the curve (AUC), and the right column shows accuracy. The training samples are preprocessed as described in Sec. III.1, except that ’Full-100K’ uses all particle data without applying the transverse momentum cutoff.

As described in Sec. III.1, we selected 25,000 training samples with a maximum of 16 particles per jet for each class. In this appendix, we justify that this setup is sufficient to evaluate the performance of each model. We trained state-of-the-art classical models, including the Particle Flow Network (PFN) [34], Particle Net (PNet) [35], Particle Transformer (ParT) [36], and MPGNN with 64 hidden neurons (nM=64n_{M}=64).

The performance of each model on both the Top and the JetNet datasets is obtained by training with varying numbers of samples per class, across 5 different random seeds. The results are presented in Fig. 7. The training samples were preprocessed as outlined in Sec. III.1, using only events with at least 4 and at most 16 particles. The performance of each state-of-the-art model is getting saturated between 25,000 and 50,000 training samples, indicating that the choice of 25,000 samples in Sec. III.1 is almost adequate for demonstrating the model performance. We also conducted experiments with the full-particle jets from the original dataset, without applying the transverse momentum cutoff, using 100,000 samples per class. We found that, when training with a few particles, the simplest MPGNN model performs better than the other models. However, when using the full original dataset, the ParT outperforms the other models.

Appendix C Number of Parameters in MPGNN and QCGNN

In this appendix, we compute the number of parameters for the MPGNN and QCGNN models based on the structures outlined in Sec. III.2. It is important to distinguish these calculations from the total number of parameters reported in Table. 1, which includes the parameters of the final feed-forward network in both MPGNN and QCGNN.

For MPGNN with nMn_{M} hidden neurons in both the hidden and output layers, and an input dimension of 6 (since features of two particles are concatenated), if there are LCL_{C} hidden layers, the total number of parameters is given by

#C=6(nM+1)+nMLC(nM+1)=LCnM2+(6+LC)nM+6,\begin{split}\#_{C}&=6(n_{M}+1)+n_{M}L_{C}(n_{M}+1)\\ &=L_{C}n^{2}_{M}+(6+L_{C})n_{M}+6,\end{split} (17)

where the +1+1 in each parenthesis accounts for the bias term in the linear layers.

For QCGNN, suppose there are nQn_{Q} qubits in the NR with the strongly entangling layers ansatz as depicted in Fig. 3. Each strongly entangling layer consists of nQn_{Q} rotation gates, with each gate having 3 parameters. If there are LQL_{Q} strongly entangling layers and LRL_{R} times of re-uploading, the total number of parameters is

#Q=3LRLQnQ.\#_{Q}=3L_{R}L_{Q}n_{Q}. (18)

To ensure that both models have the same output dimension, we set nM=nQ=Dn_{M}=n_{Q}=D. Assuming nQn_{Q} is a multiple of 3, and setting LQ=nQ/3=D/3L_{Q}=n_{Q}/3=D/3, the number of parameters for MPGNN and QCGNN are

#C=LCD2+(6+LC)D+6,#Q=LRD2.\begin{split}\#_{C}&=L_{C}D^{2}+(6+L_{C})D+6,\\ \#_{Q}&=L_{R}D^{2}.\end{split} (19)

It is evident that by choosing LC=LR=LL_{C}=L_{R}=L, the leading term for both models scales as O(LD2)O(LD^{2}). In this study, we set L=2L=2. To approximate the linear term O(LD)O(LD) in MPGNN, one could set LR=(nQ+1)/3L_{R}=(n_{Q}+1)/3, resulting in #Q=LD2+LD\#_{Q}=LD^{2}+LD. However, this approach was not considered in this study due to the increased simulation time required for longer circuits.

References