Quantum reservoir computing using arrays of Rydberg atoms

Rodrigo Araiza Bravo¹ [email protected] Khadijeh Najafi^1,2 [email protected] Xun Gao¹ Susanne F. Yelin¹ ¹Department of Physics, Harvard University, Cambridge, Massachusetts 02138, USA
²IBM Quantum, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 USA

Abstract

Quantum computing promises to speed up machine learning algorithms. However, noisy intermediate-scale quantum (NISQ) devices pose engineering challenges to realizing quantum machine learning (QML) advantages. Recently, a series of QML computational models inspired by the noise-tolerant dynamics of the brain has emerged as a means to circumvent the hardware limitations of NISQ devices. In this article, we introduce a quantum version of a recurrent neural network (RNN), a well-known model for neural circuits in the brain. Our quantum RNN (qRNN) makes use of the natural Hamiltonian dynamics of an ensemble of interacting spin-1/2 particles as a means for computation. In the limit where the Hamiltonian is diagonal, the qRNN recovers the dynamics of the classical version. Beyond this limit, we observe that the quantum dynamics of the qRNN provide it with quantum computational features that can aid it in computation. To this end, we study a fixed geometry qRNN, i.e. a quantum reservoir compute, based on arrays of Rydberg atoms and show that the Rydberg reservoir is indeed capable of replicating the learning of several cognitive tasks such as multitasking, decision-making, and long-term memory by taking advantage of several key features of this platform such as interatomic species interactions, and quantum many-body scars.

^†^†preprint: APS/123-QED

I Introduction

Quantum computing promises to enhance machine learning algorithms. However, implementing these advantages often relies on either fault-tolerant quantum computers not yet available [1, 2, 3, 4, 5], or on decoherence-limited, variational quantum circuits which may experience training bottlenecks [6, 7]. Thus, currently available noisy intermediate-scale quantum (NISQ) devices thwart quantum advantages in machine learning algorithms.

Recently, to counteract these challenges, several quantum machine learning architectures have emerged inspired by models for computation in the brain [8, 9, 10]. These brain-inspired algorithms are motivated by the inherent robustness of input- and hardware-noise in brain-like computation, and by the possibility to use the analogue dynamics of controllable, many-body quantum systems for computation without relaying on a digital circuit architecture. Broadly speaking, these brain-inspired algorithms can be put into two categories. The first of which encompasses systems quantizing the dynamics of biological computational models at the single-neuron level. Thus, the dynamics of single qubits or groups of qubits resemble the dynamics of a neurons in a neural circuit of interest. Examples of these include quantum memristors [11], which are electrical circuits with a history-dependent resistance, quantum versions of the biologically realistic Hodgkin-Huxley model for single neurons [12, 13], and unitary adiabatic quantum perceptron [14].

The second category of brain-inspired algorithms relies on a macroscopic resemblance between many-body quantum systems and neural circuits. In this regard, the algorithms that have received the most attention are quantum reservoir computers. Quantum reservoir computers use ensembles of quantum emitters with fixed interactions to perform versatile machine learning tasks relying on the complexity of the unitary evolution of the system. Since these systems can couple with both classical and quantum devices, which may encode the tasks’ input, quantum reservoirs have been used for time-series prediction [15, 16, 17], entanglement measurement [18, 19], quantum state preparation [20], continuous-variable computation [21] which can be made universal [22], reduction of depths in quantum circuit [23], ground state finding [24], and for long-term memory employing ergodicity-breaking dynamics [25, 26, 27]. See [10] for a comprehensive review of quantum reservoir computing.

In both categories, however, a thorough understanding of the potential computational advantages and their origins are slowly emerging. In this article, we contribute to this direction by proposing a quantum extension of a well-known neural circuit model called recurrent neural networks (RNNs), of which reservoir computers are a special case [28]. Our extension uses the Hamiltonian dynamics of ensembles of two-level systems. In the limit where the Hamiltonian is diagonal, we recover the classical single-neuron dynamics naturally encoding RNNs into quantum hardware. Recently, another natural encoding of a reservoir computer was proposed using superconducting qubits [29]. In our case, the general dynamics of the quantum RNN (qRNN) present several new features that can aid in the computation of both classical and quantum tasks. In particular, a qRNN used for simulating stochastic dynamics can exhibit speedups compared to classical RNNs.

To show that our scheme is experimentally realizable, we propose that arrays of Rydberg atoms can be used as qRNNs (Sec. IV). Although our Rydberg qRNNs have restricted connectivity, we are motivated to use Rydberg arrays due to recent studies with equally restricted qRNNs which show significant computational capacity when driven near criticality [17, 17, 24]. Moreover, recent experiments using optical tweezers [30, 31, 32, 33, 34, 35, 36, 37] have catapulted the community’s interest in Rydberg arrays as they exhibit long coherence times, controllable and scalable geometries, and increasing levels of single-atom control [38]. Additionally, Rydberg arrays can be used for a novel, programmable quantum simulations and universal computations [39, 30, 40, 41, 42, 43].

We numerically implement fixed-geometry Rydberg qRNNs, i.e. Rydberg reservoir computers, and we successfully perform cognitive tasks even when a few atoms are available (Sec. V). The success of these tasks is explained by the physics of Rydberg atoms. For example, our Rydberg qRNNs excel at learning to multitask since they can naturally encode RNNs with inhibitory and excitatory neurons which are vitals for many cognitive tasks [44]. This encoding relies on the different types of interactions between Rydberg atoms with different principal quantum numbers [45]. Likewise, a Rydberg qRNN exhibits long-term memory due to the weak-ergodicity breaking dynamics of many-body quantum scars [35, 46, 47]. Lastly, we discuss possible further research directions in Sec. VI.

We remark that the notion of qRNNs has been previously coined relying on universal quantum circuits and using measurements to implement the nonlinear dynamics of an RNN [48]. Instead, what we define as a “quantum RNN” leverages the inherent unitary dynamics of ensembles of two-level systems to compute, deviating from the quantum digital circuit model for computation.

II Classical recurrent neural networks

We begin by reviewing an archetypal RNN consisting of $N$ binary neurons. Each neuron is in one of two possible states $s_{n}(t)\in\{-1,1\}$ and is updated from the time-step $t$ to $t+1$ following the update rule

	$\displaystyle s_{n}(t+1)$	$\displaystyle=\text{sign}\left(h_{n}(t)s_{n}(t)\right),$
	$\displaystyle h_{n}(t)$	$\displaystyle\equiv-\Delta_{n}(t)+\sum_{m}J_{nm}s_{m}(t),$		(1)

where $J_{nm}=J_{mn}$ are symmetric synaptic connections between neurons $n$ and $m$ . The time-dependent biases $\Delta_{n}(t)$ encode the RNN’s inputs. To avoid memorization during a learning task with inputs $u_{n}^{\text{task}}(t)$ , the RNN receives Gaussian-whitened inputs

\Delta_{n}(t)=u_{n}^{task}(t)+\xi_{n},

(2)

where $\xi_{n}$ is a zero-mean Gaussian random variable with variance $\sigma_{in}^{2}$ , making the evolution of the RNN stochastic. In RNNs, the value of $\sigma_{in}^{2}$ is proportional to the value of the tasks’ inputs $u_{n}^{task}$ .

When studying learning tasks similar to those in the mammalian cortex [44] one turns to a continuous version of the rule in (II) obtained in the case that the time-interval $\tau$ in which neurons update is small compared to $J_{nm}$ . In this limit,

\tau\dot{s}_{n}(t)=-s_{n}(t)+\text{sign}\left(h_{n}(t)s_{n}(t)\right).

(3)

Thus, the RNN obeys a system of nonlinear differential equations. Note that (3) imply that $s_{n}\in[-1,1]$ is a continuous and bounded variable [28].

A third way to describe an RNN is via the probability distribution $p_{t}(\bm{s})$ of observing each of the $2^{N}$ different configurations $\bm{s}$ at the $t^{th}$ time-step. Due to the noise in the inputs $\Delta_{n}$ , the dynamics of the distribution follow a Markov chain description [28]. This description is particularly useful for analyzing the stochastic dynamics simulatable by an RNN. As we shall see in Sec III.1, this representation will be useful in explaining how, relative to classical RNNs, the unitary dynamics of a qRNN can speed up stochastic process simulations.

Lastly, we describe how to use an RNN for computation. After the RNN evolves for a time $t_{f}$ , a subset of $M$ neurons are used to collect the vector $\bm{r}(t_{f})=(s_{n_{1}}(t_{f}),...,s_{n_{M}}(t_{f}),1)$ with the last entry accommodating for a bias. The other $N-M$ other neurons are called hidden neurons. The RNN’s output is obtained via a linear transformation $\bm{y}^{\text{out}}=W^{\text{out}}\bm{r}(t_{f})$ where $W^{\text{out}}$ is a real-valued matrix. Thus, the computational complexity of the RNN comes from the nonlinear activation function in (II) which enables $\bm{y}^{\text{\text{out}}}$ to be a nonlinear function of the inputs.

In a learning task with a target output $\bm{y}^{\text{targ}}$ , the RNN is trained by minimizing a loss function $\mathcal{L}(\bm{y}^{\text{out}},\bm{y}^{\text{targ}})$ with respect to the network parameters such as $W^{\text{out}}$ , $J_{nm}$ , etc. subject to the task-determined inputs in (2). We choose the square-loss

\mathcal{L}(\bm{y}^{\text{out}},\bm{y}^{\text{targ}})=\frac{1}{N_{s}}\sum_{i=1}^{N_{s}}||\bm{y}_{i}^{\text{targ}}-\bm{y}_{i}^{\text{out}}||^{2},

(4)

where $i$ labels the $N_{s}$ different input instances. For the tasks in Sec. V, we fix the connections $J_{nm}$ , such that our qRNNs more closely resemble quantum reservoir computers.

III Quantum recurrent neural networks

III.1 Quantum update rule

Let us now extend the classical RNN in (II) to the quantum setting. We replace each of the $N$ neurons with a spin-1/2 particle for which a spin measurement along the $z$ -axis yields the values $\{-1,1\}$ . Thus, each neuron $n$ is in a normalized quantum state in the Hilbert space $\mathcal{H}_{n}$ with basis vectors $\{|\text{-}1\rangle_{n},|1\rangle_{n}\}$ which are eigenstates of the Pauli-Z operator $\sigma_{n}^{z}=|1\rangle\langle 1|_{n}-|\text{-}1\rangle\langle\text{-}1|_{n}$ . The state of the composite system lives in the product Hilbert space $\mathcal{H}=\bigotimes_{n=1}^{N}\mathcal{H}_{n}$ .

We choose spins interacting via the time-dependent Hamiltonian

	$\displaystyle H(t)$	$\displaystyle=-\sum_{n=1}^{N}\Delta_{n}(t)\sigma_{n}^{z}+\sum_{nm}J_{nm}\sigma_{n}^{z}\sigma_{m}^{z}$
		$\displaystyle+\frac{\Omega(t)}{2}\sum_{n=1}^{N}\sigma_{n}^{x},$		(5)

where $\sigma_{n}^{x}=|1\rangle\langle\text{-}1|_{n}+|\text{-}1\rangle\langle 1|_{n}$ is the Pauli-X operator. Indeed, the evolution under (III.1) encompasses the update rule in (II). To see this, note that in the classical case of (II), the RNN evolves under the rules

	$\displaystyle\text{If }h_{n}>0,\text{ }s_{n}\text{ doesn't change.}$		(1C)
	$\displaystyle\text{If }h_{n}<0,\text{ }s_{n}\text{ flips.}$		(2C)

Here “ $C$ ” stands for “classical”. Now, consider a qRNN starting in the configuration $|s_{1},s_{2},...,s_{N}\rangle$ and evolving for a time $t=2\pi\Omega^{-1}$ . In the limit where $\Delta_{n}\gg\Omega$ or $J_{nm}\gg\Omega$ , each spin experiences the Ham7iltonian $H_{n}=h_{n}\sigma_{n}^{z}+\frac{\Omega}{2}\sigma_{n}^{x}$ where $h_{n}=-\Delta_{n}+\sum_{m}J_{nm}s_{m}$ is the effective field generated by the rest of the spins where $s_{m}$ stands for the measurement result of $\sigma_{m}^{z}$ on the initial configuration. We then obtain the quantum update rules

	$\displaystyle\text{If }{\color[rgb]{0,0,0}{\|h_{n}\|}}\gg\Omega,\text{ }{\color[rgb]{0,0,0}{\|s_{n}\rangle}}\text{ doesn't change.}$		(1Q)
	$\displaystyle\text{If }{\color[rgb]{0,0,0}{\|h_{n}\|}}\ll\Omega,\text{ }{\color[rgb]{0,0,0}{\|s_{n}\rangle}}\text{ flips.}$		(2Q)

Here, “Q” stands for “quantum”. Therefore, (III.1) can implement (1C)-(2C) but without the use of the nonlinear activation function in (II). Nonetheless, (III.1) allows for more general dynamics beyond the perturbative limit for which (1Q)-(2Q) holds. We now highlight three features arising from the quantum evolution of the qRNN: (i) the ability to compute complex functions on the input by using quantum interference, (ii) exploiting the choice of measurement basis, and (iii) efficiently achieving stochastic processes inaccessible to classical RNNs with no hidden neurons.

Refer to caption — Figure 1: Computing the parity, XOR( $s_{1},s_{2}$ ), of two inputs $s_{1}$ and $s_{2}$ with a qRNN. Spin 3 (the output spin) experiences an effective field $\tilde{J}=J(s_{1}+s_{2})$ with $J\gg\Omega$ . After evolving for a time $t=2\pi\Omega^{-1}$ , we measure the output spin. The measurement outcome $\text{+}1$ is obtained when $s_{1}=-s_{2}$ since $\tilde{J}=0$ . If $s_{1}=s_{2}$ so that $\tilde{J}\neq 0$ , the inputs constructively interfere to generate a large detuning on the output such that measurement yields the outcome -1.

Quantum feature 1: quantum interference as a means for computation

The computational power of (II) is a result of its nonlinear dynamics. For example, an RNN with linear dynamics is incapable of computing the parity function $\text{XOR}(s_{1},s_{2})=s_{1}s_{2}$ between two classical binary inputs. On the other hand, quantum mechanics is a unitary theory. Yet, this does not limit a qRNN to linear computation. Indeed, a qRNN can compute XOR by leveraging quantum interference, a resource fundamental to quantum computation. Thus, we can use a qRNN for complex computing tasks.

As illustrated in Fig. 1, we can compute XOR( $s_{1},s_{2}$ ) using a qRNN of three spins initially in the state $|s_{1},s_{2},\text{-}1\rangle$ . The third spin is an outcome spin. This spin is measured to tell us information about the parity of $s_{1}$ and $s_{2}$ . We let these spins evolve under the dynamics dictated by (III.1) choosing $\Delta_{n},J_{12}=0$ and $J_{13}=J_{23}=J\gg\Omega$ . Let $\tilde{J}=J(s_{1}+s_{2})$ . In the frame rotating at the rate $\tilde{J}$ , the output spin experiences the Hamiltonian

\displaystyle H_{3}

\displaystyle=\frac{\Omega}{2}\left(e^{2i\tilde{J}\tau}|1\rangle\langle\text{-}1|+\text{h.c.}\right).

(6)

It’s clear that if the spins have odd parity (i.e. $s_{1}=-s_{2}$ so that $\tilde{J}=0$ ), the output spin flips to the state $|1\rangle$ when we choose to evolve by $t=2\pi\Omega^{-1}$ . On the other hand, if $\tilde{J}\neq 0$ , $H_{3}$ contains only fast-rotating terms, and the rotating-wave approximation (RWA) allows us to neglect the evolution of the output spin [49]. Physically, the RWA can be thought of as the spin rotating along the $x$ -axis by a small amount followed by a rapid precession of the spin around the $z$ -axis. Indeed, as illustrated in Fig. 1, $J\gg t^{-1}$ amounts to averaging out the spin’s position so that the spin is along the $z$ -axis. Overall, this computation realizes the operation $|s_{1},s_{2},-1\rangle\rightarrow|s_{1},s_{2},XOR(s_{1},s_{2})\rangle$

Note that this is a result of $s_{1}+s_{2}$ constructively interfering to produce a large effective detuning on the output and blocking its evolution. Thus, interference serves as a means for computation in qRNNs.

Quantum feature 2: arbitrary measurement basis as a means for computation

Equations (1Q)-(2Q) recover (II) when $t=2\pi\Omega^{-1}$ . However, $t=2\pi\Omega^{-1}$ is not a necessary restriction. This freedom results in the ability to rotate each quantum neuron which can be used as means for computing on a different basis. Measuring on different bases reveals the quantum correlations enhancing the performance of a qRNN relative to its classical counterpart. In this section, we show how to use the qRNN’s evolution to change the basis on which an error occurs. This freedom can detect a Z-error, an error proper to quantum computation.

Consider the repetition code $|0_{L}\rangle=|\text{-}_{y}\rangle^{\otimes 3}$ and $|1_{L}\rangle=|+_{y}\rangle^{\otimes 3}$ on qubits labeled $L_{1,2,3}$ where $|\pm_{y}\rangle=\frac{1}{\sqrt{2}}(|\text{-}1\rangle\pm i|1\rangle)$ . Suppose we prepare the state $|\psi\rangle=a|0_{L}\rangle+b|1_{L}\rangle$ , and consequently a Z-error occurs, we can detect the error by rotating all three spins $L_{1,2,3}$ using (III.1) with the dominant field being $\Omega$ for a time $t=\pi/2\Omega$ . Note that the rotation conjugates the Z-error by

e^{-i\pi\sigma^{x}/4}\sigma^{z}e^{i\pi\sigma^{x}/4}\propto\sigma^{y}

(7)

where $\sigma_{n}^{y}=i|\text{-}1\rangle\langle 1|-i|1\rangle\langle\text{-}1|$ is like a bit-flip error except for a state-dependent phase. A bit-flip error can then be detected by bringing two extra spins $A_{1,2}$ and performing parity measurements of the pairings $(L_{1},L2)$ , and $(L_{2},L_{3})$ as described in Sec. III.1. Using Table 1, the final parity of $(L_{1},L2)$ , and $(L_{2},L_{3})$ gives the measurement results $a_{1}$ and $a_{2}$ which can be used to discern where the Z-error occurred.

As an example, Fig. 2 illustrates the two final states of $L_{3}$ if no error occurs (bottom left), and if a Z-error occurs on $L_{3}$ (bottom right).

Detecting the Z-error hinges on (7) can be achieved by using the qRNNs evolution to rotate the measurement basis. Note that rotation allows us to measure the error syndrome of the stabilizer state $|\psi\rangle$ , bringing out the quantum correlations of the state. Thus, the qRNN’s native evolution can be used to perform quantum computational tasks. After the error is detected on spin $L_{i}$ , all qubits are rotated again by $U^{\dagger}$ and $\sigma_{i}^{z}$ can be applied to correct the error. We note that using a repetition code for error detection is a well-known technique in the quantum computing community.

The previous two quantum features show that qRNNs are naturally suited to solve important problems in machine learning and quantum computing. Recently, qRNNs were used to compress quantum circuits [23]. However, studies on using qRNNs for error correction in circuit-like quantum computing are warranted and left for further studies.

	-1	+1
-1	Error in $L_{2}$	Error in $L_{1}$
+1	Error in $L_{3}$	No error

Table 1: Results of parity measurements for detection of a Z-error. Measuring spin

A_{i}

results in the outcome

a_{i}

. By comparing the outcomes, one can detect the location of the Z-error.

Quantum feature 3: stochastic processes accessible to a qRNN

We now explore how a qRNN can be used to stochastically evolve a probability distribution faster than any classical RNN. Firstly, we note that if we initialize an RNN according to an initial distribution $p_{0}(\bm{s})$ , the dynamics in (II) dictate that for $t>0$ the RNN obeys a distribution given by the Markov-chain dynamics

\displaystyle p_{t}(\bm{s})=\sum_{s^{\prime}}P({\bm{s}|\bm{s}^{\prime}})p_{t-1}(\bm{s}^{\prime})

(8)

where $P({\bm{s}|\bm{s}^{\prime}})$ is the transition probability between states $\bm{s}^{\prime}$ and $\bm{s}$ , which particular value is given by (II) [28] (see Appendix A for details).

Given this observation, we see that an RNN can be used for the task evolving a probability distribution $p_{0}$ into $p_{f}=L^{\text{targ}}p_{0}$ by a series of stochastic transition matrices $L^{\text{out}}=P^{t_{f}}$ . The goal is to adjust the parameters of the RNN (i.e. biases and connection weights) to simulate the stochastic matrix encoded in $L^{\text{out}}\approx L^{\text{targ}}$ in as few steps as possible. Then, one may ask if a qRNN can do this more efficiently than any RNN.

We answer this in the positive. It is worth noting that not all stochastic transition matrices $L^{\text{targ}}$ are embeddable in a Markov process (for a review of classical and quantum embeddability see Appendix A). To simulate a stochastic system’s future behavior, information about its past must be stored, and thus memory is a key resource. Quantum information processing promises a memory advantage for stochastic simulation [50]. In simulating stochastic evolution with classical resources there is a trade-off between the temporal and physical resources needed [51], and it’s been shown that certain stochastic evolutions, when simulated with quantum hardware, may not suffer from such trade-off since the evolution arising from quantum Lindbladian dynamics are far more general than classical Markovian evolution [52]. That is, there exist matrices $L^{\text{targ}}$ that are quantum embeddable but not classically embeddable. Moreover, even if $L^{\text{targ}}$ is embeddable, the quantum evolution can lower the number of steps needed to produce $L^{\text{targ}}$ since the unitary dynamics of a quantum system allow a simultaneous, continuous, and coherent update of every neuron. This separation in capabilities illustrates the computational advantages of quantizing an RNN.

Let us now give an example of a matrix $L^{\text{targ}}$ that can be achieved exponentially faster in a qRNN. Consider the task of realizing a transformation $F$ corresponding to a global “spin-flip”

F_{\bm{s}|\bm{s^{\prime}}}=\begin{cases}1&\mbox{if }\forall_{n}\text{ }s_{n}\neq s^{\prime}_{n}\\ 0&\mbox{otherwise.}\end{cases}

(9)

Realizing $F$ on $N$ neurons using a classical Markov process requires several time steps of order $\mathcal{O}(2^{N-m})$ where $m$ is the number of hidden neurons (for details, see Sec. III.A in Ref. [52]). In other words, a classical RNN cannot produce $F$ efficiently when all available neurons must be flipped. This is a result of (II), and the fact that flipping neuron $n$ is done by ensuring that there is another neuron $m$ in the opposite state so that $J_{nm}>0$ dominates $h_{n}$ .

On the other hand, a qRNN can perform $F$ in a single step regardless if all neurons need to flip. To see this, one can consider the case of (III.1) with $\Omega\gg h_{n}$ . In this case, neurons both flip simultaneously and in a single time step under a unitary $U$ . That is, if $|\psi_{0}\rangle=\sum_{\bm{s}}\sqrt{p_{0}(\bm{s})}|\bm{s}\rangle$ , then

|\psi_{f}\rangle=U|\psi\rangle=\sum_{\bm{s}}\sqrt{p_{f}(\bm{s})}|\bm{s}\rangle.

(10)

While the realization of the matrix $F$ via (III.1) signal a quantum advantage, we highlight that this advantage is extremely sensitive to the decoherence arising from spontaneous emission (i.e. spontaneous relaxations from $|1\rangle$ to $|\text{-}1\rangle$ ), a main source of noise in NISQ devices (see Appendix A). It remains an open problem whether there exist stochastic processes enabled by (III.1) which are robust to noise, and in the future, we hope to explore how to shield unitary stochastic processes against noise in experimentally realizable NISQ devices.

The spin-flip process $F$ is efficiently simulated using a classical computer. However, $F$ exemplifies the qRNN’s ability to access stochastic processes inaccessible to classical RNNs without hidden neurons. This implies that if an RNN is employed simulate evolving $p_{0}$ to $p_{t_{f}}$ stochastically by passing it through several linear transformations, there are instances where the qRNN requires exponentially fewer steps. Stochastic simulation, of course, has applications in finance, biology, and ecology, among other fields. As an example, Ref. [53] used this quantum advantage to propose a quantum circuit algorithm for stochastic process characterization and presented applications in finance and correlated random walks. The separation above illustrates the computational advantage of quantizing an RNN.

III.2 qRNNs under spontaneous emission

Having seen how (III.1) recovers the discrete update rule (II), we now show that a qRNN under dissipation naturally evolves under continuous-time dynamics analogous to (3). This establishes only mathematical similarities between the evolution of NISQ devices and neural circuits, allowing us to use available quantum hardware for cognitive tasks, an idea that we explore further in Sec. V.

Consider the qRNN in $(\ref{eq:HGeneral})$ under spontaneous emission where a spin relaxes from $|1\rangle$ to $|\text{-}1\rangle$ at a rate $\gamma$ . To extract the dynamics of continuous variables, we focus on the dynamics of the expectation values of local Pauli operators.

The expectation value of an observable $A$ is $\left\langle A\right\rangle=\text{Tr}(A\rho)$ where $\rho$ is the density matrix describing the system. In particular, we focus on the expectations of the operators $\sigma^{x}_{n}$ , and $\sigma^{y}_{n}=i|\text{-1}\rangle\langle 1|_{n}-i|1\rangle\langle\text{-1}|_{n}$ . If we start the qRNN at a state for which $\langle\sigma_{n}^{z}(0)\rangle=-1$ then (see Appendix B)

	$\displaystyle\dot{\left\langle\sigma^{y}_{n}\right\rangle}=$	$\displaystyle-\frac{1}{\tau}\left\langle\sigma^{y}_{n}(t)\right\rangle-\frac{\Omega}{2\gamma}\sum_{m}J_{nm}\left\langle\sigma_{n}^{x}(t)\sigma_{m}^{y}(t)\right\rangle$
		$\displaystyle+\Delta_{n}(t)\left\langle\sigma_{n}^{x}(t)\right\rangle,$		(11)

where we have defined the neural time-scale $\tau^{-1}=\gamma/2+\Omega^{2}/4\gamma$ which is different than that in (3) but bears the analogous significance of the time-scale in which $\left\langle\sigma_{n}^{y}\right\rangle$ decays.

Differently, that (3), notice that the dynamics of $\left\langle\sigma_{n}^{y}\right\rangle$ are influenced by the spin’s value along the $x$ -axis, a consequence of the nontrivial commutation relation of spin variables. The commutation relations also make (III.2) quadratic, and therefore nonlinear. The quadratic term in (III.2) is analogous to the nonlinear term that gives RNNs their computational power.

In Appendix B, we explore the dynamics of $\left\langle\sigma_{n}^{x}\right\rangle$ as well and show that together with $\left\langle\sigma_{n}^{y}\right\rangle$ , we recover dynamics analogous to integrate-and-fire RNN model [54], a more realistic model of neural networks in the brain than the one in $(\ref{eq:EOMClassical})$ .

IV Quantum reservoir computers using Rydberg atoms: An experimental proposal

The similarities between (III.2) and the evolution of RNNs suggest the ability of qRNNs to emulate neurological learning. To explore neurological learning in qRNNs, we propose to fix the architecture of the qRNN coupling constants $J_{nm}$ based on optical-tweezers arrays of Rydberg atoms.

The natural Hamiltonian of a Rydberg array closely resembles the one in (III.1). A Rydberg atom is a single valance-electron atom that can be coherently driven between an atomic ground state $|g\rangle$ and a highly excited state $|r\rangle$ with a much larger principal quantum number. These states can represent our $|\text{-}1\rangle$ and $|1\rangle$ neuronal states respectively. A Rydberg atom in its excited state exhibits a large electronic dipole moment, and, consequently, a collection of Rydberg atoms interact via a $1/R^{6}$ van der Waals potential where $R$ denotes the physical distance between the two atoms. For an array of Rydberg atoms where the atoms are at fixed positions, the Hamiltonian of the system is [35]

H_{Ryd}=\Delta\sum_{n}\hat{n}_{n}+\frac{\Omega}{2}\sum_{n}\sigma_{n}^{x}+\sum_{nm}\frac{V}{R_{nm}^{6}}\hat{n}_{n}\hat{n}_{m}

(12)

where $\hat{n}_{n}=|1\rangle\langle 1|_{n}$ , $\Omega$ is the coherent Rabi drive coupling the $|\text{-}1\rangle$ and $|1\rangle$ states, $\Delta<0$ is a global drive frequency mismatch to the atomic spacing of the atoms, and $V$ is the nearest neighbor interaction strength. Using acusto-optical deflectors (AOD) and spatial light modulator (SLM), one can create spatially depending light-shifts resulting in site and time-dependent detunings $\Delta_{n}(t)=\Delta+\alpha(t)\Delta_{n}$ where $\alpha(t)$ is a time-dependent envelope. With this in mind, the Hamiltonian in (12) can be mapped to a Hamiltonian like that in (III.1) with $J_{nm}=V/R_{nm}^{6}$ since $\hat{n}_{n}=(\sigma_{n}^{z}+\mathds{1}_{n})/2$ . In this paper, for concreteness, we compare our numerics against the experimental realization of Rydberg arrays in Ref. [35, 38], where the rates $\Omega,\Delta_{n},V$ are all in units of mega-Hertz, while time constants are in units of micro-seconds. In these experiments, an off-resonance intermediate state, $|6P_{3/2},F=3,M_{F}=-3\rangle$ , is used to couple $|g\rangle=|5S_{1/2},F=2,m_{F}=-2\rangle$ and $|r\rangle=|70S_{1/2},m_{J}=-1/2,m_{I}=-3/2\rangle$ of Rubidium-87 atoms through a two-photon transition. Thus, photon-scattering off the intermediate state is the dominant source of decoherence. As we show in Appendix D, we can model this with a modified spontaneous emission process given by the jump operator

L^{+}=\sqrt{\gamma}|g\rangle\left(\alpha\langle r|+\beta\langle g|\right)

(13)

instead of the typical $\sqrt{\gamma}|g\rangle\langle r|$ jump operator. In the equation above, $\gamma=2\pi/(20\text{ }\mu\text{s})$ , and $(\alpha,\beta)=(0.05,0.16)$ for the realistic settings we simulate. With the full unitary and dissipative dynamics, we can think of an array of Rydberg atoms as a quantum analog of a continuous-time RNN. Fig. 4 compares the architecture of a classical RNN in Fig. 4A, and a Rydberg RNN in Fig.s 4B-C.

We note that training RNNs can be unstable as that often relies on (truncated) back-propagation through time or real-time recurrent learning. One way to circumvent this problem is by keeping the fixed system’s parameters. Instead, we focus on only training the output filter $W^{\text{out}}$ . This easier training schedule motivated the introduction of reservoir computers [55] and their quantum analogs [10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27]. Thus, in the following numerical experiments we fix the position of the atoms in either a 1D chain or a 2D square lattices and train only $W^{\text{out}}$ and some temporal parameters depending on the task. That is, in this article we implement Rydberg reservoir computers. Logically, successful performance on the tasks here presented sufficiently shows qRNNs computational ability. While we include the effect of small imperfections on the positions of the atoms, we see no significant effect on the performance of the tasks after averaging our results over 10 realizations of the atom’s positions. We leave full optimization of the qRNN for future work.

Lastly, several features of the many-body dynamics of arrays of Rydberg atoms are particularly well suited for emulating biological tasks. In Sec. V.1, we show how Rydberg arrays can be used to implement inhibitory and excitatory neurons which are vital in many biological tasks such as multitasking [56]. The key idea behind encoding inhibitory neurons will be leveraging positive and negative interactions between Rydberg atoms with different principal quantum numbers [45]. Additionally, in Sec. V.4 we show that Rydberg arrays can store long-term memory by taking advantage of the weak-ergodicity breaking dynamics of quantum many-body scars [35, 46, 47].

V Learning biological tasks via reservoir computers

We focus on analyzing the Rydberg reservoir’s potential to learn biologically plausible tasks. In the tasks analyzed, we fixed the geometry of the atoms depending on the task at hand. As a proof of principle, we focus on four simple neurological tasks which indicate good performance even with a small number of atoms. We show that a Rydberg reservoir can encode inhibitory and excitatory neurons vital for successful multitasking. Likewise, we show that Rydberg reservoirs can learn to decide by distinguishing properties of stimuli, have a working memory, and exhibit long-term memory enhanced by quantum many-body scars. Simulation details of each task can be found in Appendix D.

V.1 Multitasking

A hallmark of classical RNNs is their ability to multitask. Multitasking consists of simultaneously learning several output functions. Dale’s principle defines an inhibitory neuron, indexed by $n$ , as one with a negative sign in its interactions with all other neurons [57]

J_{nm}\leq 0\quad\forall m.

(14)

Two Rydberg atoms with different principal quantum numbers $n_{Q}$ , and $n_{Q}^{\prime}$ and angular momentum quantum numbers the same can interact with a $1/r^{6}$ attractive potential $V_{n_{Q},n_{Q}^{\prime}}$ [45]. Using the Python package PairInteraction [58], we note that if $n_{Q}$ represents the state $|r\rangle=|70S_{1/2},m_{j}=-1/2,m_{I}=-3/2\rangle$ , and $n_{Q}^{\prime}$ represents $|r^{\prime}\rangle=|73S_{1/2},m_{j}=-1/2,m_{I}=-3/2\rangle$ , then the interaction $V_{n_{Q},n_{Q}}=V\approx-V_{n_{Q},n_{Q}^{\prime}}$ where $V$ is the strength between atoms with principal quantum numbers $n_{Q}$ (see Appendix D). We can use this fact to encode inhibitory neurons. We restrict the concentration of $n_{Q}^{\prime}$ Rydberg atoms to be sparse such that pairs of $n_{Q}^{\prime}$ atoms are placed as far as possible at a distance $d_{max}$ in a 1D chain arrangement. We choose the field strength $V$ so that $V/d_{max}^{6}=10^{-2}$ , and as a result we can neglect the interactions between pairs of $n_{Q}^{\prime}$ atoms, but not the interactions between pairs $(n_{Q})(n_{Q}^{\prime})$ and $(n_{Q})(n_{Q})$ . This amounts to saying that if atom $n$ is driven to $n_{Q}^{\prime}$ then for all $m$ $J_{nm}\lesssim 0$ as in (14). By implementing this in on our reservoir we can learn XOR, AND, and OR simultaneously for different concentrations of inhibitory neurons as illustrated in Fig. 5(A).

Fig. 5(B) shows the errors of simultaneously learning XOR, OR, and AND as a function of the system size $N$ for a different number of inhibitory neurons in the array. The network is initialized in the state $|g\rangle^{\otimes N}$ , and the network receives two binary inputs $x,y\in\{0,1\}$ (in units of MHz) for a time $\Delta t$ (in units of $\mu$ s) with input noise $\sigma_{in}=0.1$ . Afterwards, the network is interrogated to give XOR $(x,y)$ , OR $(x,y)$ , and AND $(x,y)$ . $W^{\text{out}}$ is trained using the loss in (4). The errors shown in Fig. 5(B) are the minimum achieved over a wide range of choices of interaction time $\Delta t\in[0,5]$ $\mu$ s. This shows that in some cases our reservoir can benefit from having a connectivity matrix $J_{nm}$ with both positive (excitatory) and negative (inhibitory) values, analogously to the mammalian brain. For small system sizes, it seems that a ratio of 1:4 inhibitory neurons betters the learning performance, similar to the results in [56]. This is supported by the performance at 4 and 8 neurons in Fig. 5(B). Particularly, $N=8$ neurons, two of which are inhibitory, result in a 40% decrease in the loss. Nonetheless, we observe that having no inhibitory neurons is best when dealing with $N=$ 6 and 10 neurons. No inhibitory neurons are ever the worse choice. Fig. 5(C)-(E) shows the results of learning XOR, OR, and AND simultaneously using $N=8$ and two inhibitory neuron. Note that the network is fully capable of classification errors well below the input noise threshold $\sigma_{in}$ .

Lastly, while this task shows the success of the Rydberg reservoirs at approximating Boolean functions of the input, we note that one may also want to calculate different nonlinear functions of the input. We remark that our Rydberg reservoir can approximate biologically relevant nonlinear functions such as ReLu and sigmoid.

V.2 Decision-making

One of the great successes of classical RNNs is their ability to integrate sensory stimuli to choose between two actions. Here, we present the Rydberg reservoir with a variant of the dot motion decision-making task initially studied in monkeys in which several inputs are analyzed to produce a scalar nonlinear function [59]. This function represents a decision. This task shows the Rydberg reservoir’s ability to produce nonlinear functions of the input and perform simple cognitive tasks, a feature of most reservoirs proposed thus far [60].

In this task, a reservoir is presented with two inputs $\Delta_{1}^{in}$ and $\Delta_{2}^{in}$ , and the goal is to train the network to choose which input is the largest. That is,

y^{\text{targ}}=\text{sign}\left(\Delta_{1}^{in}-\Delta_{2}^{in}\right).

(15)

The stimuli, which in the case of a qRNN are local detunings on a pair of atoms, are turned on for a normally distributed time $\Delta t$ with variance also $\sigma_{in}=0.1$ and mean $\langle\Delta t\rangle=0.1$ $\mu s$ (see Fig. 6(A)). The stimuli are then turned off, and the network chooses a relaxation time $t_{out}$ after which it “makes a decision” by approximating (15). This is known as the fixed-duration protocol since the experimentalist fixes the stimulation period, and the subject, the reservoir in this case, learns to choose a response time $t_{out}$ .

In the brain, we expect the performance of a decision-making task to follow a sigmoidal psychometric response [44, 59]. A psychometric response maps out the accuracy of a decision-making task as a function of stimuli distinguishability. As an example of a psychometric response, the reader could think about paying a routine visit to the eye doctor and having to discern the letters “b” and “p” written on the wall. If the letters are large enough, they become distinguishable, and if the letters are too small one often fails to make out the right letter.

Classically, a decision-making task benefits from connectivity between all neurons. Since our connectivity is limited by physical constraints, a 2D square lattice structure was chosen to prevent neurons from being isolated from the rest. Moreover, a 2D square lattice is experimentally friendly. We set up a Rydberg reservoir of $3\times 2$ atoms with two input atoms and two different output atoms (for details see Appendix D). The reservoir is then trained by optimizing over $t_{out}$ , and $W^{\text{out}}$ such that the reservoir’s output approximates (15) while keeping the network parameters $J_{nm}$ , $\Omega$ and $\Delta^{in}_{n}$ fixed. We observe that $t_{out}\approx 1$ $\mu$ s is regularly obtained as this is the time scale in which the information about $\Delta_{1,2}^{in}$ propagates through the network. In our case, $c_{1}=\Delta^{in}_{1}-\Delta^{in}_{2}$ is a natural choice for a measure of stimuli distinguishability. Fig. 6(B) shows the psychometric response of the task which is qualitatively similar to the ones obtained in classical RNNs [44]. Moreover, we see in Fig. 6(B) that if $|c_{1}|\geq\sigma_{in}$ such that it is above the input noise level our network success more than $80\%$ of the time. The success of this task shows the Rydberg reservoir’s ability to emulate simple cognitive tasks.

V.3 Parametric working memory

Our next neurological task is that of parametric working memory. Working memory, which is one of the most important cognitive functions, deals with the brain’s ability to retain and manipulate information for the later execution of a task. Here, we train a network to perform a task based on the decision-making task in Sec. V.2 but with two temporally separate stimuli (see Fig. 7(A)). We use the fixed-time protocol where the separation between stimuli, denoted by $t_{delay}$ if fixed by us. The stimuli are both turned on for a time $\Delta t$ , and after the second input the network is left to relax for a time $t_{out}$ before two output neurons are used to approximate (15. To avoid overfitting, we add Gaussian noise to the times $\Delta t$ , $t_{out}$ , and $t_{delay}$ with zero mean and standard deviation $\sigma_{in}=0.1$ . The network optimizes over $W^{\text{out}}$ . Thus, the network has to retain information about $\Delta_{1}^{in}$ for a few “seconds” to then compare against $\Delta_{2}^{in}$ and make a decision.

We set a Rydberg reservoir of $3\times 2$ atoms with two input atoms and two different output atoms (for details see Appendix D). Fig. 7(B) shows the loss of the network as a function of the total time the inputs are injected into the network ( $\tau=2\Delta t+t_{delay}$ ). We note that the loss function is high for small $\tau$ since it takes the input neurons to correlate with the rest of the reservoir. Accordingly, in Fig. 7(B) we show that growth of the entanglement entropy of the input qubits accompanies a decrease in the loss function. For Fig. 7(C) we fixed $t_{out}=0.1$ , a choice which has little effect on the reservoir’s performance.

In Fig. 7(C) we show the accuracy of the reservoir at reproducing (15) as a function of the time the inputs are turned on ( $\Delta t$ ) and for different choices of $t_{out}$ . For these plots $t_{delay}=0.1$ is fixed. We notice that the accuracy is largely invariant to our sampled choices of $t_{out}$ .

Lastly, in Fig. 7(D), we probe the reservoir’s accuracy as a function of $t_{delay}$ . For these experiments, we fix $t_{out}=0.5$ and $\Delta t=0.15$ . Importantly we set $V=2\pi\times 10$ MHz and $\Omega=2\pi\times 4.2$ MHz such that $V>\Omega$ and neighboring Rydberg excitations are off-resonance putting our reservoir in the so-called blockaded regime [61, 62]. While one initially might expected the accuracy to decrease for increasing $t_{delay}$ , we found that this is not the case and instead the accuracy oscillates persistently reaching high accuracies as shown in Fig. 7(D) blue curve. Interestingly, this behavior disappears when the coupling $V=2\pi\times 0.1$ MHz such that $V<\Omega$ as shown in the red curve in Fig. 7(D), although the performance is statistically significant even for long $t_{d}elay$ with an accuracy greater than $50\%$ . We can conclude that, in the blockaded regime, the reservoir can hold information for longer periods. We can understand this dependence on $V/\Omega$ as follows. In the disordered regime, atoms are mostly uncorrelated and the atoms are allowed to freely oscillate with the dynamics being dominated by the drive $\Omega$ . Thus, after a short time, the inputs coming through a $z$ -field are largely irrelevant and the network is unable to hold the information about the first input. On the other hand, when $V>\Omega$ the atoms are largely correlated since neighboring excitations of Rydberg atoms are blockaded and the dynamics are slowed down. These slow dynamics in the system allow for longer memory times. In Sec. V.4, we will explore the longer-term memory in the blockaded regime and show that long-term memory in a reservoir can be stabilized due to the presence of quantum many-body scars.

V.4 Long-term Memory via Quantum Many-body Scars

Finally, we turn to examine a reservoirs ability to encode long-term memory. The task consists of encoding a classical bit $m$ in the initial state of a reservoir $|\psi_{m}(0)\rangle$ so that after the system is left to evolve under its inherent dynamics for a time $T$ , local measurements of the state $|\psi_{m}(T)\rangle$ are used to recover $m$ . However, $m$ cannot be recovered from local measurements if the dynamics obey the eigenstate-thermalization hypothesis (ETH) [63]. Instead, local measurements of $|\psi_{m}(T)\rangle$ obey thermal statistics described by the energy spectrum of the Hamiltonian and bear no information on the initial condition $|\psi_{m}(0)\rangle$ . Thus, reservoirs that violate the ETH are naturally suited for memory tasks, since they can locally retain information about their initial state. Indeed this notion has begun to be studied in quantum reservoirs [25, 27]. Recent experiments using quench dynamics in arrays of Rydberg atoms have revealed quantum many-body scaring behavior [35], which can be stabilized [46, 47] to delay the thermalization of the system. Here, we use these results to enlarge the memory lifetime of a reservoir. Simulation details are found in Appendix D.

In the case of a kicked ring of Rydberg atoms experiencing nearest-neighbor blockade, the dynamics are captured in the so-called PXP-model [35, 47, 64, 65]

	$\displaystyle H(t)$	$\displaystyle=H_{PXP}+\hat{N}\sum_{k\in\mathbb{Z}}\theta_{k}\delta(t-k\tau)$		(16)
	$\displaystyle H_{PXP}$	$\displaystyle=\Omega\sum_{n=1}^{N}P_{n-1}\sigma_{n}^{x}P_{n+1}\qquad\hat{N}=\sum_{n}\hat{n}_{n}$		(17)

where $P_{n}=|g\rangle\langle g|_{n}$ projects the atom at the $n^{th}$ site onto the ground state and we choose periodic boundary conditions to mitigate edge effects. In (16) we let $\theta_{k}=\pi+\epsilon_{k}$ where $\epsilon_{k}$ is a Gaussian random variable with mean $\epsilon$ and variance $\sigma_{in}^{2}$ . That is $\epsilon_{k}$ plays the role of added noise in the reservoir. For this discussion we let $\gamma=0$ since we know from experiments that the quantum scaring behavior is robust to the atom’s decoherence, and the choice to work with the Hamiltonian evolution helps speed up the acquisition of numerical data.

We denote $\chi_{\tau}=\exp{(-i\pi\hat{N})}\exp{(-i\tau H_{PXP})}$ . It has been empirically observed that $\chi_{\tau}$ approximately exchanges the Neel states $|AF\rangle=|1010...\rangle$ and $|AF^{\prime}\rangle=|0101...\rangle$ for $\tau\approx 1.51\pi\Omega^{-1}$ [35]. Note that $\chi_{\tau}\chi_{\tau}=\mathds{1}$ , and so under no noise, any state $|\psi\rangle$ is recovered after a cycle of evolution of $2\tau$ . However, the noise $\epsilon_{k}$ destroys the revival of all initial states except for $|AF\rangle$ and $|AF^{\prime}\rangle$ (see Appendix C). This leads to many-body quantum scars stabilized by the operator $\exp{(-i\pi\hat{N})}$ [46, 47].

Given the dynamics in (16), we propose the following scheme for encoding a binary memory $m\in\{0,1\}$ . We choose a reference state $|\psi\rangle$ , and let $|\psi_{0}(0)\rangle=|\psi\rangle$ and $|\psi_{1}(0)\rangle=\chi_{\tau}|\psi\rangle$ . Subsequently, the state $|\psi_{m}(0)\rangle$ is left to evolve for $n$ cycles of duration $2\tau=2(1.51\pi)$ after which the populations $\bm{r}_{m}(n)=(P_{g}(2n\tau|m),P_{r}(2n\tau|m))$ of the single-atom reduced density matrix are used to retrieve $m$ . The retrieval is done by training a vector $W^{\text{out}}_{n}$ on $M$ instances of $\bm{r}_{m}(n)$ in order to minimize (4) with $\bm{y}^{\text{targ}}=\bm{m}$ the binary vector of memories and $\bm{y}^{\text{out}}(n)=W_{n}^{\text{out}}\bm{r}(n)$ our networks’ output after $n$ cycles.

To quantify the quality of the memory retrieval $R(n)$ , we use the squared Pearson’s $r$ -factor

R(n)=\frac{\text{cov}^{2}(\bm{m},\bm{y}^{\text{out}}(n))}{\sigma^{2}(\bm{m})\sigma^{2}(\bm{y}^{\text{out}}(n))}.

(18)

Fig. 9(a) shows the memory retrieval error as a function of the number of cycles for three different choices of reference states. Fig. 9(b) shows the average entanglement entropy $(\bar{S}_{E})$ of the left-most atom in the ring. Saturation of $\bar{S}_{E}$ signals growth in the memory retrieval error as the state “forgets” the initial condition. From other studies, we see that memory is retrieved at longer times due to the slow thermalization of the Neel states due to quantum many-body scars [35, 47, 64, 65, 46, 47]. The time-crystalline nature of the reservoir using $|\psi\rangle=|AF\rangle$ signals long-time correlations, and thus the reservoir can be used to encode and predict series with long-time correlations [17].

The Neel states exhibit long-term memory due to the evolution’s scaring behavior. This can be understood by analyzing the average evolution produced by a single cycle. Up to second order in $\epsilon_{k}$ , the state at time $2\tau n$ , $\rho(n)$ , evolves to the state at time $2\tau(n+1)$ , $\rho(n+1)$ , where (see Appendix C)

$\displaystyle\rho(n+1)$	$\displaystyle=\rho(n)-i\epsilon[H^{+},\rho(n)]$
	$\displaystyle+\sigma^{2}_{in}\left(H^{+}\rho(n)H^{+}-\frac{1}{2}\{H^{+}H^{+},\rho(n)\}\right)$
	$\displaystyle+\sigma^{2}_{in}\left(H^{-}\rho(n)H^{-}-\frac{1}{2}\{H^{-}H^{-},\rho(n)\}\right).$	(19)

Here, $H^{\pm}=\hat{N}\pm\chi_{\tau}\hat{N}\chi_{\tau}$ are Hermitian operators. We can rewrite (V.4) as $\rho(n+1)=\rho(n)+\mathcal{L}_{\epsilon,\sigma}(\rho(n))$ . Since $[H^{+},\chi_{\tau}]=0$ , the operator $H^{+}$ has an emergent $\mathbb{Z}_{2}$ symmetry which means that the ground states of $H^{+}$ are well approximated by the states $|\pm\rangle=\frac{1}{\sqrt{2}}\left(|AF\rangle\pm|AF^{\prime}\rangle\right)$ [47]. Note that

	$\displaystyle H^{+}\|+\rangle$	$\displaystyle\approx N\|+\rangle,\quad$	$\displaystyle H^{-}\|+\rangle\approx 0,$		(20)
	$\displaystyle H^{+}\|-\rangle$	$\displaystyle\approx N\|+\rangle,\quad$	$\displaystyle H^{-}\|-\rangle\approx 0,$		(21)

were $N$ is the system size We conclude that if $\rho(n)=|AF\rangle\langle AF|$ then $\rho(n+1)\approx\rho(n)$ as this state is (approximately) in the kernel of $\mathcal{L}_{\epsilon,\sigma}$ . Therefore, the Neel states are suitable memory states.

Equation (V.4) also tells us that any density matrix in the kernel of $\mathcal{L}_{\epsilon,\sigma}$ may also serve as a memory state since it is a steady state of the evolution. This would allow us to enlarge the number of memories accessible in a qRNN. In Appendix C we show the existence of a large number of steady states, and we present a scheme to prepare a number of them. It’s worth noting, however, that these memories may have to be distinguished from one another via global measurements. The questions of how to efficiently prepare and distinguish these memory states remain importantly both open and key in telling us if a memory quantum advantage can be claimed in qRNNs. As it stands, using quantum scars signals that Rydberg-inspired RNNs may present enhanced memory since quantum scars are classically simulatable due to their low entanglement entropy. However, it’s unclear whether the system can be classically simulated at late times due to the onset of the thermalization. These questions are left for future studies.

Quite recently, another proposal to enlarge the number of memories accessible in a quantum reservoir has been introduced using the emergent scale-free network dynamics of a melting discrete time-crystal in an Ising chain [25]. The proposal in [25] can be seen as a generalization of the quantum reservoir presented in (16) by dropping the constraint of the Rydberg blockade. Our results, as well as those in [25], pose the possibility of having an RNN with a memory capacity that outpaces that of classical RNNs such as the Hopfield network [66].

VI Conclusions and outlook

In this article, we present a quantum extension of a classical RNN on binary neurons. This implies a deep connection between controllable many-body quantum systems and brain-inspired computational models. Our qRNN facilitates the ability to employ the analogue dynamics of quantum systems for computation instead of the circuit-based paradigm. We show how features of the quantum evolution of our qRNN can be used for quantum learning tasks, and to speedup simulation of stochastic dynamics. We implement a quantum reservoir using arrays of Rydberg atoms and show how Rydberg atoms analogously perform biological tasks even in the presence of a few atoms. This can be explained via the physics of the system. For example, we showed how weak-ergodicity breaking collective dynamics in Rydberg atoms can be employed for long-term memory.

While this article takes the first step forward in connecting controllable quantum systems and neural networks from a fundamental perspective, several questions remain unanswered. Firstly, from the first two quantum features hereby presented, studies of how qRNNs can be used for quantum error correction in circuit-like quantum computing are warranted. Directly from this work, investigations into advantageous stochastic processes in qRNNs that are robust to decoherence are enticing. These advantages will likely emerge from the collective behavior of quantum neurons. Therefore, the field will soon require a thorough understanding of the collective dissipative dynamics of neurons in qRNNs, which would also shed light on rigorous studies of the computational power of these architectures. Guided by the fact that neural networks become universal approximators by interconnecting many neurons, one may also consider the spatial and control requirements necessary for universal brain-inspired quantum machine learning.

Given the vast number of classical computational models for the brain, there are several immediate research directions. One of these is the exploration of a systematic way to quantize more biologically realistic models of a neural circuit. A possible starting point for translating different neural circuits would be to exploit key engineering and fundamental features of different NISQ platforms. For example, recent experiments using Rydberg atoms in photonic cavities may provide us with the ability to capture neural plasticity on qRNNs by arbitrarily tuning the inter-neural interactions [67]. Likewise, superconducting circuits have lately been used to encode biologically realistic single-neuron models [13]. Along these explorations, it will be imperative to establish a variety of methods to analyze how quantum neural networks recover the classical protocols within certain limits, as well as the source and extent of the quantum advantages that each platform can offer.

Lastly, while our memory encoding scheme in Sec. V.4 offers a possibility to encode a binary memory, whether a higher number of memories can be encoded efficiently remains an important open question. In Appendix C we offer a proposal based on the steady states of the effective dissipative evolution in the pre-thermalization regime introduced by the noise in the qRNN. This already shows a theoretical number of memories greater than those attainable by the vanilla Hopfield network [66]. However, distinguishing these memories, or producing Hamiltonians with the desired memory state in mind, is left for future research. It is clear, however, that memory in a quantum reservoir relies on ergodicity breaking dynamics [25, 27]. Hamiltonian engineering techniques, together with more general driven Hamiltonians such as those in [25], may pave the way towards programmable memories in a qRNN.

ACKNOWLEDGMENTS

The authors thank Mikhail. D. Lukin and Nishad Maskara for insightful discussion. RAB acknowledges support from NSF Graduate Research Fellowship under Grant No. DGE1745303, as well as funding from Harvard University’s Graduate Prize Fellowship. XG acknowledges support from Quantum Science of the Harvard-MPQ Center for Quantum Optics, the Templeton Religion Trust grant TRT 0159, and by the Army Research Office under Grant W911NF1910302 and MURI Grant W911NF-20-1-0082. SFY acknowledges funding from NSF and AFOSR.

References

Biamonte et al. [2017] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Quantum machine learning, Nature 549, 195 (2017).
Harrow et al. [2009] A. W. Harrow, A. Hassidim, and S. Lloyd, Quantum algorithm for linear systems of equations, Physical Review Letters 103, 10.1103/physrevlett.103.150502 (2009).
Wiebe et al. [2012] N. Wiebe, D. Braun, and S. Lloyd, Quantum algorithm for data fitting, Physical Review Letters 109, 10.1103/physrevlett.109.050505 (2012).
Low et al. [2014] G. H. Low, T. J. Yoder, and I. L. Chuang, Quantum inference on bayesian networks, Physical Review A 89, 10.1103/physreva.89.062315 (2014).
Lloyd et al. [2014] S. Lloyd, M. Mohseni, and P. Rebentrost, Quantum principal component analysis, Nature Physics 10, 631 (2014).
McClean et al. [2018] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, and H. Neven, Barren plateaus in quantum neural network training landscapes, Nature communications 9, 1 (2018).
Cerezo et al. [2021] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. Coles, Cost function dependent barren plateaus in shallow parametrized quantum circuits, Nature communications 12, 1 (2021).
Marković and Grollier [2020] D. Marković and J. Grollier, Quantum neuromorphic computing, Applied Physics Letters 117, 150501 (2020).
Kiraly et al. [2021] B. Kiraly, E. J. Knol, W. M. J. van Weerdenburg, H. J. Kappen, and A. A. Khajetoorians, An atomic Boltzmann machine capable of self-adaption, Nature Nanotechnology 10.1038/s41565-020-00838-4 (2021).
Mujal et al. [2021] P. Mujal, R. Martínez-Peña, J. Nokkala, J. García-Beni, G. L. Giorgi, M. C. Soriano, and R. Zambrini, Opportunities in quantum reservoir computing and extreme learning machines, arXiv preprint arXiv:2102.11831 (2021).
Pfeiffer et al. [2016] P. Pfeiffer, I. Egusquiza, M. Di Ventra, M. Sanz, and E. Solano, Quantum memristors, Scientific reports 6, 1 (2016).
Gonzalez-Raya et al. [2019] T. Gonzalez-Raya, X.-H. Cheng, I. L. Egusquiza, X. Chen, M. Sanz, and E. Solano, Quantized single-ion-channel hodgkin-huxley model for quantum neurons, Physical Review Applied 12, 014037 (2019).
Gonzalez-Raya et al. [2020] T. Gonzalez-Raya, E. Solano, and M. Sanz, Quantized three-ion-channel neuron model for neural action potentials, Quantum 4, 224 (2020).
Torrontegui and García-Ripoll [2019] E. Torrontegui and J. J. García-Ripoll, Unitary quantum perceptron as efficient universal approximator, EPL (Europhysics Letters) 125, 30004 (2019).
Fujii and Nakajima [2020] K. Fujii and K. Nakajima, Quantum reservoir computing: a reservoir approach toward quantum machine learning on near-term quantum devices, arXiv preprint arXiv:2011.04890 (2020).
Nakajima et al. [2019] K. Nakajima, K. Fujii, M. Negoro, K. Mitarai, and M. Kitagawa, Boosting computational power through spatial multiplexing in quantum reservoir computing, Phys. Rev. Applied 11, 034021 (2019).
Kutvonen et al. [2020] A. Kutvonen, K. Fujii, and T. Sagawa, Optimizing a quantum reservoir computer for time series prediction, Scientific Reports 10, 14687 (2020).
Ghosh et al. [2019a] S. Ghosh, A. Opala, M. Matuszewski, T. Paterek, and T. C. H. Liew, Quantum reservoir processing, npj Quantum Information 5, 10.1038/s41534-019-0149-8 (2019a).
Khan et al. [2021] S. A. Khan, F. Hu, G. Angelatos, and H. E. Türeci, Physical reservoir computing using finitely-sampled quantum systems, arXiv preprint arXiv:2110.13849 (2021).
Ghosh et al. [2019b] S. Ghosh, T. Paterek, and T. C. H. Liew, Quantum neuromorphic platform for quantum state preparation, Phys. Rev. Lett. 123, 260404 (2019b).
Govia et al. [2021] L. Govia, G. Ribeill, G. Rowlands, H. Krovi, and T. Ohki, Quantum reservoir computing with a single nonlinear oscillator, Physical Review Research 3, 013077 (2021).
Nokkala et al. [2021] J. Nokkala, R. Martínez-Peña, G. L. Giorgi, V. Parigi, M. C. Soriano, and R. Zambrini, Gaussian states of continuous-variable quantum systems provide universal and versatile reservoir computing, Communications Physics 4, 1 (2021).
Ghosh et al. [2021] S. Ghosh, T. Krisnanda, T. Paterek, and T. C. Liew, Realising and compressing quantum circuits with quantum reservoir computing, Communications Physics 4, 1 (2021).
Mujal [2022] P. Mujal, Quantum reservoir computing for speckle-disorder potentials, arXiv preprint arXiv:2201.11096 (2022).
Sakurai et al. [2021] A. Sakurai, M. P. Estarellas, W. J. Munro, and K. Nemoto, Quantum reservoir computation utilising scale-free networks, arXiv preprint arXiv:2108.12131 (2021).
Xia et al. [2022] W. Xia, J. Zou, X. Qiu, and X. Li, The reservoir learning power across quantum many-body localization transition, Frontiers of Physics 17, 1 (2022).
Martínez-Peña et al. [2021] R. Martínez-Peña, G. L. Giorgi, J. Nokkala, M. C. Soriano, and R. Zambrini, Dynamical phase transitions in quantum reservoir computing, Phys. Rev. Lett. 127, 100502 (2021).
Coolen [2001] A. Coolen, Statistical mechanics of recurrent neural networks I—statics, in Handbook of biological physics, Vol. 4 (Elsevier, 2001) Chap. 14, pp. 553–618.
Suzuki et al. [2022] Y. Suzuki, Q. Gao, K. C. Pradel, K. Yasuoka, and N. Yamamoto, Natural quantum reservoir computing for temporal information processing, Scientific Reports 12, 1 (2022).
Saffman et al. [2010] M. Saffman, T. G. Walker, and K. Mølmer, Quantum information with Rydberg atoms, Rev. Mod. Phys. 82, 2313 (2010).
Lester et al. [2015] B. J. Lester, N. Luick, A. M. Kaufman, C. M. Reynolds, and C. A. Regal, Rapid production of uniformly filled arrays of neutral atoms, Physical review letters 115, 073003 (2015).
Barredo et al. [2016] D. Barredo, S. De Léséleuc, V. Lienhard, T. Lahaye, and A. Browaeys, An atom-by-atom assembler of defect-free arbitrary two-dimensional atomic arrays, Science 354, 1021 (2016).
Endres et al. [2016] M. Endres, H. Bernien, A. Keesling, H. Levine, E. R. Anschuetz, A. Krajenbrink, C. Senko, V. Vuletic, M. Greiner, and M. D. Lukin, Atom-by-atom assembly of defect-free one-dimensional cold atom arrays, Science 354, 1024 (2016).
Labuhn et al. [2016] H. Labuhn, D. Barredo, S. Ravets, S. De Léséleuc, T. Macrì, T. Lahaye, and A. Browaeys, Tunable two-dimensional arrays of single rydberg atoms for realizing quantum ising models, Nature 534, 667 (2016).
Bernien et al. [2017] H. Bernien, S. Schwartz, A. Keesling, H. Levine, A. Omran, H. Pichler, S. Choi, A. S. Zibrov, M. Endres, M. Greiner, V. Vuletić, and M. D. Lukin, Probing many-body dynamics on a 51-atom quantum simulator, Nature 551, 579 (2017).
Cooper et al. [2018] A. Cooper, J. P. Covey, I. S. Madjarov, S. G. Porsev, M. S. Safronova, and M. Endres, Alkaline-earth atoms in optical tweezers, Physical Review X 8, 041055 (2018).
Wilson et al. [2022] J. Wilson, S. Saskin, Y. Meng, S. Ma, R. Dilip, A. Burgers, and J. Thompson, Trapping alkaline earth rydberg atoms optical tweezer arrays, Physical Review Letters 128, 033201 (2022).
Ebadi et al. [2020] S. Ebadi, T. T. Wang, H. Levine, A. Keesling, G. Semeghini, A. Omran, D. Bluvstein, R. Samajdar, H. Pichler, W. W. Ho, et al., Quantum phases of matter on a 256-atom programmable quantum simulator (2020), arXiv:2012.12281 [quant-ph] .
Isenhower et al. [2010] L. Isenhower, E. Urban, X. Zhang, A. Gill, T. Henage, T. A. Johnson, T. Walker, and M. Saffman, Demonstration of a neutral atom controlled-not quantum gate, Physical review letters 104, 010503 (2010).
Pichler et al. [2018] H. Pichler, S.-T. Wang, L. Zhou, S. Choi, and M. D. Lukin, Quantum optimization for maximum independent set using Rydberg atom arrays (2018), arXiv:1808.10816 [quant-ph] .
Omran et al. [2019] A. Omran, H. Levine, A. Keesling, G. Semeghini, T. Wang, S. Ebadi, H. Bernien, A. Zibrov, H. Pichler, S. Choi, et al., Generation and manipulation of Schrödinger cat states in Rydberg atom arrays, Science 365, 570 (2019).
Henriet et al. [2020] L. Henriet, L. Beguin, A. Signoles, T. Lahaye, A. Browaeys, G.-O. Reymond, and C. Jurczak, Quantum computing with neutral atoms, Quantum 4, 327 (2020).
Cohen and Thompson [2021] S. R. Cohen and J. D. Thompson, Quantum computing with circular rydberg atoms, PRX Quantum 2, 030322 (2021).
Song et al. [2016] H. F. Song, G. R. Yang, and X.-J. Wang, Training excitatory-inhibitory recurrent neural networks for cognitive tasks: A simple and flexible framework, PLOS Computational Biology 12, 1 (2016).
Han and Gallagher [2009] J. Han and T. F. Gallagher, Millimeter-wave rubidium Rydberg van der waals spectroscopy, Phys. Rev. A 79, 053409 (2009).
Bluvstein et al. [2021] D. Bluvstein, A. Omran, H. Levine, A. Keesling, G. Semeghini, S. Ebadi, T. T. Wang, A. A. Michailidis, N. Maskara, W. W. Ho, et al., Controlling quantum many-body dynamics in driven rydberg atom arrays, Science 371, 1355 (2021).
Maskara et al. [2021] N. Maskara, A. A. Michailidis, W. W. Ho, D. Bluvstein, S. Choi, M. D. Lukin, and M. Serbyn, Discrete time-crystalline order enabled by quantum many-body scars: entanglement steering via periodic driving, arXiv preprint arXiv:2102.13160 (2021).
Bausch [2020] J. Bausch, Recurrent quantum neural networks, arXiv preprint arXiv:2006.14619 (2020).
Wu and Yang [2007] Y. Wu and X. Yang, Strong-coupling theory of periodically driven two-level systems, Phys. Rev. Lett 98, 013601 (2007).
Ghafari et al. [2019] F. Ghafari, N. Tischler, J. Thompson, M. Gu, L. K. Shalm, V. B. Verma, S. W. Nam, R. B. Patel, H. M. Wiseman, and G. J. Pryde, Dimensional quantum memory advantage in the simulation of stochastic processes, Phys. Rev. X 9, 041013 (2019).
Wolpert et al. [2019] D. H. Wolpert, A. Kolchinsky, and J. A. Owen, A space–time tradeoff for implementing a function with master equation dynamics, Nature communications 10, 1 (2019).
Korzekwa and Lostaglio [2021] K. Korzekwa and M. Lostaglio, Quantum advantage in simulating stochastic processes, Phys. Rev. X 11, 021019 (2021).
Blank et al. [2021] C. Blank, D. K. Park, and F. Petruccione, Quantum-enhanced analysis of discrete stochastic processes, npj Quantum Information 7, 1 (2021).
Burkitt [2006] A. N. Burkitt, A review of the integrate-and-fire neuron model: I. Homogeneous synaptic input, Biological Cybernetics 95, 1 (2006).
Jaeger [2002] H. Jaeger, Short term memory in echo state networks, in GMD - German National Research Institute for Computer Science (GMD-Report 152, 2002).
Capano et al. [2015] V. Capano, H. J. Herrmann, and L. de Arcangelis, Optimal percentage of inhibitory synapses in multi-task learning, Scientific Reports 5, 10.1038/srep09895 (2015).
Eccles et al. [1954] J. C. Eccles, P. Fatt, and K. Koketsu, Cholinergic and inhibitory synapses in a pathway from motor-axon collaterals to motoneurones, The Journal of Physiology 126, 524 (1954).
Weber et al. [2017] S. Weber, C. Tresp, H. Menke, A. Urvoy, O. Firstenberg, H. P. Büchler, and S. Hofferberth, Tutorial: Calculation of Rydberg interaction potentials, J. Phys. B: At. Mol. Opt. Phys. 50, 133001 (2017).
Roitman and Shadlen [2002] J. D. Roitman and M. N. Shadlen, Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task, The Journal of Neuroscience 22, 9475 (2002).
Govia et al. [2022] L. Govia, G. Ribeill, G. Rowlands, and T. Ohki, Nonlinear input transformations are ubiquitous in quantum reservoir computing, Neuromorphic Computing and Engineering 2, 014008 (2022).
Urban et al. [2009] E. Urban, T. A. Johnson, T. Henage, L. Isenhower, D. Yavuz, T. Walker, and M. Saffman, Observation of rydberg blockade between two atoms, Nature Physics 5, 110 (2009).
Gaetan et al. [2009] A. Gaetan, Y. Miroshnychenko, T. Wilk, A. Chotia, M. Viteau, D. Comparat, P. Pillet, A. Browaeys, and P. Grangier, Observation of collective excitation of two individual atoms in the rydberg blockade regime, Nature Physics 5, 115 (2009).
D’Alessio et al. [2016] L. D’Alessio, Y. Kafri, A. Polkovnikov, and M. Rigol, From quantum chaos and eigenstate thermalization to statistical mechanics and thermodynamics, Advances in Physics 65, 239 (2016).
Fendley et al. [2004] P. Fendley, K. Sengupta, and S. Sachdev, Competing density-wave orders in a one-dimensional hard-boson model, Physical Review B 69, 075106 (2004).
Lesanovsky and Katsura [2012] I. Lesanovsky and H. Katsura, Interacting fibonacci anyons in a rydberg gas, Physical Review A 86, 041601 (2012).
Folli et al. [2017] V. Folli, M. Leonetti, and G. Ruocco, On the maximum storage capacity of the hopfield model, Frontiers in computational neuroscience 10, 144 (2017).
Periwal et al. [2021] A. Periwal, E. S. Cooper, P. Kunkel, J. F. Wienand, E. J. Davis, and M. Schleier-Smith, Programmable interactions and emergent geometry in an atomic array, arXiv preprint arXiv:2106.04070 (2021).
Goodman [1970] G. S. Goodman, An intrinsic time for non-stationary finite Markov chains, Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 16, 165 (1970).
Reiter and Sørensen [2012] F. Reiter and A. S. Sørensen, Effective operator formalism for open quantum systems, Physical Review A 85, 032111 (2012).
Nelder and Mead [1965] J. A. Nelder and R. Mead, A simplex method for function minimization, The computer journal 7, 308 (1965).
Serbyn et al. [2021] M. Serbyn, D. A. Abanin, and Z. Papić, Quantum many-body scars and weak breaking of ergodicity, Nature Physics 17, 675 (2021).

Appendix A Probability transformations using qRNNs

In the case of of the RNN presented in (II), using Ref. [28] we can derive that $P(\bm{s}|\bm{s}^{\prime})$ in (8) is given by

P(\bm{s}|\bm{s}^{\prime})=\prod_{i=1}^{N}\frac{1}{2}\left(1+s_{i}g\left[h_{i}(\bm{s}^{\prime})/\sigma^{2}_{in}-1\right]\right)

where $g\left[x\right]=\text{Erf}\left[z/\sqrt{2}\right]$ is the error function due to the Gaussian noise. Regarding the task in Sec. III.1 of flipping all neurons at once, one could naively think that this can be done classically by taking the inputs $\Delta_{n}\rightarrow\infty$ , however, since the noise’s strength $\sigma_{in}^{2}$ scales as the size of the inputs, one obtains $P(\bm{s}|\bm{s}^{\prime})\rightarrow\prod_{i=1}^{N}\frac{1}{2}(1+s_{i}/2)$ which is a completely random update independent of the original state.

A transition matrix $L$ obeys $L_{\bm{s}^{\prime}|\bm{s}}\geq 0$ and $\sum_{\bm{s}^{\prime}}L_{\bm{s}^{\prime}|\bm{s}}=1$ . $L$ is said to be classically embeddable if it can be generated by a continuous Markov process via

\frac{d}{dt}P(t)=K(t)P(t),\quad P(0)=\mathds{1},\text{ }P(t_{f})=L,

(A.22)

where $K$ is called a generator matrix that preserves the positive nature of $P$ via the constraint $K_{\bm{s}|\bm{s}^{\prime}}\geq 0$ for $\bm{s}\neq\bm{s}^{\prime}$ , and normalization via the constraint $\sum_{\bm{s}}K_{\bm{s}|\bm{s}^{\prime}}=0$ . Applied to our setup, a classically embeddable stochastic process is one that can transform $p_{t_{f}}=Lp_{0}$ via an RNN without employing any hidden neurons (i.e. $M=N$ neurons are used for readout), and in a single step. In general, determining if a matrix $L$ is embeddable is an open question, but any embeddable matrix must necessarily satisfy [68]

\prod_{\bm{s}}L_{\bm{s}|\bm{s}}\geq\text{det}L\geq 0.

(A.23)

From (A.23), it immediately follows that the global “spin-flip” matrix $F$ defined in (9) is not classically embeddable. That is, $\det F=1$ and $\prod_{\bm{s}}F_{\bm{s}|\bm{s}}=0$ , violating (A.23). Notice that the impossibility of performing $F$ without hidden neurons is quite general, and it is not limited to the stochastic process allowed by (II). Moreover, the number of time-steps needed to achieve $F$ using $m$ hidden neurons is of order $\mathcal{O}(2^{N-m})$ (for details, see Sec. III.A in Ref. [52]).

Similar definitions of embeddability exist in the quantum setting. A stochastic process $L$ is said to be quantum embeddable if there exists a Markovian quantum channel $\mathcal{E}$ such that

L_{\bm{s}^{\prime}|\bm{s}}=\langle\bm{s}^{\prime}|\mathcal{E}(|\bm{s}\rangle\langle\bm{s}|)|\bm{s}^{\prime}\rangle.

(A.24)

A Markovian quantum channel $\mathcal{E}$ is a channel arising from the time-evolution under a master equation, and thus $\mathcal{E}$ may include unitary and dissipative terms. As pointed out in Ref. [52], all classically embeddable stochastic processes. Moreover, permutations such as $F$ in (9) are quantum embeddable since all permutations are unitary operators.

We highlight that realizing $F$ is extremely sensitive to the decoherence arising from spontaneous emission, a main source of noise in NISQ devices. If $\gamma$ is the decay rate at which spin $|1\rangle$ relaxes to $|\text{-}1\rangle$ , one can show that the unitary evolution leads to the stochastic process $F^{\gamma}$ where $\text{det}F^{\gamma}=e^{-\mathcal{O}(2^{N})}$ . Notice that whether $F^{\gamma}$ violates (A.23) becomes rapidly inconclusive with increasing system size.

Appendix B Continous-time dynamics for a qRNN

A successful neural circuit model is the integrate and fire RNN (IF-RNN). In an IF-RNN each of the $N$ neurons is influenced by pre-synaptic firing rates and produces a post-synaptic firing rate as an output. Each neuron is endowed with a firing rate $s_{n}(t)$ , where $n$ denotes the $n^{th}$ neuron. The pre-synaptic firing rates arriving at the $n^{th}$ neuron are integrated to produce a pre-synaptic current $I_{n}(t)$ . In turn, the neuron produces a firing rate $s_{n}$ influenced by its current and the firings of other neurons. Additionally, each neuron can receive a temporal input stimulus $\Delta^{in}_{n}(t)$ which affects both the currents and the firing rates. Generally, the firing rates and currents are described by non-linear, coupled differential equations of the form

	$\displaystyle\dot{I}_{n}$	$\displaystyle=-\tau_{I}^{-1}I_{n}+G_{n}(\bm{s}(t),\bm{I}(t),J_{nm},\bm{\Delta}^{in}(t))$		(B.25)
	$\displaystyle\dot{s}_{n}$	$\displaystyle=-\tau^{-1}_{s}s_{n}+F_{n}(\bm{s}(t),\bm{I}(t),J_{nm},\bm{\Delta}^{in}(t))$		(B.26)

where $\tau_{I,r}$ are relaxation time constants for the currents and firing rates respectively. The vector $\bm{s}(t)$ is defined as $\bm{s}(t)=(s_{1}(t),...,s_{N}(t))$ , with $\bm{I}(t)$ , and $\bm{\Delta}^{in}(t)$ defined analogously. The functions $G$ and $F$ ensure the dynamics are non-linear which gives RNNs their vast computational complexity. The specific forms of $G$ and $F$ depend on the application and relation between the currents and the firing rates one is trying to capture by the model.

The qRNN in Sec. III.2 follows the Heisenberg-Langevine equations of motion

	$\displaystyle\dot{A}=i[H,A]$	$\displaystyle+\sum_{n}\left(\frac{\gamma}{2}\sigma_{n}^{+}+f_{n}^{\dagger}\right)[A,\sigma_{n}^{-}]$
		$\displaystyle+\sum_{n}[A,\sigma_{n}^{+}]\left(\frac{\gamma}{2}\sigma_{n}^{-}+f_{n}\right)$		(B.27)

for any operator $A$ . In (B), $\sigma_{n}^{+}=|1\rangle\langle\text{-}1|_{n}$ , $\sigma^{+}_{n}=(\sigma_{n}^{+})^{\dagger}$ , and $f_{n}$ is a Langevin noise operator with Gaussian statistics $\left\langle f_{n}(t)\right\rangle=0$ and $\left\langle f_{n}(t)f_{m}^{\dagger}(t^{\prime})\right\rangle\propto\delta_{mn}\delta(t-t^{\prime})$ . In the equation above $[A,B]\equiv AB-BC$ stands for the commutator between matrices $A$ and $B$ .

To extract the statistics of the system, one may choose to look at the dynamics of two different local observables’ expectation values. For example, the equations of motion for expectations of the local Pauli operators $\sigma^{x}_{n}=|\text{-1}\rangle\langle 1|_{n}+|1\rangle\langle\text{-1}|_{n}$ , and $\sigma^{y}_{n}=i|\text{-1}\rangle\langle 1|_{n}-i|1\rangle\langle\text{-1}|_{n}$ are given by

	$\displaystyle\dot{\left\langle\sigma^{x}_{n}\right\rangle}=-\frac{\gamma}{2}\left\langle\sigma^{x}_{n}\right\rangle+i\left\langle\left[H(t),\sigma_{n}^{x}\right]\right\rangle$		(B.28)
	$\displaystyle\dot{\left\langle\sigma^{y}_{n}\right\rangle}=-\frac{\gamma}{2}\left\langle\sigma^{y}_{n}\right\rangle+i\left\langle\left[H(t),\sigma_{n}^{y}\right]\right\rangle$		(B.29)

with $H(t)$ specified by (III.1). The expectation values are calculated in the quantum-mechanical sense such that for an operator $A$ , $\left\langle A\right\rangle=\text{Tr}(A\rho)$ , and terms linear in $f_{n}$ cancel out. Notice that the commutators in equations (B.28)-(B.29) play the role of the functions $G$ and $F$ in (B.25)-(B.26).

For $\sigma_{n}^{z}$ , (B) gives

\dot{\sigma^{z}_{n}}=-\gamma/2\sigma^{z}_{n}-\frac{\Omega}{2}\sigma_{n}^{y}+\gamma\mathds{1}/2-2f_{n}^{\dagger}\sigma_{n}^{-}.

(B.30)

This can be integrated out to give

	$\displaystyle\sigma_{n}^{z}(t)-\sigma_{n}^{z}(0)=$	$\displaystyle\int_{0}^{t}dt^{\prime}e^{-\gamma/2(t-t^{\prime})}\biggl{(}-\frac{\Omega}{2}\sigma_{n}^{y}(t^{\prime})$
		$\displaystyle-2f_{n}^{\dagger}(t^{\prime})\sigma_{n}^{-}(t^{\prime})+\gamma/2\mathds{1}\biggr{)}$		(B.31)

We choose to start the network at $\left\langle\sigma^{z}_{n}\right\rangle=-1$ for all $n$ . We plug this back into (B.29), and we take the expectation values to eliminate terms linear in $f_{n}$ . We obtain

	$\displaystyle\dot{\left\langle\sigma^{y}_{n}\right\rangle}=$	$\displaystyle-\frac{\gamma}{2}\left\langle\sigma_{n}^{y}\right\rangle+\frac{\Omega}{2}\sum_{m=1}^{N}J_{nm}\int_{0}^{t}dt^{\prime}e^{-\gamma(t-t^{\prime})}\left\langle\sigma^{x}_{n}(t)\sigma^{y}_{m}(t^{\prime})\right\rangle$
		$\displaystyle+\Delta_{n}(t)\left\langle\sigma_{n}^{x}\right\rangle-\frac{\Omega^{2}}{4}\int_{0}^{t}\left\langle\sigma^{y}_{n}(t^{\prime})\right\rangle e^{-\gamma(t-t^{\prime})}dt^{\prime}.$		(B.32)

Similar equations can be found for $\left\langle\sigma_{n}^{x}\right\rangle$ . Equation (B) tells us that $\left\langle\sigma_{n}^{y}\right\rangle$ depends on past statistics, and thus our network has a memory time bounded by $1/\gamma$ . Let $J$ denote the matrix $J_{nm}$ . For $\gamma t\gg 1$ , we can extend the lower bound of integration to $-\infty$ . Using the approximation $\int_{-\infty}^{t}e^{-\gamma(t-t^{\prime})}f(t^{\prime})dt^{\prime}\approx-\gamma^{-1}f(t)$ , we obtain

$\displaystyle\dot{\left\langle\sigma_{n}^{x}\right\rangle}$	$\displaystyle=-\frac{\gamma}{2}\left\langle\sigma_{n}^{x}\right\rangle-\Delta_{n}^{in}\left\langle\sigma_{n}^{y}\right\rangle$
	$\displaystyle-\frac{\Omega}{2\gamma}\sum_{m}J_{nm}\left\langle\sigma_{n}^{y}\sigma_{m}^{y}\right\rangle$	(B.33)
$\displaystyle\dot{\left\langle\sigma_{n}^{y}\right\rangle}$	$\displaystyle=-\left(\frac{\gamma}{2}+\frac{\Omega^{2}}{4\gamma}\right)\left\langle\sigma_{n}^{y}\right\rangle+\Delta_{n}^{in}\left\langle\sigma_{n}^{x}\right\rangle$
	$\displaystyle-\frac{\Omega}{2\gamma}\sum_{m}J_{nm}\left\langle\sigma_{n}^{x}\sigma_{m}^{y}\right\rangle$	(B.34)

thus leading to (III.2). In (B.34)-(B.33), the time dependence is implied.

Let us now define $s_{n}(t)\equiv\left\langle\sigma^{y}_{n}(t)\right\rangle$ and $I_{n}(t)\equiv\left\langle\sigma^{x}_{n}(t)\right\rangle$ so that $\bm{s}(t)=(s_{1}(t),...,s_{N}(t))$ and $\bm{I}(t)=(I_{1}(t),...,I_{N}(t))$ . We see that (B.34)-(B.33) match (B.25)-(B.26) where

	$\displaystyle G_{n}$	$\displaystyle=-\Delta_{n}s_{n}-\frac{\Omega}{2\gamma}\sum_{m}J_{nm}\left\langle I_{n}I_{m}\right\rangle,$	$\displaystyle\tau_{I}^{-1}=\frac{\gamma}{2},$		(B.35)
	$\displaystyle F_{n}$	$\displaystyle=\Delta_{n}I_{n}-\frac{\Omega}{2\gamma}\sum_{m}J_{nm}\left\langle I_{n}s_{m}\right\rangle,$	$\displaystyle\tau_{s}^{-1}=\frac{\gamma}{2}+\frac{\Omega^{2}}{4\gamma}.$		(B.36)

Equations (B.33)-(B.34) allow us to naturally interpret $\left\langle\sigma_{n}^{y}\right\rangle$ as the firing rate of the $n^{th}$ neuron, and $\left\langle\sigma_{n}^{x}\right\rangle$ as the current. That is, the rate of the pre-synaptic neuron $\left\langle\sigma^{y}_{k}\right\rangle$ amounts to a current in the post-synaptic neuron $\left\langle\sigma^{x}_{n}\right\rangle$ that drives its rate $\left\langle\sigma_{n}^{y}\right\rangle$ .

Equations (B.33)-(B.34) comprise a system of coupled quadratic differential equations, where the quadratic terms arise from the nontrivial commutation relation of the Pauli-operators $[\sigma_{n}^{\alpha},\sigma_{m}^{\beta}]=i\delta_{\alpha\beta}\epsilon_{\alpha\beta\gamma}\sigma_{n}^{\gamma}$ where $\epsilon_{\alpha\beta\gamma}$ is the Levi-Civita symbol. These quadratic terms in (B.33)-(B.34) make a qRNN a powerful computational system similar to how the functions $G$ and $F$ make an RNN a powerful computational system.

Appendix C Memory and quantum many-body scars

As described in the main text, and more thoroughly discussed in Ref. [47], the scaring behavior of the kicked PXP-model is robust to fixed imperfections in the drive. The robustness persist even for random noise. Fig. 10 exemplifies the overlap with the initial condition for a noisy, kicked PXP-model for different values of $\epsilon$ and $\sigma_{in}$ , which is a natural extension of the model in [47]. The Neel state $|AF\rangle$ exhibits robust revivals invariant of $\sigma^{2}_{in}$ . This fact can be explained with the effective theory presented below.

To understand the robustness of the quantum scaring behavior in the Rydberg reservoir it is instructive to seek an effective description of the system’s evolution. Recall that a cycle is defined as two imperfect applications of $\chi_{\tau}$ . The Hamiltonian in (16) produces the single-cycle unitary

U_{\tau}(\epsilon_{1},\epsilon_{2})=e^{-i\epsilon_{2}\hat{N}}\chi_{\tau}e^{-i\epsilon_{1}\hat{N}}\chi_{\tau}=e^{-i\epsilon_{2}\hat{N}}e^{-i\epsilon_{1}\chi_{\tau}\hat{N}\chi_{\tau}}

(C.37)

where we use the fact that $\chi_{\tau}$ is both Hermitian and unitary. Using the Baker-Campbell-Hausdorf formula to second order in $\epsilon_{i}$ , we can rewrite (C.37) as

U_{\tau}(\epsilon_{1},\epsilon_{2})\approx e^{-i(\epsilon_{2}\hat{N}+\epsilon_{1}\chi_{\tau}\hat{N}\chi_{\tau})}.

(C.38)

A state $\rho(n)$ evolves to $\rho(n+1)=U_{\tau}(\epsilon_{1},\epsilon_{2})\rho(n)U^{\dagger}_{\tau}(\epsilon_{1},\epsilon_{2})$ after a cycle. Expanding this to second order in $\epsilon_{k}$ and using the fact that $\langle\epsilon_{k}\rangle=\epsilon$ and $\langle\epsilon_{k}\epsilon_{l}\rangle=\sigma_{in}^{2}\delta_{kl}$ , we obtain the average evolution of the state

$\displaystyle\rho(n+1)-\rho(n)$	$\displaystyle=-i\varepsilon[H^{+},\rho(n)]$
	$\displaystyle+\sigma_{in}^{2}\left(\hat{N}\rho(n)\hat{N}-\frac{1}{2}\{\hat{N}^{2},\rho(n)\}\right)$
	$\displaystyle+\sigma_{in}^{2}\left(\chi_{\tau}\hat{N}\chi_{\tau}\rho(n)\chi_{\tau}\hat{N}\chi_{\tau}\right.$
	$\displaystyle\left.-\frac{1}{2}\{\chi_{\tau}\hat{N}^{2}\chi_{\tau},\rho(n)\}\right).$	(C.39)

Here, $\{A,B\}=AB+BA$ denote commutators and anti-commutators respectively. We define $H^{+}=\hat{N}+\chi_{\tau}\hat{N}\chi_{\tau}$ . For times $T\gg 2\tau$ , we can take (C) to be a Lindbladian evolution since the noise satisfies the Markovian properties. We can rewrite (C) as

$\displaystyle\dot{\rho}$	$\displaystyle=\mathcal{L}_{\epsilon,\sigma}(\rho)$	(C.40)
$\displaystyle\mathcal{L}_{\epsilon,\sigma}(\cdot)$	$\displaystyle=-i\frac{\varepsilon}{2\tau}[H^{+},\cdot]+\frac{\sigma_{in}^{2}}{2\tau}D^{+}(\cdot)+\frac{\sigma_{in}^{2}}{2\tau}D^{-}(\cdot)$	(C.41)
$\displaystyle D^{\pm}(\cdot)$	$\displaystyle=H^{\pm}\cdot H^{\pm}+\frac{1}{2}\{H^{\pm}H^{\pm},\cdot\}$	(C.42)

where $H^{-}=\hat{N}-\chi_{\tau}\hat{N}\chi_{\tau}$ . For $\tau=1.51\pi$ , the Neel states are approximately simultaneous eigenstates of $\chi_{\tau}\hat{N}\chi_{\tau}$ and $\hat{N}$ with eigenvalues $N$ for a system of size $N$ . Thus, they are simultaneous eigenstates of $H^{\pm}$ and so

\mathcal{L}_{\epsilon,\sigma}(|AF\rangle\langle AF|)\approx 0,\quad\mathcal{L}_{\epsilon,\sigma}(|AF^{\prime}\rangle\langle AF^{\prime}|)\approx 0.

(C.43)

Therefore, the Neel states are steady states. It is worth noting that $\mathcal{L}_{\epsilon,\sigma}$ captures the pre-thermal evolution. Ultimately, higher-order effects in $\epsilon_{k}$ takeover and lead to the thermalization of the Neel states similar to the results in [47] and as seen in Fig. 9. Nonetheless, the thermalization of the Neel states is delayed relative to other states due to (C.43).

Moreover, any density matrix $\rho_{ss}$ in the kernel of $\mathcal{L}_{\epsilon,\sigma}$ can be used as a memory state. Expressing $\mathcal{L}_{\epsilon,\sigma}$ as a super-operator on density matrices, we can look at its spectrum which is in general complex. Fig. 11 shows the number of zero eigenvalues of $\mathcal{L}_{\epsilon,\sigma}$ for different system sizes $N$ . The number of zeros scales larger than linearly on $N$ . Therefore, a quantum reservoir evolving under $\mathcal{L}_{\epsilon,\sigma}$ may have a larger number of memory states than a classical RNN. To prepare these states, we propose to initialize the reservoir on different string configurations $|s\rangle$ satisfying the Rydberg blockade constraint. For example, one can have $s=rgg..g$ while $s=rrg...g$ is not allowed. The system is left to evolve for some time $T_{ss}$ to reach a steady state $\rho_{ss}(s)$ which can then be used as memory. Different initial strings can lead to different steady states as exemplified in Fig. 12. Fig. 12 shows the fidelity between $\rho_{ss}(s)$ and $\rho_{ss}(s^{\prime})$ defined by the trace norm

F(\rho_{ss}(s),\rho_{ss}(s^{\prime}))=\left(\text{Tr}\sqrt{\sqrt{\rho_{ss}(s)}\rho_{ss}(s^{\prime})\sqrt{\rho_{ss}(s^{\prime})}}\right)^{2}.

(C.44)

The red arrows in Fig. 12 indicate the different memory states obtained by this scheme. It’s worth noting that this scheme offers us an empirical number of memories $N^{e}_{m}$ that scales at most as $\phi^{N}$ , where $\phi\approx 1.62$ is the Golden ratio since that’s the number of basis states respecting the Rydberg blockade. We see that $N^{e}_{m}>N$ in all instances, a bound unattainable by classical RNNs such as the Hopfield network [66]. However, this scheme relies on an efficient way to recognize the different memory states through measurements, a question that we leave for future investigations.

Appendix D Experimental values, and numerical simulations

In this section, we outline the details of the experimental values used for the numerical simulation of Sec. V. Firstly, for simulating Rydberg atoms we use the experimental values in Ref. [38] for concreteness ( see Fig. 13). In this experimental platform, a two-photon transition couples $|g\rangle=|5S_{1/2}\rangle$ and $|r\rangle=|50S_{1/2}\rangle$ via an off-resonance state $|6P_{3/2}\rangle$ . For this setup, and for short periods of simulation ( $<10\text{ }\mu\text{s}$ ), the dominant source of decoherence is photon-scattering processes out of the intermediate state. Using the fact that the intermediate state is off-resonance, we can adiabatically eliminate it to produce an effective decay operator (see Sec. IV.B in Ref. [69])

\sigma^{-}_{eff}=\frac{\sqrt{\gamma_{420}}}{2\delta}|g\rangle\left(\Omega_{420}\langle g|+\Omega_{1013}\langle r|\right)

(C.45)

which is an effective spontaneous emission from $|r\rangle$ to $|g\rangle$ accompanied by decoherence on the ground state.

We chose $\Omega=4.2\text{ MHz}$ . Additionally, a pair of $|r\rangle$ atoms interact with a strength $C_{6}=862.9$ GHz $(\mu\text{m})^{6}$ . We used the PairInteraction python package from [58] to determine that a pair of $|r\rangle=|70S_{1/2}\rangle$ and $|r^{\prime}\rangle=|73S_{1/2}\rangle$ has a similar interaction strength of $C_{6}^{rr^{\prime}}=-836.6\text{ GHz}(\mu\text{m})^{6}\approx-C_{6}$ . We used this interaction to model the inhibitory and excitatory neurons in Sec. V.1 ( $V_{n_{Q},n_{Q}}=V$ , $V_{n_{Q},n_{Q}^{\prime}}=-V$ ). We denote $V=C_{6}/a^{6}_{0}$ where $a_{0}$ is tuned to give us different nearest neighbor interaction strengths.

Next, we explain and report the numerical parameters chosen for each of the biological tasks.

D.1 Multitasking

Our scheme to encode inhibitory and excitatory neurons relies on approximating (13), and as a result, one needs the “inhibitory neurons” to be as far away as possible from each other such that they do not interact positively with each other. For this reason, this task uses a 1D open chain of atoms separated by a distance $a_{0}$ with the inhibitory neurons being at opposite ends of the chain and in the bulk with maximum spacing from each other. The input neurons are chosen to be the two at one end of the chain, while the output neuron is chosen to be at the opposite end of the chain. This choice was made to ensure that the input neurons interact with the whole chain before readout.

The inputs are uniformly sampled from $\{0,2\pi\}$ MHz with added Gaussian noise $\sigma_{in}=0.1$ , and all $\Delta t$ sampled from a Gaussian with average $\langle\Delta t\rangle\in[0,5]$ $(\mu$ s) and standard deviation $\sigma_{in}$ . For each size of the network and number of inhibitory neurons, We choose $a_{0}$ such that the separation between inhibitory neurons $d_{max}$ results in $V/d_{max}^{2}=10^{-2}$ . For example, for the case of 4 neurons and two inhibitory neurons on either end, note that one needs $V/3^{6}=10^{-2}$ , which amounts to choosing $V=7.2$ MHz. Note that this value of $V$ is of the order of magnitude of $\Omega=4.2$ MHz and so the reservoir, in this case, is well into the non-classical regime.

The learned parameters in the output linear map $W^{\text{out}}$ which in this case is a matrix in $\mathbb{R}^{3+1\times 1}$ with the last row representing a bias term. Note that the dimension of the map is so because only one neuron is measured but three functions have to be fitted.

D.2 Decision making

In classical RNNs tasks such as decision making and working memory require connectivity between all neurons. Since our connectivity is limited by physical constraints, an open 2D square lattice structure was chosen to prevent neurons from being isolated from the rest. Moreover, a 2D square lattice is experimentally friendly. In our case, we use an open $2\times 3$ lattice with the two input neurons being at the top-left corner of the chain, and two output neurons being at the bottom-right corner. Again, this architecture was chosen so that the input neurons have to interact with the rest of the system before readout. We use $V=2\pi\times 10$ MHz for our simulations, and choose $\Delta t=2\pi/V$ as the time the inputs are turned on as that’s the timescale in which the input atoms entangle with the rest of the chain.

The inputs are uniformly sampled from $\{0,\pi/2,\pi,3\pi/2,2\pi\}$ (MHz) with added Gaussian noise $\sigma_{in}=0.1$ . In this task, the time that the stimuli are turned on $\Delta t$ is fixed to a mean of $\langle\Delta t\rangle=0.1$ $\mu$ s and with added Gaussian noise $\sigma_{in}=0.1$ . In this task, we optimize over the linear output map $W^{\text{out}}$ , a matrix in $\mathbb{R}^{1+1,2}$ since one function is fitted and two neurons are measured. Additionally, we train the output time $t_{out}$ after the stimuli are turned off and before the network is probed to come up with an input that is satisfied (15). To do the optimization, we make use of the Nelder-Mead algorithm [70].

In order to compute the psychometric response plotted in Fig. 6B, we measure the expectation values on the two output neurons and produce the vector $\bm{r}(\Delta^{in}_{1},\Delta^{in}_{2})=(\langle\sigma_{out1}^{y}\rangle,\langle\sigma_{out2}^{y}\rangle,1)$ which depends on the inputs $\Delta^{in}_{1,2}$ , as well as the temporal parameters $(\Delta t,t_{out})$ . We then compute $y^{\text{out}}(\Delta^{in}_{1,2})=W^{\text{out}}\cdot\bm{r}(\Delta^{in}_{1,2})$ and $(W^{\text{out}},t_{out})$ are optimized such that $y^{\text{out}}(\Delta^{in}_{1,2})\approx y^{targ}$ in 15. The optimization is done by generating about 40,000 different values of $\Delta_{1,2}^{in}$ of different levels of contrast $|\Delta^{in}_{1}-\Delta^{in}_{2}|$ ranging from 0 to 1 MHz. Once the optimization is done, we look at the loss towards $\Delta_{2}^{in}$ , which is obtained as the error in classifying $\Delta_{2}^{in}$ as greater than $\Delta_{1}^{in}$ when indeed $\Delta_{2}^{in}>\Delta_{1}^{in}$ . The error is quantified using the mean-square loss in (4).

D.3 Working memory

This task’s setup is identical to the decision-making task except that the two inputs are separated by a delay time $t_{delay}$ . The values of the interaction strength $V$ used for Fig. 7 are $V=2\pi\times 10$ MHz and $V=2\pi\times 0.1$ MHz corresponding to $V/\Omega>1$ and $V/\Omega<1$ respectively. The former of which sets us in the Rydberg blockaded regime while the latter is not. In this task, the times $\Delta t$ and $t_{delay}$ are fixed up to an added Gaussian with noise $\sigma_{in}=0.1$ . In this task, we optimize over the linear output map $W^{\text{out}}$ , a matrix in $\mathbb{R}^{1+1,2}$ since one function is fitted and two neurons are measured.

D.4 Long-term memory

Although quantum scars are known to exist in other geometries and dimensions [71], for this task we use a 1D chain of Rydberg atoms since for this case quantum many-body scars have been experimentally observed [35, 46]. Furthermore, our chain has periodic boundary conditions to avoid edge effects. Since we know that scars are robust to decoherence, we set $\gamma_{420}=0$ so that we can evolve our states for longer periods. The number of cycles $n$ in Fig. 9 corresponds to $n$ evolutions under the PXP Hamiltonian for a time $2\tau=1.51\times\pi\Omega^{-1}$ . In this case, we take $V\gg\Omega$ and renormalized $\Omega=1$ . The noisy field in (16) is sampled according to $\epsilon_{k}\sim N(\mu=0.1,\sigma=0.1)$ . The input $m$ is sampled as a fair random coin. Lastly, after each number of cycles $n$ , the only trained parameter is $W_{n}^{\text{out}}\in\mathbb{R}^{1+1\times 1}$ since only one atom is probed to calculate an answer as to the input $m$ .

	$\displaystyle H^{+}\|+\rangle$	$\displaystyle\approx N\|+\rangle,\quad$	$\displaystyle H^{-}\|+\rangle\approx 0,$		(20)
	$\displaystyle H^{+}\|-\rangle$	$\displaystyle\approx N\|+\rangle,\quad$	$\displaystyle H^{-}\|-\rangle\approx 0,$		(21)