Realistic quantum photonic neural networks

Jacob Ewaniuk [email protected] Department of Physics, Engineering Physics & Astronomy, 64 Bader Lane, Queen’s University, Kingston, Ontario, Canada K7L 3N6 Jacques Carolan Wolfson Institute for Biomedical Research, University College London, London WC1E 6BT, UK Bhavin J. Shastri Department of Physics, Engineering Physics & Astronomy, 64 Bader Lane, Queen’s University, Kingston, Ontario, Canada K7L 3N6 Vector Institute, Toronto, Ontario, Canada, M5G 1M1 Nir Rotenberg Department of Physics, Engineering Physics & Astronomy, 64 Bader Lane, Queen’s University, Kingston, Ontario, Canada K7L 3N6

Abstract

Quantum photonic neural networks are variational photonic circuits that can be trained to implement high-fidelity quantum operations. However, work-to-date has assumed idealized components, including a perfect $\pi$ Kerr nonlinearity. Here, we investigate the limitations of realistic quantum photonic neural networks that suffer from fabrication imperfections leading to photon loss and imperfect routing, and weak nonlinearities, showing that they can learn to overcome most of these errors. Using the example of a Bell-state analyzer, we demonstrate that there is an optimal network size, which balances imperfections versus the ability to compensate for lacking nonlinearities. With a sub-optimal $\pi/10$ effective Kerr nonlinearity, we show that a network fabricated with current state-of-the-art processes can achieve an unconditional fidelity of 0.891, that increases to 0.999999 if it is possible to precondition success on the detection of a photon in each logical photonic qubit. Our results provide a guide to the construction of viable, brain-inspired quantum photonic devices for emerging quantum technologies.

I Introduction

Quantum neural networks, brain-inspired quantum circuits, harness artificial intelligence to enhance quantum information processing. When driven with light, quantum photonic neural networks (QPNNs) leverage the strengths of mature photonic platforms [1], including multiplexing, low latency, and ultra-low operational powers already being exploited by conventional neural networks [2] and linear-optical quantum processors [3]. This allows QPNNs to perform quantum state tomography [4], act as quantum simulators [5, 6], process [7] or reduce the noise [8] of quantum states, or speed up tasks normally carried out by classical neural networks, such as image recognition [9] and natural language processing [10].

An example of a QPNN circuit, a two-layer network trained to act as a Bell-state analyzer (BSA), is shown in Fig. 1a.

Refer to caption — Figure 1: A realistic QPNN-based BSA. (a) An exemplary two-layer QPNN consisting of meshes $\left(\mathbf{U}\right)$ of parameterized Mach-Zehnder interferometers (inset) separated by single-site nonlinearities $\left(\boldsymbol{\Sigma}\right)$ . The network features two dual-rail encoded qubits, one where a single photon occupies the upper two spatial modes, the other in the lower two modes. Here, realistic losses (0.3 dB/cm) and errors in the $50:50$ directional couplers (5.08%), as well as a weak $\pi/4$ nonlinear phase shift are assumed. The network was trained to act as a BSA according to the truth table shown, with a resultant unconditional fidelity of 0.825. As an example, the network is colored to portray the propagation of the photons through the network when the $\left|\Phi^{+}\right\rangle$ Bell-state is incident. The colors represent the probabilities that there are zero, one, or two photons in each spatial mode at each part of the network (colorbar), showing the evolution of the state as it propagates through the circuit. For this example, there is an 82.5% chance of measuring the correct $\left|00\right\rangle$ target state. (b) Probabilities of measuring a state $\left|\psi_{\mathrm{out}}\right\rangle$ when a state $\left|\psi_{\mathrm{in}}\right\rangle$ is fed into the network shown in (a). (c) Comparison between the success rates of ideal linear-optical and realistic QPNN-based BSAs when up to ten are operated in series. The linear-optical BSA has a maximal unconditional fidelity of 0.5 [11] and is compared to realistic QPNNs with varying amounts of layers and effective nonlinear phase shifts ( $\varphi$ ), as explained in the main text. The purple marker highlights the success rate for the network shown in (a).

Here, the connectivity and activation function for the network are provided by linear, rectangular interferometer meshes $\left(\mathbf{U}\right)$ [12] and single-site optical nonlinearities $\left(\boldsymbol{\Sigma}\right)$ , respectively. A BSA can distinguish between, or create, all four highly entangled Bell-states, and the addition of this nonlinearity ideally increases the success probability of the circuit to unity [7] from 0.5 as possible solely with linear optics in the absence of ancillary photons [11]. The operation performed by the QPNN in this example is therefore crucial to entanglement swapping [13] and hence provides a route toward a deterministic quantum repeater node [14], a vital component of a future quantum internet [15].

Quantum photonic circuits are not ideal, and here we report on the performance of realistic, imperfect QPNNs. Specifically, we consider how propagation losses and imperfect optical nonlinearities affect the fidelity of the QPNN, using the example of a BSA to benchmark our results. As shown in Figs. 1b and c, we find that even realistic networks with weak nonlinearities can vastly outperform those based on (ideal) linear optics. In Fig. 1c, we observe that using state-of-the-art waveguide fabrication (as described in Sec. II) and a perfect $\left(\pi\right)$ two-photon nonlinearity, 10 BSA nodes made from 2-layer QPNNs can be applied in series with a success rate of $72\%$ . Surprisingly, this rate is only decreased to $61\%$ , for a much weaker $\left(\pi/4\right)$ two-photon nonlinearity if a third (lossy) layer is added. Moreover, if each operation is conditioned on the measurement of two photons, the conditional success rate of 10 nodes becomes $99.99999\%$ and $99.9\%$ for $\pi$ and $\pi/4$ , respectively. In what follows, we unravel the dependence on loss, effective nonlinearity, and network size, providing a methodology for the design of optimal QPNNs with realistic components.

II Network Architecture & Nonidealities

The architecture of a realistic QPNN is the same as that of an ideal network, and is thus designed to operate on dual-rail encoded photonic qubits [7]. Each layer consists of a mesh of tunable Mach-Zehnder interferometers (MZIs) with two controllable phase shifters $\left(\phi,\theta\right)$ , as shown in the inset to Fig. 1a. The interferometer mesh can be programmed to perform any arbitrary linear unitary transformation $\mathbf{U}\left(\boldsymbol{\phi},\boldsymbol{\theta}\right)$ on the spatial modes of the photons [12]. Single-site nonlinearities, of strength $\varphi$ , are placed between consecutive layers. In the Supplementary Information S1, we provide further details on the construction of the system transfer function.

The components of linear photonic networks are not perfect, and various techniques have been developed to mitigate the effects of these imperfections. Specifically, both imperfect splitting ratios of the directional couplers (DCs) that form the MZIs and imperfectly calibrated phase shifters lead to errors that can be mitigated by optimizing the circuits after fabrication [16, 17, 18]. In contrast, we account for these errors and those due to imbalanced photon loss or imperfect nonlinearities by training the variational parameters $\left\{\boldsymbol{\phi}_{i},\boldsymbol{\theta}_{i}\right\}$ in situ, as would be done on-chip, post-fabrication.

We model a realistic linear mesh by allowing each element to suffer from slightly different imperfections, resulting in unbalanced, photon-path-dependent errors. We define the transmittance of each DC as $t$ , randomly selected from a normal distribution with a mean of 50% and standard deviation of 5.08%, matching experimental results of a broadband DC fabricated for silicon-on-insulator (SOI) platforms [19]. Likewise, propagation losses, where photons are scattered out of the circuit due to, for example, surface roughness, or are absorbed by the waveguides, are parameterized by

\alpha=1-10^{-\frac{\alpha_{\mathrm{WG}}\ell}{10}},

(1)

for an element of length $\ell$ and propagation losses per unit length $\alpha_{\mathrm{WG}}$ . These losses depend on the platform upon which the photonic circuit is constructed [20, 21, 22, 23, 24, 25, 26], which additionally determines the size of each photonic element. In our analysis, we select $\alpha_{\mathrm{WG}}$ from a normal distribution with a standard deviation of $6.67\%$ of the mean, as is the case for current state-of-the-art photonic circuits built on SOI, which suffer from $\alpha_{\mathrm{WG}}=0.3\pm 0.02$ dB/cm at 1550 nm [26]. More information on the inclusion of fabrication imperfections can be found in Sec. VI and the Supplementary Information S1.

In our architecture, as in previous realizations [7], a Kerr nonlinearity, resolved in the Fock basis as,

\boldsymbol{\Sigma}\left(\varphi\right)=\sum_{n}\exp\left[in(n-1)\frac{\varphi}{2}\right]\left|n\right\rangle\left\langle n\right|,

(2)

is assumed, wherein the ideal case $\varphi=\pi$ such that a single photon passing through will experience no phase change while two photons will undergo a $\pi$ phase change. To date, a $\pi$ Kerr-nonlinearity has yet to be observed at the single-photon level; however, this efficiency has been reached by other nonlinearities, such as those based on electromagnetically induced transparency [27] or the saturation [28] of atoms. While these enable neural networks capable of quantum state tomography [29] or image recognition [30] respectively, neither are scalable nor compatible with most quantum information processing applications since they lead to photon loss. It is, however, likely that in the near future, efficiencies approaching $\pi$ will be demonstrated, either through the coherent chiral scattering of photons from single quantum emitters [31], or using integrated nanophotonic cavities designed to address this need [32, 33, 34]. Hence, we consider single-site Kerr nonlinearities but examine the performance of the QPNN also in the realistic scenario where $\varphi\lesssim\pi$ .

In sum, the total transfer function for an $L$ -layer QPNN is given by,

\mathbf{S}=\mathbf{U}\left(\boldsymbol{\phi}_{L},\boldsymbol{\theta}_{L}\right)\cdot\prod_{i=1}^{L-1}\boldsymbol{\Sigma}\left(\varphi\right)\cdot\mathbf{U}\left(\boldsymbol{\phi}_{i},\boldsymbol{\theta}_{i}\right).

(3)

This transfer function will act on any input state to produce an actual output state $\left|\psi_{\mathrm{out,act}}^{(i)}\right\rangle=\mathbf{S}\left|\psi_{\mathrm{in}}^{(i)}\right\rangle$ , which is compared to the ideal output state $\left|\psi_{\mathrm{out}}^{(i)}\right\rangle$ to determine the unconditional fidelity for that input-output pair,

\mathcal{F}^{(\mathrm{unc})}_{i}=\left|\left\langle\psi_{\mathrm{out}}^{(i)}\right|\mathbf{S}\left|\psi_{\mathrm{in}}^{(i)}\right\rangle\right|^{2}.

(4)

The total unconditional fidelity, the chance that the network provides the correct output state for any given input state without preconditions, is then found by averaging over all $K$ input-output pairs according to

\mathcal{F}^{(\mathrm{unc})}=\frac{1}{K}\sum_{i=1}^{K}\mathcal{F}^{(\mathrm{unc})}_{i}.

(5)

Conversely, we can calculate the unconditional infidelity (i.e. network error) according to $\mathcal{C}^{(\mathrm{unc})}=1-\mathcal{F}^{(\mathrm{unc})}$ , which we minimize in training the QPNN, using the local gradient-free BOBYQA nonlinear optimization algorithm [35], as available in the NLopt library [36]. Further details on network optimization are provided in Sec. VI.

The success of the network may be conditioned on the detection of photons only in ports which abide by the dual-rail encoding scheme, as in this case we know that a logical output was produced. For the BSA shown in Fig. 1a, this means that a single photon was detected in one of the top two modes and the other in the bottom two modes. We call this the conditional fidelity $\mathcal{F}^{(\mathrm{con})}$ , each $i^{\mathrm{th}}$ term of which is related to the unconditional fidelity by the probability that the network operation results in a computational basis state (that is, no photons are lost and each logical photonic qubit contains a single photon at the output) $\mathcal{P}^{(cb)}$ by,

\mathcal{F}_{i}^{(\mathrm{unc})}=\mathcal{F}_{i}^{(\mathrm{con})}\mathcal{P}_{i}^{(\mathrm{cb})}.

(6)

In Sec. VI, we provide the expressions used to calculate these conditional measures. For a network operation that requires detection, such as the BSA and not, for example, the realization of a quantum logic gate for quantum computation, $\mathcal{P}^{(\mathrm{cb})}$ gives the probability that the network successfully performed its task, while $\mathcal{F}^{(\mathrm{con})}$ provides the quality of the result; in practice, one may optimize on either the conditional or unconditional infidelities, depending on the task under consideration. In the following, we train solely on the unconditional, however, we provide results from optimizing the conditional infidelity in the Supplementary Information S2.

III Correcting Imperfect Linear Interferometer Meshes

We begin to consider the effects of imperfections on QPNNs by holding the nonlinearity at the ideal value ( $\varphi=\pi$ ) but introducing DC splitting ratio variations and photon loss as described above. The resultant unconditional infidelity of a BSA for 2 to 6-layer QPNNs with losses ranging from 0.001 to 3 dB/cm, as a function of training iterations, is shown in Figs. 2a-c.

Each case is repeated 50 times, resulting in plateaus of $\mathcal{C}^{(\mathrm{unc})}$ that increase in value for increasing losses, as expected. Interestingly, for 2-layer BSAs, we observe a large spread in the final $\mathcal{C}^{(\mathrm{unc})}$ , particularly for low-loss networks (c.f. blue and purple curves in Fig. 2a), indicating that the final performance of the QPNN is largely dictated by imbalance due to imperfect DCs. Adding more layers to the network, as in Fig. 2b and c, reduces this spread, showing that larger QPNNs may learn to correct for these errors and more often reach optimal performance.

This is reflected in Figs. 2d-f, which show the unconditional fidelity $\mathcal{F}^{(\mathrm{unc})}$ as a function of waveguide loss for different sized networks. Here, we compare the in situ trained QPNNs of Figs. 2a-c, denoted by the symbols, to the case where the network is trained offline. Offline training means that a perfect network was trained, then losses and DC errors were subsequently added to the solution. This was repeated 50 times for each $\alpha_{\mathrm{WG}}$ , selecting different random imperfections at each repetition, with the mean given by the solid-black curve and standard deviation by the grey region. When trained in situ, the QPNN learns to overcome these imperfections as is seen by the convergence toward loss-limited performance (see Sec. VI for more information on network training and the loss limit). This is more apparent for larger networks, where the fidelity of those trained offline significantly reduces due to increased losses and DC errors, while those trained in situ maintain and even increase $\mathcal{F}^{(\mathrm{unc})}$ .

The balance between fabrication imperfections and network size, as a function of losses, is summarized in Fig. 2g. Here, we observe that for state-of-the-art losses $\left(\alpha_{\mathrm{WG}}=0.3\text{ dB}/\text{cm}\right)$ or worse, the unconditional fidelity decreases as expected when more layers are added to the network. When $\alpha_{\mathrm{WG}}=0.3\text{ dB}/\text{cm}$ , $\mathcal{F}^{(\mathrm{unc})}\geq 0.905$ (0.904), where the bracketed result is the lower bound of the 95% confidence interval, even for a 6-layer QPNN, demonstrating that high-efficiency performance is possible on realistic state-of-the-art systems. Conversely, a more complex evolution is seen in Fig. 2g for 0.01 dB/cm losses or less, where there exists an optimal network size other than 2 layers. Losses at 0.01 dB/cm are similar to those of the silicon nitride platform for 1550 nm, reported as low as 0.007 dB/cm [37], but more typically near 0.01 dB/cm [38, 39, 40]. In this low-loss case $\left(\alpha_{\mathrm{WG}}=0.01\text{ dB}/\text{cm}\right)$ , $\mathcal{F}^{(\mathrm{unc})}$ first increases from 0.993 (0.949) to 0.998 (0.997) by adding two layers to the base size, as the network is better able to account for imperfections. The unconditional fidelity then only slightly decreases to 0.996 (0.996) as the network grows to 7 layers. That is, near-deterministic QPNN-based quantum elements such as BSAs will be realistic in the near-future as platform losses continue to decrease.

The situation is even more promising if the success of the network is preconditioned on detection in the computational basis, as is shown in Figs. 2h and i. Here we present $\mathcal{F}^{(\mathrm{con})}$ and $\mathcal{P}^{(\mathrm{cb})}$ for different sized QPNNs, and for differing $\alpha_{\mathrm{WG}}$ . Even for the extremely lossy networks, where $\alpha_{\mathrm{WG}}=2\text{ dB}/\text{cm}$ , $\mathcal{F}^{(\mathrm{con})}$ remains above 0.9999 (0.9888) for all $L\leq 7$ , while for state-of-the-art losses this conditional fidelity does not drop below 0.999999 (0.999784) for $3\leq L\leq 7$ , as we observe in Fig. 2h. In fact, as shown in Fig. 2i, it is mainly the rate at which the QPNN produces a logical output that is affected by an increase in network size, showing the potential of even lossy networks if fault-tolerant protocols are used.

IV Embracing Weak Nonlinear Interactions

Having studied the effect of fabrication errors on network performance, we now turn to the consequences of sub-optimal nonlinearities. Assuming state-of-the-art losses $\left(\alpha_{\mathrm{WG}}=0.3\text{ dB}/\text{cm}\right)$ , we vary the effective nonlinear phase shift $\varphi$ from the ideal $\pi$ to $\pi/100$ and attempt to train QPNNs of different sizes to act as BSAs (see the Supplementary Information S4 for exemplary training traces, c.f. Figs. 2a-c.) For each network size and effective nonlinearity, we attempt to train 200 QPNNs, showing the results in Fig. 3.

Figs. 3a-c depict the highly non-trivial dependence of $\mathcal{F}^{(\mathrm{unc})}$ on the effective nonlinearity $\varphi$ . When the QPNN is trained offline, $\mathcal{F}^{(\mathrm{unc})}$ increases monotonically with $\varphi$ , as would be the case for a quantum-optical Fredkin gate-based BSA [41, 42, 43] (see the Supplementary Information S4 for additional information). Conversely, a QPNN can be trained to account for the weak nonlinearity, in which case it can vastly outperform this expectation. Considering a 2-layer network (Fig. 3a), we observe a strikingly different $\varphi$ dependence when comparing the best-case in situ trained networks (triangles; see the Supplementary Information S3 for statistical analysis information) to both the average of all successful in situ training cases (circles) and networks trained offline. We observe that networks trained in situ can reach the loss limit with sub-optimal nonlinearities, in addition to fabrication imperfections. Specifically, we observe optimal performance of 2-layer QPNNs when $\varphi=\pi/2$ in addition to $\pi$ . Moreover, as can be seen in Fig. 3a, near-optimal performance is reached for a domain of $\varphi$ centred at $\pi/2$ , providing a pathway to robust QPNN-based BSAs without the need for a perfect Kerr nonlinearity. It must be noted, however, that operating with weaker nonlinearities decreases the probability that the QPNN converges at the loss limit during training, as is shown in the Supplementary Information S4.

For all $\varphi$ , a QPNN trained in situ learns how to account for weak nonlinearities and thus approach the loss limit. These capabilities improve as redundancies are added via an increase in network size, as visually evident across Figs. 3a-c and summarized in d. By adding a single additional (lossy) layer, QPNNs were trained to within $1.13\%$ of the unconditional fidelity achieved with the ideal nonlinearity, 0.951 (0.950) at $\varphi=\pi$ , and within $1.15\%$ of the loss limit, 0.952, for all $\varphi\geq\pi/4$ . Even networks with nonlinearities as weak as $\varphi=\pi/10$ approach the loss limit at 6 layers, in contrast to the case of $\pi/100$ where $\mathcal{F}^{(\mathrm{unc})}$ increases only to 0.528 (0.528) at 7 layers from 0.499 (0.498) at 2 layers, essentially acting as a linear-optical BSA [11]. Trainability also improves with increased network size, as it becomes easier for the QPNN to find optimal solutions, such that the average unconditional fidelity achieved during in situ training approaches the maximum plateau.

In Figs. 3e and f, the conditional fidelity and computational basis probability are shown as a function of $L\leq 7$ , for differing $\varphi$ . In contrast to the case where photon losses were varied (c.f. Fig. 2), we observe that the behavior of $\mathcal{F}^{(\mathrm{con})}$ strongly depends on $\varphi$ . While QPNNs with $\varphi=\pi$ and $\pi/2$ operate with $\mathcal{F}^{(\mathrm{con})}\geq 0.9999\left(0.9998\right)$ for all $L\leq 7$ , networks with nonlinearities near $\pi/4$ and $3\pi/4$ require at least 3 layers to reach this level, while at $\pi/10$ , 6 layers are needed. For all nonlinearities and network sizes, $\mathcal{P}^{(\mathrm{cb})}$ is within 0.009 (0.031) of loss-limited performance, as seen in Fig. 3f, and as expected for a QPNN suffering from state-of-the-art losses (c.f. Fig. 2d-f and i). Altogether, this demonstrates that for each combination of fabrication imperfections and effective nonlinearity, there exists an optimal network size that maximizes $\mathcal{F}^{(\mathrm{unc})}$ . While adding layers will always tend to increase $\mathcal{F}^{(\mathrm{con})}$ , a balance must be struck with the exponential decrease in $\mathcal{P}^{(\mathrm{cb})}$ . In the Supplementary Information S5, we demonstrate a QPNN trained to generate Greenberger-Horne-Zeilinger states, and show that this remains true beyond the BSA application.

V Discussion

We have shown that high-fidelity operation is possible in realistic quantum photonic neural networks based on non-ideal Kerr nonlinearities. Since propagation through these networks leads to inevitable photon loss, their unconditional fidelity ceiling tends to decrease with increasing size. While this loss limit is unavoidable, these networks are able to learn to manage additional errors from non-uniform losses and directional coupler splitting ratio variations, often demonstrating increased fidelity with the addition of imperfect layers. Crucially, we have shown that weak nonlinearities, which are mere fractions of the ideal, are sufficient for near-optimal network performance. Even as these sub-optimal nonlinearities are realized [31, 32], the desired phase change will likely be accompanied by wave-packet distortions [44, 45], and although complex solutions based on dynamically-coupled cavities have been proposed [33, 34], it remains an open question if, instead, a QPNN may learn to overcome them in much the same way it does fabrication imperfections. Already in the work presented here, QPNNs offer a fascinating view of the intricate balance between loss, imperfect photon routing and weak nonlinearity, which we have unravelled to demonstrate how each combination leads to an optimal network geometry. Understanding and respecting this balance will be important, in the near future, as realistic QPNNs are designed and fabricated.

It is now clear why QPNNs far outperform linear-optical networks. Even with a weak $\pi/4$ effective nonlinearity, they can learn to surpass the 0.5 unconditional fidelity possible with perfect linear optics [11], achieving $\mathcal{F}^{(\mathrm{unc})}=0.820$ (0.809) at 2 layers (see Figs. 1b and 3a), which grows to 0.951 (0.949) with an additional layer (see Fig. 3b). At 6 layers, loss-limited operation, $\mathcal{F}^{(\mathrm{unc})}=0.891$ (0.890), can be achieved with nonlinearities as weak as $\pi/10$ . Returning to Fig. 1c, which summarizes the success rate of operating $N$ BSAs in series, as would be necessary to connect quantum repeater nodes by entanglement swapping [13, 14], the performance benefits offered by QPNNs become more apparent. While 10 consecutive perfect linear-optical BSAs have a success rate of just 0.1%, 6-layer, $\pi/10$ nonlinearity QPNNs reach 31.5%, and 3-layer, $\pi/4$ networks achieve 60.5%.

Preconditioning the success of each QPNN-based BSA on the detection of 2 photons in the correct ports, as would be the case for generating cluster states from fusion gates [46], allows the much higher conditional fidelities to be leveraged. While $\mathcal{F}^{(\mathrm{con})}$ for a perfect linear-optical ten-BSA sequence remains at 1, realistic QPNNs of $\pi/4$ (3 layers) and $\pi/10$ (6 layers) nonlinearities reach 0.999 (0.997) and 0.99999 (0.99996), respectively. Given that these conditional fidelities are all near-unity, the rather large variations to $\mathcal{F}^{(\mathrm{unc})}$ seen above can be attributed to the operational rate of the circuits (c.f. Eq. 6), which are $315\times$ improved when the perfect linear-optical BSAs are replaced by even 6-layer, $\pi/10$ QPNNs. Hence, imperfect QPNNs are likely to play a key role in emerging large-scale quantum technologies.

VI Methods

Modeling Fabrication Imperfections

An ideal MZI, as displayed in the inset to Fig. 1a, can be described by a $2\times 2$ matrix,

	$\displaystyle T^{(\mathrm{ideal})}$	$\displaystyle=\frac{1}{2}\begin{pmatrix}1&-i\\ -i&1\end{pmatrix}\begin{pmatrix}e^{i2\theta}&0\\ 0&1\end{pmatrix}\begin{pmatrix}1&i\\ i&1\end{pmatrix}\begin{pmatrix}e^{i\phi}&0\\ 0&1\end{pmatrix},$
		$\displaystyle=e^{i\theta}\begin{pmatrix}e^{i\phi}\cos{\theta}&-\sin{\theta}\\ e^{i\phi}\sin{\theta}&\cos{\theta}\end{pmatrix},$		(7)

as is commonly found in the literature [12, 17], up to the arrangement of components specified here. To model a realistic MZI, we include two types of imperfections: photon loss due to propagation and an imperfect splitting ratio of the nominally $50:50$ DCs. Imperfect phase shifter calibration is neglected as a QPNN trained in situ would intrinsically learn the phase shifts that account for these errors. A photonic element of length $\ell$ introduces the probability $\alpha$ that a photon is lost via propagation through it. By Eq. 1, $\alpha$ depends on the propagation losses per unit length, $\alpha_{\mathrm{WG}}$ , selected from a normal distribution with a width of 6.67%, corresponding to the state-of-the-art experimental results for SOI [26]. For each MZI, an individual $\alpha$ is computed, then applied through multiplication by the $2\times 2$ matrix,

\begin{pmatrix}\sqrt{1-\alpha}&0\\ 0&\sqrt{1-\alpha}\end{pmatrix}.

(8)

In the Supplementary Information S1, further details are given for the inclusion of these non-uniform losses, including the characteristic lengths ( $\ell$ ) of each photonic element, and how the lack of unitarity is dealt with in the simulations. Similarly, each imperfect DC has an individual transmittance $t$ that is taken from a normal distribution centered at 0.5 with a standard deviation of 0.0508, matching experimental results of a broadband DC fabricated for SOI platforms [19]. For a given $t$ , the corresponding $2\times 2$ transformation of the DC is,

\begin{pmatrix}\sqrt{t}&\pm i\sqrt{1-t}\\ \pm i\sqrt{1-t}&\sqrt{t}\end{pmatrix}.

(9)

Altogether, these result in a $2\times 2$ transformation describing a realistic MZI,

	$\displaystyle T^{(\mathrm{real})}$	$\displaystyle=\begin{pmatrix}\sqrt{1-\alpha}&0\\ 0&\sqrt{1-\alpha}\end{pmatrix}\begin{pmatrix}\sqrt{t_{2}}&-i\sqrt{1-t_{2}}\\ -i\sqrt{1-t_{2}}&\sqrt{t_{2}}\end{pmatrix}\begin{pmatrix}e^{i2\theta}&0\\ 0&1\end{pmatrix}\begin{pmatrix}\sqrt{t_{1}}&i\sqrt{1-t_{1}}\\ i\sqrt{1-t_{1}}&\sqrt{t_{1}}\end{pmatrix}\begin{pmatrix}e^{i\phi}&0\\ 0&1\end{pmatrix},$
		$\displaystyle=\sqrt{1-\alpha}\begin{pmatrix}\sqrt{t_{1}t_{2}}e^{i2\theta}e^{i\phi}+\sqrt{(1-t_{1})(1-t_{2})}e^{i\phi}&i\sqrt{t_{1}(1-t_{2})}e^{i2\theta}-i\sqrt{t_{2}(1-t_{1})}\\ -i\sqrt{t_{2}(1-t_{1})}e^{i2\theta}e^{i\phi}+i\sqrt{t_{1}(1-t_{2})}e^{i\phi}&\sqrt{(1-t_{1})(1-t_{2})}e^{i2\theta}+\sqrt{t_{1}t_{2}}\end{pmatrix}.$		(10)

In the Supplementary Information S1, we analyze the regimes in $\alpha_{\mathrm{WG}}$ , $L$ where the imperfect DC splitting ratios are dominant, and vice versa.

Network Optimization & Training Processes

A QPNN is trained to perform a mapping between a set of $K$ input-output state pairs $\left|\psi_{\mathrm{in}}^{(i)}\right\rangle\to$ $\left|\psi_{\mathrm{out}}^{(i)}\right\rangle$ . For the QPNN-based BSA, the training set is provided in the computational basis in Fig 1a. Since dual-rail encoding is applied, $\left|0\right\rangle$ $\left(\left|1\right\rangle\right)$ in the computational basis is equivalent to $\left|10\right>$ $\left(\left|01\right\rangle\right)$ in the Fock basis for the two spatial modes that realize the photonic qubit.

The unconditional infidelity of the network, $\mathcal{C}^{(\mathrm{unc})}=1-\mathcal{F}^{(\mathrm{unc})}$ (see Eqs. 4,5 for $\mathcal{F}^{(\mathrm{unc})}$ ), is minimized to facilitate the optimization process. The variational parameters, $\left\{\boldsymbol{\phi}_{i},\boldsymbol{\theta}_{i}\right\}$ for each layer in the network, are initialized randomly. Then, the local, gradient-free BOBYQA nonlinear optimization algorithm [35] (available from the NLopt library [36]) is applied until the absolute change in infidelity is less than some threshold chosen empirically based on the available computational resources. This algorithm constructs a quadratic approximation to the infidelity and thus does not require an analytical gradient. Gradient-free optimization was deemed pertinent since it is unlikely that the internal state of the network, as would be necessary for backpropagation methods, would be accessible during in situ training [7].

In contrast to in situ training, as described in the main text, offline training was conducted by training a QPNN with idealized components, then adding fabrication imperfections, and if necessary, adjusting the effective nonlinearity (c.f. Sec. IV). Due to the loss and DC splitting ratio variations, such imperfections were added to an idealized solution in 50 (200) repetitions in Fig. 2 (3), matching the number of in situ trials conducted. From these results, an in situ trial was deemed successful if it achieved an optimized unconditional infidelity at or below the worst-case of offline training (mean minus standard deviation). Only successful optimization trials were considered for further analysis. Similarly, the loss limit is computed by adding imperfections to an idealized solution, however, losses are assumed to be completely uniform at $\alpha_{\mathrm{WG}}$ , and the DC splitting ratios are all $50:50$ .

All simulations were conducted on the Frontenac Platform computing cluster offered by the Centre for Advanced Computing at Queen’s University. The accompanying code was written in Python (version 3.10.2) using Numpy (version 1.22.2) and NLopt (version 2.6.1). Cython (version 0.29.30) was used to translate performance-sensitive operations to C to improve computation runtime. In the Supplementary Information S1, we identify where computational complexity arises when constructing the system transfer function.

Conditional Measures

As for the unconditional fidelity, the conditional fidelity can be found by projecting the actual output state, $\left|\psi_{\mathrm{out,act}}^{(i)}\right\rangle=\mathbf{S}\left|\psi_{\mathrm{in}}^{(i)}\right\rangle$ , onto the computational basis, $\mathrm{CB}$ , and finding its overlap with the ideal output $\left|\psi_{\mathrm{out}}^{(i)}\right\rangle$ . Averaging over all $K$ input-output pairs, this is written as,

\mathcal{F}^{(\mathrm{con})}=\frac{1}{K}\sum_{i=1}^{K}\left|\left\langle\psi_{\mathrm{out}}^{(i)}\right|A^{(i)}\mathbf{S}\left|\psi_{\mathrm{in}}^{(i)}\right\rangle\right|^{2},

(11)

where,

A^{(i)}=\left[\sum_{\left|x\right\rangle\in\mathrm{CB}}\left|\bigl{\langle}x\bigr{|}\mathbf{S}\bigl{|}\psi_{\mathrm{in}}^{(i)}\bigr{\rangle}\right|^{2}\right]^{-\frac{1}{2}},

(12)

normalizes the $i^{\mathrm{th}}$ $\mathbf{S}\left|\psi_{\mathrm{in}}^{(i)}\right\rangle$ to the computational basis. Similarly, the probability of measuring an output in the computational basis is

\mathcal{P}^{(\mathrm{cb})}=\frac{1}{K}\sum_{i=1}^{K}\sum_{\left|x\right\rangle\in\mathrm{CB}}\left|\bigl{\langle}x\bigr{|}\mathbf{S}\bigl{|}\psi_{\mathrm{in}}^{(i)}\bigr{\rangle}\right|^{2}.

(13)

The $i^{\text{th}}$ terms of Eqs. 11, 13 can be multiplied to yield Eq. 4, which follows simply from the fact that the $i^{\text{th}}$ term of Eq. 13 can be expressed as $\left(A^{(i)}\right)^{-2}$ .

VII Acknowledgements

This research is supported by the Vector Scholarship in Artificial Intelligence, provided through the Vector Institute. The authors thank N.R.H. Pedersen for his insights into linear meshes, and gratefully acknowledge support by the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canadian Foundation for Innovation (CFI), and Queen’s University.

VIII Author Contributions

N.R. and J.C. conceived the project, which they developed along with J.E. J.E. was responsible for designing and performing all simulations and analysis, with supervision from B.S. and N.R. All authors discussed the results and shared in the writing and editing responsibilities for the manuscript.

IX Additional Information

Supplementary Information accompanies the paper.

Competing Interests: The authors declare no competing interests.

References

[1] Killoran, N. et al. Continuous-variable quantum neural networks. Phys. Rev. Research 1, 033063 (2019). URL https://link.aps.org/doi/10.1103/PhysRevResearch.1.033063.
[2] Shastri, B. J. et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photonics 15, 102–114 (2021). URL https://doi.org/10.1038/s41566-020-00754-y.
[3] Wang, J. et al. Multidimensional quantum entanglement with large-scale integrated optics. Science 360, 285–291 (2018). URL https://www.science.org/doi/abs/10.1126/science.aar7053.
[4] Torlai, G. et al. Neural-network quantum state tomography. Nature Phys 14, 447 – 450 (2018). URL https://doi.org/10.1038/s41567-018-0048-5.
[5] Aspuru-Guzik, A. & Walther, P. Photonic quantum simulators. Nature Phys 8, 285 – 291 (2012). URL https://doi.org/10.1038/nphys2253.
[6] Sparrow, C. et al. Simulating the vibrational quantum dynamics of molecules using photonics. Nature 557, 660 – 667 (2018). URL https://doi.org/10.1038/s41586-018-0152-9.
[7] Steinbrecher, G. R., Olson, J. P., Englund, D. & Carolan, J. Quantum optical neural networks. npj Quantum Inf 5, 60 (2019). URL https://doi.org/10.1038/s41534-019-0174-7.
[8] Bondarenko, D. & Feldmann, P. Quantum Autoencoders to Denoise Quantum Data. Phys. Rev. Lett. 124, 130502 (2020). URL https://link.aps.org/doi/10.1103/PhysRevLett.124.130502.
[9] Parthasarathy, R. & Bhowmik, R. T. Quantum Optical Convolutional Neural Network: A Novel Image Recognition Framework for Quantum Computing. IEEE Access 9, 103337–103346 (2021).
[10] Sipio, R. D., Huang, J.-H., Chen, S. Y.-C., Mangini, S. & Worring, M. The Dawn of Quantum Natural Language Processing. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 8612 – 8616 (2022).
[11] Calsamiglia, J. & Lütkenhaus, N. Maximum efficiency of a linear-optical bell-state analyzer. Appl Phys B 72, 67–71 (2001). URL https://doi.org/10.1007/s003400000484.
[12] Clements, W. R., Humphreys, P. C., Metcalf, B. J., Kolthammer, W. S. & Walmsley, I. A. Optimal design for universal multiport interferometers. Optica 3, 1460–1465 (2016). URL http://opg.optica.org/optica/abstract.cfm?URI=optica-3-12-1460.
[13] Żukowski, M., Zeilinger, A., Horne, M. A. & Ekert, A. K. “Event-ready-detectors” Bell experiment via entanglement swapping. Phys. Rev. Lett. 71, 4287–4290 (1993). URL https://link.aps.org/doi/10.1103/PhysRevLett.71.4287.
[14] Azuma, K., Tamaki, K. & Lo, H.-K. All-photonic quantum repeaters. Nat Commun 6, 6797 (2015). URL https://doi.org/10.1038/ncomms7787.
[15] Kimble, H. J. The quantum internet. Nature 453, 1023–1030 (2008). URL https://doi.org/10.1038/nature07127.
[16] Miller, D. A. B. Perfect optics with imperfect components. Optica 2, 747 – 750 (2015). URL http://opg.optica.org/optica/abstract.cfm?URI=optica-2-8-747.
[17] Mower, J., Harris, N. C., Steinbrecher, G. R., Lahini, Y. & Englund, D. High-fidelity quantum state evolution in imperfect photonic integrated circuits. Phys. Rev. A 92, 032322 (2015). URL https://link.aps.org/doi/10.1103/PhysRevA.92.032322.
[18] Hamerly, R., Bandyopadhyay, S. & Englund, D. Robust Zero-Change Self-Configuration of the Rectangular Mesh. In Optical Fiber Communication Conference (OFC) 2021, Tu5H.2 (2021). URL http://opg.optica.org/abstract.cfm?URI=OFC-2021-Tu5H.2.
[19] Lu, Z., Celo, D., Dumais, P., Bernier, E. & Chrostowski, L. Comparison of photonic 2×2 3-dB couplers for 220 nm silicon-on-insulator platforms. In 2015 IEEE 12th International Conference on Group IV Photonics (GFP), 57–58 (2015). URL https://ieeexplore.ieee.org/abstract/document/7305944.
[20] Ding, R. et al. A silicon platform for high-speed photonics systems. In OFC/NFOEC, 1–3 (2012).
[21] Wörhoff, K., Heideman, R. G., Leinse, A. & Hoekman, M. TriPleX: a versatile dielectric photonic platform. Advanced Optical Technologies 4, 189–207 (2015). URL https://doi.org/10.1515/aot-2015-0016.
[22] Melchiorri, M. et al. Propagation losses of silicon nitride waveguides in the near-infrared range. Applied Physics Letters 86, 121111 (2005). URL https://doi.org/10.1063/1.1889242.
[23] Cai, L., Wang, Y. & Hu, H. Low-loss waveguides in a single-crystal lithium niobate thin film. Opt. Lett. 40, 3013–3016 (2015). URL http://opg.optica.org/ol/abstract.cfm?URI=ol-40-13-3013.
[24] Nezhad, M. P., Bondarenko, O., Khajavikhan, M., Simic, A. & Fainman, Y. Etch-free low loss silicon waveguides using hydrogen silsesquioxane oxidation masks. Opt. Express 19, 18827–18832 (2011). URL http://opg.optica.org/oe/abstract.cfm?URI=oe-19-20-18827.
[25] D’Agostino, D. et al. Low-loss passive waveguides in a generic InP foundry process via local diffusion of zinc. Opt. Express 23, 25143–25157 (2015). URL http://opg.optica.org/oe/abstract.cfm?URI=oe-23-19-25143.
[26] Cardenas, J. et al. Low loss etchless silicon photonic waveguides. Opt. Express 17, 4752–4757 (2009). URL http://opg.optica.org/oe/abstract.cfm?URI=oe-17-6-4752.
[27] Zuo, Y. et al. All-optical neural network with nonlinear activation functions. Optica 6, 1132–1137 (2019). URL http://opg.optica.org/optica/abstract.cfm?URI=optica-6-9-1132.
[28] Guo, X., Barrett, T. D., Wang, Z. M. & Lvovsky, A. I. Backpropagation through nonlinear units for the all-optical training of neural networks. Photon. Res. 9, B71–B80 (2021). URL http://opg.optica.org/prj/abstract.cfm?URI=prj-9-3-B71.
[29] Zuo, Y. et al. Optical neural network quantum state tomography. Advanced Photonics 4, 1–7 (2022). URL https://doi.org/10.1117/1.AP.4.2.026004.
[30] Ryou, A. et al. Free-space optical neural network based on thermal atomic nonlinearity. Photon. Res. 9, B71–B80 (2021). URL http://opg.optica.org/prj/abstract.cfm?URI=prj-9-4-B128.
[31] Lodahl, P. et al. Chiral quantum optics. Nature 541, 473–480 (2017). URL https://doi.org/10.1038/nature21037.
[32] Choi, H., Heuck, M. & Englund, D. Self-Similar Nanocavity Design with Ultrasmall Mode Volume for Single-Photon Nonlinearities. Phys. Rev. Lett. 118, 223605 (2017). URL https://link.aps.org/doi/10.1103/PhysRevLett.118.223605.
[33] Heuck, M., Jacobs, K. & Englund, D. R. Controlled-Phase Gate Using Dynamically Coupled Cavities and Optical Nonlinearities. Phys. Rev. Lett. 124, 160501 (2020). URL https://link.aps.org/doi/10.1103/PhysRevLett.124.160501.
[34] Heuck, M., Jacobs, K. & Englund, D. R. Photon-photon interactions in dynamically coupled cavities. Phys. Rev. A 101, 042322 (2020). URL https://link.aps.org/doi/10.1103/PhysRevA.101.042322.
[35] Powell, M. J. D. The BOBYQA algorithm for bound constrained optimization without derivatives. Tech. Rep. NA06, Department of Applied Mathematics and Theoretical Physics, Cambridge, England (2009).
[36] Johnson, S. G. The NLopt nonlinear-optimization package. http://github.com/stevengj/nlopt.
[37] Bauters, J. F. et al. Ultra-low-loss single-mode Si3N4 waveguides with 0.7 dB/m propagation loss. In 2011 37th European Conference and Exhibition on Optical Communication, 1 – 3 (2011). URL https://ieeexplore.ieee.org/document/6065921.
[38] Shaw, M. J., Guo, J., Vawter, G. A., Habermehl, S. & Sullivan, C. T. Fabrication techniques for low-loss silicon nitride waveguides. In Johnson, E. G., Nordin, G. P. & Suleski, T. J. (eds.) Micromachining Technology for Micro-Optics and Nano-Optics III, vol. 5720, 109 – 118. International Society for Optics and Photonics (SPIE, 2005). URL https://doi.org/10.1117/12.588828.
[39] Blumenthal, D. J., Heideman, R., Geuzebroek, D., Leinse, A. & Roeloffzen, C. Silicon Nitride in Silicon Photonics. Proceedings of the IEEE 106, 2209 – 2231 (2018). URL https://ieeexplore.ieee.org/abstract/document/8472140.
[40] Liu, J. et al. High-yield, wafer-scale fabrication of ultralow-loss, dispersion-engineered silicon nitride photonic circuits. Nature Communications 12, 2236 (2021). URL https://doi.org/10.1038/s41467-021-21973-z.
[41] Milburn, G. J. Quantum optical Fredkin gate. Phys. Rev. Lett. 62, 2124–2127 (1989). URL https://link.aps.org/doi/10.1103/PhysRevLett.62.2124.
[42] Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge University Press, 2011).
[43] Gerry, C. C. & Knight, P. L. Introductory Quantum Optics (Cambridge University Press, 2004).
[44] Shapiro, J. H. Single-photon Kerr nonlinearities do not help quantum computation. Phys. Rev. A 73, 062305 (2006). URL https://link.aps.org/doi/10.1103/PhysRevA.73.062305.
[45] Gea-Banacloche, J. Impossibility of large phase shifts via the giant Kerr effect with single-photon wave packets. Phys. Rev. A 81, 043823 (2010). URL https://link.aps.org/doi/10.1103/PhysRevA.81.043823.
[46] Browne, D. E. & Rudolph, T. Resource-efficient linear optical quantum computation. Phys. Rev. Lett. 95, 010501 (2005). URL https://link.aps.org/doi/10.1103/PhysRevLett.95.010501.