This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Realistic quantum photonic neural networks

Jacob Ewaniuk [email protected] Department of Physics, Engineering Physics & Astronomy, 64 Bader Lane, Queen’s University, Kingston, Ontario, Canada K7L 3N6    Jacques Carolan Wolfson Institute for Biomedical Research, University College London, London WC1E 6BT, UK    Bhavin J. Shastri Department of Physics, Engineering Physics & Astronomy, 64 Bader Lane, Queen’s University, Kingston, Ontario, Canada K7L 3N6 Vector Institute, Toronto, Ontario, Canada, M5G 1M1    Nir Rotenberg Department of Physics, Engineering Physics & Astronomy, 64 Bader Lane, Queen’s University, Kingston, Ontario, Canada K7L 3N6
Abstract

Quantum photonic neural networks are variational photonic circuits that can be trained to implement high-fidelity quantum operations. However, work-to-date has assumed idealized components, including a perfect π\pi Kerr nonlinearity. Here, we investigate the limitations of realistic quantum photonic neural networks that suffer from fabrication imperfections leading to photon loss and imperfect routing, and weak nonlinearities, showing that they can learn to overcome most of these errors. Using the example of a Bell-state analyzer, we demonstrate that there is an optimal network size, which balances imperfections versus the ability to compensate for lacking nonlinearities. With a sub-optimal π/10\pi/10 effective Kerr nonlinearity, we show that a network fabricated with current state-of-the-art processes can achieve an unconditional fidelity of 0.891, that increases to 0.999999 if it is possible to precondition success on the detection of a photon in each logical photonic qubit. Our results provide a guide to the construction of viable, brain-inspired quantum photonic devices for emerging quantum technologies.

I Introduction

Quantum neural networks, brain-inspired quantum circuits, harness artificial intelligence to enhance quantum information processing. When driven with light, quantum photonic neural networks (QPNNs) leverage the strengths of mature photonic platforms [1], including multiplexing, low latency, and ultra-low operational powers already being exploited by conventional neural networks [2] and linear-optical quantum processors [3]. This allows QPNNs to perform quantum state tomography [4], act as quantum simulators [5, 6], process [7] or reduce the noise [8] of quantum states, or speed up tasks normally carried out by classical neural networks, such as image recognition [9] and natural language processing [10].

An example of a QPNN circuit, a two-layer network trained to act as a Bell-state analyzer (BSA), is shown in Fig. 1a.

Refer to caption
Figure 1: A realistic QPNN-based BSA. (a) An exemplary two-layer QPNN consisting of meshes (𝐔)\left(\mathbf{U}\right) of parameterized Mach-Zehnder interferometers (inset) separated by single-site nonlinearities (𝚺)\left(\boldsymbol{\Sigma}\right). The network features two dual-rail encoded qubits, one where a single photon occupies the upper two spatial modes, the other in the lower two modes. Here, realistic losses (0.3 dB/cm) and errors in the 50:5050:50 directional couplers (5.08%), as well as a weak π/4\pi/4 nonlinear phase shift are assumed. The network was trained to act as a BSA according to the truth table shown, with a resultant unconditional fidelity of 0.825. As an example, the network is colored to portray the propagation of the photons through the network when the |Φ+\left|\Phi^{+}\right\rangle Bell-state is incident. The colors represent the probabilities that there are zero, one, or two photons in each spatial mode at each part of the network (colorbar), showing the evolution of the state as it propagates through the circuit. For this example, there is an 82.5% chance of measuring the correct |00\left|00\right\rangle target state. (b) Probabilities of measuring a state |ψout\left|\psi_{\mathrm{out}}\right\rangle when a state |ψin\left|\psi_{\mathrm{in}}\right\rangle is fed into the network shown in (a). (c) Comparison between the success rates of ideal linear-optical and realistic QPNN-based BSAs when up to ten are operated in series. The linear-optical BSA has a maximal unconditional fidelity of 0.5 [11] and is compared to realistic QPNNs with varying amounts of layers and effective nonlinear phase shifts (φ\varphi), as explained in the main text. The purple marker highlights the success rate for the network shown in (a).

Here, the connectivity and activation function for the network are provided by linear, rectangular interferometer meshes (𝐔)\left(\mathbf{U}\right) [12] and single-site optical nonlinearities (𝚺)\left(\boldsymbol{\Sigma}\right), respectively. A BSA can distinguish between, or create, all four highly entangled Bell-states, and the addition of this nonlinearity ideally increases the success probability of the circuit to unity [7] from 0.5 as possible solely with linear optics in the absence of ancillary photons [11]. The operation performed by the QPNN in this example is therefore crucial to entanglement swapping [13] and hence provides a route toward a deterministic quantum repeater node [14], a vital component of a future quantum internet [15].

Quantum photonic circuits are not ideal, and here we report on the performance of realistic, imperfect QPNNs. Specifically, we consider how propagation losses and imperfect optical nonlinearities affect the fidelity of the QPNN, using the example of a BSA to benchmark our results. As shown in Figs. 1b and c, we find that even realistic networks with weak nonlinearities can vastly outperform those based on (ideal) linear optics. In Fig. 1c, we observe that using state-of-the-art waveguide fabrication (as described in Sec. II) and a perfect (π)\left(\pi\right) two-photon nonlinearity, 10 BSA nodes made from 2-layer QPNNs can be applied in series with a success rate of 72%72\%. Surprisingly, this rate is only decreased to 61%61\%, for a much weaker (π/4)\left(\pi/4\right) two-photon nonlinearity if a third (lossy) layer is added. Moreover, if each operation is conditioned on the measurement of two photons, the conditional success rate of 10 nodes becomes 99.99999%99.99999\% and 99.9%99.9\% for π\pi and π/4\pi/4, respectively. In what follows, we unravel the dependence on loss, effective nonlinearity, and network size, providing a methodology for the design of optimal QPNNs with realistic components.

II Network Architecture & Nonidealities

The architecture of a realistic QPNN is the same as that of an ideal network, and is thus designed to operate on dual-rail encoded photonic qubits [7]. Each layer consists of a mesh of tunable Mach-Zehnder interferometers (MZIs) with two controllable phase shifters (ϕ,θ)\left(\phi,\theta\right), as shown in the inset to Fig. 1a. The interferometer mesh can be programmed to perform any arbitrary linear unitary transformation 𝐔(ϕ,𝜽)\mathbf{U}\left(\boldsymbol{\phi},\boldsymbol{\theta}\right) on the spatial modes of the photons [12]. Single-site nonlinearities, of strength φ\varphi, are placed between consecutive layers. In the Supplementary Information S1, we provide further details on the construction of the system transfer function.

The components of linear photonic networks are not perfect, and various techniques have been developed to mitigate the effects of these imperfections. Specifically, both imperfect splitting ratios of the directional couplers (DCs) that form the MZIs and imperfectly calibrated phase shifters lead to errors that can be mitigated by optimizing the circuits after fabrication [16, 17, 18]. In contrast, we account for these errors and those due to imbalanced photon loss or imperfect nonlinearities by training the variational parameters {ϕi,𝜽i}\left\{\boldsymbol{\phi}_{i},\boldsymbol{\theta}_{i}\right\} in situ, as would be done on-chip, post-fabrication.

We model a realistic linear mesh by allowing each element to suffer from slightly different imperfections, resulting in unbalanced, photon-path-dependent errors. We define the transmittance of each DC as tt, randomly selected from a normal distribution with a mean of 50% and standard deviation of 5.08%, matching experimental results of a broadband DC fabricated for silicon-on-insulator (SOI) platforms [19]. Likewise, propagation losses, where photons are scattered out of the circuit due to, for example, surface roughness, or are absorbed by the waveguides, are parameterized by

α=110αWG10,\alpha=1-10^{-\frac{\alpha_{\mathrm{WG}}\ell}{10}}, (1)

for an element of length \ell and propagation losses per unit length αWG\alpha_{\mathrm{WG}}. These losses depend on the platform upon which the photonic circuit is constructed [20, 21, 22, 23, 24, 25, 26], which additionally determines the size of each photonic element. In our analysis, we select αWG\alpha_{\mathrm{WG}} from a normal distribution with a standard deviation of 6.67%6.67\% of the mean, as is the case for current state-of-the-art photonic circuits built on SOI, which suffer from αWG=0.3±0.02\alpha_{\mathrm{WG}}=0.3\pm 0.02 dB/cm at 1550 nm [26]. More information on the inclusion of fabrication imperfections can be found in Sec. VI and the Supplementary Information S1.

In our architecture, as in previous realizations [7], a Kerr nonlinearity, resolved in the Fock basis as,

𝚺(φ)=nexp[in(n1)φ2]|nn|,\boldsymbol{\Sigma}\left(\varphi\right)=\sum_{n}\exp\left[in(n-1)\frac{\varphi}{2}\right]\left|n\right\rangle\left\langle n\right|, (2)

is assumed, wherein the ideal case φ=π\varphi=\pi such that a single photon passing through will experience no phase change while two photons will undergo a π\pi phase change. To date, a π\pi Kerr-nonlinearity has yet to be observed at the single-photon level; however, this efficiency has been reached by other nonlinearities, such as those based on electromagnetically induced transparency [27] or the saturation [28] of atoms. While these enable neural networks capable of quantum state tomography [29] or image recognition [30] respectively, neither are scalable nor compatible with most quantum information processing applications since they lead to photon loss. It is, however, likely that in the near future, efficiencies approaching π\pi will be demonstrated, either through the coherent chiral scattering of photons from single quantum emitters [31], or using integrated nanophotonic cavities designed to address this need [32, 33, 34]. Hence, we consider single-site Kerr nonlinearities but examine the performance of the QPNN also in the realistic scenario where φπ\varphi\lesssim\pi.

In sum, the total transfer function for an LL-layer QPNN is given by,

𝐒=𝐔(ϕL,𝜽L)i=1L1𝚺(φ)𝐔(ϕi,𝜽i).\mathbf{S}=\mathbf{U}\left(\boldsymbol{\phi}_{L},\boldsymbol{\theta}_{L}\right)\cdot\prod_{i=1}^{L-1}\boldsymbol{\Sigma}\left(\varphi\right)\cdot\mathbf{U}\left(\boldsymbol{\phi}_{i},\boldsymbol{\theta}_{i}\right). (3)

This transfer function will act on any input state to produce an actual output state |ψout,act(i)=𝐒|ψin(i)\left|\psi_{\mathrm{out,act}}^{(i)}\right\rangle=\mathbf{S}\left|\psi_{\mathrm{in}}^{(i)}\right\rangle, which is compared to the ideal output state |ψout(i)\left|\psi_{\mathrm{out}}^{(i)}\right\rangle to determine the unconditional fidelity for that input-output pair,

i(unc)=|ψout(i)|𝐒|ψin(i)|2.\mathcal{F}^{(\mathrm{unc})}_{i}=\left|\left\langle\psi_{\mathrm{out}}^{(i)}\right|\mathbf{S}\left|\psi_{\mathrm{in}}^{(i)}\right\rangle\right|^{2}. (4)

The total unconditional fidelity, the chance that the network provides the correct output state for any given input state without preconditions, is then found by averaging over all KK input-output pairs according to

(unc)=1Ki=1Ki(unc).\mathcal{F}^{(\mathrm{unc})}=\frac{1}{K}\sum_{i=1}^{K}\mathcal{F}^{(\mathrm{unc})}_{i}. (5)

Conversely, we can calculate the unconditional infidelity (i.e. network error) according to 𝒞(unc)=1(unc)\mathcal{C}^{(\mathrm{unc})}=1-\mathcal{F}^{(\mathrm{unc})}, which we minimize in training the QPNN, using the local gradient-free BOBYQA nonlinear optimization algorithm [35], as available in the NLopt library [36]. Further details on network optimization are provided in Sec. VI.

The success of the network may be conditioned on the detection of photons only in ports which abide by the dual-rail encoding scheme, as in this case we know that a logical output was produced. For the BSA shown in Fig. 1a, this means that a single photon was detected in one of the top two modes and the other in the bottom two modes. We call this the conditional fidelity (con)\mathcal{F}^{(\mathrm{con})}, each ithi^{\mathrm{th}} term of which is related to the unconditional fidelity by the probability that the network operation results in a computational basis state (that is, no photons are lost and each logical photonic qubit contains a single photon at the output) 𝒫(cb)\mathcal{P}^{(cb)} by,

i(unc)=i(con)𝒫i(cb).\mathcal{F}_{i}^{(\mathrm{unc})}=\mathcal{F}_{i}^{(\mathrm{con})}\mathcal{P}_{i}^{(\mathrm{cb})}. (6)

In Sec. VI, we provide the expressions used to calculate these conditional measures. For a network operation that requires detection, such as the BSA and not, for example, the realization of a quantum logic gate for quantum computation, 𝒫(cb)\mathcal{P}^{(\mathrm{cb})} gives the probability that the network successfully performed its task, while (con)\mathcal{F}^{(\mathrm{con})} provides the quality of the result; in practice, one may optimize on either the conditional or unconditional infidelities, depending on the task under consideration. In the following, we train solely on the unconditional, however, we provide results from optimizing the conditional infidelity in the Supplementary Information S2.

III Correcting Imperfect Linear Interferometer Meshes

We begin to consider the effects of imperfections on QPNNs by holding the nonlinearity at the ideal value (φ=π\varphi=\pi) but introducing DC splitting ratio variations and photon loss as described above. The resultant unconditional infidelity of a BSA for 2 to 6-layer QPNNs with losses ranging from 0.001 to 3 dB/cm, as a function of training iterations, is shown in Figs. 2a-c.

Refer to caption
Figure 2: Performance of a QPNN-based BSA suffering from fabrication imperfections. The unconditional infidelity 𝒞(unc)\mathcal{C}^{(\mathrm{unc})} of (a) 2, (b) 4, and (c) 6-layer networks are shown as a function of the training iteration for increasingly lossy networks. In each pane, the results of 50 optimization trials are displayed, with clear plateaus visible in 𝒞(unc)\mathcal{C}^{(\mathrm{unc})} that increase with the losses. In each case, only trials that result in infidelity at or below those achieved by offline training (colored ticks in (a)-(c), shaded regions in (d)-(f)) are considered successful (shaded blue region shows an example for 0.001 dB/cm). The unconditional fidelity (unc)\mathcal{F}^{(\mathrm{unc})} of (d) 2, (e) 4, and (f) 6-layer networks are plotted with respect to the average losses αWG\alpha_{\mathrm{WG}}, with colored symbols (shaded regions) corresponding to the mean (95%95\% confidence interval) of a logarithmic normal distribution fitted to the successful trials of (a)-(c) (see the Supplementary Information S3 for more details). These points are seen to lie on the (dashed) loss limit curve, where the performance of the network is only limited by uniform photon loss (assumes perfect DCs; see Sec. VI for additional details), in contrast to networks that are trained offline (solid black curves and shaded grey regions), demonstrating the ability of QPNNs to learn to overcome imperfections. (g) Unconditional fidelity (unc)\mathcal{F}^{(\mathrm{unc})}, (h) conditional fidelity (con)\mathcal{F}^{(\mathrm{con})}, and (i) computational basis probability 𝒫(cb)\mathcal{P}^{(\mathrm{cb})}, as a function of LL for the QPNNs trained in situ, where the mean (symbols) and 95% confidence intervals (shaded regions in (g), (i), error bars in (h)) are determined via the same method as (d)-(f).

Each case is repeated 50 times, resulting in plateaus of 𝒞(unc)\mathcal{C}^{(\mathrm{unc})} that increase in value for increasing losses, as expected. Interestingly, for 2-layer BSAs, we observe a large spread in the final 𝒞(unc)\mathcal{C}^{(\mathrm{unc})}, particularly for low-loss networks (c.f. blue and purple curves in Fig. 2a), indicating that the final performance of the QPNN is largely dictated by imbalance due to imperfect DCs. Adding more layers to the network, as in Fig. 2b and c, reduces this spread, showing that larger QPNNs may learn to correct for these errors and more often reach optimal performance.

This is reflected in Figs. 2d-f, which show the unconditional fidelity (unc)\mathcal{F}^{(\mathrm{unc})} as a function of waveguide loss for different sized networks. Here, we compare the in situ trained QPNNs of Figs. 2a-c, denoted by the symbols, to the case where the network is trained offline. Offline training means that a perfect network was trained, then losses and DC errors were subsequently added to the solution. This was repeated 50 times for each αWG\alpha_{\mathrm{WG}}, selecting different random imperfections at each repetition, with the mean given by the solid-black curve and standard deviation by the grey region. When trained in situ, the QPNN learns to overcome these imperfections as is seen by the convergence toward loss-limited performance (see Sec. VI for more information on network training and the loss limit). This is more apparent for larger networks, where the fidelity of those trained offline significantly reduces due to increased losses and DC errors, while those trained in situ maintain and even increase (unc)\mathcal{F}^{(\mathrm{unc})}.

The balance between fabrication imperfections and network size, as a function of losses, is summarized in Fig. 2g. Here, we observe that for state-of-the-art losses (αWG=0.3 dB/cm)\left(\alpha_{\mathrm{WG}}=0.3\text{ dB}/\text{cm}\right) or worse, the unconditional fidelity decreases as expected when more layers are added to the network. When αWG=0.3 dB/cm\alpha_{\mathrm{WG}}=0.3\text{ dB}/\text{cm}, (unc)0.905\mathcal{F}^{(\mathrm{unc})}\geq 0.905 (0.904), where the bracketed result is the lower bound of the 95% confidence interval, even for a 6-layer QPNN, demonstrating that high-efficiency performance is possible on realistic state-of-the-art systems. Conversely, a more complex evolution is seen in Fig. 2g for 0.01 dB/cm losses or less, where there exists an optimal network size other than 2 layers. Losses at 0.01 dB/cm are similar to those of the silicon nitride platform for 1550 nm, reported as low as 0.007 dB/cm [37], but more typically near 0.01 dB/cm [38, 39, 40]. In this low-loss case (αWG=0.01 dB/cm)\left(\alpha_{\mathrm{WG}}=0.01\text{ dB}/\text{cm}\right), (unc)\mathcal{F}^{(\mathrm{unc})} first increases from 0.993 (0.949) to 0.998 (0.997) by adding two layers to the base size, as the network is better able to account for imperfections. The unconditional fidelity then only slightly decreases to 0.996 (0.996) as the network grows to 7 layers. That is, near-deterministic QPNN-based quantum elements such as BSAs will be realistic in the near-future as platform losses continue to decrease.

The situation is even more promising if the success of the network is preconditioned on detection in the computational basis, as is shown in Figs. 2h and i. Here we present (con)\mathcal{F}^{(\mathrm{con})} and 𝒫(cb)\mathcal{P}^{(\mathrm{cb})} for different sized QPNNs, and for differing αWG\alpha_{\mathrm{WG}}. Even for the extremely lossy networks, where αWG=2 dB/cm\alpha_{\mathrm{WG}}=2\text{ dB}/\text{cm}, (con)\mathcal{F}^{(\mathrm{con})} remains above 0.9999 (0.9888) for all L7L\leq 7, while for state-of-the-art losses this conditional fidelity does not drop below 0.999999 (0.999784) for 3L73\leq L\leq 7, as we observe in Fig. 2h. In fact, as shown in Fig. 2i, it is mainly the rate at which the QPNN produces a logical output that is affected by an increase in network size, showing the potential of even lossy networks if fault-tolerant protocols are used.

IV Embracing Weak Nonlinear Interactions

Having studied the effect of fabrication errors on network performance, we now turn to the consequences of sub-optimal nonlinearities. Assuming state-of-the-art losses (αWG=0.3 dB/cm)\left(\alpha_{\mathrm{WG}}=0.3\text{ dB}/\text{cm}\right), we vary the effective nonlinear phase shift φ\varphi from the ideal π\pi to π/100\pi/100 and attempt to train QPNNs of different sizes to act as BSAs (see the Supplementary Information S4 for exemplary training traces, c.f. Figs. 2a-c.) For each network size and effective nonlinearity, we attempt to train 200 QPNNs, showing the results in Fig. 3.

Refer to caption
Figure 3: Performance of realistic QPNN-based BSAs with sub-optimal (φπ)\left(\varphi\lesssim\pi\right) nonlinearities and state-of-the-art (αWG=0.3 dB/cm)\left(\alpha_{\mathrm{WG}}=0.3\text{ dB}/\text{cm}\right) losses. The unconditional fidelity (unc)\mathcal{F}^{(\mathrm{unc})} of (a) 2, (b) 3, and (c) 4-layer networks is shown with respect to the effective nonlinear phase shift φ\varphi, showing both offline (solid black curves, shaded grey regions) and in situ (colored symbols) trained networks, and the loss limit (dashed line), as in Fig. 2. In situ results include the average of all successfully-trained QPNNs (circles) and the best-case, where triangles (error bars) show the mean (95% confidence intervals) of a beta distribution fit to the maximal unconditional fidelity plateau (see the Supplementary Information S3, S4 for statistical analysis details and an example of this plateau). The (d) unconditional fidelity (unc)\mathcal{F}^{(\mathrm{unc})}, (e) conditional fidelity (con)\mathcal{F}^{(\mathrm{con})}, and (f) computational basis probability 𝒫(cb)\mathcal{P}^{(\mathrm{cb})} are plotted for each φ\varphi denoted on the colorbar, for networks of up to 7 layers. All means (triangles) and 95% confidence intervals (error bars) were determined in the same manner as the best-case in situ results of (a)-(c). Connecting dotted lines serve only as a visual aid.

Figs. 3a-c depict the highly non-trivial dependence of (unc)\mathcal{F}^{(\mathrm{unc})} on the effective nonlinearity φ\varphi. When the QPNN is trained offline, (unc)\mathcal{F}^{(\mathrm{unc})} increases monotonically with φ\varphi, as would be the case for a quantum-optical Fredkin gate-based BSA [41, 42, 43] (see the Supplementary Information S4 for additional information). Conversely, a QPNN can be trained to account for the weak nonlinearity, in which case it can vastly outperform this expectation. Considering a 2-layer network (Fig. 3a), we observe a strikingly different φ\varphi dependence when comparing the best-case in situ trained networks (triangles; see the Supplementary Information S3 for statistical analysis information) to both the average of all successful in situ training cases (circles) and networks trained offline. We observe that networks trained in situ can reach the loss limit with sub-optimal nonlinearities, in addition to fabrication imperfections. Specifically, we observe optimal performance of 2-layer QPNNs when φ=π/2\varphi=\pi/2 in addition to π\pi. Moreover, as can be seen in Fig. 3a, near-optimal performance is reached for a domain of φ\varphi centred at π/2\pi/2, providing a pathway to robust QPNN-based BSAs without the need for a perfect Kerr nonlinearity. It must be noted, however, that operating with weaker nonlinearities decreases the probability that the QPNN converges at the loss limit during training, as is shown in the Supplementary Information S4.

For all φ\varphi, a QPNN trained in situ learns how to account for weak nonlinearities and thus approach the loss limit. These capabilities improve as redundancies are added via an increase in network size, as visually evident across Figs. 3a-c and summarized in d. By adding a single additional (lossy) layer, QPNNs were trained to within 1.13%1.13\% of the unconditional fidelity achieved with the ideal nonlinearity, 0.951 (0.950) at φ=π\varphi=\pi, and within 1.15%1.15\% of the loss limit, 0.952, for all φπ/4\varphi\geq\pi/4. Even networks with nonlinearities as weak as φ=π/10\varphi=\pi/10 approach the loss limit at 6 layers, in contrast to the case of π/100\pi/100 where (unc)\mathcal{F}^{(\mathrm{unc})} increases only to 0.528 (0.528) at 7 layers from 0.499 (0.498) at 2 layers, essentially acting as a linear-optical BSA [11]. Trainability also improves with increased network size, as it becomes easier for the QPNN to find optimal solutions, such that the average unconditional fidelity achieved during in situ training approaches the maximum plateau.

In Figs. 3e and f, the conditional fidelity and computational basis probability are shown as a function of L7L\leq 7, for differing φ\varphi. In contrast to the case where photon losses were varied (c.f. Fig. 2), we observe that the behavior of (con)\mathcal{F}^{(\mathrm{con})} strongly depends on φ\varphi. While QPNNs with φ=π\varphi=\pi and π/2\pi/2 operate with (con)0.9999(0.9998)\mathcal{F}^{(\mathrm{con})}\geq 0.9999\left(0.9998\right) for all L7L\leq 7, networks with nonlinearities near π/4\pi/4 and 3π/43\pi/4 require at least 3 layers to reach this level, while at π/10\pi/10, 6 layers are needed. For all nonlinearities and network sizes, 𝒫(cb)\mathcal{P}^{(\mathrm{cb})} is within 0.009 (0.031) of loss-limited performance, as seen in Fig. 3f, and as expected for a QPNN suffering from state-of-the-art losses (c.f. Fig. 2d-f and i). Altogether, this demonstrates that for each combination of fabrication imperfections and effective nonlinearity, there exists an optimal network size that maximizes (unc)\mathcal{F}^{(\mathrm{unc})}. While adding layers will always tend to increase (con)\mathcal{F}^{(\mathrm{con})}, a balance must be struck with the exponential decrease in 𝒫(cb)\mathcal{P}^{(\mathrm{cb})}. In the Supplementary Information S5, we demonstrate a QPNN trained to generate Greenberger-Horne-Zeilinger states, and show that this remains true beyond the BSA application.

V Discussion

We have shown that high-fidelity operation is possible in realistic quantum photonic neural networks based on non-ideal Kerr nonlinearities. Since propagation through these networks leads to inevitable photon loss, their unconditional fidelity ceiling tends to decrease with increasing size. While this loss limit is unavoidable, these networks are able to learn to manage additional errors from non-uniform losses and directional coupler splitting ratio variations, often demonstrating increased fidelity with the addition of imperfect layers. Crucially, we have shown that weak nonlinearities, which are mere fractions of the ideal, are sufficient for near-optimal network performance. Even as these sub-optimal nonlinearities are realized [31, 32], the desired phase change will likely be accompanied by wave-packet distortions [44, 45], and although complex solutions based on dynamically-coupled cavities have been proposed [33, 34], it remains an open question if, instead, a QPNN may learn to overcome them in much the same way it does fabrication imperfections. Already in the work presented here, QPNNs offer a fascinating view of the intricate balance between loss, imperfect photon routing and weak nonlinearity, which we have unravelled to demonstrate how each combination leads to an optimal network geometry. Understanding and respecting this balance will be important, in the near future, as realistic QPNNs are designed and fabricated.

It is now clear why QPNNs far outperform linear-optical networks. Even with a weak π/4\pi/4 effective nonlinearity, they can learn to surpass the 0.5 unconditional fidelity possible with perfect linear optics [11], achieving (unc)=0.820\mathcal{F}^{(\mathrm{unc})}=0.820 (0.809) at 2 layers (see Figs. 1b and 3a), which grows to 0.951 (0.949) with an additional layer (see Fig. 3b). At 6 layers, loss-limited operation, (unc)=0.891\mathcal{F}^{(\mathrm{unc})}=0.891 (0.890), can be achieved with nonlinearities as weak as π/10\pi/10. Returning to Fig. 1c, which summarizes the success rate of operating NN BSAs in series, as would be necessary to connect quantum repeater nodes by entanglement swapping [13, 14], the performance benefits offered by QPNNs become more apparent. While 10 consecutive perfect linear-optical BSAs have a success rate of just 0.1%, 6-layer, π/10\pi/10 nonlinearity QPNNs reach 31.5%, and 3-layer, π/4\pi/4 networks achieve 60.5%.

Preconditioning the success of each QPNN-based BSA on the detection of 2 photons in the correct ports, as would be the case for generating cluster states from fusion gates [46], allows the much higher conditional fidelities to be leveraged. While (con)\mathcal{F}^{(\mathrm{con})} for a perfect linear-optical ten-BSA sequence remains at 1, realistic QPNNs of π/4\pi/4 (3 layers) and π/10\pi/10 (6 layers) nonlinearities reach 0.999 (0.997) and 0.99999 (0.99996), respectively. Given that these conditional fidelities are all near-unity, the rather large variations to (unc)\mathcal{F}^{(\mathrm{unc})} seen above can be attributed to the operational rate of the circuits (c.f. Eq. 6), which are 315×315\times improved when the perfect linear-optical BSAs are replaced by even 6-layer, π/10\pi/10 QPNNs. Hence, imperfect QPNNs are likely to play a key role in emerging large-scale quantum technologies.

VI Methods

Modeling Fabrication Imperfections

An ideal MZI, as displayed in the inset to Fig. 1a, can be described by a 2×22\times 2 matrix,

T(ideal)\displaystyle T^{(\mathrm{ideal})} =12(1ii1)(ei2θ001)(1ii1)(eiϕ001),\displaystyle=\frac{1}{2}\begin{pmatrix}1&-i\\ -i&1\end{pmatrix}\begin{pmatrix}e^{i2\theta}&0\\ 0&1\end{pmatrix}\begin{pmatrix}1&i\\ i&1\end{pmatrix}\begin{pmatrix}e^{i\phi}&0\\ 0&1\end{pmatrix},
=eiθ(eiϕcosθsinθeiϕsinθcosθ),\displaystyle=e^{i\theta}\begin{pmatrix}e^{i\phi}\cos{\theta}&-\sin{\theta}\\ e^{i\phi}\sin{\theta}&\cos{\theta}\end{pmatrix}, (7)

as is commonly found in the literature [12, 17], up to the arrangement of components specified here. To model a realistic MZI, we include two types of imperfections: photon loss due to propagation and an imperfect splitting ratio of the nominally 50:5050:50 DCs. Imperfect phase shifter calibration is neglected as a QPNN trained in situ would intrinsically learn the phase shifts that account for these errors. A photonic element of length \ell introduces the probability α\alpha that a photon is lost via propagation through it. By Eq. 1, α\alpha depends on the propagation losses per unit length, αWG\alpha_{\mathrm{WG}}, selected from a normal distribution with a width of 6.67%, corresponding to the state-of-the-art experimental results for SOI [26]. For each MZI, an individual α\alpha is computed, then applied through multiplication by the 2×22\times 2 matrix,

(1α001α).\begin{pmatrix}\sqrt{1-\alpha}&0\\ 0&\sqrt{1-\alpha}\end{pmatrix}. (8)

In the Supplementary Information S1, further details are given for the inclusion of these non-uniform losses, including the characteristic lengths (\ell) of each photonic element, and how the lack of unitarity is dealt with in the simulations. Similarly, each imperfect DC has an individual transmittance tt that is taken from a normal distribution centered at 0.5 with a standard deviation of 0.0508, matching experimental results of a broadband DC fabricated for SOI platforms [19]. For a given tt, the corresponding 2×22\times 2 transformation of the DC is,

(t±i1t±i1tt).\begin{pmatrix}\sqrt{t}&\pm i\sqrt{1-t}\\ \pm i\sqrt{1-t}&\sqrt{t}\end{pmatrix}. (9)

Altogether, these result in a 2×22\times 2 transformation describing a realistic MZI,

T(real)\displaystyle T^{(\mathrm{real})} =(1α001α)(t2i1t2i1t2t2)(ei2θ001)(t1i1t1i1t1t1)(eiϕ001),\displaystyle=\begin{pmatrix}\sqrt{1-\alpha}&0\\ 0&\sqrt{1-\alpha}\end{pmatrix}\begin{pmatrix}\sqrt{t_{2}}&-i\sqrt{1-t_{2}}\\ -i\sqrt{1-t_{2}}&\sqrt{t_{2}}\end{pmatrix}\begin{pmatrix}e^{i2\theta}&0\\ 0&1\end{pmatrix}\begin{pmatrix}\sqrt{t_{1}}&i\sqrt{1-t_{1}}\\ i\sqrt{1-t_{1}}&\sqrt{t_{1}}\end{pmatrix}\begin{pmatrix}e^{i\phi}&0\\ 0&1\end{pmatrix},
=1α(t1t2ei2θeiϕ+(1t1)(1t2)eiϕit1(1t2)ei2θit2(1t1)it2(1t1)ei2θeiϕ+it1(1t2)eiϕ(1t1)(1t2)ei2θ+t1t2).\displaystyle=\sqrt{1-\alpha}\begin{pmatrix}\sqrt{t_{1}t_{2}}e^{i2\theta}e^{i\phi}+\sqrt{(1-t_{1})(1-t_{2})}e^{i\phi}&i\sqrt{t_{1}(1-t_{2})}e^{i2\theta}-i\sqrt{t_{2}(1-t_{1})}\\ -i\sqrt{t_{2}(1-t_{1})}e^{i2\theta}e^{i\phi}+i\sqrt{t_{1}(1-t_{2})}e^{i\phi}&\sqrt{(1-t_{1})(1-t_{2})}e^{i2\theta}+\sqrt{t_{1}t_{2}}\end{pmatrix}. (10)

In the Supplementary Information S1, we analyze the regimes in αWG\alpha_{\mathrm{WG}}, LL where the imperfect DC splitting ratios are dominant, and vice versa.

Network Optimization & Training Processes

A QPNN is trained to perform a mapping between a set of KK input-output state pairs |ψin(i)\left|\psi_{\mathrm{in}}^{(i)}\right\rangle\to |ψout(i)\left|\psi_{\mathrm{out}}^{(i)}\right\rangle. For the QPNN-based BSA, the training set is provided in the computational basis in Fig 1a. Since dual-rail encoding is applied, |0\left|0\right\rangle (|1)\left(\left|1\right\rangle\right) in the computational basis is equivalent to |10\left|10\right> (|01)\left(\left|01\right\rangle\right) in the Fock basis for the two spatial modes that realize the photonic qubit.

The unconditional infidelity of the network, 𝒞(unc)=1(unc)\mathcal{C}^{(\mathrm{unc})}=1-\mathcal{F}^{(\mathrm{unc})} (see Eqs. 4,5 for (unc)\mathcal{F}^{(\mathrm{unc})}), is minimized to facilitate the optimization process. The variational parameters, {ϕi,𝜽i}\left\{\boldsymbol{\phi}_{i},\boldsymbol{\theta}_{i}\right\} for each layer in the network, are initialized randomly. Then, the local, gradient-free BOBYQA nonlinear optimization algorithm [35] (available from the NLopt library [36]) is applied until the absolute change in infidelity is less than some threshold chosen empirically based on the available computational resources. This algorithm constructs a quadratic approximation to the infidelity and thus does not require an analytical gradient. Gradient-free optimization was deemed pertinent since it is unlikely that the internal state of the network, as would be necessary for backpropagation methods, would be accessible during in situ training [7].

In contrast to in situ training, as described in the main text, offline training was conducted by training a QPNN with idealized components, then adding fabrication imperfections, and if necessary, adjusting the effective nonlinearity (c.f. Sec. IV). Due to the loss and DC splitting ratio variations, such imperfections were added to an idealized solution in 50 (200) repetitions in Fig. 2 (3), matching the number of in situ trials conducted. From these results, an in situ trial was deemed successful if it achieved an optimized unconditional infidelity at or below the worst-case of offline training (mean minus standard deviation). Only successful optimization trials were considered for further analysis. Similarly, the loss limit is computed by adding imperfections to an idealized solution, however, losses are assumed to be completely uniform at αWG\alpha_{\mathrm{WG}}, and the DC splitting ratios are all 50:5050:50.

All simulations were conducted on the Frontenac Platform computing cluster offered by the Centre for Advanced Computing at Queen’s University. The accompanying code was written in Python (version 3.10.2) using Numpy (version 1.22.2) and NLopt (version 2.6.1). Cython (version 0.29.30) was used to translate performance-sensitive operations to C to improve computation runtime. In the Supplementary Information S1, we identify where computational complexity arises when constructing the system transfer function.

Conditional Measures

As for the unconditional fidelity, the conditional fidelity can be found by projecting the actual output state, |ψout,act(i)=𝐒|ψin(i)\left|\psi_{\mathrm{out,act}}^{(i)}\right\rangle=\mathbf{S}\left|\psi_{\mathrm{in}}^{(i)}\right\rangle, onto the computational basis, CB\mathrm{CB}, and finding its overlap with the ideal output |ψout(i)\left|\psi_{\mathrm{out}}^{(i)}\right\rangle. Averaging over all KK input-output pairs, this is written as,

(con)=1Ki=1K|ψout(i)|A(i)𝐒|ψin(i)|2,\mathcal{F}^{(\mathrm{con})}=\frac{1}{K}\sum_{i=1}^{K}\left|\left\langle\psi_{\mathrm{out}}^{(i)}\right|A^{(i)}\mathbf{S}\left|\psi_{\mathrm{in}}^{(i)}\right\rangle\right|^{2}, (11)

where,

A(i)=[|xCB|x|𝐒|ψin(i)|2]12,A^{(i)}=\left[\sum_{\left|x\right\rangle\in\mathrm{CB}}\left|\bigl{\langle}x\bigr{|}\mathbf{S}\bigl{|}\psi_{\mathrm{in}}^{(i)}\bigr{\rangle}\right|^{2}\right]^{-\frac{1}{2}}, (12)

normalizes the ithi^{\mathrm{th}} 𝐒|ψin(i)\mathbf{S}\left|\psi_{\mathrm{in}}^{(i)}\right\rangle to the computational basis. Similarly, the probability of measuring an output in the computational basis is

𝒫(cb)=1Ki=1K|xCB|x|𝐒|ψin(i)|2.\mathcal{P}^{(\mathrm{cb})}=\frac{1}{K}\sum_{i=1}^{K}\sum_{\left|x\right\rangle\in\mathrm{CB}}\left|\bigl{\langle}x\bigr{|}\mathbf{S}\bigl{|}\psi_{\mathrm{in}}^{(i)}\bigr{\rangle}\right|^{2}. (13)

The ithi^{\text{th}} terms of Eqs. 11, 13 can be multiplied to yield Eq. 4, which follows simply from the fact that the ithi^{\text{th}} term of Eq. 13 can be expressed as (A(i))2\left(A^{(i)}\right)^{-2}.

VII Acknowledgements

This research is supported by the Vector Scholarship in Artificial Intelligence, provided through the Vector Institute. The authors thank N.R.H. Pedersen for his insights into linear meshes, and gratefully acknowledge support by the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canadian Foundation for Innovation (CFI), and Queen’s University.

VIII Author Contributions

N.R. and J.C. conceived the project, which they developed along with J.E. J.E. was responsible for designing and performing all simulations and analysis, with supervision from B.S. and N.R. All authors discussed the results and shared in the writing and editing responsibilities for the manuscript.

IX Additional Information

Supplementary Information accompanies the paper.

Competing Interests: The authors declare no competing interests.

References