^†^†thanks: These authors contributed equally to this work.^†^†thanks: These authors contributed equally to this work.

All-optical Neural Network Quantum State Tomography

Ying Zuo Department of Physics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China Chenfeng Cao Department of Physics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China Ningping Cao Department of Mathematics & Statistics, University of Guelph, Guelph N1G 2W1 ON, Canada Institute for Quantum Computing, University of Waterloo, Waterloo, ON N2L 3G1, Canada Xuanying Lai Department of Physics, The University of Texas at Dallas, Richardson, Texas 75080, USA Bei Zeng [email protected] Department of Physics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China Shengwang Du [email protected] Department of Physics, The University of Texas at Dallas, Richardson, Texas 75080, USA

Abstract

Quantum state tomography (QST) is a crucial ingredient for almost all aspects of experimental quantum information processing. As an analog of “imaging” technique in quantum settings, QST is born to be a data science problem, where machine learning techniques, noticeably neural networks, have been applied extensively. Here, we build and demonstrate an all-optical neural network (AONN) for photonic polarization qubit QST. The AONN is equipped with built-in optical nonlinear activation functions based on electromagnetically induced transparency. The experimental results show that our AONN can detrmine the phase parameter of the qubit state accurately. As optics is highly desired for quantum interconnections, our AONN-QST may contribute to the realization of all-optical quantum networks and inspire the ideas combining artificial optical intelligence with quantum information studies.

^†^†preprint: APS/123-QED

I Introduction

Quantum state tomography (QST) is a standard process of reconstructing quantum information of an unknown quantum state through measurements on its copies. QST is used to verify state preparation, exam state properties such as correlations, and calibrate experimental systems. It is a crucial part for almost all aspects of experimental quantum information processing, including quantum computing, quantum metrology, and quantum communication Bouchard et al. (2019); D’Ariano et al. (2003); Leonhardt (1995); Thew et al. (2002); Lvovsky and Raymer (2009); Rambach et al. (2021).

As an analog of “imaging” technique in quantum settings, QST is born to be a data science problem. Given limited copies of an unknown state $\rho$ , we can extract its information via QST. QST is essentially an inverse problem, and such information recovering tasks are well suited to machine leaning. Quantum learning theory indicates that $\Theta\left(2^{2n}/\varepsilon^{2}\right)$ copies of $\rho$ are necessary and sufficient to learn $\rho$ up to trace distance $\epsilon$ O’Donnell and Wright (2016). Although the tremendous resource requirement makes full-state QST impractical for large-scale systems, several weaker quantum learning models (e.g., probably approximately correct learning Aaronson (2007), online learning Aaronson et al. (2019), shadow tomography Aaronson (2019); Huang et al. (2020)) can exponentially reduce the computational resource for learning some 2-outcome measurement expectation values or “shadows”.

Artificial neural network (NN), as a powerful algorithm in machine learning to fit a specific function, has been widely used for solving quantum information problems, such as quantum optimal control Niu et al. (2019); An et al. (2020), quantum maximum entropy estimation Cao et al. (2021), Hamiltonian reconstruction Cao et al. (2020), etc. NNs have aalso been widely applied for QST applications, such as recovering the information of local-Hamiltonian ground states from local measurements Xin et al. (2019) efficiently, performing tomography on highly entangled state with large system size Torlai et al. (2018), mitigating the state-preparation-and-measurement (SPAM) errors in experiments Palmieri et al. (2020), and improving the state fidelity Quek et al. (2018); Ahmed et al. (2020a). Generative models with neural networks can also perform QST with dramatically lower cost Carrasquilla et al. (2019); Ahmed et al. (2020b).

Refer to caption — Figure 1: Schematics of all-optical neural network based quantum state tomography.

In this work, we demonstrate neural network QST with an all-optical neural network (AONN). Several optical implementations for realizing full connected neural network hardware have been proposed and demonstrated recentlyWetzstein et al. (2020); Shastri et al. (2021); Woods and Naughton (2012); Shen et al. (2017a); Lin et al. (2018); Zuo et al. (2019). Optical computing takes advantages of the bosonic wave nature of light: superposition and interference give rise to its intrinsic parallel computing ability. Meanwhile light is the fastest information carrier in nature. AONN is promising for next-generation artificial intelligence hardware which provides high energy efficiency, low crosstalk, light-speed processing, and massive parallelism. As compared to the electronic version, AONNs are ideal to deal with visual signals and information which are naturally generated and coded in light, such as image recognition and vehicular automation. However, most AONN demonstrations are still restricted to linear computation only due to the lack of suitable nonlinearity at low light level for large amount of optical neurons Woods and Naughton (2012); Shen et al. (2017a); Lin et al. (2018) . Without nonlinear activation functions, an AONN is always equivalent to a single-layer structure which cannot be applied for “real” deep machine learning. This problem has not been solved until most recently optical nonlinearity based on electromagnetically induced transparency (EIT) Zuo et al. (2019, 2021), phase-change materials Feldmann et al. (2019), and saturated absorption Guo et al. (2021); Ryou et al. (2021) was implemented to realize artificial optical neurons for AONNs.

Figure 1 illustrates a general scheme of AONN QST. Firstly we collect the training data set from a known quantum state { $|\psi_{j}\rangle$ } and the corresponding local measurements { $M_{j}$ }. Secondly we train neural network under supervised learning with some nonlinear activation functions in its hidden neurons to obtain the optimal network parameters. Third We take the trained network parameters to configure the AONN and perform some fine adjustment to optimize the hardware performance. At last, we feed measurement data sets to the trained AONN to reconstruct unknown quantum states. To validate this scheme, in the following sections we start from a general discussion of QST with the computer simulated NN and then describe our AONN experimental approach.

II NN for QST

We here consider a general $n$ -qubit space with Pauli operators (removed the all identity term) defined as

P=\{\sigma_{i_{1}}^{(1)}\otimes\cdots\otimes\sigma^{(n)}_{i_{n}}|\sigma^{(k)}_{i_{k}}\in\mathcal{P},\sum_{k=1}^{n}i_{k}\neq 0\},

(1)

where $\mathcal{P}=\{\sigma_{0}=I,\sigma_{1}=X,\sigma_{2}=Y,\sigma_{3}=Z\}$ . Every term in $P$ is specified by its index $(i_{1},i_{2},\cdots,i_{n})$ . Measuring every element in $P$ performs a QST for any $n$ -qubit quantum state $\rho$ . For instance, when $n=1$ , we need to measure all three Paulis $X,Y,Z$ for QST. Clearly, the cardinality of $P$ grows exponentially with $n$ . When $\rho$ is a pure state, one may use techniques to reduce the number of measurements for $n>1$ . Compressed sensing is one the standard techniques to recover low-rank quantum states from randomly sampled Pauli operators Gross et al. (2010); Flammia et al. (2012).

When $\rho$ is a pure state, it can be written as a ket

\ket{\psi}=\sum_{k=1}^{2^{n}}a_{k}\ket{\phi_{k}},

(2)

where { $\ket{\phi_{k}}$ } are the computational basis, and the amplitudes $a_{k}\in\mathbb{C}$ are normalized (i.e. $\sum_{k=1}^{2^{n}}(a_{k,\text{r}}^{2}+a_{k,\text{im}}^{2})=1$ , where $a_{k,\text{r}}\in\mathbb{R}$ and $a_{k,\text{im}}\in\mathbb{R}$ are the real and imaginary parts of $a_{k}$ respectively). The measurement expectation values $P$ are $\vec{c}=\operatorname{tr}(\rho\cdot P)=(\operatorname{tr}(\rho P_{1}),\operatorname{tr}(\rho P_{2}),\cdots,\operatorname{tr}(\rho P_{4^{n}-1}))$ . For a single qubit pure state $\alpha|0\rangle+\beta|1\rangle$ , its density matrix can be expressed as

\rho=\frac{1}{2}(1+\vec{c}\cdot\vec{\sigma})

(3)

where $\vec{\sigma}=\{X,Y,Z\}$ , and $\vec{c}=(\operatorname{tr}(\rho X),\operatorname{tr}(\rho Y),\operatorname{tr}(\rho Z))\equiv(\langle X\rangle,\langle Y\rangle,\langle Z\rangle)$ .

For $n>1$ , we consider compressed sensing to reduce the number of measured operators. To perform compressed sensing, one needs to randomly sample a set $P^{m}=\{P_{1},\cdots,P_{m}\}$ of $m$ Pauli operators from $P$ , then use $\vec{c}=\operatorname{tr}(\rho\cdot P^{m})=(\operatorname{tr}(\rho P_{1}),\operatorname{tr}(\rho P_{2}),\cdots,\operatorname{tr}(\rho P_{m}))$ to recover the unknown state $\rho$ , more precisely, the parameters of $\rho$ . This can be took as a regression problem to estimate the function between $\vec{c}$ and parameters of $\rho$ (e.g. $a_{k,\text{r}}$ and $a_{k,\text{im}}$ ).

NNs are excellent tools for solving regression problems. In the compressed sensing inspired QST, the expectation values $\vec{c}$ from random-sampled $P^{m}$ are inputs to the network, and ( $a_{k,\text{r}}$ , $a_{k,\text{im}}$ ) are the outputs. Note that there are different types of NNs with various structures and training procedures. According to the law of parsimony (Occam’s razor), we use the simplest type of NNs in this letter – fully-connected, feed-forward NNs. That is, the neurons between nearest layers are fully connected with one another and the information only pass forward while training. The supervised training process is to compare the ideal outputs $(a_{k,\text{r}},a_{k,\text{im}})$ with current NN outputs, and update parameters embedded in the NN to minimize their difference.

We numerically trained computer-based NNs nonlinear activation functions for 1-qubit, 2-qubit and 3-qubit QST. For the 1-qubit system, the number of sampled operators $m\in[1,2,3]$ ; for the 2-qubit system, the number of sampled operators $m\in[6,8,10,12]$ ; for the 3-qubit system, $m\in[20,25,30,35,40]$ . Plainly, $m$ equals the number of input neurons, and $n$ decides the number of output neurons. For each $m$ , three sets of Pauli operators have been sampled and tested. Figure 2 plots the average fidelities (green bars) of both cases as functions of the number of randomly sampled Paulis. For the single qubit system, the fidelity reaches 99.99% with 3 paulis. For the 2-qubit system, the fidelity reaches 99.9% with 10 randomly sampled paulis [Fig. 2(a)]. For the 3-qubit system, a fidelity of higher than 99.9% requires more than 35 randomly sampled Paulis [Fig. 2(b)]. Details of training can be found in Appendix A.

Theoretically, a pure state $\rho$ is Uniquely Determined among All states (UDA) of a set of operators $F$ means that there is no other state, pure or mix, has the same expectation values while measuring $F$ Chen et al. (2013). In Ref. Ma et al. (2016), authors discovered two sets of Pauli operators, $P_{\text{2-UDA}}$ and $P_{\text{3-UDA}}$ , that are UDA for all 2-qubit and 3-qubit pure states respectively. (See Appendix B for the particular sets $P_{\text{2-UDA}}$ and $P_{\text{3-UDA}}$ .) Namely, they are special cases of Pauli operator sets that the map between expectation values and the measured state $\rho$ is bijective. Similarly, we apply NNs for these two sets of UDA operators and obtain the prediction fidelities of 99.9% for the 2-qubit case and 99.3% for the 3-qubit case (red triangles in Figure 2).

We remark that our UDA scheme is not readily scalable for larger systems, however there exist protocals with better scalability, e.g., compressed sensing Gross et al. (2010), shadow tomography Aaronson (2019); Huang et al. (2020), where NN can be also be naturally used to enhance the protocols. Also, our NN-based scheme can be adapted to quantum tomography in the optical system, by taking into account of physical constrains, which will be discussed in detailed in the next section.

III AONN-QST experiment

In this first proof-of-principle experimental demonstration, we implement the single qubit space with light polarizations, i.e., horizontal polarization $|H\rangle=|0\rangle$ and vertical polarization $|V\rangle=|1\rangle$ . Instead of making a full QST, here we focus our task to determine the phase parameter of a pure state $|\psi\rangle=\frac{1}{\sqrt{2}}(|H\rangle+e^{i\theta}|V\rangle)$ . The experimental AONN-QST setup is displayed in Fig. 3. In conventional QST, an arbitrary polarization state can be reconstructed by measuring the expectation values of the three Pauli operators. Figure 3(a) illustrates such an optical measurement setup. A laser beam passes through a polarization beam splitter (PBS₁) and becomes horizontally polarized ( $|H\rangle$ ). The target state $|\psi\rangle=\frac{1}{\sqrt{2}}(|H\rangle+e^{i\theta}|V\rangle)$ is prepared by passing this horizontally polarized light through a half-wave plate (HWP₁) and a quarter-wave plate (QWP₁). The fast axis of the HWP₁ is aligned with an angle $\pi/4-\theta/2$ to the horizontal direction. The fast axis of the QWP₁ is aligned with an angle $\pi/4$ to the horizontal direction. $\langle X\rangle$ , $\langle Y\rangle$ and $\langle Z\rangle$ are obtained by sending the light polarization qubit state to the measurement units II, III and IV shown in Fig. 3(a). To determine $\langle Z\rangle$ , we send the polarization qubit directly to PBS₂ which projects $|H\rangle$ and $|V\rangle$ into two photodetectors in the measurement unit III. The normalized differential output from these two photodetectors gives the value $\langle Z\rangle$ . The same setup can also be used to determine $\langle X\rangle$ or $\langle Y\rangle$ by placing HWP₂ or QWP₂ before PBS₂ as shown in II or IV, respectively (see Appendix C for details).

We obtain a data set { $M_{i}$ }= $\{|\phi_{i}\rangle:1-\langle X\rangle_{i},1-\langle Y\rangle_{i},1-\langle Z\rangle_{i}\}$ by varying the phase $\theta\in[0,\pi/2]$ in the qubit state $|\psi\rangle=\frac{1}{\sqrt{2}}(|H\rangle+e^{i\theta}|V\rangle)$ and use them to train our AONN in Fig. 3(b). The AONN comprises an input layer of 3 neurons, a hidden layer of 20 neurons and a single-neuron output layer Zuo et al. (2019, 2021). Figure 3(b) shows the optical layout of the AONN and its network structure diagram is displayed in Fig. 3(c). The three coupling laser beams in the optical input layer are generated by a spatial light modulator (SLM1) [Fig. 3(b)], lenses L2 and L3, and an aperture, as shown in unit I of Fig.3(b). The SLM1 is divided into 3 parts and each part is encoded with sine phase pattern $m\pi\sin(\frac{2\pi}{T_{mi}}i+\frac{2\pi}{T_{mj}}j)$ , where $m$ is the modulation depth, $T_{mi}$ and $T_{mj}$ are the period of modulation along $x$ and $y$ directions and $i$ and $j$ are the pixel number along $x$ and $y$ directions. The sine phase encoded on SLM1 modulate the beams into separated beams at focal plane of lens L2 and the aperture behaves as a filter to keep the zero-order beam, whose intensity is determined by the modulation depth $m$ . Thus, the three beams with designed intensity in generated and collimated propagate to the SLM2 through lens L3. These weighted beams, as the input vector, are incident on SLM2 which diffracts each beam into 20 directions with designed weight (See Appendix D for the algorithm to calculate the pattern encoded on SLM2). A Fourier lens L4 performs linear summation for the beams diffracted into the same direction and forms 20 spots on its front focal plane. Thus, the combination of SLM2 and L4 completes the first linear operation W₁ and generates the input to the hidden layer. We then image these 20 spots with lenses L5 and L6 to laser-cooled ⁸⁵Rb atoms in a two-dimensional (2D) magneto-optical trap (MOT) Metcalf and van der Straten (2003); Zhang et al. (2012), where these 20-spot coupling beam pattern spatially modulate the transparency of the atomic medium through electromagnetically induced transparency (EIT) Harris (1997); Fleischhauer et al. (2005). Another relatively weak collimated probe beam counter-propagates through the MOT and its spatial transmission is nonlinearly controlled by the 20-spot coupling beam pattern. Here the nonlinear optical activation functions are realized with EIT in cold atoms. The equation of nonlinear activation functions follows Equation 7. The image of the probe beam transmission pattern by lens L6 and L8 becomes the output of the 20 hidden neurons. SLM3 and Fourier lens L9 perform the second linear matrix operation W₂ and the output is recorded by a camera. The technical details of our AONN is described in ref. Zuo et al. (2019, 2021).

In this work because we encode data into light energy, the AONN can only handle positive values: Input, output, linear matrix elements, and input/output of nonlinear activation functions are all positive values Zuo et al. (2019, 2021). Meanwhile, the EIT optical nonlinear activation functions are increasing and convex. Therefore the AONN is only able to perform regression task on increasing and convex functions. To match the AONN constrains, we performing a transform to the input variable, e.g., $\langle X\rangle$ to 1- $\langle X\rangle$ , so that all input values to the AONN nodes are positive. We add these conditions to NN to simulate the AONN performance. We find that this specific AONN fails to describe the whole range of nonmonotonic functions. For the first proof-of-principle experimental demonstration, we will only apply the AONN for single-qubit QST with phase $\theta$ within $[0,\pi/2]$ . It is surprising that such a positive-valued AONN is still able to perform some types of QST.

To train the AONN, we prepared the training data set { $M_{i}$ } from 23 phase values from a uniform distribution $\theta_{j}\sim U(0,\pi/2)$ , corresponding to the optical polarization states $\{\rho_{j}=\mathcal{N}(|\phi_{j}\rangle\langle\phi_{j}|)\}$ . Here $\mathcal{N}$ is the noise channel in experiments, and measure the Pauli expectation values $\langle X\rangle$ , $\langle Y\rangle$ , $\langle Z\rangle$ . In a similar way, we prepare a test set with 32 independent data samples.

In addition to optical quantum states, we sample data from the IBM quantum computer ibmq_ourense ibm (2020), and implement the same AONN training for comparison. The quantum circuit to prepare $|\psi\rangle=(|H\rangle+e^{i\theta}|V\rangle)/\sqrt{2}$ is the initial state $|H\rangle$ going through a Hadamard gate and then going through a RZ rotation gate. On ibmq_ourense, we uniformly sample 158 data points as training set, 50 data points as test set. Both experimental optical quantum state tomography and IBMQ tomography data are used to train two NNs. Details of training AONN can be found in Appendix E.

Figure 4 shows the AONN state construction results using neural network models trained by the AONN QST training set and the IBM quantum computer training set separately. The theoretical value is calculated from $\langle X\rangle$ directly. With the AONN system set up to the training results, we sent a set of input vector to the system. The example of real and imaginary part of density matrix are shown in Fig. 4(a). The experimentally measured state example is predicted by AONN QST training model. The example input vector for AONN model is $(\langle X\rangle,\langle Y\rangle,\langle Z\rangle)=(0.440,0.898,0)$ and the AONN experimental predicted state is $\theta=1.195$ and $\rho=\begin{pmatrix}0.5&0.1852-0.46457i\\ 0.1852+0.46457i&0.5\end{pmatrix}$ which is close to the theoretical value $\theta=1.1152$ and neural network predicted value $\theta=1.1532$ . The state is also marked with yellow triangle in Fig. 4 (b1). The experimental results are shown in Fig. 4(b). The theoretical value, NN predicted value and experimental AONN predicted value collapse together for both optics data training (Fig. 4 (b1) shows) and IBMQ data training (Fig. 4 (b2) shows). The theoretical value, neural network prediction value and AONN predicted value are consistent in both cases. The results suggest that our positive-valued AONN with EIT nonlinear activation functions is capable for qubit QST.

IV Discussion and conclusion

While most demonstrations of optical neural networks took classification tasks to verify their feasibilityLin et al. (2018); Feldmann et al. (2019); Shen et al. (2017b), we performed the first AONN-QST. To accomplish regression tasks, nonlinear function is essential as long as the relation between input vector and output vector cannot be expressed linearly. The tunable EIT nonlinear optical activation functions in our AONN offer opportunities for performing regression tasks with certain functions. Although our AONN has some certain limitation that the linear operation matrix elements are all positive valued, it has potential to solve some QST problems as we have demonstrated here for single qubit tomography. We believe the next generation of complex-valued AONNs with encoding data in both light amplitude and phase will be more powerful. The future development of complex-valued AONNs may become a more powerful tool for full QST in a much larger $n$ -qubit space.

Optical quantum network Kimble (2008) has been brought to the fore by the reduced decoherence and high speed of photons. Recently, apart from generating optical quantum states Gu et al. (2019) and optical quantum communication over long distance Yu et al. (2020), multiple state-of-the-art experiments on optical quantum interfaces to store Li et al. (2020) and distribute entanglements Choi et al. (2010); Pu et al. (2018) have been exhibited. Among all of these, QST is essential for not only characterizing generation and preservation of quantum states but also has the potential to verifying the entanglement distributed across the whole network. We believe that our all-optical setup of integrated AONN-QST will shed light on replenishing the all-optical quantum network with one more brick. We acknowledge the use of IBM Quantum services for this work.

References

Bouchard et al. (2019) Frédéric Bouchard, Felix Hufnagel, Dominik Koutnỳ, Aazad Abbas, Alicia Sit, Khabat Heshami, Robert Fickler, and Ebrahim Karimi, “Quantum process tomography of a high-dimensional quantum communication channel,” Quantum 3, 138 (2019).
D’Ariano et al. (2003) G Mauro D’Ariano, Matteo GA Paris, and Massimiliano F Sacchi, “Quantum tomography,” Advances in Imaging and Electron Physics 128, 206–309 (2003).
Leonhardt (1995) Ulf Leonhardt, “Quantum-state tomography and discrete wigner function,” Phys. Rev. Lett. 74, 4101 (1995).
Thew et al. (2002) RT Thew, Kae Nemoto, Andrew G White, and William J Munro, “Qudit quantum-state tomography,” Phys. Rev. A 66, 012303 (2002).
Lvovsky and Raymer (2009) Alexander I Lvovsky and Michael G Raymer, “Continuous-variable optical quantum-state tomography,” Rev. Mod. Phys. 81, 299 (2009).
Rambach et al. (2021) Markus Rambach, Mahdi Qaryan, Michael Kewming, Christopher Ferrie, Andrew G. White, and Jacquiline Romero, “Robust and efficient high-dimensional quantum state tomography,” Phys. Rev. Lett. 126, 100402 (2021).
O’Donnell and Wright (2016) Ryan O’Donnell and John Wright, “Efficient quantum tomography,” in Proceedings of the forty-eighth annual ACM symposium on Theory of Computing (2016) pp. 899–912.
Aaronson (2007) Scott Aaronson, “The learnability of quantum states,” Proc. R. Soc. A. 463, 3089–3114 (2007).
Aaronson et al. (2019) Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, and Ashwin Nayak, “Online learning of quantum states,” J. Stat. Mech. 2019, 124019 (2019).
Aaronson (2019) Scott Aaronson, “Shadow tomography of quantum states,” SIAM Journal on Computing 49, STOC18–368 (2019).
Huang et al. (2020) Hsin-Yuan Huang, Richard Kueng, and John Preskill, “Predicting many properties of a quantum system from very few measurements,” Nature Physics 16, 1050–1057 (2020).
Niu et al. (2019) Murphy Yuezhen Niu, Sergio Boixo, Vadim N Smelyanskiy, and Hartmut Neven, “Universal quantum control through deep reinforcement learning,” npj Quantum Inf. 5, 1–8 (2019).
An et al. (2020) Zheng An, Qi-Kai He, Hai-Jing Song, and DL Zhou, “High dimensional quantum optimal control with reinforcement learning,” arXiv:2007.00838 (2020).
Cao et al. (2021) Ningping Cao, Jie Xie, Aonan Zhang, Shi-Yao Hou, Lijian Zhang, and Bei Zeng, “Neural networks for quantum inverse problems,” arXiv:2005.01540 (2021).
Cao et al. (2020) Chenfeng Cao, Shi-Yao Hou, Ningping Cao, and Bei Zeng, “Supervised learning in hamiltonian reconstruction from local measurements on eigenstates,” J. Phys.: Condens. Matter 33, 064002 (2020).
Xin et al. (2019) Tao Xin, Sirui Lu, Ningping Cao, Galit Anikeeva, Dawei Lu, Jun Li, Guilu Long, and Bei Zeng, “Local-measurement-based quantum state tomography via neural networks,” npj Quantum Inf. 5, 1–8 (2019).
Torlai et al. (2018) Giacomo Torlai, Guglielmo Mazzola, Juan Carrasquilla, Matthias Troyer, Roger Melko, and Giuseppe Carleo, “Neural-network quantum state tomography,” Nat. Phys. 14, 447 (2018).
Palmieri et al. (2020) Adriano Macarone Palmieri, Egor Kovlakov, Federico Bianchi, Dmitry Yudin, Stanislav Straupe, Jacob D Biamonte, and Sergei Kulik, “Experimental neural network enhanced quantum tomography,” npj Quantum Inf. 6, 1–5 (2020).
Quek et al. (2018) Yihui Quek, Stanislav Fort, and Hui Khoon Ng, “Adaptive quantum state tomography with neural networks,” arXiv:1812.06693 (2018).
Ahmed et al. (2020a) Shahnawaz Ahmed, Carlos Sánchez Muñoz, Franco Nori, and Anton Frisk Kockum, “Classification and reconstruction of optical quantum states with deep neural networks,” arXiv:2012.02185 (2020a).
Carrasquilla et al. (2019) Juan Carrasquilla, Giacomo Torlai, Roger G Melko, and Leandro Aolita, “Reconstructing quantum states with generative models,” Nat. Mach. Intell 1, 155–161 (2019).
Ahmed et al. (2020b) Shahnawaz Ahmed, Carlos Sánchez Muñoz, Franco Nori, and Anton Frisk Kockum, “Quantum state tomography with conditional generative adversarial networks,” arXiv:2008.03240 (2020b).
Wetzstein et al. (2020) Gordon Wetzstein, Aydogan Ozcan, Sylvain Gigan, Shanhui Fan, Dirk Englund, Marin Soljačić, Cornelia Denz, David AB Miller, and Demetri Psaltis, “Inference in artificial intelligence with deep optics and photonics,” Nature 588, 39–47 (2020).
Shastri et al. (2021) Bhavin J Shastri, Alexander N Tait, T Ferreira de Lima, Wolfram HP Pernice, Harish Bhaskaran, C David Wright, and Paul R Prucnal, “Photonics for artificial intelligence and neuromorphic computing,” Nat. Photonics 15, 102–114 (2021).
Woods and Naughton (2012) Damien Woods and Thomas J Naughton, “Photonic neural networks,” Nat. Phys. 8, 257–259 (2012).
Shen et al. (2017a) Yichen Shen, Nicholas C Harris, Scott Skirlo, Mihika Prabhu, Tom Baehr-Jones, Michael Hochberg, Xin Sun, Shijie Zhao, Hugo Larochelle, Dirk Englund, et al., “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441 (2017a).
Lin et al. (2018) Xing Lin, Yair Rivenson, Nezih T Yardimci, Muhammed Veli, Yi Luo, Mona Jarrahi, and Aydogan Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361, 1004–1008 (2018).
Zuo et al. (2019) Ying Zuo, Bohan Li, Yujun Zhao, Yue Jiang, You-Chiuan Chen, Peng Chen, Gyu-Boong Jo, Junwei Liu, and Shengwang Du, “All-optical neural network with nonlinear activation functions,” Optica 6, 1132–1137 (2019).
Zuo et al. (2021) Ying Zuo, Yujun Zhao, You-Chiuan Chen, Shengwang Du, and Junwei Liu, “Scalability of all-optical neural networks based on spatial light modulators,” Phys. Rev. Applied 15, 054034 (2021).
Feldmann et al. (2019) J Feldmann, N Youngblood, C David Wright, H Bhaskaran, and WHP Pernice, “All-optical spiking neurosynaptic networks with self-learning capabilities,” Nature 569, 208–214 (2019).
Guo et al. (2021) Xianxin Guo, Thomas D. Barrett, Zhiming M. Wang, and A. I. Lvovsky, “Backpropagation through nonlinear units for the all-optical training of neural networks,” Photon. Res. 9, B71–B80 (2021).
Ryou et al. (2021) Albert Ryou, James Whitehead, Maksym Zhelyeznyakov, Paul Anderson, Cem Keskin, Michal Bajcsy, and Arka Majumdar, “Free-space optical neural network based on thermal atomic nonlinearity,” (2021), arXiv:2102.04464 [cs.ET] .
Gross et al. (2010) David Gross, Yi-Kai Liu, Steven T Flammia, Stephen Becker, and Jens Eisert, “Quantum state tomography via compressed sensing,” Phys. Rev. Lett. 105, 150401 (2010).
Flammia et al. (2012) Steven T Flammia, David Gross, Yi-Kai Liu, and Jens Eisert, “Quantum tomography via compressed sensing: error bounds, sample complexity and efficient estimators,” New J. Phys. 14, 095022 (2012).
Chen et al. (2013) Jianxin Chen, Hillary Dawkins, Zhengfeng Ji, Nathaniel Johnston, David Kribs, Frederic Shultz, and Bei Zeng, “Uniqueness of quantum states compatible with given measurement results,” Phys. Rev. A 88, 012109 (2013).
Ma et al. (2016) Xian Ma, Tyler Jackson, Hui Zhou, Jianxin Chen, Dawei Lu, Michael D Mazurek, Kent AG Fisher, Xinhua Peng, David Kribs, Kevin J Resch, et al., “Pure-state tomography with the expectation value of pauli operators,” Phys. Rev. A 93, 032140 (2016).
Metcalf and van der Straten (2003) Harold J Metcalf and Peter van der Straten, “Laser cooling and trapping of atoms,” J. Opt. Soc. Am. B 20, 887–908 (2003).
Zhang et al. (2012) Shanchao Zhang, J. F. Chen, Chang Liu, Shuyu Zhou, M. M. T. Loy, G. K. L. Wong, and Shengwang Du, “A dark-line two-dimensional magneto-optical trap of 85rb atoms with high optical depth,” Rev. Sci. Instrum. 83, 073102 (2012).
Harris (1997) Stephen E Harris, “Electromagnetically induced transparency,” Phys. Today 50(7), 36–42 (1997).
Fleischhauer et al. (2005) Michael Fleischhauer, Atac Imamoglu, and Jonathan P Marangos, “Electromagnetically induced transparency: Optics in coherent media,” Rev. Mod. Phys. 77, 633 (2005).
ibm (2020) 5-qubit backend: IBM Q team, “IBM Q 5 Ourense backend specification V1.3.5,”. Retrieved from https://quantum-computing.ibm.com (2020).
Shen et al. (2017b) Yichen Shen, Nicholas C. Harris, Scott Skirlo, Mihika Prabhu, Tom Baehr-Jones, Michael Hochberg, Xin Sun, Shijie Zhao, Hugo Larochelle, Dirk Englund, and Marin Soljačić, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441–446 (2017b).
Kimble (2008) H. J. Kimble, “The quantum internet,” Nature 453, 1023–1030 (2008), 0806.4195 .
Gu et al. (2019) Zhenjie Gu, Ce Yang, and J. F. Chen, “Characterization of the photon-number state of a narrow-band single photon generated from a cold atomic cloud,” Opt. Commun. 439, 206–209 (2019).
Yu et al. (2020) Yong Yu, Fei Ma, Xi Yu Luo, Bo Jing, Peng Fei Sun, Ren Zhou Fang, Chao Wei Yang, Hui Liu, Ming Yang Zheng, Xiu Ping Xie, Wei Jun Zhang, Li Xing You, Zhen Wang, Teng Yun Chen, Qiang Zhang, Xiao Hui Bao, and Jian Wei Pan, “Entanglement of two quantum memories via fibres over dozens of kilometres,” Nature 578, 240–245 (2020), 1903.11284 .
Li et al. (2020) C. Li, N. Jiang, Y. K. Wu, W. Chang, Y. F. Pu, S. Zhang, and L. M. Duan, “Quantum Communication between Multiplexed Atomic Quantum Memories,” Phys. Rev. Lett. 124, 1–6 (2020), 1909.02185 .
Choi et al. (2010) K. S. Choi, A. Goban, S. B. Papp, S. J. Van Enk, and H. J. Kimble, “Entanglement of spin waves among four quantum memories,” Nature 468, 412–418 (2010), 1007.1664 .
Pu et al. (2018) Yunfei Pu, Yukai Wu, Nan Jiang, Wei Chang, Chang Li, Sheng Zhang, and Luming Duan, “Experimental entanglement of 25 individually accessible atomic quantum interfaces,” Sci. Adv. 4, eaar3931 (2018).
Xu et al. (2015) Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li, “Empirical evaluation of rectified activations in convolutional network,” arXiv:1505.00853 (2015).
Di Leonardo et al. (2007) Roberto Di Leonardo, Francesca Ianni, and Giancarlo Ruocco, “Computer generation of optimal holograms for optical trap arrays,” Opt. Express 15, 1913–1922 (2007).
Nogrette et al. (2014) Florence Nogrette, Henning Labuhn, Sylvain Ravets, Daniel Barredo, Lucas Béguin, Aline Vernier, Thierry Lahaye, and Antoine Browaeys, “Single-atom trapping in holographic 2d arrays of microtraps with arbitrary geometries,” Phys. Rev. X 4, 021034 (2014).

Appendix A Training and testing the NN

The input layer is determined by the sampled Pauli operators, the number of neurons equals to $m$ . The number of output layer neurons is equals to the independent amplitude parameters (see eq. 2), denote as $d$ , e.g. $d$ equals $2\cdot 2-1=3$ for 1-qubit, $2\cdot 2^{2}-1=7$ for 2-qubit, and $2\cdot 2^{3}-1=15$ for 3-qubit. There are four hidden layers for each network, each layer contains $32\cdot d$ neurons. The activation function between each layer is the popular Leaky ReLu Xu et al. (2015), a modified version of the classic non-linear activation function ReLu. The network type is fully connected and feed-forward, the optimizer is Adam, and the loss function is mean square error

MSE(\phi)=E_{\phi}[(\hat{\phi}-\phi)^{2}],

(4)

for all tested examples.

For each 2-qubit case, 20,000 pairs of training data are used for training. The corresponding number is 150,000 for each 3-qubit case. And the number of iterations is 300 for all networks.

Appendix B UDA operator sets

The UDA operator sets Ma et al. (2016) for 2- and 3-qubit are as follows:

	$\displaystyle P_{\text{2-UDA}}=\{$	$\displaystyle IX,IY,IZ,XI,YX,YY,$
		$\displaystyle YZ,ZX,ZY,ZZ\},$

and

	$\displaystyle P_{\text{3-UDA}}=\{$	$\displaystyle IIX,IIY,IIZ,IXI,IXX,IXY,IYI,$
		$\displaystyle IYX,IYY,IZI,XIZ,XXX,XXY,$
		$\displaystyle XYX,XYY,XZX,XZY,YXX,YXY,$
		$\displaystyle YXZ,YYX,YYY,YYZ,YZI,ZII,$
		$\displaystyle ZXZ,ZYZ,ZZX,ZZY,ZZZ\}.$

Appendix C Polarization analysis and Poincare sphere

In describing polarizations, we use the convention of the $xyz$ coordinator system, i.e., we look toward the light source, then the z-axis is in the direction of light momentum. Thus the North Pole represents $\sigma_{+}$ , i.e., left-rotated polarization and the South Pole represents $\sigma_{-}$ , i.e., right-rotated polarization. $|H\rangle$ ( $|\nearrow\rangle=\frac{1}{\sqrt{2}}(|H\rangle+|V\rangle)$ ) and $|V\rangle$ ( $|\searrow\rangle=\frac{1}{\sqrt{2}}(|H\rangle-|V\rangle)$ ) locates at the intersections of $S_{1}$ ( $S_{3}$ ) and the equator of $Poincar\acute{e}$ sphere. State $|\psi\rangle=\frac{1}{\sqrt{2}}(|H\rangle+e^{i\theta}|V\rangle)$ is on the intersection circle between $S_{2}S_{3}$ plane and the sphere, where $\theta$ is the angle of $|\psi\rangle$ to $|\nearrow\rangle$ .

The initial state $|H\rangle$ is prepared with laser going through a polarization beam splitter (PBS). Then, we set the fast axis (pink line in Fig. 5 ) of HWP with an angle of $\alpha$ to $|H\rangle$ , so $|H\rangle$ will rotate along the fast axis of HWP by $180^{\circ}$ to $|\psi_{0}\rangle$ (the light orange point in Figure 1), with the angle of $\theta$ to $S_{3}$ . Finally, the fast axis of QWP (the green dot-line axis in Fig 5) is set at the direction of $S_{3}$ , so $|\psi_{0}\rangle$ will rotate along $S_{3}$ by $90^{\circ}$ to $|\psi\rangle$ (the dark orange point in Figure 1), with the angle to $S_{3}$ as $\theta$ . Since the angle in the real world is the half of that in Poincare sphere, in experiment we rotate the fast axis of HWP with an angle of $\frac{\alpha}{2}$ to x-plus axis in the range from ${\frac{\pi}{8}}$ to ${\frac{\pi}{4}}$ , and set the fast axis of QWP with an angle of $\frac{\pi}{4}$ to x-plus axis.

The measurement of $\langle Z\rangle$ is shown in III of Fig 3(a). The laser in polarization state $|\psi\rangle$ is incident on a PBS and the output powers of the two pannles are measured. Thus the probability of $|H\rangle$ is the transmittance T and the probability of $|V\rangle$ is the reflecitivity R. Since the eigenvalue of $|H\rangle$ and $|V\rangle$ are +1 and -1 respectively, $\langle Z\rangle=T-R$ . As for the measurement results of $\langle X\rangle$ and $\langle Y\rangle$ , we expect they have the same expressions as T-R, which means we should get both $|H\rangle$ if the inputs are $|\nearrow\rangle$ and $|L\rangle$ in these two measurements respectively. It is clear to see in $Poincar\acute{e}$ sphere that if we use a HWP with the fast axis having an angle of $\frac{\pi}{4}$ to $|H\rangle$ and a QWP with the fast axis at the direction of $S_{3}$ respectively, we can rotate the input state to get the expected measurement results. So II of Fig 3(a), we insert a HWP with an angle of $\frac{\pi}{8}$ to x-plus axis in front of the PBS to measure $\langle X\rangle$ and replace the HWP with a QWP with angle of $\frac{\pi}{4}$ to the x-plus axis to measure $\langle Y\rangle$ as shown in IV of Fig 3(a).

Appendix D Implementation of weighted beam generation

SLMs using in experiment conducted the input generation and two linear operations. For input generation, the first SLM (SLM1) is divided into rectangular parts with same size. Each part contributes to generate an element of input vector with range from 0 to 1. Without an aperture, two lenses consist of 4-f system and the exact same intensity distribution is imaged on the second SLM. We apply blazed grating and a sine modulation to the SLM with the form below:

\phi_{grating}(i,j)=\frac{2\pi}{T_{i}}i+\frac{2\pi}{T_{j}}j

(5)

\phi_{mod}(i,j)=m\pi\sin(\frac{2\pi}{T_{mi}}i+\frac{2\pi}{T_{mj}j})

(6)

$\phi_{grating}(i,j)and\phi_{mod}(i,j)$ presents the grating phase and modulation phase applied on SLM. $i,j$ present the pixel index along two directions and $T_{i}$ and $T_{j}$ are the line spacing accordingly. $m$ is called modulation depth. $T_{mi}$ and $T_{mj}$ are the period of this sine function along two directions. On focal plane, the beam is separated into multiple beams due to modulation. The intensity of beam located on the same position of the beam with no modulation is determined by modulation depth $m$ . Thus, the input can be modulated.

The linear operation conducted by SLM2 and SLM3. SLMs are divided to multiple parts and each part receive the output from the last layer and divided the output of last layer to multiple beams on the focal plane. We apply iterative algorithm to obtain optical holograms targeted to the generation of weighted multiple spots and weighted Gerchberg-Saxton (GSW) algorithmDi Leonardo et al. (2007) Nogrette et al. (2014) is one of frequently used algorithm. We applied several modifications on GSW algorithm to achieve better performance. Experimentally, the output beam intensity distribution is not as same as computer calculation. The difference can come from the deflections on optical path, SLM surface and inaccuracy of incident intensity. Theoretically, it’s hard to simulate all these deflections. We apply the adaptive iteration by using the experimentally measured intensity $I_{n}$ instead of numerical calculated amplitude to perform the iteration after FFT. i.e. We measure the intensity distribution by camera, if the difference between camera captured intensity and target intensity, we can drive this pattern to SLM. Otherwise, we replace the amplitude to $g_{n}\sqrt{I_{t}}$ with $g_{n}=\frac{\sqrt{I_{t}}}{\sqrt{I_{n}}}g_{n-1}$ and do inverse FFT. To avoid over-compensation, we add parameter a to adjust the feedback as $g_{n}=a\frac{\sqrt{I_{t}}}{\sqrt{I_{n}}}g_{n-1}+(1-a)$ , where $a$ is in the range $(0,1)$ .

Appendix E Training Optical Neural Network

In the optical neural network tomography scheme, we input $\{1-\langle\sigma_{x}\rangle,1-\langle\sigma_{y}\rangle,1-\langle\sigma_{z}\rangle\}$ to the optical neural network instead of $\{\langle\sigma_{x}\rangle,\langle\sigma_{y}\rangle,\langle\sigma_{z}\rangle\}$ because the non-linear activation function

I_{p}^{out}=f(I_{c})=I_{p}^{in}e^{-OD\frac{4\gamma_{12}\gamma_{13}}{\Omega_{c}^{2}+4\gamma_{12}\gamma_{13}}}

(7)

is convex. We need to make sure the target regression function is convex as well.

In our experiments, the optical neural network is trained via classical simulation. The optimizer is Adam, the learning rate is 0.002, the loss function is also mean square error (eq. 4) for both data sampled from our optical measurements and the IBM Ourense quantum computer, 10000 iterations suffice to converge, as shown in Fig. 6.