Bidirectional information flow quantum state tomography

Huikang Huang¹ Haozhen Situ^1, [email protected] Shenggen Zheng^2, [email protected] ¹College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
²Circuits and Systems Research Center, Peng Cheng Laboratory, Shenzhen 518055, China

Abstract

The exact reconstruction of many-body quantum systems is one of the major challenges in modern physics, because it is impractical to overcome the exponential complexity problem brought by high-dimensional quantum many-body systems. Recently, machine learning techniques are well used to promote quantum information research and quantum state tomography has been also developed by neural network generative models. We propose a quantum state tomography method, which is based on Bidirectional Gated Recurrent Unit neural network (BiGRU), to learn and reconstruct both easy quantum states and hard quantum states in this paper. We are able to use fewer measurement samples in our method to reconstruct these quantum states and obtain high fidelity.

I Introduction

As a fundamental research topic in quantum information processing, quantum state tomography (QST) has always been a concern. As a data-driven problem, QST aims at obtaining as much information as possible of a quantum system and reconstructing the quantum state density matrix through effective quantum state measurement. Actually, accurate QST is impractical Häffner et al. (2005), especially cases in high-dimensional quantum many-body systems. One needs exponential complexity to describe a generic quantum many-body system. Even for small-scale quantum systems to be tomographic, it still requires a lot of resources. Therefore, we want to reconstruct the most accurate quantum system through as less measurement samples as possible. In other words, we want to capture the associations between these limited number of quantum state measurement samples and get more information to serve tomography.

After years of investigation, several key technologies has been developed in QST, including, but not limited to, the following methods, compressed sensing tomography Gross et al. (2010); Qi Yin (2018), permutationally invariant tomography Tóth et al. (2010); Moroder et al. (2012) and tomographic schemes based on tensor networks Baumgratz et al. (2013). In recent years, quantum machine learning has attract a lot attentions in quantum computing Biamonte et al. (2017); Situ et al. (2020); He et al. (2021). Meanwhile, machine learning has made great progress in assisting the research of quantum physics problems Carleo and Troyer (2017); Hartmann and Carleo (2019); Deng et al. (2017); Cai and Liu (2018); Fournier et al. (2020); Tang-Shi Yao (2019). Meanwhile, QST driven by neural network generative models has also received widespread attentions and some relevant research results has appeared. For example, based on probabilistic undirected graph model, Torlai et al. Torlai et al. (2018) used restricted Boltzmann machine (RBM) QST method to learn amplitude and phase of quantum state and reconstruct quantum state wavefunction. Based on the powerful autoregressive model Recurrent Neural Networks (RNN), Carrasquilla et al. Carrasquilla et al. (2019) introduced RNN-QST, which is able to use the informationally complete (IC) positive-operator valued measures (POVMs) samples, to reconstruct quantum states with high classical fidelity. There is also a transformer Vaswani et al. (2017) QST method based on attention mechanism-based generative network, which reconstructs mixed state density matrix of a noisy quantum state Cha et al. (2020). Moreover, there are some other QST methods driven by generative models Ahmed et al. (2020); Luchnikov et al. (2019). All of these methods demonstrate that machine learning techniques can effectively deal with specific quantum states.

In this paper, we propose to use a Bidirectional Recurrent Neural Network (BiRNN) generative model based on contextual semantics to perform QST. By slicing quantum state measurement samples as time series information flow, we are able to make full use of the contextual semantics of these messages to perform QST by this bidirectional neural network. At the same time, we propose a network training standard to conduct early stopping, which can help one to find effectively better training model using fewer training samples. These methods enable us to use fewer quantum state measurement samples than RNNCarrasquilla et al. (2019) and AQTCha et al. (2020) neural network tomography to achieve over 99% classical fidelity on GHZ state. We test our method in dealing with “easy quantum states” and “hard quantum states” that have different sampling difficulty Rocchetto et al. (2018). Finally, we have a brief discussion about why this QST method can effectively process some specific quantum states.

II Method

Our method belongs to unsupervised machine learning. The training samples are produced based on the IC-POVMs mentioned in Carrasquilla et al. (2019). The IC-POVM operators which we use here are Pauli-4 operators. Each $N$ -qubit quantum state measurement sample is denoted by a. We slice each of them into $T=N$ parts as time series information flow, i.e., $\textbf{a}=[a_{1},a_{2},...,a_{N}]$ , where $a_{i}\in\{0,1,2,3\}$ . The probability distribution corresponding to the Pauli-4 IC-POVM is denoted as $\textbf{P}=\{P(\textbf{a})\}_{\textbf{a}}$ with $P(\textbf{a})\geq 0$ and $\sum_{\textbf{a}}P(\textbf{a})=1$ .

II.1 Bidirectional Recurrent Neural Networks

Recurrent neural network can process effectively time series data in natural language processing (NLP) problems Sutskever et al. (2014); Wu et al. (2016); Chiu et al. (2018), which is extended to a bidirectional model by Schuster and Paliwal Schuster and Paliwal (1997). It can be trained without the limitation of using information just up to a preset future frame. In our experiments, we use a Bidirectional Gated Recurrent Unit (BiGRU) neural network, a network model developed by combining BiRNN network model and a gated recurrent unit Cho et al. (2014). BiRNN has been proved to capture effectively the relationship of contextual semantics in natural language processing. GRU is able to prevent the gradient disappearance and gradient explosion of RNN. Furthermore, GRU uses less network parameters and it is easier to be trained than long short-term memory (LSTM) network Hochreiter and Schmidhuber (1997).

Refer to caption — Figure 1: The framework of general Bidirectional Recurrent Neural Networks shown unfolded in time for three time steps. The yellow blocks represent the inputs of quantum state samples and the gray blocks are the hidden states learned from forward and backward information flow of quantum state samples. The black blocks are the quantum state samples of next time step, which are fitted and predicted by the yellow and gray blocks.

In order to process QST by neural network with less quantum state measurement samples, we need to make better use of the limited training samples. According to the general bidirectional model, we divide the training samples into forward information flow and backward information flow. Given a batch of $N_{s}$ quantum state measurement samples $\textbf{E}=\{\textbf{a}_{1},\textbf{a}_{2},\textbf{a}_{3},...,\textbf{a}_{N_{s}}\}$ , we slice $\textbf{a}_{i}$ according to time series $T$ , $1\leq t\leq T$ . The training procedure of this unfolded bidirectional network over time can be summarized as followings:

1)

Forward flow training: By receiving all input data for one time slice from $t=1$ to $T$ , the hidden layer outputs $\mathop{h_{t}}\limits^{\rightarrow}=f(a_{t},\mathop{h_{t-1}}\limits^{\rightarrow})$ .
2)

Backward flow training: By receiving time series data $a_{i}$ from $t=T$ to 1, the hidden layer outputs $\mathop{h_{t}}\limits^{\leftarrow}=f(a_{t},\mathop{h_{t+1}}\limits^{\leftarrow})$ .
3)

Consolidate information flow: Outputs of the network training through the joint bidirectional information flow $P(a_{t+1})=f(\mathop{h_{t}}\limits^{\rightarrow},\mathop{h_{t}}\limits^{\leftarrow})$ .

This procedure is a general bidirectional model framework and the activation function $f$ has different forms in different recurrent unit. See Fig. 1 for the framework of general BiRNN for three time steps. The model is trained by minimizing the log-likelihood loss function $L$ over $\theta$ , which is as following:

L(\theta)=-\frac{1}{T}\sum_{t=1}^{T}log(P_{\theta}(a_{t})),

where $\theta$ is a set of model parameters.

II.2 Easy and hard quantum states

Rocchetto et al. Rocchetto et al. (2018) proposed to classify quantum states based on the hardness of sampling probability distribution of measurement result in a given basis. The Greenberger-Horne-Zeilinger (GHZ) quantum state has been discussed in many articles, which is a highly non-classical state, specified by $|\Psi_{GHZ}\rangle=\frac{1}{\sqrt{2}}(|0\rangle^{\otimes N}+|1\rangle^{\otimes N})$ . As a simple quantum pure state, GHZ state is widely used in quantum communication protocols. This kind of states can be sampled easily in a large quantum system and we will learn and reconstruct it in this paper.

We consider a kind of random pure states as hard states which are generated by normalizing a $2^{n}$ dimensional complex vector drawn from unit sphere according to Haar measure Rocchetto et al. (2018). This kind of hard states cannot be prepared easily on any realistic quantum computing device with a polynomial large circuit. It means that quantum measurement samples can only be obtained for few-qubit states. Up to now, the performances of reconstructing the probability distributions Rocchetto et al. (2018), or the quantum state wavefunctions Torlai et al. (2018) of these states are not very ideal.

II.3 Learning standards

Quantum fidelity $F$ is a comprehensive measure of quantum state reconstruction. Let $\varrho_{GHZ}$ and $\varrho_{BiGRU}$ be two $N$ -qubit quantum states, the quantum fidelity between them is defined as

F(\varrho_{GHZ},\varrho_{BiGRU})=\mathrm{tr}(\sqrt{\sqrt{\varrho_{GHZ}}\varrho_{BiGRU}\sqrt{\varrho_{GHZ}}}).

We say that $\varrho_{BiGRU}$ is a good representation of $\varrho_{GHZ}$ if the quantum fidelity $F\geq 1-\epsilon$ for a small $\epsilon>0$ . However, due to exponential large dimension of the representation of density matrices, we can only calculate the quantum fidelity for reconstruction of small quantum systems. For large many-body quantum systems, we use classical fidelity $F_{c}$ to evaluate the reconstruction performance, which is defined as

F_{c}(\textbf{P}_{GHZ},\textbf{P}_{BiGRU})=\mathbb{E}_{\textbf{a}\sim\textbf{P}_{BiGRU}}\sqrt{\frac{P_{GHZ}(\textbf{a})}{P_{BiGRU}(\textbf{a})}}.

$F_{c}$ is a standard measure of proximity between two distributions. $P_{GHZ}(\textbf{a})$ , ${P_{BiGRU}(\textbf{a})}$ are the exact probability and the probability generated by BiGRU that corresponding to the same quantum state measurement sample a, respectively. Theoretically, $F_{c}(\textbf{P}_{GHZ},\textbf{P}_{BiGRU})=1$ only if $\textbf{P}_{GHZ}=\textbf{P}_{BiGRU}$ . In general, $F_{c}$ will serve as a upper bound of $F$ , meaning that a small error in $F_{c}$ will be amplified in $F$ Carrasquilla et al. (2019).

The stop criterion is a difficult problem in training generative models. In this paper, we observe that a large number of training epoch which leads to overfitting problem. Even if the loss function continues to decrease, the fidelity may decrease. Therefore, we cannot stop the training according to the loss function itself. Especially for the tomography of an unknown quantum state, how to use the fewest quantum state measurement samples and decide when to stop the network training to obtain the best fidelity will be the key issue. We propose a simple but effective metric for the stop criterion, i.e., degree of loss function’s fluctuation: $D=\{d_{i}\}$ with $d_{i}=loss_{i}-loss_{i-1}$ , which is the difference between the losses of two consecutive epochs. This metric can reflect the training quality of the network and help us to find effectively the best quantum state reconstruction model with the fewest training samples. The detailed process is to select the range with the least fluctuation: $D=[d_{j},...,d_{j+n}]$ and choose the trained network with the smallest loss in this range $BiGRU_{min(d_{j},...,d_{j+n}})$ , where $n$ is an appropriate value, e.g. 50.

III Numerical results

Deep learning framework Pytorch is used in our numerical experiment. We use a three-layer BiGRU in all experiments. The training is performed using backpropagation and the Adam optimizer Kingma and Ba (2014), with initial learning rate of 0.001. The classical fidelity is calculated by 50000 generative samples drawn from the trained network.

We now turn our attention to the reconstruction of GHZ state. We show that the numerical results of reconstruction of 10-qubit GHZ quantum state using different number of training samples in Fig. 2. We find that no matter how large the number of training samples is, the training model will overfit gradually as the number of training iterations increases. Even if we try to exceed the number of training samples mentioned by Carrasquilla et al. in Carrasquilla et al. (2019), as shown in Fig. 2d, overfitting occur also and result in poor reconstruction fidelity. In other words, how to prevent overfitting is the key point in this neural network QST technology. As far as we know, referencing the degree of loss function’s fluctuation can assist us to find better models.

In order to testify our method, we reconstruct GHZ states with system sizes ranging from $N=10$ to 50 qubits. The necessary number of training samples and the classical fidelities are shown in Fig. 3. Compared with the state-of-the-art RNN-QST Carrasquilla et al. (2019) and AQT Cha et al. (2020) methods, our BiGRU-QST method uses almost the fewest number of measurement samples to achieve over 99% fidelity. We can see also that a linear growth of the number of training samples with respect to the number of qubits.

When we use BiGRU to perform QST on the hard state, we find a limitation of this method. In our experiments, we prepare synthetic datasets mimicking experimental measurements of the 6-qubit exact distributions of quantum states $\textbf{P}_{GHZ}$ , $\textbf{P}_{Hard}$ , $\textbf{P}_{W}$ and $\textbf{P}_{product}$ . The W state is written as $|\Psi_{W}\rangle=\frac{1}{\sqrt{N}}(|100...\rangle+|010...\rangle+...+|...001\rangle)$ and the Product state is a fully polarized state $|\Psi_{Product}\rangle=H^{\otimes}{{}^{N}}|0\rangle^{\otimes}{{}^{N}}=\frac{1}{\sqrt{2^{N}}}\sum_{i=0}^{2^{N}-1}|i\rangle$ , where $|i\rangle$ is the computational basis. The probability distribution of measurement results of 6-qubit quantum system under the Pauli-4 measurement operators has $4^{6}=4096$ probability values. The probability distributions after sorting for these four states are shown in Fig 4. Due to the relatively simple structure of W state, GHZ state and Product state, these $4^{6}$ probability values have only 41, 17 and 59 different values. In other words, these few probability values will occupy most of the quantum information and directly affect tomographic fidelity. In numerical results, we investigate $F_{c}$ as a function of $N_{s}$ , the reconstructed results are shown in Fig. 5. We find that $F_{c}$ of Product state, W state and GHZ state reach quickly 99% fidelity with fewer measurement samples and $F_{c}$ of the hard state increases much more slowly because the probability values of its measurement distribution are all different. In short word, we can get an approximate rule for QST method driven by these time series generative models, that is, the number of different probability values will seriously affect reconstruction results. The efficiency of QST will be better when the number of different measurement probability values is smaller.

IV Conclusion

In summary, the purpose of quantum state tomography is to reconstruct as complete quantum state as possible through limited quantum state measurement samples. Actually, it is impractical to perform QST in large quantum systems due to exponential “curse of dimensionality” inherent to description of quantum states. Therefore, how to alleviate these scaling issues is what we are pursuing. The QST driven by neural networks helps us process specific quantum states to achieve high fidelity reconstructions with only a small amount of samples. In our work, we proposed a BiGRU neural network as generative model, which can effectively capture contextual associations in the field of natural language processing. In terms of QST by BiGRU, we encoded quantum state measurement samples into bidirectional information flow, and used BiGRU to capture the internal correlation of quantum information, and relied on early stopping skill to achieve better results compared with other neural network QST within the scope of our knowledge.

We showed through numerical experiments that these time series generative models are capable of representing, especially Product state, GHZ state and W state, whose probability distributions have very few different values. BiGRU can use very few measurement samples to realize QST with high fidelity, over 99%. On the other hand, the reconstruction of hard states with many different measurement probability values need more samples. Although performing hard state QST has no significant reduction of measurement samples, this method can be effectively used in QST, because the fidelity increases as the number of samples increases, which means this method is credible.

We have given a brief discussion and experimental results to show that these kinds of easy quantum states with few probability values can have a good QST performance by neural network method. In order to obtain more general conclusions, more experimental verification and theoretical explanation are needed. In order to improve the versatility of this method in quantum state tomography technology, as further works, we will investigate QST and find out relationship between the number of different probability values and the necessary number of measurement samples. Furthermore, we will also study characteristics of the probability distribution which may have a direct impact on neural network based tomography.

Acknowledgements.

The authors are thankful to the anonymous referees and editor for their comments and suggestions that greatly help to improve the quality of the manuscript. We thank Dr. Zhiming He for discussion to improve the presentations of some figures. This work is supported by Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515011204). S.Z. acknowledges supports in part from the National Natural Science Foundation of China (Nos. 61602532).

References

Häffner et al. (2005) H. Häffner, W. Hänsel, C. F. Roos, J. Benhelm, D. Chek-al kar, M. Chwalla, T. Körber, U. D. Rapol, M. Riebe, P. O. Schmidt, et al., Nature 438, 643 (2005), URL https://doi.org/10.1038/nature04279.
Gross et al. (2010) D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, and J. Eisert, Phys. Rev. Lett. 105, 150401 (2010), URL https://link.aps.org/doi/10.1103/PhysRevLett.105.150401.
Qi Yin (2018) C.-F. L. G.-C. G. Qi Yin, Guo-Yong Xiang, Chinese Physics Letters 35, 070302 (2018), URL http://cpl.iphy.ac.cn/EN/abstract/article_71189.shtml.
Tóth et al. (2010) G. Tóth, W. Wieczorek, D. Gross, R. Krischek, C. Schwemmer, and H. Weinfurter, Phys. Rev. Lett. 105, 250403 (2010), URL https://link.aps.org/doi/10.1103/PhysRevLett.105.250403.
Moroder et al. (2012) T. Moroder, P. Hyllus, G. Tóth, C. Schwemmer, A. Niggebaum, S. Gaile, O. Gühne, and H. Weinfurter, New Journal of Physics 14, 105001 (2012), URL https://doi.org/10.1088/1367-2630/14/10/105001.
Baumgratz et al. (2013) T. Baumgratz, D. Gross, M. Cramer, and M. B. Plenio, Phys. Rev. Lett. 111, 020401 (2013), URL https://link.aps.org/doi/10.1103/PhysRevLett.111.020401.
Biamonte et al. (2017) J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Nature 549, 195 (2017).
Situ et al. (2020) H. Situ, Z. He, Y. Wang, L. Li, and S. Zheng, Information Sciences 538, 193 (2020).
He et al. (2021) Z. He, L. Li, S. Zheng, Y. Li, and H. Situ, New Journal of Physics, to appear, doi: 10.1088/1367-2630/abe0ae (2021).
Carleo and Troyer (2017) G. Carleo and M. Troyer, Science 355, 602 (2017).
Hartmann and Carleo (2019) M. J. Hartmann and G. Carleo, Phys. Rev. Lett. 122, 250502 (2019), URL https://link.aps.org/doi/10.1103/PhysRevLett.122.250502.
Deng et al. (2017) D.-L. Deng, X. Li, and S. Das Sarma, Phys. Rev. X 7, 021021 (2017), URL https://link.aps.org/doi/10.1103/PhysRevX.7.021021.
Cai and Liu (2018) Z. Cai and J. Liu, Phys. Rev. B 97, 035116 (2018), URL https://link.aps.org/doi/10.1103/PhysRevB.97.035116.
Fournier et al. (2020) R. Fournier, L. Wang, O. V. Yazyev, and Q. Wu, Phys. Rev. Lett. 124, 056401 (2020), URL https://link.aps.org/doi/10.1103/PhysRevLett.124.056401.
Tang-Shi Yao (2019) M. Y. K.-J. Z. D.-Y. Y. C.-J. Y. Z.-L. F. H.-C. L. C.-H. L. L. W. L. W. Y.-G. S. Y.-J. S. H. D. Tang-Shi Yao, Cen-Yao Tang, Chinese Physics Letters 36, 068101 (2019), URL http://cpl.iphy.ac.cn/EN/abstract/article_105329.shtml.
Torlai et al. (2018) G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko, and G. Carleo, Nature Physics 14, 447 (2018), URL https://doi.org/10.1038/s41567-018-0048-5.
Carrasquilla et al. (2019) J. Carrasquilla, G. Torlai, R. G. Melko, and L. Aolita, Nature Machine Intelligence 1, 200 (2019), URL https://doi.org/10.1038/s42256-019-0045-0.
Vaswani et al. (2017) A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, arXiv:1706.03762v5 (2017).
Cha et al. (2020) P. Cha, P. Ginsparg, F. Wu, J. Carrasquilla, P. L. McMahon, and E.-A. Kim, arXiv:2006.12469v1 (2020).
Ahmed et al. (2020) S. Ahmed, C. Sanchez Munoz, F. Nori, and A. Frisk Kockum, arXiv:2008.03240v1 (2020).
Luchnikov et al. (2019) I. A. Luchnikov, A. Ryzhov, P.-J. Stas, S. N. Filippov, and H. Ouerdane, Entropy 21, 1091 (2019), ISSN 1099-4300, URL http://dx.doi.org/10.3390/e21111091.
Rocchetto et al. (2018) A. Rocchetto, E. Grant, S. Strelchuk, G. Carleo, and S. Severini, npj Quantum Information 4, 28 (2018), URL https://doi.org/10.1038/s41534-018-0077-z.
Sutskever et al. (2014) I. Sutskever, O. Vinyals, and Q. V. Le, in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (MIT Press, Cambridge, MA, USA, 2014), NIPS’14, p. 3104–3112.
Wu et al. (2016) Y. Wu, M. Schuster, Z. Chen, Q. Le, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, et al., arXiv:1609.08144v2 (2016).
Chiu et al. (2018) C. Chiu, T. N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen, A. Kannan, R. J. Weiss, K. Rao, E. Gonina, et al., in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), pp. 4774–4778.
Schuster and Paliwal (1997) M. Schuster and K. K. Paliwal, IEEE Transactions on Signal Processing 45, 2673 (1997).
Cho et al. (2014) K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, arXiv:1406.1078v3 (2014).
Hochreiter and Schmidhuber (1997) S. Hochreiter and J. Schmidhuber, Neural Computation 9, 1735 (1997), URL https://doi.org/10.1162/neco.1997.9.8.1735.
Kingma and Ba (2014) D. Kingma and J. Ba, arXiv:1412.6980v9 (2014).