Parametric t-Stochastic Neighbor Embedding With Quantum Neural Network

Yoshiaki Kawase Graduate School of Engineering Science, Osaka University
1-3 Machikaneyama, Toyonaka, Osaka 560-831, Japan [email protected] Kosuke Mitarai [email protected] Graduate School of Engineering Science, Osaka University
1-3 Machikaneyama, Toyonaka, Osaka 560-831, Japan Center for Quantum Information and Quantum Biology, Osaka University, Japan. JST, PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan. Keisuke Fujii [email protected] Graduate School of Engineering Science, Osaka University
1-3 Machikaneyama, Toyonaka, Osaka 560-831, Japan Center for Quantum Information and Quantum Biology, Osaka University, Japan. RIKEN Center for Quantum Computing, Wako Saitama 351-0198, Japan.

Abstract

t-Stochastic Neighbor Embedding (t-SNE) is a non-parametric data visualization method in classical machine learning. It maps the data from the high-dimensional space into a low-dimensional space, especially a two-dimensional plane, while maintaining the relationship, or similarities, between the surrounding points. In t-SNE, the initial position of the low-dimensional data is randomly determined, and the visualization is achieved by moving the low-dimensional data to minimize a cost function. Its variant called parametric t-SNE uses neural networks for this mapping. In this paper, we propose to use quantum neural networks for parametric t-SNE to reflect the characteristics of high-dimensional quantum data on low-dimensional data. We use fidelity-based metrics instead of Euclidean distance in calculating high-dimensional data similarity. We visualize both classical (Iris dataset) and quantum (time-depending Hamiltonian dynamics) data for classification tasks. Since this method allows us to represent a quantum dataset in a higher dimensional Hilbert space by a quantum dataset in a lower dimension while keeping their similarity, the proposed method can also be used to compress quantum data for further quantum machine learning.

^†^†preprint: APS/123-QED

I Introduction

Visualization of high-dimensional data is an important subfield of machine learning. It allows us to intuitively interpret the data and understand possible patterns in them. As visualization often involves mapping of the original data to a low, typically two or three, dimensional space, the techniques for visualizations are also useful for compression of data or preprocessing before applying another machine learning techniques. Prototypical examples of visualization techniques include t-stochastic neighbour embedding (t-SNE) [1].

Such techniques have been proved to be useful also for machine learning of quantum states. Ref. [2] applied various visualization methods to detect quantum phase transitions in the Hubbard model, where they generated states by quantum Monte Carlo simulations. They have found that t-SNE is the most promising technique in this type of the application. Ref. [3] applied t-SNE to visualize quantum states represented as a matrix product states. They successfully visualized quantum phase transitions in spin models such as transverse field Ising model.

However, the application of machine learning techniques running on classical computers for this purpose is intrinsically limited to the case where the target quantum states have efficient classical representations. The use of quantum computers can provide advantage in widening the range of quantum states that can be used as inputs. In fact, in Ref. [4], identification of topological quantum phase has been proposed by using a clustering algorithm using a quantum computer. This approach is motivated by the fact that there is a family of quantum states that are useful for machine learning of classical data but cannot be efficiently represented by classical computers [5].

Here, we propose to use quantum neural networks (QNNs), which use parametric quantum circuit to construct a machine learning model, for visualization of quantum states. Our proposal is based on parametric t-SNE [6], which is a visualization technique where we employ neural networks to map high-dimensional data into low-dimensional space. The mapping is optimized so as to keep the similarity of the data points, which is defined from the distance in the respective spaces, unchanged. Our idea is to use QNNs instead of the classical neural network. This allows us to directly use quantum states as inputs, which may be useful for studying complex quantum systems and certain machine learning problems that are hard classically. The similarity of the quantum states can be defined from fildelity(-like) measures. Our method optimizes the parameters in a QNN so that the respective quantum states are mapped to low-dimensional points, which are defined as expectation values of certain observables at the output of the QNN, while maitaining the similarity among the points.

We also conduct numerical verification of our proposed method. First, we use the Iris flower dataset [7] to test if it can be successfully applied to classical data. Second, we visualize the quantum states time-evolved under the transverse-field Ising Hamiltonian. In the visualization of quantum data, we could not effectively generate a visualization with distinct clusters when using default cost function usually utilized for parametric t-SNE. To deal with this problem, we introduce a hyper-parameter into the cost function to adapt the scaling of the low-dimensional data. With these successful demonstrations, we believe that the proposed method would be a powerful tool to visualize and analyse both classical and quantum datasets.

This paper is organized as follows. In section II, we briefly review the t-SNE. In section III, we describe our proposed method. In section IV, we perform numerical experiments to verify our proposed method. In section V, we describe the conclusion and future work.

II Background

II.1 t-Stochastic Neighbor Embedding

The t-Stochastic Neighbor Embedding (t-SNE) [1], also called non-parametric t-SNE, is a classical machine learning method for visualizing high-dimensional data. The idea of t-SNE is to map data points in the original high-dimensional space to points in a low-dimensional space while keeping the similarity among the points. The map is determined by minimizing the KL-divergence between the similarity of data distributions in the high- and low- dimensional space.

In detail, the t-SNE defines the similarity between a high-dimensional datapoint $x_{i}$ and another datapoint $x_{j}$ by the following joint probability [1].

\displaystyle p_{ij}=\frac{p_{i|j}+p_{j|i}}{2N},

(1)

where,

	$\displaystyle p_{j\|i}$	$\displaystyle=$	$\displaystyle\frac{\exp{\left(-\\|x_{i}-x_{j}\\|^{2}/2\sigma_{i}^{2}\right)}}{\sum_{k\neq i}\exp{\left(-\\|x_{i}-x_{k}\\|^{2}/2\sigma_{i}^{2}\right)}},$		(2)
	$\displaystyle p_{ii}$	$\displaystyle=$	$\displaystyle 0,$

and $\sigma_{i}$ are parameters determined from the following quantity called perplexity of data $x_{i}$ .

\displaystyle Perp_{i}=2^{-\sum_{j}p_{j|i}\log_{2}{p_{j|i}}}.

The value of $\sigma_{i}$ is set so as to make $Perp_{i}$ a user-specified value (typically between $5$ and $50$ ). The similarity between the low-dimensional data points $y_{i}$ and $y_{j}$ is defined by the following equation using Student t-distribution with one degree of freedom [1].

	$\displaystyle q_{ij}$	$\displaystyle=$	$\displaystyle\frac{\left(1+\\|y_{i}-y_{j}\\|^{2}\right)^{-1}}{\sum_{k}\sum_{l\neq k}\left(1+\\|y_{k}-y_{l}\\|^{2}\right)^{-1}},$
	$\displaystyle q_{ii}$	$\displaystyle=$	$\displaystyle 0.$

The t-SNE determines a low-dimensional point $y_{i}$ corresponding to a datapoint $x_{i}$ by iteratively minimizing the cost function $C(\{y_{i}\})$ defined as the Kullback-Leibler (KL) divergence between a joint probability distribution in the high- and low- dimensional data

\displaystyle C(\{y_{i}\})=\sum_{i}\sum_{j}p_{ij}\log{\frac{p_{ij}}{q_{ij}}}.

(3)

The gradient of the cost function for $y_{i}$ is described as

\displaystyle\frac{\partial C}{\partial y_{i}}=4\sum_{j}(p_{ij}-q_{ij})(y_{i}-y_{j})(1+\|y_{i}-y_{j}\|^{2})^{-1}.

(4)

In optimizing the cost function, each $y_{i}$ is initially placed in a random position and moved to minimize the cost function.

II.2 Parametric t-SNE

Parametric t-SNE [6] is a variant of t-SNE which discards the direct optimization of the low-dimensional points $\{y_{i}\}$ but uses neural network to map $x_{i}$ to $y_{i}$ . In this method, each low-dimensional point $y_{i}$ is generated by $y_{i}=f(x_{i}|\theta)$ , where $f(x_{i}|\theta)$ is an output of the neural network with an input $x_{i}$ and network weight $\theta$ . We optimize $\theta$ to minimize the cost function Eq. (3). An important distinction between the t-SNE and the parametric t-SNE is that the latter can easily generate a low-dimensional point for a new input as we explicitly construct the mapping from $x_{i}$ to $y_{i}$ .

II.3 Application of t-SNE for quantum systems

Recently, t-SNE has also been applied to the field of quantum physics by Yuan et al. [3]. They considered the visualization of quantum phase transitions by applying t-SNE to (approximate) ground states $\ket{\psi_{i}}$ of certain Hamiltonians $H_{i}$ . Their approach is to use,

\displaystyle p_{j|i}=\frac{\exp{\left(-d(\psi_{i},\psi_{j})^{2}/2\sigma_{i}^{2}\right)}}{\sum_{k\neq i}\exp{\left(-d(\psi_{i},\psi_{k})^{2}/2\sigma_{i}^{2}\right)}},

(5)

instead of Eq. (2), where the distance $d(\psi_{i},\psi_{j})$ is defined by the negative logarithmic fidelity

\displaystyle d(\psi_{i},\psi_{j})=-\log\left(\left|\braket{\psi_{i}}{\psi_{j}}\right|\right).

(6)

It has been shown that this approach can successfully visualize and identify quantum phase transitions of one-dimensional spin chains. In Ref. [3], they also considered the visualization of classical data by using the so-called “quantum feature map” via t-SNE. We share the idea with Ref. [3] in the sense of visualization of quantum states via t-SNE. However, our approach differs essentially from Ref. [3] in that a parameterized quantum model is employed as low dimensional data of parameteric t-SNE.

III Parametric t-SNE with quantum circuits

In this work, we propose using quantum circuits to construct the parametric model $f(x|\theta)$ to generate low-dimensional data. The procedure of our proposed method for visualizing classical data is shown in Fig. 1 (a). More concretely, using a parameterized unitary circuit $U(x,\theta)$ that depends on both of a $d$ -dimensional input $x$ and trainable parameters $\theta$ , we generate a $d^{\prime}$ -dimensional datapoint $y$ by expectation values of $d^{\prime}$ observables $\{O_{\mu}\}_{i=1}^{d^{\prime}}$ . Then, we minimize the cost function defined in Eq. (3) by optimizing the parameters $\theta$ . The concrete algorithm for classical inputs is as follows:

1.

Compute $p_{ij}$ for all pairs from $\{x_{i}\}_{i=1}^{N}$ .
2.

For all $\{x_{i}\}_{i=1}^{N}$ and $\{O_{\mu}\}_{\mu=1}^{d^{\prime}}$ , evaluate $y_{\mu}(x_{i},\theta)=\bra{0}U^{\dagger}(x_{i},\theta)O_{\mu}U(x_{i},\theta)\ket{0}$ using a quantum computer.
3.

Compute $q_{ij}$ for all pairs from $y_{\mu}(x_{i},\theta)$ on a classical computer.
4.

Compute $C(\{y_{\mu}(x_{i},\theta)\}_{i=1}^{N})$ on a classical computer.
5.

Update $\theta$ so that $C$ is minimized.

After the convergence, we expect the above protocol to find a quantum circuit $U(x,\theta)$ which can map $x_{i}$ to a low-dimensional space while preserving the similarity based on distances among the data.

An interesting extension of the above protocol is to take a quantum dataset consisting of quantum states $\{\ket{\psi_{i}}\}_{i=1}^{N}$ as the input data. We depict the protocol in Fig. 1 (b). Here we can think of several ways to define the similarity of quantum data in a high dimensional space. One possible way is to generate high-dimensional classical data by measuring a set of observables $\{P_{\nu}\}_{\nu=1}^{d}$ . Then their expectation values

\displaystyle\{\langle P_{\nu}\rangle\}_{\nu=1}^{d}

(7)

are used as the high-dimensional data. Another possibility is to use a distance function of two quantum states defined via fidelity

\displaystyle d(|\psi_{i}\rangle,|\psi_{k}\rangle)=\sqrt{1-\left|\braket{\psi_{i}}{\psi_{k}}\right|^{2}}.

(8)

Then similarity of two quantum states are defined as follows:

\displaystyle p_{j|i}=\frac{\exp{\left(-d(|\psi_{i}\rangle,|\psi_{j}\rangle)^{2}/2\sigma_{i}^{2}\right)}}{\sum_{k\neq i}\exp{\left(-d(|\psi_{i}\rangle,|\psi_{k}\rangle)^{2}/2\sigma_{i}^{2}\right)}}.

(9)

This choice enables us to readily calculate $d(\psi_{i},\psi_{k})$ on a quantum computer by using standard techniques for overlap measurement such as swap test [8].

On the other hand, for the low-dimensional space, we can use $U(\theta)$ that only depends on the trainable parameters $\theta$ at the step 2 in the classical input case since we do not have classical input $x$ . For the cost function $C$ , we can use the same formulation as above, that is, we measure $y_{\mu}(\psi_{i},\theta)=\bra{\psi_{i}}U^{\dagger}(\theta)O_{\mu}U(\theta)\ket{\psi_{i}}$ for a certain set of observables $\{O_{\mu}\}$ and compute $C(\{y_{\mu}(\psi_{i},\theta)\}_{i=1}^{N})$ .

Note that if we employ quantum feature map,

\displaystyle|\psi_{i}\rangle=U_{\rm in}(x_{i})|0\rangle,

(10)

the similarity of classical input data $\{x_{i}\}$ in high dimensional space can also be calculated via the fidelities of the quantum states.

The possible advantage of the proposed method solely depends on the fact that there are quantum circuits that are hard to simulate classically [9, 10, 11]. This means that we may be able to construct the map $f(x|\theta)$ which cannot be expressed by neural networks. In fact, it is proven that, when using a very specific dataset, there is a machine learning task with rigorous quantum advantage [5]. However, usefulness of quantum circuits for modeling a practical classical data are, in general, still in question. Since one of our approaches assumes that the high-dimensional data is quantum data, we expect that the quantum model has an advantage in reproducing its similarity in low dimensions. We leave such a research direction as a important future direction to explore.

Refer to caption — Figure 1: Figures (a) and (b) show the procedure for visualizing classical and quantum data, respectively. We visualize classical and quantum data by mapping the high-dimensional data to the low-dimensional one. The mapping is performed by the parametrized quantum circuit trained by minimizing a cost function so as to maintain the similarity of surrounding points. The similarity of high-dimensional data is defined by either expectation values, fidelities, or classical data. The similarity of low-dimensional data is determined by expectation values, and we plot them in a two-dimensional plane.

IV Numerical Experiments

Here, we perform the numerical simulations of the proposed method. First, let us describe the tools used in the experiments. We use qulacs [12] to simulate a quantum circuit. We make a python wrapper to work with PyTorch[13] to use the loss function and optimizer. Our optimization is performed by Sharpness-Aware Minimization (SAM) [14, 15] with Adam[16] as the base optimizer. The overview of SAM is to consider the cost added the l2-regularization term, find the gradient at the point where that cost is the highest in the neighborhood, and descend from the current point according to that gradient.

IV.1 Visualizing Classical Data

IV.1.1 Application of parametric t-SNE with quantum circuits

We visualize the Iris flower dataset [7] with our proposed method. The dataset contains three classes and consists of four features. We normalize each feature between $-1$ and $+1$ . The construction of our ansatz $U(x,\theta)$ and observables $\{O_{\mu}\}$ for the low dimensional space are as follows. Let us define $R_{y}(\theta)=e^{i\theta\hat{Y}/2}$ and

\displaystyle U_{\mathrm{in}}(x_{i})=R_{y}(x_{i,1})\otimes R_{y}(x_{i,2})\otimes\ldots\otimes R_{y}(x_{i,d}),

where $x_{i,j}$ denotes the $j$ th element of $i$ th data $x_{i}$ . Also, let us define $U_{1}$ and $U_{2}$ as shown in Fig. 2. We construct $U(x,\theta)$ as,

U(x,\theta)=\prod_{k=0}^{L-1}\left[U_{2}(\theta_{9+16k:16+16k})U_{1}(\theta_{1+16k:8+16k})\right]U_{\mathrm{in}}(x),

(11)

where $\theta_{i:j}$ denotes $(j-i+1)$ dimensional vector containing $i$ th to $j$ th elements of $\theta$ . Specifically, we set the circuit depth $L=4$ . As for the output, we set $d^{\prime}=2$ and $\{O_{\mu}\}=\{X_{2},X_{3}\}$ to visualize the Iris dataset on two-dimensional plane.

We perform the simulation under the above settings and plot the results in Fig. 3. This figure shows that the data is clustered for each type of Iris flower. The left figure (a) shows the visualization by the classical machine learning method of t-SNE, the middle figure (b) by the proposed method with similarity based on Euclidean distance. In both two cases, the data are clustered by Iris flower varieties.

IV.1.2 Visualization with infidelity distance measure

Next, we show that the proposed method can be correctly visualized by calculating the similarity of quantum states. To this end, we will use, as a test case, a set of quantum states generated by a quantum feature map from the Iris dataset, before running the method on a truly physically meaningful quantum state. We perform the parametric t-SNE using quantum features $\ket{\psi_{i}}$ defined as

\ket{\psi_{i}}=U_{\mathrm{in}}(x_{i})\ket{0},

(12)

and the cost function associated with the distance measure defined in Eq. (8). We show the result in Fig. 3 (c), which implies that this protocol also work.

IV.2 Visualization of Quantum Data

IV.2.1 Visualization based on Observables

In this section, we perform the visualization of quantum states. As an example, let us consider the time-dependent two-body transverse field Ising model, which is employed in quantum annealing [17]. We prepare quantum states time-evolved under the following Hamiltonian.

\displaystyle H=(1-t/\tau)\sum_{i}h_{i}X_{i}+(t/\tau)\sum_{i<j}J_{ij}Z_{i}Z_{j},

(13)

where $\tau$ denotes the total simulation time of the Hamiltonian. We perform the simulation under the Trotter decomposition[18, 19]. We set the number of qubits as four, $\tau=40$ , time step $\Delta t=0.01$ , $h_{i}=1$ , and $J_{ij}$ as uniform random number between $-1$ and $0$ or $0$ and $+1$ . We prepare $100$ samples each with positive or negative $J_{ij}$ .

As explained before, we consider the two methods to compute similarity of the quantum states in high-dimensional space. The first method is to consider the expectation values of input quantum states as high-dimensional data

\displaystyle x_{i}=(\langle\psi_{i,\mathrm{in}}|X_{1}|\psi_{i,\mathrm{in}}\rangle,\langle\psi_{i,\mathrm{in}}|X_{2}|\psi_{i,\mathrm{in}}\rangle,\ldots,\langle\psi_{i,\mathrm{in}}|X_{n}|\psi_{i,\mathrm{in}}\rangle),

and calculate the similarity between the high-dimensional data points as we explained in Sec. II.1. The low-dimensional data $\{y_{i}\}$ is defined similarly by the following equation using a certain constant value $a$ :

\displaystyle y_{i}=a(\langle\psi_{i,\mathrm{out}}|X_{2}|\psi_{i,\mathrm{out}}\rangle,\langle\psi_{i,\mathrm{out}}|X_{3}|\psi_{i,\mathrm{out}}\rangle).

(14)

The constant value $a$ is a hyper-parameter to adjust the scale of low-dimensional data. We set the parameters $a=1$ and circuit depth $L=8$ . In this numerical experiment, we visualize the quantum states for every $1000$ trotter steps. The result is shown in Fig. 4. In this figure, we can see that two clusters corresponding to the sign of the coupling constants are formed.

IV.2.2 Visualization with infidelity metric

The second method is to calculate the similarity of the data in high-dimensional space via infidelity of two different quantum states. The similarity between low-dimensional data is calculated in the same way as the previous case, except for the setting of hyper-parameter $a=10$ . Fig. 5 shows the process of the dynamics for every $1000$ trotter steps. This figure also shows the two clusters corresponding to the sign of the coupling constants.

IV.2.3 The effect of multiplying a constant value

When we visualize the quantum data, we sometimes get undesirable visualization figures, which can be avoided by appropriately tuning the hyperparameter $a$ . For example, we plot three examples of wrong choices of $a$ in Fig. 6. In Fig. 6 (a), the data is arranged in an almost straight line, despite the two-dimensional plot. In figures (b) and (c), it is impossible to determine that these data belong to different classes without the colors in the plots, since each class is colored to make the figures easier to understand. One possible reason is that the cost function optimization is not sufficient to converge to a local minimum.

We examine whether the optimization converges near a local minimum. To this end, we visualize the loss contours and optimization trajectories in two dimensional plane. In Ref. [20], they develop a method to visualize loss landscape and optimization trajectories with the loss contours. They use the visualization method to investigate the relationship between loss landscape and trainability or generalization for NNs. Before we describe the detail method, let us define $\theta_{i}$ as a vector of trainable parameters at $i$ th epoch. We consider the matrix $M$ consisting of $(\theta_{i}-\theta_{n})$ for all $i$ , where the $n$ is the last epoch. Specifically, the $M$ is written by

\displaystyle M=[\theta_{0}-\theta_{n};\theta_{1}-\theta_{n};\ldots;\theta_{n-1}-\theta_{n}].

(15)

The matrix $M$ is transformed by principal component analysis [21], and the $1$ st and $2$ nd principal component vectors are used as the axis of the visualization. Along the two principal component vectors, the loss contours and optimization trajectories are plotted on two-dimensional plane. We use this visualization method to tune the hyperparameter $a$ and to confirm our optimization works appropriately.

We show the visualization of the loss landscapes in Fig. 6 (d), (e), and (f). From these figures, the optimization is sufficiently done to reach the local minima. Despite the cost function being well optimized, we have not been able to reflect enough features on the low-dimensional data to discriminate different clusters. This is attributed to the wrong design of the cost function, where the similarity is not appropriately defined. For example, in Fig. 6 (a), the parameter $a$ for adjusting the scale is multiplied by $1$ instead of by $10$ in Fig. 5 (a). In Fig. 6 (b) and (c), both high-dimensional and low-dimensional data are multiplied by $10$ in similarity calculation, instead of by $1$ in Fig. 4 (c) and (d). These results suggests that when calculating similarity it is necessary to adjust the scale of the high- and low- dimensional data so that the cost function adequately reflects the characteristics of the given data.

V Conclusion and Future works

In this work, we have visualized classical and quantum data by quantum neural network. Specifically, we employed a parameterized quantum circuit as a quantum model to generate low-dimensional data, which is trained so that the similarity of the high-dimensional data is maintained. In the quantum case, the similarity of the quantum states is calculated in two ways: measuring multiple observables to obtain high-dimensional classical data from quantum states, and calculating distance of two quantum states directly from fidelity.

We performed numerical simulation of two-dimensional visualization of Iris dataset as the case of the classical inputs and quantum states evolved under Hamiltonian dynamics as the case of the quantum inputs. It was found that the proposed method worked well for both classical and quantum inputs, with appropriate low-dimensional visualization. Specifically, in the case of quantum data, it is difficult to visualize the data well unless the constant factor is multiplied on the similarity of the low-dimensional data. This is probably because the similarity between the quantum states takes too small value. While we treated this constant factor as a hyperparameter, it can be trainable parameters as $a_{i}$ for each low-dimensional data $y_{i}$ for further improvement. At least, by doing so, the performance of non-parametric t-SNE can be guaranteed in the proposed model with a two-qubit trivial circuit. This implies that the proposed method is expected to work on a relatively small quantum device. For more complex data visualization, we can improve the representation capability of quantum circuits.

One of the other possibilities using our proposed method is to compress quantum data by defining the low-dimensional data by quantum states and the similarity by a fidelity-based metric. While real quantum data, such as a set of outputs from a quantum algorithm, live in a large Hilbert space, such quantum data can be mapped into a smaller Hilbert space with a fewer number of qubits with keeping their similarity. Then quantum machine learning algorithms can be further applied on such a compressed quantum data set. Further investigation in this direction is an intriguing future issue. The other future task is to construct a model including a decoder, which decodes data from compressed data, such as Variational AutoEncoder [22] in classical machine learning. It may be possible to create a quantum state with certain desired properties from the classical data in the middle layer.

Acknowledgements.

K.M. is supported by JST PRESTO Grant No. JPMJPR2019 and JSPS KAKENHI Grant No. 20K22330. K.F. is supported by JST ERATO Grant No. JPMJER1601 and JST CREST Grant No. JPMJCR1673. This work is supported by MEXT Quantum Leap Flagship Program (MEXTQLEAP) Grant No. JPMXS0118067394 and JPMXS0120319794. We also acknowledge support from JSTCOI-NEXT program.

References

Van der Maaten and Hinton [2008] L. Van der Maaten and G. Hinton, Visualizing data using t-sne., Journal of machine learning research 9 (2008).
Ch’ng et al. [2018] K. Ch’ng, N. Vazquez, and E. Khatami, Unsupervised machine learning account of magnetic transitions in the hubbard model, Phys. Rev. E 97, 013306 (2018).
Yang et al. [2021] Y. Yang, Z.-Z. Sun, S.-J. Ran, and G. Su, Visualizing quantum phases and identifying quantum phase transitions by nonlinear dimensional reduction, Phys. Rev. B 103, 075106 (2021).
N. Okada et al. [2022] K. N. Okada, K. Osaki, K. Mitarai, and K. Fujii, Identification of topological phases using classically-optimized variational quantum eigensolver, arXiv preprint arXiv:2202.02909 (2022).
Liu et al. [2021] Y. Liu, S. Arunachalam, and K. Temme, A rigorous and robust quantum speed-up in supervised machine learning, Nature Physics 17, 1013–1017 (2021).
Van Der Maaten [2009] L. Van Der Maaten, Learning a parametric embedding by preserving local structure, in Artificial Intelligence and Statistics (PMLR, 2009) pp. 384–391.
Dua and Graff [2017] D. Dua and C. Graff, UCI machine learning repository (2017).
Buhrman et al. [2001] H. Buhrman, R. Cleve, J. Watrous, and R. de Wolf, Quantum fingerprinting, Phys. Rev. Lett. 87, 167902 (2001).
Arute et al. [2019] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A. Buell, et al., Quantum supremacy using a programmable superconducting processor, Nature 574, 505 (2019).
Aaronson and Arkhipov [2010] S. Aaronson and A. Arkhipov, The computational complexity of linear optics (2010), arXiv:1011.3245 [quant-ph] .
Bremner et al. [2011] M. J. Bremner, R. Jozsa, and D. J. Shepherd, Classical simulation of commuting quantum computations implies collapse of the polynomial hierarchy, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 467, 459 (2011), https://royalsocietypublishing.org/doi/pdf/10.1098/rspa.2010.0301 .
Suzuki et al. [2021] Y. Suzuki, Y. Kawase, Y. Masumura, Y. Hiraga, M. Nakadai, J. Chen, K. M. Nakanishi, K. Mitarai, R. Imai, S. Tamiya, et al., Qulacs: a fast and versatile quantum circuit simulator for research purpose, Quantum 5, 559 (2021).
Paszke et al. [2019] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, Pytorch: An imperative style, high-performance deep learning library, in Advances in Neural Information Processing Systems 32, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Curran Associates, Inc., 2019) pp. 8024–8035.
Foret et al. [2020] P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, Sharpness-aware minimization for efficiently improving generalization, arXiv preprint arXiv:2010.01412 (2020).
davda54 [2020] davda54, unofficial implementation of sharpness-aware minimization algorithm for pytorch (2020).
Kingma and Ba [2014] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).
Kadowaki and Nishimori [1998] T. Kadowaki and H. Nishimori, Quantum annealing in the transverse ising model, Physical Review E 58, 5355 (1998).
Trotter [1959] H. F. Trotter, On the product of semi-groups of operators, Proceedings of the American Mathematical Society 10, 545 (1959).
Suzuki [1976] M. Suzuki, Generalized trotter’s formula and systematic approximants of exponential operators and inner derivations with applications to many-body problems, Communications in Mathematical Physics 51, 183 (1976).
Li et al. [2017] H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, Visualizing the loss landscape of neural nets, arXiv preprint arXiv:1712.09913 (2017).
Pearson [1901] K. Pearson, Liii. on lines and planes of closest fit to systems of points in space, The London, Edinburgh, and Dublin philosophical magazine and journal of science 2, 559 (1901).
Kingma and Welling [2013] D. P. Kingma and M. Welling, Auto-encoding variational bayes, arXiv preprint arXiv:1312.6114 (2013).