Quantum inspired K-means algorithm using matrix product states
Abstract
Matrix product state has become the algorithm of choice when studying one-dimensional interacting quantum many-body systems, which demonstrates to be able to explore the most relevant portion of the exponentially large quantum Hilbert space and find accurate solutions. Here we propose a quantum inspired K-means clustering algorithm which first maps the classical data into quantum states represented as matrix product states, and then minimize the loss function using the variational matrix product states method in the enlarged space. We demonstrate the performance of this algorithm by applying it to several commonly used machine learning datasets and show that this algorithm could reach higher prediction accuracies and that it is less likely to be trapped in local minima compared to the classical K-means algorithm.
I Introduction
The past few years have witnessed a growing interest in the intersection between quantum physics and machine learning. On the one side, machine learning tools have been used to solve quantum problems, such as phase recognition Carrasquilla and Melko (2017); Zhang et al. (2018); Ch’ng et al. (2018); Van Nieuwenburg et al. (2017); Broecker et al. (2017); Ch’Ng et al. (2017); Zhang and Kim (2017); Dong et al. (2019), quantum state tomography Torlai et al. (2018); Torlai and Melko (2018); Rocchetto et al. (2018); Quek et al. (2018); Carrasquilla et al. (2019), solving quantum many-body problems Arsenault et al. (2014, 2015); Torlai and Melko (2016); Amin et al. (2018); Liu et al. (2017); Huang and Wang (2017); Aoki and Kobayashi (2016); Carleo and Troyer (2017); Nomura et al. (2017); Czischek et al. (2018), non-Markovian quantum dynamics Luchnikov et al. (2020). Connections have been drawn between quantum many-body algorithms and neural network algorithms Bény (2013); Mehta and Schwab (2014); Lin et al. (2017); Deng et al. (2017). On the other side, tools from quantum many-body physics, especially tensor network states algorithms, are used to solve classical machine learning problems. Tensor network states algorithms have been extensively used to study quantum many-body systems in the last two decades. In particular, Matrix Product States (MPS), which is a one-dimensional variant of TNS, has achieved great success and become a standard numerical tool to solve one-dimensional strongly correlated quantum many-body systems due to its high efficiency and accuracy Schollwöck (2011). Applications of MPS to solve machine learning tasks include classification problems Stoudenmire and Schwab (2016); Stoudenmire (2018); Sun et al. (2020), generative modeling Han et al. (2018), sequence to sequence modeling Guo et al. (2018), where it is shown that MPS based algorithm could achieve a learning precision close to or even better than state of the art machine learning algorithms. Tensor network states based machine learning algorithms has also been applied for quantum process tomographyGuo et al. (2020); Torlai et al. (2020).
In this work, we apply matrix product states to the clustering task, which is an elementary machine learning task to separate unlabeled data into distinct and non-overlapping clusters. A standard algorithm for clustering is the K-means algorithm, which divides the data into different classes by minimizing the variance of the data in each class MacQueen et al. (1967). The most popular K-means clustering algorithm, Lloyd’s algorithm, starts from random centroids, and then iteratively dividing the data into classes according to their distances with the centroids, and recomputing the center within each cluster Lloyd (1982). This algorithm becomes a standard algorithm for clustering due to its simplicity and efficiency, and we will refer to it as the classical K-means algorithm in the rest of this work. However, it is known that in some cases this algorithm could result in bad clustering with high probability, even for randomized initialization. Various efforts have been paid to improve the accuracy of Lloyd’s algorithm, such as a clever way of initialization Arthur and Vassilvitskii (2006); Ostrovsky et al. (2013); Celebi et al. (2013); Bachem et al. (2016), or a smoother objective loss function Zhang et al. (1999); Xu and Lange (2019); De Amorim and Mirkin (2012); Chakraborty and Das (2017).
Similar to other MPS-based machine learning algorithms, the basic idea of this work is to first map the classical data into quantum states which live in an exponentially large Hilbert space, and then explore this enlarged space in a numerically efficient way and find the optimal solution. In the context of clustering problem, the hope is that by exploring a much larger space, the algorithm is less likely to result in bad clustering. This paper is organized as follows. In Sec. II, we describe our method in detail. In Sec. III, we apply our method to several machine learning data sets, showing that our method could result in a high learning accuracy and is less likely to be trapped in local minimum. We conclude in Sec.IV.
II Method
II.1 Classical K-means algorithm
Before we introduce our method, we first briefly review the standard classical K-means algorithm Lloyd (1982). For input vectors labeled as (), the classical K-means algorithm works as follows: 1) initializing random vectors () as the initial centroids. 2) Dividing all the data into classes according their distances to the centroids, denoted as . Concretely, for each data vector , one computes
(1) |
where is the standard -norm of a vector . Then is assigned to the -th cluster with . After this step, the data vectors are divided into non-overlapping clusters. 3) Computing the new center of each class by minimizing its variances independently, namely by minimizing the following loss function
(2) |
where denote the size of the -th cluster, and denotes the -th vector inside the -th cluster. By repeating steps 2) and 3), one could often quickly converge to a local minimum. In the prediction stage, given a new input vector , one simply computes each of the distance between and the centroid , and then is assigned to the -th cluster with .
II.2 Quantum K-means algorithm
The above K-means algorithm could be straightforwardly converted into a quantum machine learning algorithm, which we describe as follows and will be referred to as the quantum K-means algorithm. In this case we assume that there is a list of input quantum states labeled as . The centers are denoted as and are initialized using parametric quantum circuits Mitarai et al. (2018); Liu et al. (2020)
(3) |
where denotes the -th parametric quantum circuit with an array of parameters denoted by . The distance between and can be defined as
(4) |
where the second term on the right hand side of the above equation can be efficiently computed with a quantum computer using the SWAP test technique Buhrman et al. (2001). Similarly, the loss function for the -th cluster can be defined as
(5) |
We note that the gradient of can also be computed with a quantum computer using Mitarai et al. (2018)
(6) |
where means to shift the -th parameters in by . Implementing SWAP test on current quantum computers is not an easy task since it requires the three-qubit TOFFILI gate. In this work we will not go any further on this quantum machine learning algorithm but will instead focus on the quantum inspired classical algorithm as follows.
II.3 Quantum inspired K-means algorithm
Now we will describe in detail our quantum inspired K-means algorithm. Similar to the quantum K-means algorithm, the inputs will be assumed to be a list of quantum states. However, here we also assume that each quantum state could be parameterized using an MPS
(7) |
where is the number of qubits required by the corresponding quantum state . The size of an MPS is characterize by an integer , referred to as the bond dimension, defined as
(8) |
We note that for a vector with elements , one could map it into a separable quantum state such that each element is mapped into a single-qubit state . This would correspond to a separable MPS of with elements
(9) |
The centroids are then assumed to be MPSs with a fixed bond dimension , and written as
(10) |
The distance between two MPSs is defined as
(11) |
and similarly, the loss function for the -th cluster can be defined as
(12) |
with the normalization condition . The above cost function can be minimized by setting the gradient against each tensor to , namely
(13) |
The above equation can be simply reduced to a set of algebraic equations. To see this, we first define
(14) | ||||
(15) | ||||
(16) | ||||
(17) |
with . With these equations, Eq. 13 can be written as
(18) |
where we have neglected the label for the cluster since each cluster can be minimized independently. Interestingly, if we keep in a mixed-canonical form Schollwöck (2011), then and , such that the above equation reduces to
(19) |
III Applications
Name | Points | Dimensions | Class |
---|---|---|---|
Breast | 683 | 9 | 2 |
Ionosphere | 351 | 34 | 2 |
Wine | 178 | 13 | 3 |
Yeast | 1484 | 8 | 10 |
E-coli | 336 | 7 | 8 |
To demonstrate our method, we apply it to several commonly used machine learning datasets for clustering tasks as shown in TABLE 1. For each dataset, we randomly pick of the data as the training data, and the remaining as the testing data. To ensure a fair comparison, we will always use the same splitting of training and testing data for the classical K-means algorithm and our quantum inspired K-means algorithm. Since the K-means algorithm is based on the distance measurement, if the difference between variables of different dimensions is too large, it may cause a small number of variables to exert an excessively high influence on the whole cost, thus eliminating the effect of the rest variables. To avoid this effect, we divide each dimension of the data by the its largest value in the training set, namely
(20) |
with , where is the total number of training data. As a result, we have for all and . In the prediction stage, we also divide each dimension of the testing data by . We note that this procedure may produce elements larger than in the testing set, which may not be well distinguished by our quantum inspired algorithm since our encoding in Eq. 9 is a periodic function of period . However, our numerical results show that we could still get high precision results using our method despite those elements.
In the following, we show the comparison between our quantum inspired K-means algorithm with the classical K-means algorithm in terms learning accuracy and the likelihood of getting trapped in local minima. In TABLE. 2, we show the accuracies for different datasets, which is defined as the number of the correctly predicted data divided by the total number of testing data. Since it is possible for both algorithms to be trapped in local minima, we run the same simulation for each algorithm for times with randomly initialized centroids and pick the one with the highest accuracy. For our quantum inspired K-means algorithm, we also tune the bond dimension to be and to see the effect of different bond dimensions. We can see from TABLE. 2 that with , our quantum inspired algorithm already performs as good as or better than the classical algorithm for all the cases. And for , we get even higher accuracies.
Classic k-means | Quantum inspired k-means | ||
D = 8 | D = 15 | ||
Breast | 99.27 | 100 | 100 |
Ionosphere | 60.56 | 60.56 | 63.38 |
Wine | 100 | 100 | 100 |
Yeast | 45.45 | 47.47 | 48.48 |
E-coli | 85.29 | 89.71 | none |
A well-known drawback of the classical K-means algorithm is that it could easily be trapped in local minima. Here we also check the likelihood of our algorithm to be trapped in local minima. For this purpose, we take the Wine and Breast dataset as examples and we run the same simulation for randomly initialized centroids for each algorithm. We show the distribution of the number of cases against different accuracies as a histogram in Fig. 1. We can see that for our quantum inspired K-means algorithm, there is a higher probability to get a higher accuracy. For example, for the Wine dataset, there are cases out of with accuracies beyond for the classical K-means algorithm, while for quantum inspired K-means algorithm with a bond dimension there are cases, and for there are cases.

Another way to access the quality of the learned centers is to compute the Euclidean distances between the centroids in both cases. It is already shown in Sun et al. (2020) that the classical to quantum encoding in Eq.9 could result in better separated input data, thus when appropriate learning algorithms are used, it is more likely to get higher prediction accuracy. In the classical case, The Euclidean distance between two different centers and is
(21) |
where means the -norm of the vector . Similarly, in the quantum inspired case, the distance between two different centers and is
(22) |
Since lives in a much larger space than , it is possible that the centroids are more likely orthogonal to each other in the quantum inspired case, that is, the matrix formed by is closer to a diagonal matrix. As a result, it is easier to label a new data in the correct category. To show this clearly, we directly show the distance matrices formed by and based on the optimized centroids for the E-coil dataset in Fig.2. We can see clearly that in the classical case the centroids have large overlaps between one another while in the quantum inspired case, the centers are almost orthogonal to each other.

Lastly, we compare the convergence rate towards minima for different algorithms. In Fig.3, we show the loss values as a function of the minimization steps. In the classical case, the loss values are computed after each K-means iteration which is shown in Fig.3(a). Each K-means iteration is counted as one minimization step in this case. In the quantum inspired case, in each K-means iteration, we use a single variation MPS sweep in which the local on-site energy is first minimized from the left boundary to the right boundary and then back from the right boundary to the left boundary Schollwöck (2011). Each local minimization is counted as a minimization step in the quantum inspired case. In Fig.3(b), we show the loss values in this case. We have taken the initial loss value as in Fig. 3 and the rest loss values are renormalized against the initial value. We can see that the classical K-means algorithm takes minimization steps ( K-means iterations) to reach a final loss value , while the quantum inspired K-means algorithm uses minimization steps ( K-means iterations) to reach a final loss value .

IV Conclusion
In this paper, we propose a quantum inspired K-means clustering algorithm based on matrix product states. By mapping the input data into a much larger quantum Hilbert space and clustering data in the enlarged space, we show that the learning algorithm could result in a much higher precision. We demonstrate our algorithm by applying it to several commonly used machine learning datasets. Our results show that our algorithm is advantageous to the classical K-means algorithm in that 1) our algorithm could reach a higher prediction accuracy and 2) our algorithm is less likely to be trapped in local minima. We show that compared to the classical K-means algorithm, the learned centroids with our algorithm are better separately, which could be a reason why it could make more precise predictions. We also show the loss values as a function of the number of minimizations steps to help to better visualize the convergence in both algorithms.
To this end, we point out that there are at least two directions which could be inspired from this work: 1) the classical K-means algorithm is well studied and there exists various techniques to improve the learning accuracy on top of the standard K-means algorithm. In future works those techniques could also be adapted into our quantum inspired algorithm to further increase the learning accuracy; 2) As discussed in Sec.II, our quantum inspired algorithm has a one-to-one correspondence with a pure quantum machine learning algorithm which could be readily be executed on a quantum computer, with the most significant difference that in the quantum inspired algorithm the quantum state is represented using matrix product states which could be efficiently stored on a classical computer. In the near future it would be interesting to carry out the quantum K-means algorithm on a quantum computer with applications to real world clustering problems. Other than that, one could also explore the two-site variant of the quantum inspired algorithm such that the bond dimension could be dynamically adjusted. Moreover, in this work we use variational-MPS sweeps where we minimized the local energy on each site iteratively, it could also be interesting to compare this approach with the gradient-based minimization methods.
V Acknowledgement
Y. Shang thanks the support of National Key Research and Development Program of China under grant 2016YFB1000902, National Research Foundation of China (Grant No. 61872352, 61472412), and Program for Creative Research Group of National Natural Science Foundation of China (Grant No. 61621003); C. Guo. acknowledges support from National Natural Science Foundation of China under Grants No. 11805279.
References
- Carrasquilla and Melko (2017) J. Carrasquilla and R. G. Melko, Nature Physics 13, 431 (2017).
- Zhang et al. (2018) P. Zhang, H. Shen, and H. Zhai, Physical review letters 120, 066401 (2018).
- Ch’ng et al. (2018) K. Ch’ng, N. Vazquez, and E. Khatami, Physical Review E 97, 013306 (2018).
- Van Nieuwenburg et al. (2017) E. P. Van Nieuwenburg, Y.-H. Liu, and S. D. Huber, Nature Physics 13, 435 (2017).
- Broecker et al. (2017) P. Broecker, J. Carrasquilla, R. G. Melko, and S. Trebst, Scientific reports 7, 1 (2017).
- Ch’Ng et al. (2017) K. Ch’Ng, J. Carrasquilla, R. G. Melko, and E. Khatami, Physical Review X 7, 031038 (2017).
- Zhang and Kim (2017) Y. Zhang and E.-A. Kim, Physical review letters 118, 216401 (2017).
- Dong et al. (2019) X.-Y. Dong, F. Pollmann, X.-F. Zhang, et al., Physical Review B 99, 121104 (2019).
- Torlai et al. (2018) G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko, and G. Carleo, Nature Physics 14, 447 (2018).
- Torlai and Melko (2018) G. Torlai and R. G. Melko, Physical review letters 120, 240503 (2018).
- Rocchetto et al. (2018) A. Rocchetto, E. Grant, S. Strelchuk, G. Carleo, and S. Severini, npj Quantum Information 4, 28 (2018).
- Quek et al. (2018) Y. Quek, S. Fort, and H. K. Ng, arXiv preprint arXiv:1812.06693 (2018).
- Carrasquilla et al. (2019) J. Carrasquilla, G. Torlai, R. G. Melko, and L. Aolita, Nature Machine Intelligence 1, 155 (2019).
- Arsenault et al. (2014) L.-F. Arsenault, A. Lopez-Bezanilla, O. A. von Lilienfeld, and A. J. Millis, Physical Review B 90, 155136 (2014).
- Arsenault et al. (2015) L.-F. Arsenault, O. A. von Lilienfeld, and A. J. Millis, arXiv preprint arXiv:1506.08858 (2015).
- Torlai and Melko (2016) G. Torlai and R. G. Melko, Physical Review B 94, 165134 (2016).
- Amin et al. (2018) M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, and R. Melko, Physical Review X 8, 021050 (2018).
- Liu et al. (2017) J. Liu, Y. Qi, Z. Y. Meng, and L. Fu, Physical Review B 95, 041101 (2017).
- Huang and Wang (2017) L. Huang and L. Wang, Physical Review B 95, 035105 (2017).
- Aoki and Kobayashi (2016) K.-I. Aoki and T. Kobayashi, Modern Physics Letters B 30, 1650401 (2016).
- Carleo and Troyer (2017) G. Carleo and M. Troyer, Science 355, 602 (2017).
- Nomura et al. (2017) Y. Nomura, A. S. Darmawan, Y. Yamaji, and M. Imada, Physical Review B 96, 205152 (2017).
- Czischek et al. (2018) S. Czischek, M. Gärttner, and T. Gasenzer, Physical Review B 98, 024311 (2018).
- Luchnikov et al. (2020) I. A. Luchnikov, S. V. Vintskevich, D. A. Grigoriev, and S. N. Filippov, Phys. Rev. Lett. 124, 140502 (2020).
- Bény (2013) C. Bény, arXiv preprint arXiv:1301.3124 (2013).
- Mehta and Schwab (2014) P. Mehta and D. J. Schwab, arXiv preprint arXiv:1410.3831 (2014).
- Lin et al. (2017) H. W. Lin, M. Tegmark, and D. Rolnick, Journal of Statistical Physics 168, 1223 (2017).
- Deng et al. (2017) D.-L. Deng, X. Li, and S. D. Sarma, Physical Review X 7, 021021 (2017).
- Schollwöck (2011) U. Schollwöck, Annals of Physics 326, 96 (2011).
- Stoudenmire and Schwab (2016) E. Stoudenmire and D. J. Schwab, in Advances in Neural Information Processing Systems (2016) pp. 4799–4807.
- Stoudenmire (2018) E. M. Stoudenmire, Quantum Science and Technology 3, 034003 (2018).
- Sun et al. (2020) Z.-Z. Sun, C. Peng, D. Liu, S.-J. Ran, and G. Su, Physical Review B 101, 075135 (2020).
- Han et al. (2018) Z.-Y. Han, J. Wang, H. Fan, L. Wang, and P. Zhang, Physical Review X 8, 031012 (2018).
- Guo et al. (2018) C. Guo, Z. Jie, W. Lu, and D. Poletti, Physical Review E 98, 042114 (2018).
- Guo et al. (2020) C. Guo, K. Modi, and D. Poletti, arXiv preprint arXiv:2004.11038 (2020).
- Torlai et al. (2020) G. Torlai, C. J. Wood, A. Acharya, G. Carleo, J. Carrasquilla, and L. Aolita, arXiv preprint arXiv:2006.02424 (2020).
- MacQueen et al. (1967) J. MacQueen et al., in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1 (Oakland, CA, USA, 1967) pp. 281–297.
- Lloyd (1982) S. Lloyd, IEEE transactions on information theory 28, 129 (1982).
- Arthur and Vassilvitskii (2006) D. Arthur and S. Vassilvitskii, k-means++: The advantages of careful seeding, Tech. Rep. (Stanford, 2006).
- Ostrovsky et al. (2013) R. Ostrovsky, Y. Rabani, L. J. Schulman, and C. Swamy, Journal of the ACM (JACM) 59, 1 (2013).
- Celebi et al. (2013) M. E. Celebi, H. A. Kingravi, and P. A. Vela, Expert systems with applications 40, 200 (2013).
- Bachem et al. (2016) O. Bachem, M. Lucic, H. Hassani, and A. Krause, in Advances in neural information processing systems (2016) pp. 55–63.
- Zhang et al. (1999) B. Zhang, M. Hsu, and U. Dayal, Hewlett-Packard Labs Technical Report HPL-1999-124 55 (1999).
- Xu and Lange (2019) J. Xu and K. Lange, in International Conference on Machine Learning (2019) pp. 6921–6931.
- De Amorim and Mirkin (2012) R. C. De Amorim and B. Mirkin, Pattern Recognition 45, 1061 (2012).
- Chakraborty and Das (2017) S. Chakraborty and S. Das, Pattern Recognition Letters 100, 67 (2017).
- Mitarai et al. (2018) K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, Physical Review A 98, 032309 (2018).
- Liu et al. (2020) Y. Liu, D. Wang, S. Xue, A. Huang, X. Fu, X. Qiang, P. Xu, H.-L. Huang, M. Deng, C. Guo, et al., Physical Review A 101, 052316 (2020).
- Buhrman et al. (2001) H. Buhrman, R. Cleve, J. Watrous, and R. De Wolf, Physical Review Letters 87, 167902 (2001).