This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quantum inspired K-means algorithm using matrix product states

Xiao Shi Institute of Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China    Yun Shang [email protected] Institute of Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China NCMIS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190,China    Chu Guo [email protected] Department of Physics and Synergetic Innovation Center for Quantum Effects and Applications, Hunan Normal University, Changsha 410081, China
Abstract

Matrix product state has become the algorithm of choice when studying one-dimensional interacting quantum many-body systems, which demonstrates to be able to explore the most relevant portion of the exponentially large quantum Hilbert space and find accurate solutions. Here we propose a quantum inspired K-means clustering algorithm which first maps the classical data into quantum states represented as matrix product states, and then minimize the loss function using the variational matrix product states method in the enlarged space. We demonstrate the performance of this algorithm by applying it to several commonly used machine learning datasets and show that this algorithm could reach higher prediction accuracies and that it is less likely to be trapped in local minima compared to the classical K-means algorithm.

I Introduction

The past few years have witnessed a growing interest in the intersection between quantum physics and machine learning. On the one side, machine learning tools have been used to solve quantum problems, such as phase recognition Carrasquilla and Melko (2017); Zhang et al. (2018); Ch’ng et al. (2018); Van Nieuwenburg et al. (2017); Broecker et al. (2017); Ch’Ng et al. (2017); Zhang and Kim (2017); Dong et al. (2019), quantum state tomography Torlai et al. (2018); Torlai and Melko (2018); Rocchetto et al. (2018); Quek et al. (2018); Carrasquilla et al. (2019), solving quantum many-body problems Arsenault et al. (2014, 2015); Torlai and Melko (2016); Amin et al. (2018); Liu et al. (2017); Huang and Wang (2017); Aoki and Kobayashi (2016); Carleo and Troyer (2017); Nomura et al. (2017); Czischek et al. (2018), non-Markovian quantum dynamics Luchnikov et al. (2020). Connections have been drawn between quantum many-body algorithms and neural network algorithms Bény (2013); Mehta and Schwab (2014); Lin et al. (2017); Deng et al. (2017). On the other side, tools from quantum many-body physics, especially tensor network states algorithms, are used to solve classical machine learning problems. Tensor network states algorithms have been extensively used to study quantum many-body systems in the last two decades. In particular, Matrix Product States (MPS), which is a one-dimensional variant of TNS, has achieved great success and become a standard numerical tool to solve one-dimensional strongly correlated quantum many-body systems due to its high efficiency and accuracy Schollwöck (2011). Applications of MPS to solve machine learning tasks include classification problems Stoudenmire and Schwab (2016); Stoudenmire (2018); Sun et al. (2020), generative modeling Han et al. (2018), sequence to sequence modeling Guo et al. (2018), where it is shown that MPS based algorithm could achieve a learning precision close to or even better than state of the art machine learning algorithms. Tensor network states based machine learning algorithms has also been applied for quantum process tomographyGuo et al. (2020); Torlai et al. (2020).

In this work, we apply matrix product states to the clustering task, which is an elementary machine learning task to separate unlabeled data into distinct and non-overlapping clusters. A standard algorithm for clustering is the K-means algorithm, which divides the data into kk different classes by minimizing the variance of the data in each class MacQueen et al. (1967). The most popular K-means clustering algorithm, Lloyd’s algorithm, starts from kk random centroids, and then iteratively dividing the data into kk classes according to their distances with the centroids, and recomputing the center within each cluster Lloyd (1982). This algorithm becomes a standard algorithm for clustering due to its simplicity and efficiency, and we will refer to it as the classical K-means algorithm in the rest of this work. However, it is known that in some cases this algorithm could result in bad clustering with high probability, even for randomized initialization. Various efforts have been paid to improve the accuracy of Lloyd’s algorithm, such as a clever way of initialization Arthur and Vassilvitskii (2006); Ostrovsky et al. (2013); Celebi et al. (2013); Bachem et al. (2016), or a smoother objective loss function Zhang et al. (1999); Xu and Lange (2019); De Amorim and Mirkin (2012); Chakraborty and Das (2017).

Similar to other MPS-based machine learning algorithms, the basic idea of this work is to first map the classical data into quantum states which live in an exponentially large Hilbert space, and then explore this enlarged space in a numerically efficient way and find the optimal solution. In the context of clustering problem, the hope is that by exploring a much larger space, the algorithm is less likely to result in bad clustering. This paper is organized as follows. In Sec. II, we describe our method in detail. In Sec. III, we apply our method to several machine learning data sets, showing that our method could result in a high learning accuracy and is less likely to be trapped in local minimum. We conclude in Sec.IV.

II Method

II.1 Classical K-means algorithm

Before we introduce our method, we first briefly review the standard classical K-means algorithm Lloyd (1982). For NN input vectors labeled as xn\vec{x}_{n} (1nN1\leq n\leq N), the classical K-means algorithm works as follows: 1) initializing kk random vectors 𝒪m\vec{\mathcal{O}}_{m} (1mk1\leq m\leq k) as the initial centroids. 2) Dividing all the data into kk classes according their distances to the centroids, denoted as dCd^{C}. Concretely, for each data vector xn\vec{x}_{n}, one computes

dn,mC=|xn𝒪m|2,\displaystyle d^{C}_{n,m}=|\vec{x}_{n}-\vec{\mathcal{O}}_{m}|^{2}, (1)

where |v|=vj2|\vec{v}|=\sqrt{\sum v_{j}^{2}} is the standard 22-norm of a vector v\vec{v}. Then xn\vec{x}_{n} is assigned to the jj-th cluster with j=argmaxmdn,mCj=\operatornamewithlimits{argmax}_{m}d^{C}_{n,m}. After this step, the NN data vectors are divided into kk non-overlapping clusters. 3) Computing the new center of each class by minimizing its variances independently, namely by minimizing the following loss function

fC(𝒪m)=nj=1Nj|xnjj𝒪m|2=nj=1Njdnj,mC,\displaystyle f^{C}(\vec{\mathcal{O}}_{m})=\sum_{n_{j}=1}^{N_{j}}|\vec{x}^{j}_{n_{j}}-\vec{\mathcal{O}}_{m}|^{2}=\sum_{n_{j}=1}^{N_{j}}d^{C}_{n_{j},m}, (2)

where NjN_{j} denote the size of the jj-th cluster, and xnjj\vec{x}^{j}_{n_{j}} denotes the njn_{j}-th vector inside the jj-th cluster. By repeating steps 2) and 3), one could often quickly converge to a local minimum. In the prediction stage, given a new input vector x\vec{x}, one simply computes each of the distance dC(x,𝒪m)d^{C}(\vec{x},\vec{\mathcal{O}}_{m}) between x\vec{x} and the centroid 𝒪m\vec{\mathcal{O}}_{m}, and then x\vec{x} is assigned to the jj-th cluster with j=argmaxmdC(x,𝒪m)j=\operatornamewithlimits{argmax}_{m}d^{C}(\vec{x},\vec{\mathcal{O}}_{m}).

II.2 Quantum K-means algorithm

The above K-means algorithm could be straightforwardly converted into a quantum machine learning algorithm, which we describe as follows and will be referred to as the quantum K-means algorithm. In this case we assume that there is a list of input quantum states labeled as |xn|x_{n}\rangle. The centers are denoted as |𝒪m|\mathcal{O}_{m}\rangle and are initialized using parametric quantum circuits Mitarai et al. (2018); Liu et al. (2020)

|𝒪m=𝒞m(θ)|0L,\displaystyle|\mathcal{O}_{m}\rangle=\mathcal{C}_{m}(\vec{\theta})|0\rangle^{\otimes L}, (3)

where 𝒞m(θ)\mathcal{C}_{m}(\vec{\theta}) denotes the mm-th parametric quantum circuit with an array of parameters denoted by θ\vec{\theta}. The distance between |xn|x_{n}\rangle and |𝒪m|\mathcal{O}_{m}\rangle can be defined as

dn,mQ=1|xn|𝒪m|2,\displaystyle d^{Q}_{n,m}=1-|\langle x_{n}|\mathcal{O}_{m}\rangle|^{2}, (4)

where the second term on the right hand side of the above equation can be efficiently computed with a quantum computer using the SWAP test technique Buhrman et al. (2001). Similarly, the loss function for the mm-th cluster can be defined as

fQ(|𝒪m)=nj=1Nj(1|xnjj|𝒪m|2)=nj=1Njdnj,mQ.\displaystyle f^{Q}(|\mathcal{O}_{m}\rangle)=\sum_{n_{j}=1}^{N_{j}}\left(1-|\langle x_{n_{j}}^{j}|\mathcal{O}_{m}\rangle|^{2}\right)=\sum_{n_{j}=1}^{N_{j}}d^{Q}_{n_{j},m}. (5)

We note that the gradient of fQ(|𝒪m)f^{Q}(|\mathcal{O}_{m}\rangle) can also be computed with a quantum computer using Mitarai et al. (2018)

fQ(|𝒪m)θi=12fQ(|𝒪m)12fQ(|𝒪m+),\displaystyle\frac{\partial f^{Q}(|\mathcal{O}_{m}\rangle)}{\partial\theta_{i}}=\frac{1}{2}f^{Q}(|\mathcal{O}_{m}\rangle^{-})-\frac{1}{2}f^{Q}(|\mathcal{O}_{m}\rangle^{+}), (6)

where |𝒪m±|\mathcal{O}_{m}\rangle^{\pm} means to shift the ii-th parameters θi\theta_{i} in θ\vec{\theta} by ±π2\pm\frac{\pi}{2}. Implementing SWAP test on current quantum computers is not an easy task since it requires the three-qubit TOFFILI gate. In this work we will not go any further on this quantum machine learning algorithm but will instead focus on the quantum inspired classical algorithm as follows.

II.3 Quantum inspired K-means algorithm

Now we will describe in detail our quantum inspired K-means algorithm. Similar to the quantum K-means algorithm, the inputs will be assumed to be a list of quantum states. However, here we also assume that each quantum state |xn|x_{n}\rangle could be parameterized using an MPS XnσX_{n}^{\vec{\sigma}}

Xnσ=a1,a2,,aL+1Xn,a1,a2σ1Xn,a2,a3σ2Xn,aL,aL+1σL,\displaystyle X_{n}^{\vec{\sigma}}=\sum_{a_{1},a_{2},\dots,a_{L+1}}X^{\sigma_{1}}_{n,a_{1},a_{2}}X^{\sigma_{2}}_{n,a_{2},a_{3}}\dots X^{\sigma_{L}}_{n,a_{L},a_{L+1}}, (7)

where LL is the number of qubits required by the corresponding quantum state |xn|x_{n}\rangle. The size of an MPS is characterize by an integer DD, referred to as the bond dimension, defined as

D=max1lL(dim(al)).\displaystyle D=\max_{1\leq l\leq L}\left(\dim(a_{l})\right). (8)

We note that for a vector xl\vec{x}_{l} with LL elements (xl,1,xl,2,,xl,L)\left(x_{l,1},x_{l,2},\dots,x_{l,L}\right), one could map it into a separable quantum state |xl|x_{l}\rangle such that each element xn,lx_{n,l} is mapped into a single-qubit state cos(π2xn,l)|0+sin(π2xn,l)|1\cos(\frac{\pi}{2}x_{n,l})|0\rangle+\sin(\frac{\pi}{2}x_{n,l})|1\rangle. This would correspond to a separable MPS XnσX_{n}^{\vec{\sigma}} of D=1D=1 with elements

Xn,1,10=cos(π2xn,l),Xn,1,11=sin(π2xn,l).\displaystyle X_{n,1,1}^{0}=\cos(\frac{\pi}{2}x_{n,l}),\quad X_{n,1,1}^{1}=\sin(\frac{\pi}{2}x_{n,l}). (9)

The centroids are then assumed to be MPSs with a fixed bond dimension DD, and written as 𝒪mσ\mathcal{O}_{m}^{\vec{\sigma}}

𝒪mσ=b1,b2,,bL+1Mm,b1,b2σ1Mm,b2,b3σ2Mm,bL,bL+1σL.\displaystyle\mathcal{O}_{m}^{\vec{\sigma}}=\sum_{b_{1},b_{2},\dots,b_{L+1}}M^{\sigma_{1}}_{m,b_{1},b_{2}}M^{\sigma_{2}}_{m,b_{2},b_{3}}\dots M^{\sigma_{L}}_{m,b_{L},b_{L+1}}. (10)

The distance between two MPSs is defined as

dn,mM=(Xnσ,𝒪mσ,)(Xnσ𝒪mσ),\displaystyle d^{M}_{n,m}=\left(X_{n}^{\vec{\sigma},\dagger}-\mathcal{O}_{m}^{\vec{\sigma},\dagger}\right)\left(X_{n}^{\vec{\sigma}}-\mathcal{O}_{m}^{\vec{\sigma}}\right), (11)

and similarly, the loss function for the mm-th cluster can be defined as

fM(𝒪mσ)=nj=1Njdnj,mM,\displaystyle f^{M}(\mathcal{O}_{m}^{\vec{\sigma}})=\sum_{n_{j}=1}^{N_{j}}d^{M}_{n_{j},m}, (12)

with the normalization condition 𝒪mσ,𝒪mσ=1\mathcal{O}_{m}^{\vec{\sigma},\dagger}\mathcal{O}_{m}^{\vec{\sigma}}=1. The above cost function can be minimized by setting the gradient against each tensor Mm,bl,bl+1σlM_{m,b_{l},b_{l+1}}^{\sigma_{l}} to 0, namely

fM(𝒪mσ)Mm,bl,bl+1σl=0.\displaystyle\frac{\partial f^{M}(\mathcal{O}_{m}^{\vec{\sigma}})}{\partial M^{\sigma_{l}}_{m,b_{l},b_{l+1}}}=0. (13)

The above equation can be simply reduced to a set of algebraic equations. To see this, we first define

Lblbl\displaystyle L_{b_{l}}^{b_{l}^{\prime}} =bl1,bl1,σl1Lbl1bl1Mbl1,blσl1Mbl1,blσl1;\displaystyle=\sum_{b_{l-1},b_{l-1}^{\prime},\sigma_{l-1}}L_{b_{l-1}}^{b_{l-1}^{\prime}}M^{\sigma_{l-1}}_{b_{l-1},b_{l}}M^{\sigma_{l-1}}_{b_{l-1}^{\prime},b_{l}^{\prime}}; (14)
Rbl+1bl+1\displaystyle R_{b_{l+1}}^{b_{l+1}^{\prime}} =bl+2,bl+2,σl+1Rbl+2bl+2Mbl+1,bl+2σl+1Mbl+1,bl+2σl+1;\displaystyle=\sum_{b_{l+2},b_{l+2}^{\prime},\sigma_{l+1}}R_{b_{l+2}}^{b_{l+2}^{\prime}}M^{\sigma_{l+1}}_{b_{l+1},b_{l+2}}M^{\sigma_{l+1}}_{b_{l+1}^{\prime},b_{l+2}^{\prime}}; (15)
Ablal\displaystyle A_{b_{l}}^{a_{l}} =bl1,al1,σl1Abl1al1Mbl1,blσl1Xal1,alσl1;\displaystyle=\sum_{b_{l-1},a_{l-1},\sigma_{l-1}}A_{b_{l-1}}^{a_{l-1}}M^{\sigma_{l-1}}_{b_{l-1},b_{l}}X^{\sigma_{l-1}}_{a_{l-1},a_{l}}; (16)
Bbl+1al+1\displaystyle B_{b_{l+1}}^{a_{l+1}} =bl+2,al+2,σl+1Rbl+2al+2Mbl+1,bl+2σl+1Mal+1,al+2σl+1,\displaystyle=\sum_{b_{l+2},a_{l+2},\sigma_{l+1}}R_{b_{l+2}}^{a_{l+2}}M^{\sigma_{l+1}}_{b_{l+1},b_{l+2}}M^{\sigma_{l+1}}_{a_{l+1},a_{l+2}}, (17)

with Lb1b1=RbL+1bL+1=Ab1a1=BbL+1aL+1=1L_{b_{1}}^{b_{1}^{\prime}}=R_{b_{L+1}}^{b_{L+1}^{\prime}}=A_{b_{1}}^{a_{1}}=B_{b_{L+1}}^{a_{L+1}}=1. With these equations, Eq. 13 can be written as

LblblMbl,bl+1σlRbl+1bl+1=i=1nAblalXi,al,al+1σlBbl+1al+1,\displaystyle L_{b_{l}}^{b_{l}^{\prime}}M^{\sigma_{l}}_{b_{l},b_{l+1}}R_{b_{l+1}}^{b_{l+1}^{\prime}}=\sum_{i=1}^{n}A_{b_{l}}^{a_{l}}X^{\sigma_{l}}_{i,a_{l},a_{l+1}}B_{b_{l+1}}^{a_{l+1}}, (18)

where we have neglected the label mm for the cluster since each cluster can be minimized independently. Interestingly, if we keep 𝒪mσ\mathcal{O}_{m}^{\vec{\sigma}} in a mixed-canonical form Schollwöck (2011), then Lbl,bl=δbl,blL_{b_{l},b_{l}^{\prime}}=\delta_{b_{l},b_{l}^{\prime}} and Rbl+1,bl+1=δbl+1,bl+1R_{b_{l+1},b_{l+1}^{\prime}}=\delta_{b_{l+1},b_{l+1}^{\prime}}, such that the above equation reduces to

Mbl,bl+1σl=n=1NAblalXn,al,al+1σlBbl+1al+1.\displaystyle M^{\sigma_{l}}_{b_{l},b_{l+1}}=\sum_{n=1}^{N}A_{b_{l}}^{a_{l}}X^{\sigma_{l}}_{n,a_{l},a_{l+1}}B_{b_{l+1}}^{a_{l+1}}. (19)

III Applications

Table 1: Datasets used for comparison between classical K-means algorithm and our quantum inspired algorithm.
Name Points Dimensions Class
Breast 683 9 2
Ionosphere 351 34 2
Wine 178 13 3
Yeast 1484 8 10
E-coli 336 7 8

To demonstrate our method, we apply it to several commonly used machine learning datasets for clustering tasks as shown in TABLE 1. For each dataset, we randomly pick 80%80\% of the data as the training data, and the remaining 20%20\% as the testing data. To ensure a fair comparison, we will always use the same splitting of training and testing data for the classical K-means algorithm and our quantum inspired K-means algorithm. Since the K-means algorithm is based on the distance measurement, if the difference between variables of different dimensions is too large, it may cause a small number of variables to exert an excessively high influence on the whole cost, thus eliminating the effect of the rest variables. To avoid this effect, we divide each dimension of the data by the its largest value in the training set, namely

xn,jxn,jAjmax,\displaystyle x_{n,j}\leftarrow\frac{x_{n,j}}{A^{\max}_{j}}, (20)

with Ajmax=max1nNtrainxn,jA^{\max}_{j}=\max_{1\leq n\leq N_{train}}x_{n,j}, where NtrainN_{train} is the total number of training data. As a result, we have 0xn,j10\leq x_{n,j}\leq 1 for all 1nNtrain1\leq n\leq N_{train} and 1jL1\leq j\leq L. In the prediction stage, we also divide each dimension of the testing data by AjmaxA^{\max}_{j}. We note that this procedure may produce elements larger than 11 in the testing set, which may not be well distinguished by our quantum inspired algorithm since our encoding in Eq. 9 is a periodic function of period 11. However, our numerical results show that we could still get high precision results using our method despite those elements.

In the following, we show the comparison between our quantum inspired K-means algorithm with the classical K-means algorithm in terms learning accuracy and the likelihood of getting trapped in local minima. In TABLE. 2, we show the accuracies for different datasets, which is defined as the number of the correctly predicted data divided by the total number of testing data. Since it is possible for both algorithms to be trapped in local minima, we run the same simulation for each algorithm for 5050 times with randomly initialized centroids and pick the one with the highest accuracy. For our quantum inspired K-means algorithm, we also tune the bond dimension DD to be 88 and 1515 to see the effect of different bond dimensions. We can see from TABLE. 2 that with D=8D=8, our quantum inspired algorithm already performs as good as or better than the classical algorithm for all the cases. And for D=15D=15, we get even higher accuracies.

Table 2: Comparison of accuracies on the testing data between the classical K-means algorithm and the quantum inspired K-means algorithm. For the E-coil dataset, the data for the case D=15D=15 is missing since the dimension of each data is too small.
Classic k-means Quantum inspired k-means
D = 8 D = 15
Breast 99.27%\% 100%\% 100%\%
Ionosphere 60.56%\% 60.56%\% 63.38%\%
Wine 100%\% 100%\% 100%\%
Yeast 45.45%\% 47.47%\% 48.48%\%
E-coli 85.29%\% 89.71%\% none

A well-known drawback of the classical K-means algorithm is that it could easily be trapped in local minima. Here we also check the likelihood of our algorithm to be trapped in local minima. For this purpose, we take the Wine and Breast dataset as examples and we run the same simulation for 100100 randomly initialized centroids for each algorithm. We show the distribution of the number of cases against different accuracies as a histogram in Fig. 1. We can see that for our quantum inspired K-means algorithm, there is a higher probability to get a higher accuracy. For example, for the Wine dataset, there are 5050 cases out of 100100 with accuracies beyond 0.90.9 for the classical K-means algorithm, while for quantum inspired K-means algorithm with a bond dimension D=8D=8 there are 5353 cases, and for D=15D=15 there are 6060 cases.

Refer to caption
Figure 1: Frequency as a function of the accuracy for the Wine dataset (a), and the Breast dataset (b). The red, blue, yellow columns correspond to classical K-means, quantum inspired K-means algorithm with D=8D=8 and D=15D=15 respectively. The y-axis is the number of instances out of 100100 cases with an accuracy in between the interval indicated by the x-axis.

Another way to access the quality of the learned centers is to compute the Euclidean distances between the centroids in both cases. It is already shown in Sun et al. (2020) that the classical to quantum encoding in Eq.9 could result in better separated input data, thus when appropriate learning algorithms are used, it is more likely to get higher prediction accuracy. In the classical case, The Euclidean distance between two different centers 𝒪i\vec{\mathcal{O}}_{i} and 𝒪j\vec{\mathcal{O}}_{j} is

Di,j=|𝒪i𝒪j|,\displaystyle D_{i,j}=|\vec{\mathcal{O}}_{i}-\vec{\mathcal{O}}_{j}|, (21)

where |v||\vec{v}| means the 22-norm of the vector v\vec{v}. Similarly, in the quantum inspired case, the distance between two different centers 𝒪iσ\mathcal{O}_{i}^{\vec{\sigma}} and 𝒪jσ\mathcal{O}_{j}^{\vec{\sigma}} is

Di,jσ=|𝒪iσ|𝒪jσ|2.\displaystyle D^{\vec{\sigma}}_{i,j}=|\langle\mathcal{O}_{i}^{\vec{\sigma}}|\mathcal{O}_{j}^{\vec{\sigma}}\rangle|^{2}. (22)

Since 𝒪iσ\mathcal{O}_{i}^{\vec{\sigma}} lives in a much larger space than 𝒪i\vec{\mathcal{O}}_{i}, it is possible that the centroids are more likely orthogonal to each other in the quantum inspired case, that is, the k×kk\times k matrix formed by Di,jσD^{\vec{\sigma}}_{i,j} is closer to a diagonal matrix. As a result, it is easier to label a new data in the correct category. To show this clearly, we directly show the distance matrices formed by Di,jD_{i,j} and Di,jσD^{\vec{\sigma}}_{i,j} based on the optimized centroids for the E-coil dataset in Fig.2. We can see clearly that in the classical case the centroids have large overlaps between one another while in the quantum inspired case, the centers are almost orthogonal to each other.

Refer to caption
Figure 2: (a) Distances between learned centroids for the classical K-means algorithm. (b) Distances between learned centroids for the quantum inspired K-means algorithm. Here the E-coil dataset is used.

Lastly, we compare the convergence rate towards minima for different algorithms. In Fig.3, we show the loss values as a function of the minimization steps. In the classical case, the loss values are computed after each K-means iteration which is shown in Fig.3(a). Each K-means iteration is counted as one minimization step in this case. In the quantum inspired case, in each K-means iteration, we use a single variation MPS sweep in which the local on-site energy is first minimized from the left boundary to the right boundary and then back from the right boundary to the left boundary Schollwöck (2011). Each local minimization is counted as a minimization step in the quantum inspired case. In Fig.3(b), we show the loss values in this case. We have taken the initial loss value as 11 in Fig. 3 and the rest loss values are renormalized against the initial value. We can see that the classical K-means algorithm takes 77 minimization steps (77 K-means iterations) to reach a final loss value 0.3160.316, while the quantum inspired K-means algorithm uses 1212 minimization steps (22 K-means iterations) to reach a final loss value 0.0910.091.

Refer to caption
Figure 3: (a) Loss value as a function of K-means iterations for the classical K-means algorithm. (b) Loss value as a function of K-means iterations. In each K-means iteration we use 11 variational-MPS sweep which contains 2(L1)2(L-1) local minimizations steps.

IV Conclusion

In this paper, we propose a quantum inspired K-means clustering algorithm based on matrix product states. By mapping the input data into a much larger quantum Hilbert space and clustering data in the enlarged space, we show that the learning algorithm could result in a much higher precision. We demonstrate our algorithm by applying it to several commonly used machine learning datasets. Our results show that our algorithm is advantageous to the classical K-means algorithm in that 1) our algorithm could reach a higher prediction accuracy and 2) our algorithm is less likely to be trapped in local minima. We show that compared to the classical K-means algorithm, the learned centroids with our algorithm are better separately, which could be a reason why it could make more precise predictions. We also show the loss values as a function of the number of minimizations steps to help to better visualize the convergence in both algorithms.

To this end, we point out that there are at least two directions which could be inspired from this work: 1) the classical K-means algorithm is well studied and there exists various techniques to improve the learning accuracy on top of the standard K-means algorithm. In future works those techniques could also be adapted into our quantum inspired algorithm to further increase the learning accuracy; 2) As discussed in Sec.II, our quantum inspired algorithm has a one-to-one correspondence with a pure quantum machine learning algorithm which could be readily be executed on a quantum computer, with the most significant difference that in the quantum inspired algorithm the quantum state is represented using matrix product states which could be efficiently stored on a classical computer. In the near future it would be interesting to carry out the quantum K-means algorithm on a quantum computer with applications to real world clustering problems. Other than that, one could also explore the two-site variant of the quantum inspired algorithm such that the bond dimension DD could be dynamically adjusted. Moreover, in this work we use variational-MPS sweeps where we minimized the local energy on each site iteratively, it could also be interesting to compare this approach with the gradient-based minimization methods.

V Acknowledgement

Y. Shang thanks the support of National Key Research and Development Program of China under grant 2016YFB1000902, National Research Foundation of China (Grant No. 61872352, 61472412), and Program for Creative Research Group of National Natural Science Foundation of China (Grant No. 61621003); C. Guo. acknowledges support from National Natural Science Foundation of China under Grants No. 11805279.

References

  • Carrasquilla and Melko (2017) J. Carrasquilla and R. G. Melko, Nature Physics 13, 431 (2017).
  • Zhang et al. (2018) P. Zhang, H. Shen,  and H. Zhai, Physical review letters 120, 066401 (2018).
  • Ch’ng et al. (2018) K. Ch’ng, N. Vazquez,  and E. Khatami, Physical Review E 97, 013306 (2018).
  • Van Nieuwenburg et al. (2017) E. P. Van Nieuwenburg, Y.-H. Liu,  and S. D. Huber, Nature Physics 13, 435 (2017).
  • Broecker et al. (2017) P. Broecker, J. Carrasquilla, R. G. Melko,  and S. Trebst, Scientific reports 7, 1 (2017).
  • Ch’Ng et al. (2017) K. Ch’Ng, J. Carrasquilla, R. G. Melko,  and E. Khatami, Physical Review X 7, 031038 (2017).
  • Zhang and Kim (2017) Y. Zhang and E.-A. Kim, Physical review letters 118, 216401 (2017).
  • Dong et al. (2019) X.-Y. Dong, F. Pollmann, X.-F. Zhang, et al., Physical Review B 99, 121104 (2019).
  • Torlai et al. (2018) G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko,  and G. Carleo, Nature Physics 14, 447 (2018).
  • Torlai and Melko (2018) G. Torlai and R. G. Melko, Physical review letters 120, 240503 (2018).
  • Rocchetto et al. (2018) A. Rocchetto, E. Grant, S. Strelchuk, G. Carleo,  and S. Severini, npj Quantum Information 4, 28 (2018).
  • Quek et al. (2018) Y. Quek, S. Fort,  and H. K. Ng, arXiv preprint arXiv:1812.06693  (2018).
  • Carrasquilla et al. (2019) J. Carrasquilla, G. Torlai, R. G. Melko,  and L. Aolita, Nature Machine Intelligence 1, 155 (2019).
  • Arsenault et al. (2014) L.-F. Arsenault, A. Lopez-Bezanilla, O. A. von Lilienfeld,  and A. J. Millis, Physical Review B 90, 155136 (2014).
  • Arsenault et al. (2015) L.-F. Arsenault, O. A. von Lilienfeld,  and A. J. Millis, arXiv preprint arXiv:1506.08858  (2015).
  • Torlai and Melko (2016) G. Torlai and R. G. Melko, Physical Review B 94, 165134 (2016).
  • Amin et al. (2018) M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy,  and R. Melko, Physical Review X 8, 021050 (2018).
  • Liu et al. (2017) J. Liu, Y. Qi, Z. Y. Meng,  and L. Fu, Physical Review B 95, 041101 (2017).
  • Huang and Wang (2017) L. Huang and L. Wang, Physical Review B 95, 035105 (2017).
  • Aoki and Kobayashi (2016) K.-I. Aoki and T. Kobayashi, Modern Physics Letters B 30, 1650401 (2016).
  • Carleo and Troyer (2017) G. Carleo and M. Troyer, Science 355, 602 (2017).
  • Nomura et al. (2017) Y. Nomura, A. S. Darmawan, Y. Yamaji,  and M. Imada, Physical Review B 96, 205152 (2017).
  • Czischek et al. (2018) S. Czischek, M. Gärttner,  and T. Gasenzer, Physical Review B 98, 024311 (2018).
  • Luchnikov et al. (2020) I. A. Luchnikov, S. V. Vintskevich, D. A. Grigoriev,  and S. N. Filippov, Phys. Rev. Lett. 124, 140502 (2020).
  • Bény (2013) C. Bény, arXiv preprint arXiv:1301.3124  (2013).
  • Mehta and Schwab (2014) P. Mehta and D. J. Schwab, arXiv preprint arXiv:1410.3831  (2014).
  • Lin et al. (2017) H. W. Lin, M. Tegmark,  and D. Rolnick, Journal of Statistical Physics 168, 1223 (2017).
  • Deng et al. (2017) D.-L. Deng, X. Li,  and S. D. Sarma, Physical Review X 7, 021021 (2017).
  • Schollwöck (2011) U. Schollwöck, Annals of Physics 326, 96 (2011).
  • Stoudenmire and Schwab (2016) E. Stoudenmire and D. J. Schwab, in Advances in Neural Information Processing Systems (2016) pp. 4799–4807.
  • Stoudenmire (2018) E. M. Stoudenmire, Quantum Science and Technology 3, 034003 (2018).
  • Sun et al. (2020) Z.-Z. Sun, C. Peng, D. Liu, S.-J. Ran,  and G. Su, Physical Review B 101, 075135 (2020).
  • Han et al. (2018) Z.-Y. Han, J. Wang, H. Fan, L. Wang,  and P. Zhang, Physical Review X 8, 031012 (2018).
  • Guo et al. (2018) C. Guo, Z. Jie, W. Lu,  and D. Poletti, Physical Review E 98, 042114 (2018).
  • Guo et al. (2020) C. Guo, K. Modi,  and D. Poletti, arXiv preprint arXiv:2004.11038  (2020).
  • Torlai et al. (2020) G. Torlai, C. J. Wood, A. Acharya, G. Carleo, J. Carrasquilla,  and L. Aolita, arXiv preprint arXiv:2006.02424  (2020).
  • MacQueen et al. (1967) J. MacQueen et al., in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1 (Oakland, CA, USA, 1967) pp. 281–297.
  • Lloyd (1982) S. Lloyd, IEEE transactions on information theory 28, 129 (1982).
  • Arthur and Vassilvitskii (2006) D. Arthur and S. Vassilvitskii, k-means++: The advantages of careful seeding, Tech. Rep. (Stanford, 2006).
  • Ostrovsky et al. (2013) R. Ostrovsky, Y. Rabani, L. J. Schulman,  and C. Swamy, Journal of the ACM (JACM) 59, 1 (2013).
  • Celebi et al. (2013) M. E. Celebi, H. A. Kingravi,  and P. A. Vela, Expert systems with applications 40, 200 (2013).
  • Bachem et al. (2016) O. Bachem, M. Lucic, H. Hassani,  and A. Krause, in Advances in neural information processing systems (2016) pp. 55–63.
  • Zhang et al. (1999) B. Zhang, M. Hsu,  and U. Dayal, Hewlett-Packard Labs Technical Report HPL-1999-124 55 (1999).
  • Xu and Lange (2019) J. Xu and K. Lange, in International Conference on Machine Learning (2019) pp. 6921–6931.
  • De Amorim and Mirkin (2012) R. C. De Amorim and B. Mirkin, Pattern Recognition 45, 1061 (2012).
  • Chakraborty and Das (2017) S. Chakraborty and S. Das, Pattern Recognition Letters 100, 67 (2017).
  • Mitarai et al. (2018) K. Mitarai, M. Negoro, M. Kitagawa,  and K. Fujii, Physical Review A 98, 032309 (2018).
  • Liu et al. (2020) Y. Liu, D. Wang, S. Xue, A. Huang, X. Fu, X. Qiang, P. Xu, H.-L. Huang, M. Deng, C. Guo, et al., Physical Review A 101, 052316 (2020).
  • Buhrman et al. (2001) H. Buhrman, R. Cleve, J. Watrous,  and R. De Wolf, Physical Review Letters 87, 167902 (2001).