This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

thanks: [email protected]thanks: [email protected]thanks: [email protected]

Unsupervised learning of topological phase transitions using the Calinski-Harabaz index

Jielin Wang College of information and computer, Taiyuan University of Technology, Taiyuan 030024, China    Wanzhou Zhang College of Physics and Optoelectronics, Taiyuan University of Technology, Taiyuan 030024, China    Tian Hua College of information and computer, Taiyuan University of Technology, Taiyuan 030024, China    Tzu-Chieh Wei C.N. Yang Institute for Theoretical Physics and Department of Physics and Astronomy, State University of New York at Stony Brook, Stony Brook, NY 11794-3840, USA
Abstract

Machine learning methods have been recently applied to learning phases of matter and transitions between them. Of particular interest is the topological phase transition, such as in the XY model, whose critical points can be difficult to be obtained by using unsupervised learning such as the principal component analysis. Recently, authors of [Nature Physics 15,790 (2019)] employed the diffusion map method for identifying topological orders and were able to determine the Berezinskii-Kosterlitz-Thouless (BKT) phase transition of the XY model, specifically via the intersection of the average cluster distance D¯\bar{D} and the within-cluster dispersion parameter σ¯\bar{\sigma} (when the different clusters vary from separation to mixing together). However, sometimes it is not easy to find the intersection if D¯\bar{D} or σ¯\bar{\sigma} does not change too much due to topological constraint. In this work, we propose to use the Calinski-Harabaz (chch) index, defined roughly as the ratio D¯/σ¯\bar{D}/\bar{\sigma}, to determine the critical points, at which the chch index reaches a maximum or minimum value, or jumps sharply.

We examine the chch index in several statistical models, including ones that contain a BKT phase transition. For the Ising model, the peaks of the quantity chch or its components are consistent with the position of the specific heat maximum. For the XY model, both on the square and honeycomb lattices, our results of the chch index show the convergence of the peaks over a range of the parameters ε/ε0\varepsilon/\varepsilon_{0} in the Gaussian kernel.

We also examine the generalized XY model with q=2q=2 and q=8q=8 and study the phase transition using the fractional 1/21/2-vortex or 1/81/8-vortex constraint respectively. The global phase diagram can be obtained by our method, which does not use the label of configuration needed by supervised learning, nor a crossing from two curves D¯\bar{D} and σ¯\bar{\sigma}. Our method is thus useful to both topological and non-topological phase transitions and can achieve accuracy as good as supervised learning methods previously used in these models, and may be used for searching phases from experimental data.

pacs:
05.70.Fh, 64.60.ah, 64.60.De,64.60.at,89.20.Ff

I introduction

Exploring phases of matter and phase transitions is a traditional but still active research direction in statistical physics pt ; qpt , partly due to new phases of matter that have been uncovered. In recent years, this field of research is revived thanks to the introduction of artificial intelligence and machine learning methods to recognize phases and transitions roger1 . Among the various methods, supervised learning methods are used to train the network using prior labeled data generated by Monte Carlo methods in various classical systems, such as the Ising model and its variants roger1 ; longising ; smising , XY models hu ; mlpxy ; rogerxy ; Wessel with BKT phase transitions xykt ; bkt , Dzyaloshinskii-Moriya ferromagnets dm , skyrmions sky , Potts models Potts and percolation models mlpxy . The properties of quantum systems, such as the energy spectrum, entanglement spectrum confu1 , or configuration in the imaginary time fermi ; xuefeng are also used as inputs to the neural networks for the training. On the other hand, unsupervised learning methods can provide unbiased classification, as they do not need prior knowledge of the system so as to classify phases and obtain the phase transition. These include methods such as the principal component analysis wang1 , autoencoders unp , t-SNE unsup1 , clustering with quantum mechanics tony . A key feature of these is to determine the sought-after properties (such as phase transition) by examining indicators from the scattered plot in the reduced space. Beyond equilibrium statistical physics, non-equilibrium and dynamical properties dy ; nonequi2 ; out2 are also obtained by machine learning methods. In addition, there are other newly developed schemes applied to the phases of matter, such as the adversarial neural networks adv , confusion method and its extension dcn , the super resolving method rs , and even applications to experimental data ex1 ; ex2 . Other developments in this area can be found in Refs. topo-inv ; z2spin ; tp2 ; mbl ; jgliu ; ml ; selfmap ; tool ; spinor ; conf ; order ; exp and the review article rev .

It is well known that in traditional continuous phase transitions global symmetry is broken and these transitions can be described by the Landau theory. However, the topological phase transitions have no broken symmetry and therefore it is of great interest to understand how the transitions emerge and how to locate the transition points. Recent developments of machine learning offers new tools for this endeavor hu ; rogerxy ; Wessel ; mlpxy , based on supervised learning. However, the learning-by-confusion scheme when applied to the XY model predicts a transition temperature set by the value located at the maximum of the specific heat Wessel . In Ref. rogerxy , it was found that significant feature engineering of the raw spin states is needed to relate vortex configurations and the transition. Moreover, a single-hidden-layer Fully Connected Networks (FCN) could not correctly learn the phases in the XY model, whereas the Convolutional Neural Networks (CNN) was successfully employed to learn the BKT transition rogerxy , and the classification was later extended to the generalized XY model mlpxy . However, for unsupervised learning with the PCA method, it was claimed to be difficult to identify the transition hu . Recently, Rodriguez-Nieva and Scheurer used the diffusion map method nature invented by R.R. Coifman and S. Lafon lafon and related to quantum clustering tony , and devised an unsupervised learning method for identifying topological orders and successfully locating the BKT transition. They divided the configurations into several topological sectors with different winding numbers, then calculated the eigenvector Ψ\Psi and eigenvalues λ\lambda of the so-called diffusion matrix PP (see Sec. II.1). From the intersections of the average cluster distance D¯\bar{D} and within-cluster dispersion σ¯\bar{\sigma}, or equivalently the intersection of Δλ\Delta\lambda (the jump in the eigenvalues) and σλ\sigma_{\lambda} (the standard deviation of the eigenvalues), the phase transition of the XY model on the square lattice can be obtained very well.

Our motivation of this study is to examine whether or not the diffusion map (DM) method of Rodriguez-Nieva and Scheurer (RNS method) is suitable beyond the pure XY model, such as the generalized XY model. Indeed we find that the DM method works for some topological phase transitions, but it fails to locate the phase transition in the generalized XY model in other regimes. Specifically, the RNS method for determining the transition point relies on the intersection of the two curves (such as the average cluster distance D¯\bar{D} and within-cluster dispersion σ¯\bar{\sigma}), and in the q=8q=8 generalized XY model, as illustrated in Sec. V, we cannot find an intersection there. The thermal fluctuation is not strong enough and the scattering clusters with different winding as numbers do not mix close to TcT_{c}, i.e. D¯\bar{D} does not decrease substantially. The question arises: are there other quantities that can be used to locate the phase transition points?

In this work, we mainly use the Calinski-Harabaz (ch) index score CH defined by cht/chbch_{t}/ch_{b}, related to the ratio of D¯/σ¯\bar{D}/\bar{\sigma}, which means that if the variation of D¯\bar{D} can be negligible due to strong topological constraints, the variation of σ¯\bar{\sigma} itself can help to determine the phase transition point. We also use the Silhouette coefficient (scsclunkuo or its components to compare with the results.

The outline of this work is as follows. Sec. II shows the DM methods, the definition of the indices chch, scsc and their components, their advantage and prior knowledge for using the indices. Sec. III shows the signature of chch and scsc for the two-dimensional Ising model from configurations using the Swendsen-Wang algorithms SW . In Sec. IV, the critical phase transition points of the XY model on the square and the honeycomb lattices are obtained using the DM method. In Sec. V, for the q=2q=2 and q=8q=8 generalized XY models, the global phase diagrams are obtained by the DM method assisted by PCA or k-PCA methods. Other techniques are discussed in Sec. VI regarding the effect of higher dimensions considered in the k-means method and how to automatically find the hyperparameter ε/ε0\varepsilon/\varepsilon_{0}. Concluding comments are made in Sec.VII. In Appendix A, the simplest example 1D XY model is discussed, and the comparison between PCA and k-PCA is shown in Appendix B. Finally, a total of 11 indices used in unsupervised learning are listed in Appendix C.

II Methods

II.1 The review of diffusion map method

Here, we explain the DM method of Rodriguez-Nieva and Scheurer nature . Assume that we have MM configurations {xl}\{x_{l}\}, where l=1,Ml=1,\dots M. The connectivity between xlx_{l} and xlx_{l^{{}^{\prime}}} is denoted by the elementary Gaussian kernel

Kε(xl,xl)=exp(xlxl22Nε).K_{\varepsilon}(x_{l},x_{l^{{}^{\prime}}})=\exp(-\frac{||{x_{l}-x_{l^{{}^{\prime}}}}||^{2}}{2N\varepsilon}). (1)

The normalization of Kε(xl,xl)K_{\varepsilon}(x_{l},x_{l^{{}^{\prime}}}) can be done by performing the sum over ll^{{}^{\prime}},

Pl,l=Kε(xl,xl)zl,zl=l=1mKε(xl,xl)P_{l,l^{{}^{\prime}}}=\frac{K_{\varepsilon}(x_{l},x_{l^{{}^{\prime}}})}{z_{l}},z_{l}=\sum^{m}_{l^{{}^{\prime}}=1}K_{\varepsilon}(x_{l},x_{l^{{}^{\prime}}}) (2)

We can also perform the normalization along the direction of ll (i.e., the sum over ll). The eigenvalue equation of the diffusion matrix Pl,lP_{l,l^{{}^{\prime}}} is Pψk=λkψkP\psi_{k}=\lambda_{k}\psi_{k}, where ψk\psi_{k}’s are the right eigenvectors, with the corresponding eigenvalues λk1\lambda_{k}\leq 1, for k=0,1,,m1k=0,1,\dots,m-1.

To find the phase transition point, Ref. nature and its earlier preprint version prep use two different ways, respectively:
(a) Intersection of the mean distance of cluster centers D¯/2n\overline{D}/2n, where nn is the number of clusters, and the dispersion σ¯\overline{\sigma} of the data points in each cluster. The quantities D¯/2n\overline{D}/2n and σ¯\overline{\sigma} are obtained from the scatter plot, where nn is the number of topological sectors.

After obtaining the eigenvectors of the PP matrix given by

Φ:=[(ψ)1,(ψ)2,,(ψ)m1],\Phi:=[(\psi)_{1},(\psi)_{2},\ldots,(\psi)_{m-1}], (3)

the authors project them unto a two-dimensional space, namely, a two-column matrix, then D¯/2n\overline{D}/2n and the dispersion σ¯\overline{\sigma} can be obtained from the scatter plot of the two column vectors or their modified version. The detailed application to the one-dimensional XY model is reproduced in Appendix A.
(b) Intersection of the mean fluctuation σλ\sigma_{\lambda} and gap of eigenvalues Δλ\Delta\lambda, where

σλ=1nk=0n1(λkλ¯)2,λ¯=1nk=0n1λk,\sigma_{\lambda}=\frac{1}{n}\sum^{n-1}_{k=0}(\lambda_{k}-\overline{\lambda})^{2}\;,\;\overline{\lambda}=\frac{1}{n}\sum_{k=0}^{n-1}\lambda_{k}, (4)

and the gap of eigenvalues between the topological sectors nn and n1n-1,

Δλ=λnλn1.\Delta\lambda=\lambda_{n}-\lambda_{n-1}. (5)

II.2 The indices ch and sc

We propose to use indices instead of intersections. Based on the first two leading eigenvectors Ψ0\Psi_{0} and Ψ1\Psi_{1} of the PCA, kernel-PCA (kPCA) or the second and third vectors Ψ1\Psi_{1} and Ψ2\Psi_{2} of the DM method, for a manually chosen cluster numbers kk, the chch index is given in terms of the following ratio:

ch=chtchb=tr(Bk)tr(Wk)×Nkk1,ch=\frac{ch_{t}}{ch_{b}}=\frac{tr(B_{k})}{tr(W_{k})}\times\frac{N-k}{k-1}, (6)

where BkB_{k} is the between-group dispersion matrix and WkW_{k} is the within-cluster dispersion matrix and they are defined as follows,

Wk=q=1kxCq(xcq)(xcq)T,Bk=qnq(cqc)(cqc)T,\begin{array}[]{l}W_{k}=\sum_{q=1}^{k}\sum_{x\in C_{q}}(x-c_{q})(x-c_{q})^{T},\\ B_{k}=\sum_{q}n_{q}(c_{q}-c)(c_{q}-c)^{T},\end{array} (7)

where NN is the number of data points, CqC_{q} is the set of points in cluster qq, cqc_{q} is the center of cluster qq, and cc denotes the average center of all cluster centers {cq}\{c_{q}\}, and nqn_{q} the number of points in cluster qq. The scsc index of the ii-th sample is

sc(i)=b(i)a(i)max(b(i),a(i))sc(i)=\frac{b(i)-a(i)}{max(b(i),a(i))} (8)

where a(i)a(i) is the mean distance between sample ii and all other data points in the same cluster, b(i)b(i) is the mean dissimilarity of point ii to some cluster CC expressed as the mean of the distance from ii to all points in CC (where CCiC\neq C_{i}). The mean sc=sc(i)/Nsc=\sum sc(i)/N over all points of a cluster is a measure of how tightly grouped all the points in the cluster are. Sometimes sca=a(i)/Nsc_{a}=\sum a(i)/N or scb=b(i)/Nsc_{b}=\sum b(i)/N are also very useful 11index .

The performance of a total of 11 indices is shown in Appendix C and the results of indices chch, dndn, pbmpbm and IiIi applied to the q=8q=8 generalized XY model are all reasonable choices. Here, we choose two representative indices chch and scsc as examples.

Refer to caption
Figure 1: (Color online) Flow chart and main steps of the RNS method and our modifications. The main difference begins at how to using the DM method (a) clustering algorithm in RNS method (b) intersections of Δλ\Delta\lambda and σλ\sigma_{\lambda} (c) Our method using chch.

II.3 Advantage and prior knowledge required using ch

Fig. 1 draws the flowchart of the RNS method and the place of our modifications. Depending on whether or not we are considering topology, we perform dimensional reduction by DM or PCA (k-PCA) respectively, to get the features of the data, i.e., the eigenvectors of the diffusion matrix PP or the covariance matrix. Then we apply k-means to cluster the two-dimensional or higher dimensional scattering points. If we do not choose “ch”, then intersections of Δλ\Delta\lambda and σλ\sigma_{\lambda}, or intersections of D¯/2n\bar{D}/2n and σ¯\bar{\sigma} nature ; prep can be used instead.

To test whether the clustering is good or bad, if “chch” is chosen, then the peaks of chch or its components chtch_{t} and chbch_{b} will be used directly. For the topological phase transition, the index chch or its component chtch_{t} and chbch_{b} can give signatures of phase transition when it is not easy to determine the intersection of the RNS method or when there is no intersection.

The index chch can also be applied to non-topological phase transitions, such as the order-disorder phase transition of the Ising model. This does not require any prior knowledge except for the configurations generated by e.g. Monte Carlo methods or from real experiments.

For topological phase transitions, such as the XY and generalized XY models, although this type of unsupervised learning is not similar to supervised learning (such as fully-connected layers, or convolutional neural networks), it still needs labels of configurations, i.e., the topological winding number and the number of possible phases. However, the label of the topological winding number does not mean the label of phases, and essentially, this method is still an unsupervised learning method.

III The 2D Ising model

To test the ability to locate the phase transition point TcT_{c}, we calculate chch for the simplest model, i.e., the Ising model,

H=Ji,jSiSj,H=-J\sum_{\langle i,j\rangle}S_{i}S_{j}, (9)

where JJ is the ferromagnetic interaction between a pair of nearest neighborhood spins, and Si=±1S_{i}=\pm 1. The unsupervised learning of Ising model has been studied before, (see e.g., Refs. wang1 ; unp ).

We use a two-dimensional 64×6464\times 64 lattice and generate Ns=2000N_{s}=2000 samples for each temperature TiT_{i} and analyze them by PCA, using the scattering data of the first two leading eigenvectors Ψ0\Psi_{0} and Ψ1\Psi_{1}. Moreover, we calculate chch(k=2k=2) and sc(k=2k=2) for each TiT_{i} according to Eqs. (6) and (8).

Figure 2 shows the main results for the Ising model. In Fig. 2 (a), scsc itself and scbsc_{b} have a sharp decrease whereas scasc_{a} has peaks around TcT_{c} (here we re-scale the results so the maximum value is 1). In Figs. 2 (b) and (c), the chbch_{b} index is peaked around 2.32.3; chtch_{t} and chch also have a sharp jump at the phase transition points.

To understand these results, the scatter plot of Ψ1\Psi_{1} and Ψ2\Psi_{2} are shown in Figs. 2 (d)-(f) for temperatures T=1.5T=1.5, 2.3 and 2.9, respectively. At low temperature, the clusters identified by the two colors separate from each other in the reduced space and finally mix together at high temperatures.

In general, the chch index is large when the clusters are well separated and the points in each cluster are well aggregated. From this viewpoint, in the low temperature phase, the chch index becomes large because all up states and all down states can be easily distinguished if the analysis is successful.

The index chch and their components perform well in detecting the transition, and the position of the peak or the jump is the largest at T=2.3T=2.3. Around the phase transition point, configurations possess the properties from both phases (paramagnetic and ferromagnetic) and the fluctuation is the largest there.

Comparing to traditional Monte Carlo results with the same size L=64L=64 as shown in Fig.2 (g)-(i), we find that the position of the peak is located at around 2.295 consistent with our chch or scsc results with numerical intervals of 0.01. For the purpose of reference, the thermodynamic limit transition Tc=2.269T_{c}=2.269 is marked.


Refer to caption
Refer to caption
Figure 2: For the Ising model with sizes L=64L=64, (a) scsc, scasc_{a}, scbsc_{b} (b) chtch_{t} and chbch_{b} presented in the normalized range [0,1][0,1] (c) chch (d) - (f) scatter plot with Ψ1\Psi_{1} and Ψ2\Psi_{2}, at three typical temperatures T<TcT<T_{c}, TTcT\approx T_{c} and T>TcT>T_{c}, respectively, where different colors mean labels generated by k-means. (g)-(i) Monte Carlo results m2m^{2}, CvC_{v} and zoomed CvC_{v}. The peaks are at 2.295, consistent with 2.30.

It should be noted that if we simulate the Ising model by the Metropolis Monte Carlo algorithm metro , which flips one spin each time, in the low temperature limit, almost all spins choose to sit in the initial state (i.e., 111111111111, spin up). The scatter plot will simply have one group. However, this is not wrong because the state still obeys the Boltzmann distribution. This is the reason of small chch at low temperatures of using the Metropolis algorithm to generate configurations. However, we can still observe the signature at the phase transition point. Using the Swendsen-Wang algorithm instead SW , i.e., a global updating Monte Carlo method, the spin states will all be spin up or spin down in lower temperatures, which is not dependent of the initial state.

IV The 2D XY model

The Hamiltonian of the classical XY model is given by

H=Jcos(θiθj),H=-J\sum\cos(\theta_{i}-\theta_{j}),

where i,j\left\langle i,j\right\rangle denotes a nearest-neighbor pair of sites ii and jj, and θ\theta in (0,2π](0,2\pi] is a classical variable defined at each site describing the angles of spin directions in a two-dimensional spin plane. The sum in the Hamiltonian is over nearest-neighbor pairs on the square lattice (L×L)(L\times L) with the periodic boundary condition; other lattices can be also considered.

IV.1 The 2D XY model on square lattices

Now we analyze the first example, i.e., the two-dimensional XY model on the square lattice. Firstly, we generate the configurations with five fixed winding number pairs at ν=(νx,νy)=(0,0)\nu=(\nu_{x},\nu_{y})=(0,0), (0,1)(0,1), (1,0)(1,0), (0,1)(0,-1) and (1,0)(-1,0). For each fixed winding number pair, it should be noted that, for the two-dimensional geometry, the winding number component νx=1\nu_{x}=1 means that the spins in each row form a winding number of 11 rather than just the spins on one row randomly selected. Cluster simulation algorithms wf ; SW are not suitable to update the spins because the global flips break the topological winding number easily and therefore the Metropolis algorithm is used while trying to rotate the spin vector with a very small step each time so as to preserve the winding number.

For each topological sector, we generate m=500m=500 configurations. Combining all configurations {xl,l=1,,5m}\{x_{l},l=1,\cdots,5m\} from all five sectors, we construct the kernel KεK_{\varepsilon}. The elements between Kε(xl,xl)K_{\varepsilon}(x_{l},x_{l^{{}^{\prime}}}) is defined in Eq. (1) and the normalized matrix Pε(xl,xl)P_{\varepsilon}(x_{l},x_{l^{{}^{\prime}}}) is obtained, obeying its eigenvalue equation PΨk=λkΨkP\Psi_{k}=\lambda_{k}\Psi_{k}. Using the scatter plot of the second and third leading eigenvectors Ψ1\Psi_{1} and Ψ2\Psi_{2} of PεP_{\varepsilon}, the chch index are obtained and displayed in Figs. 3 (a)- (d).

There is a parameter ε\varepsilon in the above matrices and in order to deduce consistent results, we need to make sure the results are converging for a finite range of ε/ε0\varepsilon/\varepsilon_{0}. We see that this is indeed the case for different values of ε/ε0\varepsilon/\varepsilon_{0} in Figs. 3(a)-(d). For ε/ε0=2.5\varepsilon/\varepsilon_{0}=2.5, 33, 3.53.5 & 44, TcT_{c} is less than 0.90.9, and then becomes 0.930.93 when ε/ε0=4.5\varepsilon/\varepsilon_{0}=4.5, 55, 5.55.5, 66, 6.56.5 and 77. Finally, the peaks move left and deviate from 0.930.93 again when ε/ε0=8\varepsilon/\varepsilon_{0}=8 or larger. It should be noted that here, 10,00010,000 or more Monte Carlo steps, are used in order to reach the equilibrium of systems.

The estimated points are labeled by the circles in Fig. 3 (e) and they all distribute nearby or on the red lines representing the latest result critical temperature Tc=0.8935T_{c}=0.8935kao The intersections of D¯\bar{D} and σ¯\bar{\sigma} are labeled by the gray regions in the critical regimes Tc=0.9±0.1T_{c}=0.9\pm 0.1 from Ref. nature . It appears that it is easier to use chch to locate the phase transition as we only need to identify the peak location. The results from Ref. nature have greater uncertainty than those by using the chch index.

The histogram of our estimated TcT_{c} is shown in Fig. 3 (f), which helps to determine the hyperparameter ε/ε0\varepsilon/\varepsilon_{0} (see Sec. VI).

Refer to caption
Refer to caption
Figure 3: (Color online) chch index for the XY model on two-dimensional square lattice of size L=32L=32, with various value of ε/ε0\varepsilon/\varepsilon_{0} (a) 2.52.5, 33, 3.53.5 and (b) 44, 4.54.5, 55 (c) 5.55.5, 66, 6.56.5 (d) 77, 88, 99. The error bar is obtained by (e) comparing the TcT_{c} vs. ε/ε0\varepsilon/\varepsilon_{0} using chch, Ref. nature , and the latest MC result kao . (f) Histogram of TcT_{c} obtained by using various ε/ε0\varepsilon/\varepsilon_{0}. (g)-(i) chch index for the XY model on two-dimensional square lattice of size L=8L=8 and L=16L=16, and the scaling of estimated critical points.

In Fig. 3 (g)-(i), the finite size effect of TcT_{c} is checked using the chch with two smaller sizes L=8L=8 and L=16L=16 with ε/ε0=5.5\varepsilon/\varepsilon_{0}=5.5, 66 & 6.56.5. Combining the estimated Tc(L)T_{c}(L) with L=8,16,32L=8,16,32, we use two different ways of (linear, exponential) extrapolation to get Tc()T_{c}(\infty) in the thermodynamic limit. The results are 0.88(5)0.88(5) and 0.89(5)0.89(5), respectively.

IV.2 2D XY model on honeycomb lattices

Here, we study the second example , i.e., the pure XY model on the honeycomb lattice, as the critical point is known exactly at Tc=1/20.707T_{c}=1/\sqrt{2}\approx 0.707 0.7 .

The geometry of the honeycomb lattice is equivalent to the 8×88\times 8 brick-wall lattice shown in Fig. 4, where every spin has three nearest-neighbor spins. To initialize the configurations with a fixed winding number (νx\nu_{x}, νy\nu_{y})=(0, 1), the spins connected by solid gray lines are defined as forming νy=1\nu_{y}=1 in the vertical direction. Specifically, if we start a spin at position (2,1), and then go left to \rightarrow (1,1) \rightarrow (1,2) \cdots (2,8) and go back to (2,1) through the red dashed lines, connecting the sites at the boundaries for periodic boundary conditions, the spins sweep an angle of 2π2\pi counter-clockwise.

It should be noted that two spins, such as (1,1) and (2,1), connected by the horizontal gray lines will contribute to the winding number νy\nu_{y}, and they also contribute to νx\nu_{x}. This poses a problem when fixing νx\nu_{x} and νy\nu_{y} independently. Fortunately, this problem can be solved. Specifically, in the first row, labeled by 1 in the vertical (yy) direction, the relation of angles obeys θ(2,1)θ(1,1)=(θ(1,1)θ(2,1))(θ(3,1)θ(2,1))\theta_{(2,1)}-\theta_{(1,1)}=-(\theta_{(1,1)}-\theta_{(2,1)})\approx-(\theta_{(3,1)}-\theta_{(2,1)}). During the simulation, small fluctuations are allowed if they do not break the winding number.

Refer to caption
Figure 4: (color online) A 8×88\times 8 honeycomb lattice equivalent to a brick-wall lattice. The gray solid line connects the spin forming the winding number in the yy-direction, i.e. (νx\nu_{x}, νy\nu_{y})= (1, 0).

In Fig. 5, using configurations constrained in five topological sectors on the 32×3232\times 32 lattices, we find that the peaks are located at the exact value Tc=120.7T_{c}=\frac{1}{\sqrt{2}}\approx 0.7 for several different values of ε/ε0\varepsilon/\varepsilon_{0} in the interval [2,6] with intervals of 0.5. We also calculate the intersections by Δλ\Delta\lambda and σλ\sigma_{\lambda} at ε/ε0=\varepsilon/\varepsilon_{0}= 5, 6, 7. The intersection can also arrive at 0.70.7 when ε/ε0=7\varepsilon/\varepsilon_{0}=7, but not 5 and 6. It indicates that the chch performs better than the intersection as the intersection method may not give a high confidence of the transition.


Refer to caption
Refer to caption
Figure 5: (Color online) Locating the phase transition points of the pure XY model on the honeycomb lattices (a) the intersections by Δλ\Delta\lambda, σλ\sigma_{\lambda} (b) chch index (c) Comparison results between (a) and (b) and the exact result 2/2\sqrt{2}/2.
Refer to caption
Refer to caption
Figure 6: (Color online) (a) The phase diagram of the q=2q=2 GXY model, containing NN, FF and PP, by calculating the chch index. The dashed lines are from Ref. q2mc (b) NN-PP transition at fixed Δ=0\Delta=0 : chch vs. TT. (c) NN-PP transition Δ=0.1\Delta=0.1: chch vs. TT. The peaks are closer to the known values using the half-vortex constraint with LL up to 6464 when compared with those using only integer vortices.

V 2D Generalized XY (GXY) models

Here, we apply our method to the generalized XY models q8mc ; q2mc , whose Hamiltonian is given by

H=[Δcos(θiθj)+(1Δ)cos(qθiθj)],H=-\sum[\Delta\cos(\theta_{i}-\theta_{j})+(1-\Delta)\cos(q\theta_{i}-\theta_{j})],

where Δ\Delta is the relative weight of the pure XYXY model, and qq is another integer parameter that can drive the system to form a nematic phase. For both Δ=0\Delta=0 and Δ=1\Delta=1 the model reduces to the pure XY model (redefining qq as 11 in the first case), and hence the transition temperature is identical to that of the pure XY model. However, such a redefinition is not possible with Δ1\Delta\neq 1. The phase diagrams of the GXY models depend on the integer parameter qq, and they have been explored extensively.

V.1 q=2 GXY model

The q=2q=2 GXY model has a richer phase diagram than the XY model and has an additional new phase mlpxy . Thus, it is a good candidate model to test our method away from the pure XY limit. The phase diagram is illustrated in Fig. 6 (a) on the ΔT\Delta-T plane, and we also show the results from our unsupervised method and those from PCA as comparison. The symbols NN, FF, PP represent nematic, ferromagnetic and paramagnetic phases, respectively. The dashed lines are data from the MC simulations q2mc mainly of L=50L=50 up to L=300L=300. The color indicates the value of chch index obtained from simulation with the system size L=32L=32. We now discuss the FF-PP, NN-FF and NN-PP transitions as follows.

(i) The FF-PP phase transition: Interestingly, the positions of the peaks by chch are consistent with the dashed line of the phase boundary in the whole region Δ>0.4\Delta>0.4. The index chch performs very well where Δ\Delta is away from the pure XY model limit. The essential nature of the FF-PP phase transition is still BKT.

(ii) The NN-PP phase transition: In the regime Δ=0\Delta=0, the chch peaks around T=0.7T=0.7, which is 0.20.2 less than 0.90.9. This discrepancy is likely due to the nature of half-vorticity in the NN phase. We can improve the result by limiting the configurations to have the half-winding number νx(y)=1/2\nu_{x(y)}=1/2 as the topological constraint in our Monte Carlo simulation. The half-vortex physics has been discussed in Ref. q2mc .

Thus, we only consider (νx\nu_{x},νy\nu_{y}) in the four types of combinations (±1/2\pm 1/2, 0) and (0, ±1/2\pm 1/2). To form (νx=1/2\nu_{x}=1/2,0), the difference between a pair of spins located at the left most and right most boundaries is fixed as π\pi and we assign each spin using the Eq. (14). With the half vortex constraint, our results illustrated by the red symbols move closer to the dashed lines of the NPN-P transition in Fig. 6 (a) than the results using the integer-value constraint.

Figs. 6 (b) and (c) show the details at Δ=0\Delta=0 and Δ=0.1\Delta=0.1, respectively. For L=64L=64, we generate two groups of data and the peaks almost converge in the interval [0.82,0.85][0.82,0.85], and the convergence is closer to 0.890.89 than the result from the integer vortex constraint. For Δ=0.1\Delta=0.1, the half vortex constraint gives better results at Tc=0.8T_{c}=0.8.

(iii) The NN-FF phase transition: When we realize that the topological constraint makes the peaks (features) of the distribution for the spin directions implicit in the FF and PP phases, we use the configurations without any constraint in this case. The results are labeled by purple symbols with legend ‘pcapca, L32L32’.

The behavior of chch depends on the structure of the sample points, and sometimes chch emerges as a sharp jump at the phase transition point such as in the Ising model discussed in the main text. Sometimes it is a local minimum value at the phase transition point. Take the q=2q=2 GXY model at T=0.5T=0.5 as an example, in Fig. 7, both ln(cht)\ln(ch_{t}) and ln(chb)\ln(ch_{b}) increase as a function of Δ\Delta. However the difference α=ln(cht)ln(chb)=gap\alpha=\ln(ch_{t})-\ln(ch_{b})=-\mbox{gap} decreases first and then increases around Δ2=0.22\Delta_{2}=0.22, where the gap has a positive value.

Refer to caption
Figure 7: For the q=2q=2 GXY model, chch, chtch_{t}, chbch_{b} in log-scale vs. Δ\Delta without any constraint. chch is obtained by applying k-PCA and k-means.

According to the following equation:

ch=exp(ln(cht)ln(chb))=exp(gap),ch=\exp(\ln(ch_{t})-\ln(ch_{b}))=\exp(-\mbox{gap}), (10)

clearly, min(ch)max(gap)min(ch)\Rightarrow\max(\mbox{gap}) and the local minimum of the chch is at the location of Δ2\Delta_{2}.

Refer to caption
Figure 8: In the NN (Δ=0.1\Delta=0.1) and FF (Δ=0.3\Delta=0.3) phases and critical points Δ=0.2\Delta=0.2, the distributions of spin vector of the q=2q=2 GXY model with (a1)-(a3) and without (b1)-(b3) constraints, respectively.

For the NN-FF phase transition in Fig. 6 (a), the PCA is stronger than the DM method. Here, we would like to give a physical discussion about this because such a difference is related to the advantage of the proposed method, and thus a more detailed discussion should be made.

The reasons for PCA being stronger for the NFN-F transition can be explained as follows:

(i) The NFN-F phase transition is not a topological phase transitionq8mc . The DM method designed here is to determine the topological phase transition. It is still interesting to see the chch of DM by inputting Ising configurations without any topological constraint. In Fig. 9, using k-PCA and the DM method with complete same configurations, the signatures of phase transition emerge and the results are consistent at Tc=2.3T_{c}=2.3. However, the signature of k-PCA method is clearer.

(ii) From the view of the data, it is also understandable that PCA is stronger. Without any topological constraint, in the nematic (NN) phase, the spins prefer two dominant directions and their histograms obey a double peaks structure. In the Ferromagnetic (FF) phase, one main direction remains and one peak emerges for the histograms. Fig. 8 (b1)-(b3) show the typical histograms in three phases. The main feature difference between the phases is obvious.

However, with a topological constraint, the spin directions are mainly distributed according to the winding numbers. For example, the spin angles in each row are θx=2πx/L\theta_{x}=2\pi x/L with additional fluctuations, where xx and LL are the number index and total number, respectively, in each row. Therefore, the distribution sits almost in the range [0,2π][0,2\pi] as shown in Figs. 8(a1)-(a3). The feature difference of spin angles disappears when using the constraint. Therefore, the DM method cannot get transition points using the configurations with constraints.


Refer to caption
Figure 9: (Color online) The value of chbch_{b} obtained by kernel PCA (k-PCA) and the DM model for the complete same configurations of the 64×6464\times 64 Ising model without any constraint.
Refer to caption
Refer to caption
Figure 10: (Color online) (a) Phase diagram (with NN, FF, F2F_{2}, PP phases) of the q=8q=8 GXY model and results from calculating the chch index, whose values are represented by colors. The dashed blue lines are MC results q8mc . (b) The curves of scsc, scasc_{a}, scbsc_{b} vs. TT at Δ=0.5\Delta=0.5. (c) and (d) The curves of chch, chtch_{t} and chbch_{b} vs. T, at Δ\Delta=0.5 (and note that Tc=0.5T_{c}=0.5). (e)-(g): Scatter plot of Ψ1\Psi_{1} and Ψ2\Psi_{2} at T=0.1T=0.1, 0.50.5 and 1.01.0. (h)-(j): Zoom-in view for a fixed group of data.

V.2 q=8 GXY model

For the q=8q=8 GXY model q8mc , Fig. 10 (a) shows the global phase diagram using the values of the chch index. The phases NN, F2F_{2}, FF, PP are obtained and the distributions are also shown.

In the new F2F_{2} phase, the distribution of the spin vectors has 8 peaks, but is dominated by 44 possible angles. The ‘X’ shape dashed lines are from Refs. q2mc ; q8mc . The orange color represents the values of chch by the DM method.

Let us first discuss the FF(F2F_{2})-PP transition. Clearly, for FF-PP transition in the range 0.5<Δ<10.5<\Delta<1, the peak positions of chch align well with the dashed line. In the center of ’X’, the F2F_{2} and PP transition are also consistent with MC result i.e., Tc=0.5T_{c}=0.5.

However, we could not use the intersection of the cluster average distance D¯\bar{D} and within-cluster dispersion σ¯\bar{\sigma} as described in Ref. nature because D¯\bar{D} does not vary too much and there is no intersection as shown in Figs. 10 (e)-(g). The five different colors therein represent the five typical topological sectors. However, fortunately, when zooming in the figures in Figs. 10 (g)-(i), we find that, near the transition region, the shape of a fixed cluster shrinks because the data points gather closer together, and hence the within-cluster dispersion σ¯\bar{\sigma} is smaller. The index ch=chtchbch=\frac{ch_{t}}{ch_{b}} thus develops a peak around Tc=0.5T_{c}=0.5 as shown in Fig. 10 (c), with chtch_{t} and chbch_{b} also displayed in Fig. 10 (d). In contrast, Fig. 10 (b) shows that scsc cannot be used to signal the transition temperature TcT_{c} because it evaluates the difference, scbscasc_{b}-sc_{a}, but scascbsc_{a}\ll sc_{b} in spite of the fact that scasc_{a} has a local minimum.

It should be mentioned that for the NN-PP transition, the use of either the integer or half vortex constraint is not suitable. Instead the νx(y)=1/8\nu_{x(y)}=1/8 vortex constraint is needed in generating the configurations. Moreover, after comparison, we find that kPCA works better if we use θi\theta_{i} as the input into the kPCA (using PCA does not work well). The kernel used for kPCA defined as a radial basis functional kernel exp(γx/Ly/L2)\exp(-\gamma||x/L-y/L||^{2}), where L=32L=32 is the system size, xx and yy are the configurations {θi}\{\theta_{i}\} and γ=1\gamma=1 is the default value gp .

The other details of the FF-PP, F2F_{2}-FF, NN-F2F_{2} and NN-PP transitions will be discussed as follows.

Refer to caption
Figure 11: In the NN, F2F_{2}, FF phases, the distributions of spin vector of the q=8q=8 GXY model with (a1)-(a3) and without (b1)-(b3) constraints, respectively.

For the q=8q=8 GXY model, Figs. 11 show the distributions of spin vectors in the (a1) NN, (a2) F2F_{2} and (a3) FF phases with the integer-vortex constraint. The distributions of spin vectors in the NN and F2F_{2} phase have no obvious differences. To distinguish the NN and F2F_{2} phases, the constraints are canceled and the distributions are shown in Figs. 11 (b1)-(b3) respectively.

Refer to caption
Figure 12: (Color online) (a) FF-PP: chch vs TT at fixed Δ\Delta in the interval [0.5,1][0.5,1]. The peaks are at 0.50.5, 0.60.6, 0.70.7, 0.80.8, 0.90.9 and 1, respectively. (b) F2F_{2}-FF: chch vs. TT and TcT_{c} is 0.20.2 for Δ=0.8\Delta=0.8. (c) the numerator chtch_{t} and denominator chbch_{b} of the chch index in Eq. (6). (c) NN-F2F_{2}: chch vs. Δ\Delta for T=0.2T=0.2, 0.30.3 and 0.40.4. (d) NN-PP: using the 18\frac{1}{8} vortex constraint and kPCA, chch vs. TT and TcT_{c} is about 0.90.9.

Figs. 12 (a)-(d) show the detail of phase transition FF-PP, F2F_{2}-FF, NN-F2F_{2}, respectively. For the FF-PP transition, in Fig. 12 (a), fixing Δ\Delta in the interval [0.5,1][0.5,1] at steps of 0.10.1, the peaks of chch are located at 0.50.5 and 11 respectively, completely matching the dashed lines in the global phase diagram in Fig.10.

In Fig. 12 (b), for the F2F_{2}-FF transition, by fixing Δ=0.8\Delta=0.8, the peaks of chch are located at 0.20.2 with L=8L=8, 1616, 3232, 4848 and 6464. The other values of Δ\Delta are not shown. In Fig. 12 (c), fixing TT=0.20.2, 0.30.3 and 0.40.4, the positions Δ\Delta of local minimum of chch located at 0.20.2, 0.30.3 and 0.40.4 respectively.

In Fig.12 (d), for Δ=0\Delta=0, using the 1/81/8 vortex constraint results in the most accurate critical points. The peak position is at 0.90.9 more closely to 11 than 0.70.7 by the integer vortex constraint. The error bars are obtained by the bootstrap method using 400 randomly chosen configurations between the total 2000 configurations and total of 20 bins.

VI Other technical modifications

In the above sections, during the use of the k-means method, we apply two output eigenvectors of the diffusion matrix PP and then get the values of the chch index. The conclusion is that using two leading vectors leads to the best accuracy. At the same time, it is possible to design a way to determine the super parameter ε/ε0\varepsilon/\varepsilon_{0} automatically. In this section, we will focus on such issues for the completeness of our method.


Refer to caption
Figure 13: chch index as a function of TT for various value of Δd\Delta d (a) Δd=3\Delta d=3 (b) Δd=4\Delta d=4 (c) Δd=5\Delta d=5 for Ising systems. chch. vs. TT of the GXY model at fixed Δ=0.5\Delta=0.5 and varying TT with number of vectors considered (d) Δd=3\Delta d=3 and 44 (e) Δd=5\Delta d=5 (f) Δd=6\Delta d=6, respectively. The results are best using two vectors Δd=2\Delta d=2.

Here, the effects of higher-dimensional features will be discussed using k-means, namely, whether or not the accuracy is enhanced when including more features will be clarified. The answer is that retaining more dimensions of the eigenvectors, the result may deviate from the right critical points or even predict wrong critical points.

Assuming the covariance matrix of the PCA method is CC, and its eigenvectors are {Ψd}\{\Psi_{d}\}, where d=1,,dmax(Δd=dmax)d=1,\cdots,d_{max}(\Delta d=d_{max}). For the Ising model, in Sec. III, the first and second vectors {Ψd}\{\Psi_{d}\}, are used, where dmax=2d_{max}=2, or Δd=2\Delta d=2. Here, we consider vectors with Δd=3\Delta d=3, 44 and 55. The difference is found between the results of two-dimensional vectors and higher dimensional data, as shown in Figs. 13(a)-(c). The estimated position of peaks are 2.312.31, 2.312.31 and 2.332.33 respectively. The result Tc=2.3T_{c}=2.3 for Δd=2\Delta d=2 is the closest to the peak of the specific heat.

For the generalized XY model (q=8q=8, Δ=0.5\Delta=0.5), with the DM method, the vectors with index d=2,,dmaxd=2,\cdots,d_{max} (Δd=dmax1\Delta d=d_{max}-1) are used as the first vector ignored. In Fig.13 (d), with Δd=3\Delta d=3 or 44, the position of peaks are Tc=0.5T_{c}=0.5 as the same as that of Δd=2\Delta d=2. However, if higher-dimensional data are used, such as Δd=5\Delta d=5 and 66, the chch score does not yield the correct signature of the phase transition as shown in Figs.13 (e) and (f).

Another issue discussed here is whether one can design an automatic method to determine the hyperparameter ε/ε0\varepsilon/\varepsilon_{0}. Since its value has been tried many times for some reasonable choices, one may expect to find a way to determine its optimal value automatically.

To the best of our knowledge, at present, there are no such automatic determination of the hyperparameter ε/ε0\varepsilon/\varepsilon_{0}. Here we devise a simple analysis from the statistics (histogram) of TcT_{c} obtained by various ε/ε0\varepsilon/\varepsilon_{0}:

  1. i.

    Calculate ch(T)ch(T) as a function of temperature TT with various ε/ε0\varepsilon/\varepsilon_{0} in a range from εmin/ε0\varepsilon^{min}/\varepsilon_{0} to εmax/ε0\varepsilon^{max}/\varepsilon_{0} by using the configurations with topological constraints;

  2. ii.

    Store the position of chch peaks, i.e., TcT_{c} and the times they appear as shown in Fig. 3 (e) and (f). Sometimes, the peaks may not be not obvious, and a plateau emerges. The two ends of the plateau will be stored.

The histogram of TcT_{c} vs. ε/ε0\varepsilon/\varepsilon_{0} can be used to give the best estimate for the transition. Therefore, corresponding to the highest position Tc=0.96T_{c}=0.96, the values 4ε/ε064\leq\varepsilon/\varepsilon_{0}\leq 6 are acceptable.

Another possible approach is to calculate a location dependent σ\sigma for each data point instead of selecting a single scaling parameter localsigma . Then, the kernel matrix between a pair of points can be written as

ω(xj,xj)=exp(xixj2σiσj).\omega(x_{j},x_{j})=\exp(-\frac{||{x_{i}-x_{j}}||^{2}}{\sigma_{i}\sigma_{j}}). (11)

where σi\sigma_{i} and σj\sigma_{j} are the local scale parameters for xix_{i} and xjx_{j}, respectively. The selection of the local scale σi\sigma_{i} is determined by the local statistics of the neighborhood of point xix_{i}. For example, the scale can be set as

σi=xixK.\sigma_{i}=||{x_{i}-x_{K}}||. (12)

where xKx_{K} is the KK-th nearest neighbor. However, here, K is also a hyperparameter to be chosen. By comparing Eq. (11) and Eq. (1), 2Nε=σiσj2N\varepsilon=\sigma_{i}\sigma_{j}, the obtained ε/ε030\varepsilon/\varepsilon_{0}\approx 30 is about several times larger than the acceptable regimes.

VII Discussion and Conclusion

In summary, we use the Calinski-Harabaz (chch) index to successfully locate phase transitions of a few classical statistical models, including the Ising, XY and the generalized XY models. From the scatter data, for which we use the chch index to obtain the phase transitions for the Ising, XY and GXY models on lattices. This combines the advantages of the PCA and DM methods, as the scattering data can be based on either the first two leading eigenvectors of the PCA Kernel-PCA or the second and third vectors from DM method.

The advantages of using the chch index are less steps, wider applications and better convergence. After k-means applied to the eigenvectors of the diffusion matrix PP, using the chch index, we do not need to maximize the visibility of cluster as proposed in RNS method. For some phase transitions, it may not be easy to find the intersections of D¯\bar{D} and σ\sigma. In Figs. 10 (e)-(g), chch index could capture the signatures of both quantities. In Fig. 5, for the pure XY model on the honeycomb lattices, the exact critical point is at Tc=0.707T_{c}=0.707. Using chch, when ε/ε0\varepsilon/\varepsilon_{0} is in the interval [2,6], the estimated results of TcT_{c} are all close to 0.70.7.

In the pure XY limit, we have tested that chch can locate the phase transition for the XY model on both the square and honeycomb lattices, similar to the DM method of Rodriguez-Nieva and Scheurer. For the q=2q=2 GXY model, the values of the chch index in the whole phase diagram matched the boundary between the ferromagnetic phase to paramagnetic phase very well even away from the pure XY limit. Close to the NN-PP transition, the accuracy can be improved by using the 1/21/2-vortex constraint in generating the Monte Carlo configurations. For the q=8q=8 GXY model, using intersections of D¯\bar{D} and σ¯\bar{\sigma} cannot be used to locate the transition point, such as at Δ=0.5\Delta=0.5, but the chch index can still work to locate the phase transition point due to its incorporation of the fluctuation of samples within each cluster. Moreover, the NN-PP transition can also be identified by using the 1/81/8 vortex constraint.

We have also compared the results from other indices (see Table 1), but we find that the chch index works best overall for the models considered in this work. In the future, it will be desirable to systematically study the utility of various parameters in models of statistical physics that exhibit different natures of transitions.

The development of applying machine learning to physics could develop new methods for studying unknown phases. For the Ising model or similar models, the transition point is well known. This situation will help to check our proposed method.

If we get the first-hand data from experiments, and do not know the details of the Hamiltonian, our unsupervised method can help deduce the number of possible classes (phases) in the data. Furthermore, using the chch index we could identify phase transition points. Therefore, the method of using the chch index is very useful for future non-topological phase transitions. For the XY model and Hamiltonians similar to the GXY model, by using the chch index, topological phase transition points were obtained without using the traditional method of measuring correlations q8mc and spin stiffness kao . Of course, some reasonable prior knowledge is needed such as the possible winding numbers. The idea of DM can be applied to topological quantum systems (see Refs. band ; dff ). In principle, the topological quantum phases and transitions between them may be probed using the chch or scsc indices.

Acknowledgements.
We thank for the valuable discussion with J. F. Rodriguez-Nieva, M. S. Scheurer, Y. Z. You and T. Chotibut on simulations. We also thank Tony C Scott for his help. TCW is supported by the National Science Foundation under Grant No. PHY 1915165 J. Wang and H. Tian are supported by NSFC under Grant No. 51901152. W. Zhang is supported from Science Foundation for Youths of Shanxi Province under Grant No. 201901D211082;

References

  • (1) H. E. Stanley, Introduction to Phase Transitions and Critical Phenomena (Oxford University Press, New York, 1971).
  • (2) S. Sachdev, Quantum Phase Transitions, (Cambridge University Press, UK, 1999).
  • (3) J. Carrasquilla and R. Melko, Machine learning phases of matter, Nat. Phys. 13, 431 (2017).
  • (4) K. I. Aoki, T. Kobayashi, Restricted Boltzmann Machines for the Long Range Ising Models, Mod. Phys. Lett. B 30, 1651401 (2016).
  • (5) D. Kim and D. Kim, Smallest neural network to learn the Ising criticality, Phys. Rev. E 98, 022138 (2018).
  • (6) W. Hu, R. R. P. Singh, and R. T. Scalettar, Discovering phases, phase transitions, and crossovers through unsupervised machine learning: A critical examination, Phys. Rev. E 95, 062122 (2017).
  • (7) M. Beach, A. Golubeva and R. Melko, Machine learning vortices at the Kosterlitz-Thouless transition, Phys. Rev. B 97, 045207 (2018).
  • (8) P. Suchsland and S. Wessel, Parameter diagnostics of phases and phase transition learning by neural networks, Phys. Rev. B 97, 174435 (2018).
  • (9) W. Zhang, J. Liu and T.-C. Wei, Machine learning of phase transitions in the percolation and XY models, Phys. Rev. E 99, 032142 (2019).
  • (10) Kosterlitz, J. M. Thouless and D. J. Ordering, metastability and phase transitions in two-dimensional systems, J. Phys. C 6, 1181 (1973).
  • (11) Berezinskii, V. Destruction of long-range order in one-dimensional and two-dimensional systems possessing a continuous symmetry group. II. Quantum systems, Sov. J. Exp. Theor. Phys. 34, 610 (1972).
  • (12) V. Singh and J. Han, Application of machine learning to two-dimensional Dzyaloshinskii-Moriya ferromagnets, Phys. Rev. B 99, 174426, (2019).
  • (13) I. A. Iakovlev, O. M. Sotnikov and V. V. Mazurenko Supervised learning approach for recognizing magnetic skyrmion phases, Phys. Rev. B 98, 174411 (2018).
  • (14) C. Li, D. Tan and F. Jiang, Applications of neural networks to the studies of phase transitions of two-dimensional Potts models, Ann. Phys. 391, 312 (2018).
  • (15) E. van Nieuwenburg, Y. Liu and S. Huber, Learning phase transitions by confusion, Nat. Phys. 13, 435 (2017).
  • (16) P. Broecker, J. Carrasquilla, R. Melko and S. Trebst, Machine learning quantum phases of matter beyond the fermion sign problem, Sci. Rep. 7 8823 (2017).
  • (17) X. Dong, F. Pollmann and X. Zhang, Machine learning of quantum phase transitions, Phys. Rev. B 99, 121104 (2018).
  • (18) L. Wang, Discovering Phase Transitions with Unsupervised Learning, Phys. Rev. B 94, 195105, (2016).
  • (19) S. J. Wetzel, Unsupervised learning of phase transitions: From principal component analysis to variational autoencoders, Phys. Rev. E 96, 022140 (2017).
  • (20) K. Ch’ng, N. Vazquez and E. Khatami, Unsupervised machine learning account of magnetic transitions in the Hubbard model, Phys. Rev. E 97, 13306 (2018).
  • (21) T. C. Scott, M. Therani, and X. M. Wang, Data Clustering with Quantum Mechanics, Mathematics, 5, 5 (2017).
  • (22) E. van Nieuwenburg, E. Bairey and G. Refael, Learning phase transitions from dynamics, Phys. Rev. B 98, 060301 (2018).
  • (23) J. Venderley, V. Khemani and E. Kim, Machine learning out-of-equilibrium phases of matter, Phys. Rev. Lett. 120, 257204 (2018).
  • (24) C. Casert, T. Vieijra, J. Nys and J. Ryckebusch, Interpretable machine learning for inferring the phase boundaries in a nonequilibrium system, Phys. Rev. E 99, 023304 (2019).
  • (25) P. Huembeli, A. Dauphin and P. Wittek, Identifying quantum phase transitions with adversarial neural networks, Phys. Rev. B 97, 134109 (2018).
  • (26) Y. Liu and E. P. L. van Nieuwenburg, Discriminative Cooperative Networks for Detecting Phase Transitions, Phys. Rev. Lett. 120, 176401 (2018).
  • (27) S. Efthymiou, M. J. S. Beach, and R. G. Melko, Super-resolving the Ising model with convolutional neural networks, Phys. Rev. B 99, 075113 (2019).
  • (28) W. Lian et al. Machine Learning Topological Phases with a Solid-State Quantum Simulator, Phys. Rev. Lett. 122, 210503 (2019).
  • (29) A. Bohrdt et al., Classifying snapshots of the doped Hubbard model with machine learning, Nat. Phys. 15, 921 (2019).
  • (30) P. Zhang, H. Shen and H. Zhai, Machine Learning Topological Invariants with Neural Networks, Phys. Rev. Lett. 120, 066401 (2017).
  • (31) Y. Zhang, R. Melko and E. Kim, Machine learning Z2 quantum spin liquids with quasiparticle statistics, Phys. Rev. B 96, 245119 (2017).
  • (32) Y. Zhang and E. A. Kim, Quantum Loop Topography for Machine Learning, Phys. Rev. Lett. 118, 216401 (2017).
  • (33) Y. Hsu, X. Li, D. Deng and S. Das Sarma, Machine Learning Many-Body Localization: Search for the Elusive Nonergodic Metal, Phys. Rev. Lett. 121, 245701 (2018).
  • (34) Z. Cai and J. Liu Approximating quantum many-body wave functions using artificial neural networks, Phys. Rev. B 97,035116 (2018).
  • (35) R. Vargas-Hernández, J. Sous, M. Berciu and R. Krems, Extrapolating Quantum Observables with Machine Learning: Inferring Multiple Phase Transitions from Properties of a Single Phase, Phys. Rev. Lett. 121, 255702 (2018).
  • (36) A. Shirinyan, V. Kozin, J. Hellsvik and M. Pereiro, Self-organizing maps as a method for detecting phase transitions and phase identification, Phys. Rev. B 99, 041108 (2019).
  • (37) C. Giannetti, B. Lucini and D. Vadacchino, Machine Learning as a universal tool for quantitative investigations of phase transitions, Nucl. Phys. 944 114639 (2019).
  • (38) J. Greitemann, K. Liu and L. Pollet Probing hidden spin order with interpretable machine learning, Phys. Rev. B 99, 060404 (2019)
  • (39) S. S. Lee and B. J. Kim, Confusion scheme in machine learning detects double phase transitions and quasi-long-range order, Phys. Rev. E 99, 043308 (2019).
  • (40) P. Ponte and R. G. Melko, Kernel methods for interpretable machine learning of order parameters, Phys. Rev. B 96, 205146 (2017).
  • (41) Z. Li, M. Luo, and X. Wan, Extracting critical exponents by finite-size scaling with convolutional neural networks, Phys. Rev. B 99, 075418 (2019).
  • (42) Giuseppe Carleo et al., Machine learning and the physical sciences, Rev. Mod. Phys. 91, 045002 (2019).
  • (43) J. F. Rodriguez-Nieva and M. S. Scheurer, Identifying topological order through unsupervised machine learning, Nat. Phys. 15, 790 (2019).
  • (44) R. R Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, S.W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, PNAS. 102 (21), 7426(2005).
  • (45) T. Caliński and J. Harabasz, A dendrite method for cluster analysis, Commun. Stat. - Theory Methods 3, (1974).
  • (46) P. J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20, 53 (1987).
  • (47) R. H. Swendsen, J. S. Wang, Nonuniversal critical dynamics in Monte Carlo simulations, Phys. Rev. Lett. 58 86 (1987).
  • (48) J. F. Rodriguez-Nieva and M. S. Scheurer, arXiv:1805.05961v1.
  • (49) Y. Liu, Z. Li, H. Xiong, X. Gao, J. Wu and S. Wu, Understanding and Enhancement of Internal Clustering Validation Measures, in IEEE Transactions on Cybernetics, 43, 3, (2013).
  • (50) N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, E. Teller, Equation of State Calculations by Fast Computing Machines, J. Chem. Phys. 21 1087 (1953).
  • (51) U. Wolff, Collective Monte Carlo Updating for Spin Systems, Phys. Rev. Lett. 62, 361 (1989).
  • (52) Y. Hsieh, Y. J. Kao and A. W. Sandvik. Finite-size scaling method for the Berezinskii-Kosterlitz-Thouless transition, J Stat. Mech-Theory 2013, P09001 (2013).
  • (53) B. Nienhuis, Exact Critical Point and Critical Exponents of O(nn) Models in Two Dimensions, Phys. Rev. Lett. 49, 1062 (1982); Y. Deng, T. Garoni, W. Guo, H. Blöte and A. Sokal, Cluster simulations of loop models on two- dimensional lattices, Phys. Rev. Lett. 98, 120601 (2007).
  • (54) D. M. Hübscher and S. Wessel, Stiffness jump in the generalized XY model on the square lattice, Phys. Rev. E 87, 062112 (2013).
  • (55) G. A. Canova, Y. Levin, and J. J. Arenzon, Competing nematic interactions in a generalized XY model in two and three dimensions, Phys. Rev. E 94, 032140 (2016).
  • (56) F. Pedregosa, G. Varoquaux, A. Gramfort et. al., Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 12, 2825 (2011).
  • (57) L. Zelnik-Manor and P. Perona, Self-Tuning Spectral Clustering, Neural Inf. Process. Syst. 17 1601 (2004); G. Mishne and I. Cohen, Multiscale Anomaly Detection Using DFs, IEEE Journal of Selected Topics in Signal Processing (JSTSP), 7, 111 (2013).
  • (58) M. S. Scheurer and Robert-Jan Slager, Unsupervised machine learning and band topology, Phys. Rev. Lett. 124, 226401 (2020).
  • (59) Y. Long, J. Ren and H. Chen, Unsupervised Manifold Clustering of Topological Phononics, Phys. Rev. Lett. 124, 185501 (2020).

Appendix A The 1D XY model

The Hamiltonian of the pure XY model reads

H=Ji=1Ncos(θiθi+1)H=-J\,\sum_{i=1}^{N}\cos(\theta_{i}-\theta_{i+1}) (13)

where JJ is the coupling strength of the nearest neighboring pair of spins i,i+1\langle i,i+1\rangle and throughout this work, we set J=1J=1 for simplicity and use the periodic boundary condition θN+1=θ\theta_{N+1}=\theta. For simplicity, following Ref. nature , we generate the spin vectors according to the following,

θi(l)=2πν(l)i/N+δθi(l)+θ¯(l),\theta^{(l)}_{i}=2\pi\nu^{(l)}i/N+\delta\theta^{(l)}_{i}+\overline{\theta}^{(l)}, (14)

where ν\nu is the winding number, ll is the identification number of the configuration {θi\theta_{i}}, with ii varying from 1 to the total length NN. The first term 2πν(l)i/N2\pi\nu^{(l)}i/N is used to define the winding number ν=iΔi/2π\nu=\sum_{i}\Delta_{i}/2\pi, where the Δi\Delta_{i} is in the range [π,π)[-\pi,\pi) by the so-called saw function rogerxy obtained by replacing Δi\Delta_{i} with Δi±2π\Delta_{i}\pm 2\pi if it is not in the target range. The term δθi(l)\delta\theta^{(l)}_{i} obeys the Gaussian fluctuation and θ¯(l)\overline{\theta}^{(l)} is generated randomly between [0, 2π2\pi).

We consider two types of generated configurations with winding numbers ν=0\nu=0 and ν=1\nu=1. The histogram of the values of the first component of the diffusion map ψ1\psi_{1} is shown in Fig. 14 (a). The histogram of ψ1\psi_{1} has values at ±1\pm 1 and ψ\psi is a vector with size m×1m\times 1, therefore the values are equal to ±1\pm 1 when ψ1\psi_{1} is re-scaled by m\sqrt{m}. Fig. 14 (b) shows the largest 20 eigenvalues of the transition probability matrix Pl,lP_{l,l^{{}^{\prime}}} in Eq. (2). Two maximum eigenvalues are found equal to unity. Following Ref. nature , we also test ν=7\nu=7 according to the following equation,

θi(l)=2πν(l)i/N+δθi(l)+θ¯(l)+η(l)[1cos(2πi/N)]\theta^{(l)}_{i}=2\pi\nu^{(l)}i/N+\delta\theta^{(l)}_{i}+\overline{\theta}^{(l)}+\eta^{(l)}[1-\cos(2\pi i/N)] (15)

where ν={0,±1,±2,±3}\nu=\left\{0,\pm 1,\pm 2,\pm 3\right\}. We find that ψ1\psi_{1} has seven district values ranging from 0.03-0.03 to 0.030.03 , corresponding to seven winding numbers marked by the symbols in Fig. 14 (c). Eigenvalues λk\lambda_{k} vs. kk is also shown in Fig. 14 (d). Clearly, the plateau of eigenvalues appears in k7k\leq 7.

Refer to caption
Refer to caption
Figure 14: (Color online) (a) Histogram count of ψ1m\psi_{1}\sqrt{m}. (b) The largest 20 eigenvalues of the matrix PP (c) ν\nu vs. ψ1\psi_{1} for the configurations with seven topological sectors (d) The largest 3030 eigenvalues of the matrix PP. Clearly in (b) and (d), the leading 22 and 77 eigenvalues occur respectively.
Refer to caption
Figure 15: (Color online) The distributions of λk\lambda_{k} vs ε/ε0\varepsilon/\varepsilon_{0} of the 2D XY model for three typical temperatures (a) TT=0.3, (b) TT=0.6 and (c) TT=1.0. Right column: The corresponding λk\lambda_{k} vs kk. The five arrows point to the eigenvalues of the leading five topological sectors.

Figs.  15 (a)-(c) show the largest 24 eigenvalues λk24\lambda_{k\leq 24} of Pl,lP_{l,l_{{}^{\prime}}} as a function of ε/ε0\varepsilon/\varepsilon_{0} for T=0.3T=0.3, 0.6 and 1.0, respectively. The band of eigenvalues λk\lambda_{k} could not be distinguished for small values of ε/ε0\varepsilon/\varepsilon_{0} and the reason can be seen from the matrix of PP, which is a diagonal matrix in that limit. Increasing ε/ε0\varepsilon/\varepsilon_{0}, the band of λk\lambda_{k} will separate away from each other.

The choice of ε/ε0\varepsilon/\varepsilon_{0} is important. We choose ε/ε0=4\varepsilon/\varepsilon_{0}=4 marked by the dashed lines in the left column. The right column presents λk\lambda_{k} versus kk(k=0,1,k=0,1,\cdots). Clearly, the gap between the k=4k=4 eigenvalue and k=5k=5 eigenvalue becomes smaller when increasing temperature and subsequently disappears when T>TcT>T_{c}.

Appendix B The PCA and K-PCA

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 16: (Color online) PCA and kernel PCA with Gaussian and polynomial kernels on a simple dataset. PCA and kernel PCA with Gaussian and polynomial kernels for the generalized dataset.

Figs. 16, represent some results of PCA and kPCA on simple and generalized datasets in the appendix of Ref. nature .

The simple datasets are configurations of one-dimensional XY model with size N=256N=256, m=1000m=1000, and σθ=0.3\sigma_{\theta}=0.3 and with topological winding numbers ν=0\nu=0 and ν=1\nu=1 according to Eq. (14). Figure 16 show results of linear PCA, PCA with Gaussian kernels, and PCA with polynomial kernels, respectively (the top left, top middle and top right panels). Obviously, the classifications of PCA with a nonlinear kernel are much more clear for the XY models. The three dimensional visualization is based on the three reduced components. However, the above method fails for data generated with slight modification of Eq. (14) with an additional term η(l)[1cos(2πi/N)]\eta^{(l)}[1-cos(2\pi i/N)], where η(l)\eta^{(l)} is random in the range [η0-\eta_{0}, η0\eta_{0}]. The results are shown in Fig. 16 at the bottom left, bottom middle and bottom right panels, respectively.

Appendix C Other indices

As listed in Ref. 11index , we check the 11 indices listed in Table (1) for validating the classifications. Take the q=8q=8 GXY model as an example, the value of the parameters such as the temperature TT and Δ\Delta are fixed as those in Fig. 10 (b). We find only four indices, presented in Fig.17 (a)-(d) produce signals at the critical points. These are chch, IiIi, dndn and pbmpbm, whose full names are listed in Table (1).

The other indices could not give correct signals to locate the phase transition points.

Refer to caption
Figure 17: In the first row (a)-(d), the indices, called chch, IiIi, dndn, pbmpbm, can distinguish the F2F_{2} and PP phases. (e)-(k) The remaining indices fail to present signals.
Table 1: INTERNAL CLUSTERING VALIDATION MEASURES.
measure Notation Definiton
1 Calinski-Harabasz index chch inid2(ci,c)/(NC1)ixCid2(x,ci)/(nNC)\frac{\sum_{i}n_{i}d^{2}(c_{i},c)/(NC-1)}{\sum_{i}\sum_{x\in C_{i}}d^{2}(x,c_{i})/(n-NC)}
2 Silhouette index scsc
{1NCi[1nixCib(i)a(i)max(b(i),a(i))]}\left\{\par\frac{1}{NC}\sum_{i}[\frac{1}{n_{i}}\sum_{x\in C_{i}}\frac{b(i)-a(i)}{max(b(i),a(i))}]\right\}
a(x)=1ni1yCi,yxd(x,y)a(x)=\frac{1}{n_{i}-1}\sum_{y\in C_{i},y\neq x}d(x,y)
b(x)=minj.ji[1njyCjd(x,y)]b(x)=min_{j.j\neq i}[\frac{1}{n_{j}}\sum_{y\in C_{j}}d(x,y)]\par
3 Davies-Bouldin index dbdb {1NCimaxj,ji1nixCid(x,ci)+1njd(x,cj)d(ci,cj)}\left\{\frac{1}{NC}\sum_{i}max_{j,j\neq i}\frac{\frac{1}{n_{i}}\sum_{x\in C_{i}}d(x,c_{i})+\frac{1}{n_{j}}d(x,c_{j})}{d(c_{i},c_{j})}\right\}
4 SDbw validity index SDbwSDbw
{Scat(NC)+Dens_bw(NC)}\left\{Scat(NC)+Dens\_bw(NC)\right\}
scat(NC)=1NCiσ(Ci)σ(D)scat(NC)=\frac{1}{NC}\frac{\sum_{i}\|\sigma(C_{i})\|}{\|\sigma(D)\|}
Dens_bw(N)=Dens\_bw(N)=
1NC(NC1)i[j,jix(CiCj)f(x,uij)max{xCif(x,ci),xCjf(x,cj)}]\frac{1}{NC(NC-1)}\sum_{i}[\sum_{j,j\neq i}\frac{\sum_{x\in(C_{i}\cup C_{j})}f(x,u_{ij})}{max\{\sum_{x\in C_{i}}f(x,c_{i}),\sum_{x\in C_{j}}f(x,c_{j})\}}]
5 Xie-Beni index xbxb {ixCid2(x,ci)nmini,jid2(ci,cj)}\left\{\frac{\sum_{i}\sum_{x\in C_{i}}d^{2}(x,c_{i})}{n\cdot min_{i,j\neq i}d^{2}(c_{i},c_{j})}\right\}
6 Dunn’s indices dndn {mini[minj(minxCi,yCjd(x,y)maxk{maxx,yCkd(x,y)})]}\left\{min_{i}[min_{j}(\frac{min_{x\in C_{i},y\in C_{j}}d(x,y)}{max_{k}\left\{max_{x,y\in C_{k}}d(x,y)\right\}})]\right\}
7 pbm pbmpbm {1K×i=1nd(xi,yj)j=1NxiCjd(xi,yj)×maxi,j=1,2,..,Kd(yi,yj))}\left\{\frac{1}{K}\times{\frac{{\sum_{i=1}^{n}}d(x_{i},y_{j})}{\sum_{j=1}^{N}\sum_{x_{i}\in C_{j}}d\left(x_{i},y_{j}\right)}}\times max_{i,j=1,2,..,K}d(y_{i},y_{j}))\right\}
8 I index IiIi {[1NCxDd(x,c)ixCid(x,ci)maxi,jd(ci,cj)]P}\left\{[\frac{1}{NC}\cdot\frac{\sum_{x\in D}d(x,c)}{\sum_{i}\sum_{x\in C_{i}}d(x,c_{i})}\cdot max_{i,j}d(c_{i},c_{j})]^{P}\right\}
9 Root-mean-square std dev rmssrmss {[ixCixci2Pi(ni1)]12}\left\{[\frac{\sum_{i}\sum_{x\in C_{i}}\|x-c_{i}\|^{2}}{P\sum_{i}(n_{i}-1)}]^{\frac{1}{2}}\right\}
10 R-squared r2r2 {1ixCixci2xDxc2}\left\{1-\frac{\sum_{i}\sum_{x\in C_{i}}\|x-c_{i}\|^{2}}{\sum_{x\in D}\|x-c\|^{2}}\right\}
11 Modified Hubert Γ\Gamma statistic mhgsmhgs {2n(n1)xDyDd(x,y)dxC,ycj(ci,cj)}\left\{\frac{2}{n(n-1)}\sum_{x\in D}\sum_{y\in D}d(x,y)d_{x\in C_{,}y\in c_{j}}(c_{i},c_{j})\right\}
  • 1

    DD: dataset ; nn: number of objects in DD ; cc: center of DD ; PP: attributes number of DD ; NCNC: number of clusters ; CiC_{i}: the i-th cluster; nin_{i}: number of objects in CiC_{i} ; cic_{i}: center of CiC_{i} ; σ(Ci)\sigma(C_{i}): variance vector of CiC_{i} ; d(x,y)d(x,y): distance between xx and yy ; Xi=(XiTXi)12||X_{i}||=(X_{i}^{T}\cdot X_{i})^{\frac{1}{2}}.