This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Deep learning of topological phase transitions from entanglement aspects: An unsupervised way

Yuan-Hong Tsai1,2 [email protected]    Kuo-Feng Chiu3    Yong-Cheng Lai3    Kuan-Jung Su3    Tzu-Pei Yang3    Tsung-Pao Cheng3    Guang-Yu Huang3    Ming-Chiang Chung3,4,5 [email protected] 1 AI Foundation, Taipei, 106, Taiwan 2 Taiwan AI Academy, New Taipei, 241, Taiwan 3 Physics Department, National Chung-Hsing University, Taichung, 40227, Taiwan 4 Physics Department, National Center for Theoretical Sciences, Taipei, 10617, Taiwan 5 Physics Department, Northeastern university, 360 Huntington Ave., Boston, Massachusetts 02115, U.S.A.
Abstract

Machine learning techniques have been shown to be effective to recognize different phases of matter and produce phase diagrams in the parameter space interested, while they usually require prior labeled data to perform well. Here, we propose a machine learning procedure, mainly in an unsupervised manner, which can first identify topological/non-topological phases and then refine the locations of phase boundaries. By following this proposed procedure, we expand our previous work on the one-dimensional pp-wave superconductor [Phys. Rev. B 102, 054512 (2020)] and further on the Su-Schrieffer-Heeger model, with an emphasis on using the quantum entanglement-based quantities as the input features. We find that our method not only reproduces similar results to the previous work with sharp phase boundaries but importantly it also does not rely on prior knowledge of the phase space, e.g., the number of phases present. We conclude with a few remarks about its potential, limitations, and explainabilities.

I Introduction

Machine learning (ML) is not only a rapidly growing field of computer science recently with various applications from machine vision to natural language processing lecun15 , but also attracts much attention among researchers in the physical society. This technique is completely data-driven and thus a bottom-up method: Given a large amount of data or features, some function (often a neural network) is trained to correlate them with a more accessible form or condensed representations, which could be simply class labels or patterns. If such a trained function is well-generalizable, it can then predict (represent) an unknown new data point. Within several conceptual or practical applications in condensed matter physics, using ML to identify different phases of matter or to determine the phase boundaries are of particular interests Ohtsuki16 ; Nieuwenburg17 ; Carrasquilla17 ; Broecker17 ; Wetzel17 ; wang16 ; Tanaka17 ; Hu17 ; Broecker17b ; Wetzel17b ; Chng18 ; Liu18 ; Durr19 ; Kottmann20 . Moreover, it is quite remarkable that ML has also been shown insightful and to have great potential in considering topological phase transitions kim17a ; kim17b ; Zhang18 ; Sun18 ; Carvalho18 ; Ming19 ; Caio19 ; Zhang21 ; Scheurer19 ; Greplova20 ; Scheurer20 , where no obvious local order parameters are available.

There are two main approaches when using ML to classify phases of matter, no matter topological or symmetry-breaking. One is based on supervised learning, in which each training data sample should be labeled by a well-known regime (phase) Ohtsuki16 ; Nieuwenburg17 ; Carrasquilla17 ; Broecker17 ; Wetzel17 ; kim17a ; kim17b ; Zhang18 ; Sun18 ; Carvalho18 ; Ming19 ; Caio19 ; Zhang21 . This approach often results in a better phase boundary decision while it requires prior knowledge of the underlying phases of the system such as the number of total phases in a focused parameter space. Therefore, it could lose the possibility of learning any hidden or unknown phases. On the contrary, the other one is called unsupervised learning and it requires no prior labeling and learns simply from the training data themselves. As a result, the unsupervised learning would be a more natural choice when one wants to explore the parameter space where known a priori is none or almost lack of.

In fact, tasks about identifying phases of matter by using unsupervised learning approach have been shown to be feasible as well as suggestive of new perspectives wang16 ; Tanaka17 ; Hu17 ; Broecker17b ; Wetzel17b ; Chng18 ; Liu18 ; Durr19 ; Kottmann20 . For instance, algorithms such as autoencoders Lecun87 ; Kamp88 ; Hinton94 can extract out a local order parameter in the two dimensional (2D) Ising model Hu17 ; Wetzel17b ; clustering and dimensional reduction techniques such as diffusion maps have been employed to accurately distinguish from different topological sectors in the 2D XY model Scheurer19 . However, while these examples reflect the “effectiveness” of the unsupervised approach, the resulting phase boundaries are often not comparable to the supervised counterparts.

Therefore, we here propose to apply an unsupervised learning method for identifying (symmetry-protected) topological phases of matter, optionally followed by a “supervised” learning to further determine phase boundaries more accurately. We expand our previous work tsai20 on the one-dimensional pp-wave superconductor (1D pp-SC) kitaev01 and further on the Su-Schrieffer-Heeger (SSH) model Su79 , where both of the systems possess non-trivial topological phases. Moreover, unlike previous studies on similar systems Nieuwenburg17 ; Zhang18 , we emphasize the quantum information-based quantities as our input features for machine learning. In particular, we would focus on the block correlation matrices and Majorana correlation matrices, which have been approved effective according to Ref. tsai20, . We find that our proposed strategy not only reproduces similar results with sharp phase boundaries, but notably it does not rely on any prior knowledge of the phase space (e.g. number of phases present).

II Models

In this paper, we demonstrate the effectiveness of our proposed strategy on two classic models with topological phase transitions. The first one is Kitaev’s 1D pp-wave superconductor of spinless fermions kitaev01 :

H\displaystyle H =\displaystyle= it(cici+1+ci+1ci)\displaystyle\sum_{i}-t\left(c_{i}^{\dagger}c_{i+1}+c_{i+1}^{\dagger}c_{i}\right) (1)
+\displaystyle+ Δ(cici+1+ci+1ci)μ(cici1/2),\displaystyle\Delta\left(c_{i}c_{i+1}+c_{i+1}^{\dagger}c_{i}^{\dagger}\right)-\mu\left(c_{i}^{\dagger}c_{i}-1/2\right),

with the nearest-neighbor hopping amplitude tt, superconducting pairing potential Δ\Delta, and on-site chemical potential μ\mu. Due to translational invariance of HH it can be transformed into momentum space as

H=kBZ(ck,ck)[𝐑(k)𝝈](ck,ck)T,H=-\sum_{k\in BZ}\left(c_{k}^{\dagger},c_{-k}\right)\left[{\mathbf{R}}(k)\cdot{\boldsymbol{\sigma}}\right]\left(c_{k},c_{-k}^{\dagger}\right)^{T}, (2)

where Pauli matrices 𝝈=(σx,σy,σz){\boldsymbol{\sigma}}=(\sigma_{x},\sigma_{y},\sigma_{z}), and 𝐑(k)=(0,Δsink,tcosk+μ/2){\mathbf{R}}(k)=(0,-\Delta\sin{k},t\cos{k}+\mu/2) is the pseudo-magnetic field. One can compute its one-particle energy spectrum easily, which is given by ϵ(k)=±(2tcosk+μ)2+4Δ2sin2k\epsilon(k)=\pm\sqrt{(2t\cos{k}+\mu)^{2}+4\Delta^{2}\sin^{2}{k}}. The spinless pp-wave superconductor (1) breaks both time-reversal and chiral symmetries while keeps the particle-hole symmetry intact, and thus it belongs to the class D according to the ten-fold way classification for symmetry-protected topological systems Schnyder08 . The system ground state can be characterized by a Z2Z_{2} topological invariant.

Refer to caption
Figure 1: (a) Topological phase diagram of the 1D pp-wave superconductor. The chain-like inset pictures schematically represent ground state for each phase in terms of Majorana fermions as described in the main context. The pairing between neighboring sites indicates phases I and II are topological, while the others are not. (b) Topological phase diagram of SSH model as a function of v/wv/w (lower part). Whether the trajectory of 𝐑(k){\mathbf{R}^{\prime}(k)} encloses the origin (red point) as kk runs over the first Brillouin zone determines if the corresponding phase is topological (upper part).

To make a clearer physical picture, we employ Majorana operators, d2j1=cj+cjd_{2j-1}=c_{j}+c_{j}^{\dagger} and d2j=i(cjcj)d_{2j}=-i(c_{j}-c_{j}^{\dagger}), to rewrite Eq. (1) as

H\displaystyle H =\displaystyle= i2j[(t+|Δ|)d2j1d2j+2+(t+|Δ|)d2jd2j+1\displaystyle\frac{i}{2}\sum_{j}[(-t+|\Delta|)d_{2j-1}d_{2j+2}+(t+|\Delta|)d_{2j}d_{2j+1} (3)
\displaystyle- μd2j1d2j].\displaystyle\mu d_{2j-1}d_{2j}].

When |μ|>2t|\mu|>2t, Eq. (3) can be adiabatically transformed into the form, iμ2jd2j1d2j\frac{-i\mu}{2}\sum_{j}d_{2j-1}d_{2j}, where t=|Δ|=0t=|\Delta|=0 and μ<0\mu<0. As schematically depicted in Fig. 1(a), the ground state of this simplified Hamiltonian is now composed of a paired Majorana fermions at the same site, leading to no Majorana edge modes and hence belongs to a topologically trivial phase (phases III and IV). In the opposite situation where |μ|<2t|\mu|<2t, Eq. (3) can be adiabatically transformed into a special case, itjd2jd2j+1it\sum_{j}d_{2j}d_{2j+1}, where t=|Δ|>0t=|\Delta|>0 and μ=0\mu=0. The ground state in this case can then be viewed as follows: Most Majorana fermions from neighboring sites are paired together while the system leaves the edge Majorana modes alone (unpaired), and thus corresponding to a nontrivial phase [phases I and II in Fig. 1(a)].

The other model we have studied is the Su-Schrieffer-Heeger (SSH) model that describes spinless fermions on 1D lattice with two-site (α\alpha and β\beta) unit cells at half-filling Su79 :

H=iv(ci,αci,β+h.c.)+w(ci+1,αci,β+h.c.),H=\sum_{i}v\left(c_{i,\alpha}^{\dagger}c_{i,\beta}+h.c.\right)+w\left(c_{i+1,\alpha}^{\dagger}c_{i,\beta}+h.c.\right), (4)

where vv-terms represent fermion hopping within each unit cell ii and ww-terms represent those hopping between nearest-neighbor unit cells. By transforming Eq. (4) into momentum space with periodic boundary condition, it becomes

H=kBZ(ck,α,ck,β)[𝐑(k)𝝈](ck,α,ck,β)T,H=\sum_{k\in BZ}\left(c_{k,\alpha}^{\dagger},c_{k,\beta}^{\dagger}\right)\left[{\mathbf{R}^{\prime}}(k)\cdot{\boldsymbol{\sigma}}\right]\left(c_{k,\alpha},c_{k,\beta}\right)^{T}, (5)

where again 𝝈{\boldsymbol{\sigma}} denotes Pauli matrices and 𝐑(k)=(v+wcosk,wsink,0){\mathbf{R}^{\prime}}(k)=(v+w\cos k,w\sin k,0). The eigenvalues and their corresponding eigenvectors can be easily represented by 𝐑(k){\mathbf{R}^{\prime}}(k). This model preserves all time-reversal, particle-hole, and chiral (sub-lattice) symmetries ssh_topological and thus it belongs to the class BDI, characterized by ZZ topological invariants in the symmetry-protected topological systems Schnyder08 .

The topological nature of the SSH model can be intuitively understood by plotting the trajectory of 𝐑(k){\mathbf{R}^{\prime}}(k) over the first Brillouin zone. When v/w<1v/w<1 (topological), the trajectory winds about the origin (red point), while v/w>1v/w>1 (non-topological) it does not, as shown in the upper part of Fig. 1(b). It turns out that this observation is closely related to the well-known winding number Sheet20 and results in a phase diagram schematically shown in the lower part of Fig. 1(b).

Refer to caption
Figure 2: (a) The infinite system is composed of a finite subsystem A with LL sites and an environment B. (b) A typical L×LL\times L MCM image for a 1D pp-wave SC at Δ/t=1,μ=0,L=10\Delta/t=1,\mu=0,L=10. (c) A typical 2L×2L2L\times 2L eigenvector image of a BCM for the same system.

III Methods of Machine Learning

III.1 Producing data

As have been mentioned in the Introduction section, ML is a data-driven approach. Any effective ML pipeline should be qualified by training data, a model architecture, and a fair evaluation procedure. Due to the essential role of data, here, we follow our previous successful experience on using entanglement correlations as representative data for our interested systems tsai20 .

We consider two common entanglement-based correlators. Both of them calculate certain correlations within a subsystem A after integrating out all the other degrees of freedom outside A, i.e., its environment B [see Fig. 2(a)]. First, the “Majorana” correlation matrix (MCM) for fermions at two sites within the subsystem A of size LL can be defined, more concretely, via Majorana language for a 1D pp-wave SC (t1t\equiv 1),

i\displaystyle i Trρ0didj=Trρ0(cici)(cj+cj)=\displaystyle\operatorname{Tr}\rho_{0}d_{i}d_{j}=\operatorname{Tr}\rho_{0}(c_{i}-c_{i}^{\dagger})(c_{j}+c_{j}^{\dagger})=
0πdkπΔsinksin[k(ij)]+(cosk+μ2)cos[k(ij)]Δ2sin2k+(cosk+μ2)2,\displaystyle\int_{0}^{\pi}\frac{dk}{\pi}\frac{-\Delta\sin k\sin[k(i-j)]+(\cos k+\frac{\mu}{2})\cos[k(i-j)]}{\sqrt{\Delta^{2}\sin^{2}k+(\cos k+\frac{\mu}{2})^{2}}},

where ρ0\rho_{0} represents the density matrix of the ground state and i,ji,j represent two sites within A. A typical MCM “image” is shown in Fig. 2(b). When turning into an insulating case such as the SSH model, MCM would have to be modified as follows (w1w\equiv 1),

Trρ0ci,βcj,α=\displaystyle\operatorname{Tr}\rho_{0}c_{i,\beta}c^{\dagger}_{j,\alpha}= (7)
02πdkπsinksin[k(ij)]+(v+cosk)cos[k(ij)]1+v2+2vcosk,\displaystyle\int_{0}^{2\pi}\frac{dk}{\pi}\frac{\sin k\sin[k(i-j)]+(v+\cos k)\cos[k(i-j)]}{\sqrt{1+v^{2}+2v\cos k}},

where i,ji,j indicate two unit cells within A. Notably, the particle-hole space should now be replaced by sub-lattice space, but for simplicity we still call it MCM.

The second type of the correlators is block correlation matrix (BCM) for subsystem A, defined as BCMi,j=Trρ0𝐜^i𝐜^j\text{BCM}_{i,j}=\operatorname{Tr}\rho_{0}{\hat{\mathbf{c}}}_{i}\hat{{\mathbf{c}}}_{j}^{\dagger} with 𝐜^i(ci,ci)T\hat{{\mathbf{c}}}_{i}\equiv(c_{i},c_{i}^{\dagger})^{T} [ (ci,α,ci,β)T(c_{i,\alpha},c_{i,\beta})^{T}] for 1D pp-wave SC case (SSH model), and i,ji,j being sites (or unit-cells) of the finite subsystem A. This matrix is intimately connected to the more familiar quantity, the reduced density matrix of the block A, ρA=m[λm001λm]\rho_{A}=\bigotimes_{m}\left[\begin{matrix}\lambda_{m}&0\\ 0&1-\lambda_{m}\end{matrix}\right], where λm\lambda_{m} are simply the eigenvalues of BCM and λm\lambda_{m}s are also known as one-particle entanglement spectrum (OPES). Therefore, the eigenvalues and their corresponding eigenvectors of BCM, also known as one-particle entanglement eigenvectors (OPEEs), would be considered as our input data (“image”) for our ML purpose. Fig. 2(c) provides an example of the eigenvector “image” from a given BCM.

Refer to caption
Figure 3: (a) The proposed working pipeline to identify different phases and to finely determine the phase boundaries without prior knowledges. The model architectures are schematically shown in (b) the autoencoder (AE) and (c) the variational autoencoder (VAE). [Conv: convolutional module; FCNN: fully connected neural network module; mean: mean values; logvar: logarithmic of the variances]

III.2 ML algorithms

Once the format of our input data was settled down, we can then build our ML pipeline for our task, to identify topological phase transitions of a given system in an unsupervised manner without prior knowledges. Our proposed learning procedure integrates a few different ML algorithms into getting final predictions. There are four steps: (1) As shown in Fig. 3(a), the input data are first fed to an autoencoder for training Lecun87 ; Kamp88 ; Hinton94 in order to extract effective features; (2) the number of necessary features is then determined by principle component analysis (PCA) Pearson01 ; Jolliffe02 once 99%99\% (a prescription) of the total variance of input features is kept prescription ; (3) it turns out that the total number of phases in the focused parameter space can be determined by K-means clustering MacQueen67 ; Lloyd82 from the transformed features after PCA, followed by Silhouette analysis (SA) Rousseeuw87 ; deAmorim15 ; (4) finally, relatively sharp phase boundaries can be determined with the help of supervised learning: Constructing a training dataset by expanding around the data point with the highest confidence in each cluster using the same SA, a neural network can then be trained to achieve the goal. We emphasize that step (4) is optional and not always necessary for our purpose of use. In the following, we provide a brief review for each algorithm, and refer readers to references GBC ; Mehta19 for more details.

Refer to caption
Figure 4: AE results for type-I data (MCMs). (a) The discrete distribution of necessary number of neurons dzd_{z} for a given nmidn_{mid} neurons in the middlemost layer (2 to 10 along yy-axis). Results from 100 independently trained AEs with type-I data are statistically calculated: The length of every color bar is proportional to the number of times that dzd_{z} occurred within 100 models. Different color in the legend represents different dzd_{z}. (b) The box plot of the ss-score as a function of nn-clustering (via K-means method). (c) Latent representations projected to a subspace spanned by the first two principle components. Each color indicates its corresponding cluster (phase). (d) The neuron output “phase diagram” as a function of μ/2t\mu/2t with Δ/t=1,L=10\Delta/t=1,L=10 for 1D pp-SC from a trained CNN by supervised learning in the last step of the ML pipeline.

III.2.1 Autoencoder and its variational version

An autoencoder (AE) is a type of neural network that aims at compressing input data into more efficient representation in an unsupervised manner. It mainly consists of two parts, an encoder and a decoder. The encoder ff takes dd-dimensional input data xx and outputs a ll-dimensional latent variable z=f(x)z=f(x); the decoder gg then maps zz back to x=g(z)x^{\prime}=g(z) in dd-dimension. The learnable parameters of the model are trained by performing gradient descent updates in order to minimize the reconstruction loss, L(x,x)L(x,x^{\prime}), usually chosen to be the mean squared error. Since typically l<dl<d after encoding the data, using AE is often viewed as a nonlinear dimension reduction method.

A typical model architecture we used in this study is schematically shown in Fig. 3(b). In the encoder part, it is made of a convolutional module (ResNet-like structure He16 ) followed by a linear module consisting of a few fully connected hidden layers. In the decoder part, it is basically made of similar layer structures but arranged in a reversed order with respect to the encoder. Note that, however, the convolutional layers here are replaced by transposed convolutional ones. The activation functions of the intermediate layers are always rectified linear units (ReLUs) Jarrett09 , except for the final layer where the sigmoid function is used.

Although the traditional AE can learn a function to encode two input data points into distinct latent variables z1z_{1} and z2z_{2}, one may have no idea what would be the decoded result when giving the input (z1+z2)/2(z_{1}+z_{2})/2. To overcome this arbitrariness in AEs, we also consider the variational autoencoders (VAE) which can learn a latent variable model g(x,z)g(x,z) with a joint distribution of a latent variable zz and input xx Kingma13 . In sharp contrast with traditional AEs, zz here is basically drawn from some prior probability distribution p(z)p(z), which is almost always chosen to be a multivariate Gaussian, and thus leads to certain controllability. In addition, the weights of the VAE are now trained by simultaneously optimizing two loss functions, a reconstruction loss and the Kullback-Leibler (KL) divergence between the learned latent distribution and a prior unit Gaussian. Such an additional KL divergence loss can be viewed as a regularization term in a traditional AE. The VAE has a similar model architecture compared to that in Fig. 3(b) by simply getting rid of convolutional modules at the head and tail. Furthermore, as shown in Fig. 3(c) the middle most layer would output the multiple means and the variances, depending on the number for encoded features we need, in the latent space.

III.2.2 Principle component analysis

Principal component analysis (PCA) is a standard yet simple method for dimensional reduction and data visualization Pearson01 ; Jolliffe02 . It is an orthogonal, linear transformation of the input features to a sorted set of new variables by their variance. Such method is motivated by the experience that in many cases, the most relevant information can be revealed in the direction with largest variance for a given signal, while directions with small variance usually indicate noises and may be neglected.

Concretely, let us consider NN pp-dimensional feature vectors, 𝑿={𝒙1,𝒙2,,𝒙N}{\boldsymbol{X}}=\{{\boldsymbol{x}}_{1},{\boldsymbol{x}}_{2},\cdots,{\boldsymbol{x}}_{N}\}. One can assume that the mean of all vectors, i𝒙i=0\sum_{i}{\boldsymbol{x}}_{i}=0, without loss of generality, and then 𝑿{\boldsymbol{X}} is called a (zero-mean) centered matrix. By definition, the transformation weight vector for producing the first principle component 𝒘1{\boldsymbol{w}}_{1} can be found by

𝒘1=argmaxw=1{i(𝒙i𝒘)2}.{\boldsymbol{w}}_{1}=\text{argmax}_{||w||=1}\{\sum_{i}({\boldsymbol{x}}_{i}\cdot{\boldsymbol{w}})^{2}\}. (8)

The next ordered weight vectors are then obtained by repeating Eq. (8) after subtracting out the calculated principle components from 𝑿{\boldsymbol{X}}. However, in practice, one can prove that this procedure for getting 𝒘i{\boldsymbol{w}}_{i} is equivalent to find out the eigenvectors of the N×NN\times N symmetric matrix 𝑿T𝑿{\boldsymbol{X}}^{T}{\boldsymbol{X}}, Mehta19 i.e.,

𝑿T𝑿𝒘i=λi𝒘i,{\boldsymbol{X}}^{T}{\boldsymbol{X}}{\boldsymbol{w}}_{i}=\lambda_{i}{\boldsymbol{w}}_{i}, (9)

where we have assumed that the obtained eigenvalues are sorted such that λ1λ2λN0\lambda_{1}\geq\lambda_{2}\geq\cdots\geq\lambda_{N}\geq 0, representing variances for the input feature vectors. It is also useful to define the relative variance, λ~i=λi/iλi\tilde{\lambda}_{i}=\lambda_{i}/\sum_{i}{\lambda_{i}} in order to count accumulated variance percentage.

III.2.3 K-means clustering

K-means clustering is a simple and easily understandable clustering algorithm without any supervision MacQueen67 ; Lloyd82 . Given a prior knowledge about the number of clusters KK, the basic idea is to find the best cluster means such that the variance within each cluster is minimized. To put it more precisely, consider a set of NN pp-dimensional data points without labels, 𝑿={𝒙1,𝒙2,,𝒙N}{\boldsymbol{X}}=\{{\boldsymbol{x}}_{1},{\boldsymbol{x}}_{2},\cdots,{\boldsymbol{x}}_{N}\} and call the set, 𝑪={𝝁1,𝝁2,,𝝁K}{\boldsymbol{C}}=\{{\boldsymbol{\mu}}_{1},{\boldsymbol{\mu}}_{2},\cdots,{\boldsymbol{\mu}}_{K}\} (𝝁{\boldsymbol{\mu}} is also pp-dimensional), as the KK centers for the whole data. The objective of K-means method is then to assign each 𝒙i{\boldsymbol{x}}_{i} to an appropriate cluster such that the loss function,

L({𝑿,𝑪})=k=1Ki=1Naik(𝒙i𝝁k)2,L(\{{\boldsymbol{X}},{\boldsymbol{C}}\})=\sum_{k=1}^{K}\sum_{i=1}^{N}a_{ik}({\boldsymbol{x}}_{i}-{\boldsymbol{\mu}}_{k})^{2}, (10)

is minimized. Note that the assignment aika_{ik} is 1 if 𝒙i{\boldsymbol{x}}_{i} is assigned to cluster kk while 0 otherwise and kaik=1\sum_{k}a_{ik}=1 for every ii. The implementation is usually done by iteration until certain convergence with chosen tolerance level has been achieved.

III.2.4 Silhouette analysis

Although K-means clustering is quite intuitive, one still needs to provide the number of clusters nn as a priori knowledge. To obtain a more reasonable estimation of this number and somehow eliminate the effect of distance function chosen in K-means method, we employ Silhouette analysis (SA) to justify it Rousseeuw87 ; deAmorim15 . For a given set of clusters {𝒞i}\{\mathcal{C}_{i}\}, SA assigns a value to each data point x𝒞ix\in\mathcal{C}_{i} by

s(x)=b(x)a(x)max{b(x),a(x)},s(x)=\frac{b(x)-a(x)}{\text{max}\{b(x),a(x)\}}, (11)

where b(x)=minjibj(x)b(x)=\min_{j\neq i}b_{j}(x) with bj(x)b_{j}(x) being the mean distance of xx to all points in the cluster jj and a(x)a(x) is the mean distance between xx and all other data points in the same cluster ii. In other words, the Silhouette value s(x)s(x), bounded between ±1\pm 1, is a measure of how similar xx is to its own cluster (cohesion) compared to the other clusters (separation). Considering the mean of Silhouette values, ss-score, as a function of nn (after K-means clustering), the best estimation of nn is simply the one gives maximum of ss-score.

In addition, once the best nn is determined, the data point with the highest Silhouette value within each cluster may then be taken as a confident seed to build a “labeled” training set to train a neural network via supervised learning approach. It in turn could be used to make sharper phase boundaries in a phase diagram, originally obtained in an unsupervised way over interested parameter space.

Refer to caption
Figure 5: AE results for type-II data (OPES). (a) The box plot of the ss-score as a function of nn-clustering (via K-means method). (b) Latent representations (of type-II data) after projected along the first principle component of PCA, as a function of μ/2t\mu/2t. Each color indicates its corresponding cluster (phase).
Refer to caption
Figure 6: AE results for type-III data (OPEEs). (a) The discrete distribution of necessary number of neurons dzd_{z} for a given nmidn_{mid} neurons in the middlemost layer (2 to 10 along yy-axis). Results from 100 independently trained AEs with type-III data are statistically calculated: The length of every color bar is proportional to the number of times that dzd_{z} occurred within 100 models. Different color in the legend represents different dzd_{z} (1 to 10 here). (b) The box plot of the ss-score as a function of nn-clustering (via K-means method). (c) Latent representations after projected to a subspace spanned by the first two principle components. Each color indicates its corresponding cluster (phase). (d) The neuron output “phase diagram” as a function of μ/2t\mu/2t with Δ/t=1,L=100\Delta/t=1,L=100 from a trained CNN by supervised learning.

IV Results and Analysis

IV.1 1D pp-wave superconductor

We first prepare the input “image” dataset by generating 20,001 MCMs via Eq. (III.1) at evenly divided μ/2t\mu/2t from -5 to 5, with subsystem size LL (block A) of an infinite chain with periodic boundary conditions. Each MCM can be viewed as a L×LL\times L “image” in one (gray) channel and entries in it represent pixel values. We call this type-I input format. Without loss of generality, we will assume Δ/t=1\Delta/t=1, L=10L=10, and 2t12t\equiv 1 (energy units) unless mentioned otherwise.

The other formats of the input dataset could originate from BCM, as mentioned in Sec. III A. For a finite subsystem A of size LL, we prepare again 20,001 BCMs (now of size 2L×2L2L\times 2L due to Nambu notation) at evenly divided μ/2t\mu/2t from -5 to 5. In our study, we either collect all eigenvalues of each BCM as an input vector (called type-II input) or arrange each eigenvector of a BCM as one of the columns in a new matrix MM (of size 2L×2L2L\times 2L), viewed as a “gray image” (called type-III input).

IV.1.1 AE approach

Following our ML pipeline mentioned in Section III, the first step is to train a neural network to encode our type-I input data to effective representations in the latent space. However, in order to determine the minimal dimension dzd_{z} of the latent space, we train a series of AEs with same model architecture appendix except for the number of hidden neurons in the middlemost layer (nmidn_{mid}, from 2 to 10). For each nmidn_{mid}, we record the necessary dimension of the (converged) latent representations to keep at least 99%99\% variance of them by PCA. Due to the unconstrained nature of the latent representations in AEs, we repeat 100 times of the same training procedure with the same initial weight distribution. As shown in Fig. 4(a), we observe that the minimal dimension dzd_{z} would be 4, because such number becomes dominant in the discrete distribution of dzd_{z} when nmidn_{mid} increases.

Next, SA is utilized to estimate the best number of clusters nn for the latent representations of all input data via K-means method. Note that these representations are provided from previous trained AEs with nmid=4n_{mid}=4 (as suggested above) hidden neurons in the middlemost layer. The box plot, Fig. 4(b), clearly shows that the mean of Silhouette value achieves the highest one when n=3n=3. By projecting the 4D latent representations into a 2D space spanned by the first two principle components (features), we obtain Fig. 4(c). This feature plot gives us an insight about how the system could be reasonably divided into three clusters (phases).

Refer to caption
Figure 7: Silhouette values as a function of μ/2t\mu/2t with various subsystem size LL using (a) the AE approach and (b) the VAE approach, given the type-III (OPEEs) training data after feature extraction. It is remarkable to see that all the dips reasonably indicate the phase (cluster) boundaries. Moreover, the highest value corresponding to each cluster only change mildly as LL increases.

Again, due to the unconstrained nature of the latent representations in AEs, the phase transition points found are statistically at mean values -1.012 and 0.980 with standard deviations 0.093 and 0.081, respectively, after collecting clustering results from 100 sets of latent representations via different trained AEs. This is quite close to the theoretically calculated values -1 and 1, but having relatively large deviation. To make phase boundaries sharper, we further train a CNN classifier appendix in supervised learning manner to predict the whole phase diagram. To prepare a “labeled” dataset, we first pick up three seeds in the input “images”, each of which gets the highest Silhouette value in each cluster and is thus believed to be inside each phase with strong confidence. In our example, the three seeds are located at μ/2t=2.988,0.182,3.019\mu/2t=-2.988,0.182,3.019 (from latent representations traced back to original input data points), and for each cluster (phase) we expand symmetrically around the seed by a window width 0.1t0.1t to obtain 2000 points with equal spacing. These 6000 data points are finally formed our training dataset, while the original 20,001 ones become our test set without labels. As shown in Fig. 4(d), the phase boundaries obtained by trained CNN classifiers are clearly sharper at mean values -1.015 and 1.038 with smaller deviations 0.018 and 0.023 (after repeating same training procedure 100 times).

Refer to caption
Figure 8: VAE results for type-I data (MCMs). (a) The discrete distribution of necessary number of neurons dzd_{z} for a given nmidn_{mid} (paired) neurons in the middlemost layer (2 to 10 along yy-axis). Results from 100 independently trained VAEs are statistically calculated: The dominant dzd_{z} is clearly one (pair) here. (b) The box plot of the ss-score as a function of nn-clustering (via K-means method). (c) Latent representations (of type-I data) as a function of μ/2t\mu/2t. Each color indicates its corresponding cluster (phase).

Alternatively, we next consider taking BCM generated quantities as our training inputs. There are two potential formats. We first train an AE to encode the type-II inputs, whose format looks simpler. Note that here we choose the middlemost layer to have 2 hidden neurons according to similar experiments done in type-I case [(see Fig. 4(a)]; moreover, all convolution-related modules are taken away in the model architecture appendix . As shown in Fig. 5(a), SA shows that 2 clusters, among other chosen number of distinct clusters, are the best result obtained via K-means method on the latent representations. It indicates that, when combining with the projected plot of the latent features, as seen in Fig. 5(b), this approach can only distinguish a topological phase (phase I) from non-topological ones (phases III and IV). The reason that phases III and IV can not be further distinguished can be attributed to rather limited information provided by type-II inputs, as indicated in our previous work tsai20 .

To gain more information, we use type-III inputs to train autoencoders with nmidn_{mid} hidden neurons in the middlemost layer to get the corresponding latent representations. As shown in Fig. 6(a), the necessary dimension of the latent space is proportional to nmidn_{mid} and thus we choose the dominant one at large nmidn_{mid}, namely, dz=8d_{z}=8. Note that to avoid the arbitrariness of phase when computing eigenvectors, we have preprocessed the inputs by squaring each entry of the input matrix (“image”). After K-means clustering the latent representations obtained from AE, Fig. 6(b) depicts the results of SA and the input data are suggested to be separated into 3 clusters. Fig. 6(c) shows how the encoded representations in the 3D latent space after projecting to 2D may be divided into 3 clusters. In fact, by plotting the phase diagram as a function of μ\mu, the phase transition boundaries are somewhat shifted from theoretical values with relatively large deviation. This, however, can be improved if following the same supervised learning strategy as mentioned in the type-I input case typeiii . Note that the labeled training dataset used here is based on 3 seeds at μ/2t=3.362,0.026,3.359\mu/2t=-3.362,-0.026,3.359, each of which gets the highest Silhouette value in the corresponding cluster. Even though these vales are taken in the subsystem A with L=10L=10, Fig. 7 shows that the locations of the most confident Silhouette values do not change much as LL increases. Therefore, we finally take the subsystem of size L=100L=100 to refine the transition boundaries and to reduce the possible finite-size effect all together. As shown in Fig. 6(d), the phase transition points found then are statistically at mean values -0.917 and 0.978 with standard deviations 0.149 and 0.0719, respectively, using well-trained CNN classifiers appendix .

Refer to caption
Figure 9: VAE results for type-II data (OPES). (a) The box plot of the ss-score as a function of nn-clustering (via K-means method). (b) Latent representations (of type-II data) extracted by a trained VAE as a function of μ/2t\mu/2t. Each color indicates its corresponding cluster (phase).

IV.1.2 VAE approach

In VAE approach, we try to impose certain constraints (multivariate Gaussian here) on the encoded representations and let a VAE learn the parameters of a probability distribution modeling the input data. Following the same ML pipeline, the first step is to train a series of VAEs with nmid=2n_{mid}=2 to 10 pairs of hidden neurons in the middlemost layer appendix ; in each pair one neuron outputs the mean of encoded representation distribution and the other produces its corresponding variance. After repeating the same training procedure 100 times for a given nmidn_{mid}, the number of necessary dimension for the latent representations obtained after PCA to keep at least 99%99\% variance of the encoded data is simply one (pair). In sharp contrast with AE, VAE provides a more stable result, as clearly shown in Fig. 8(a).

Once the minimal dimension of the encoded representations is set, we employs SA to estimate the best number of clusters nn for them via K-means method. From Fig. 8(b), it shows that the Silhouette value achieves the highest one when n=3n=3. Fig. 8(c) depicts the encoded representations as a function of μ/2t\mu/2t and different colors indicate distinct clusters (phases). By plotting the phase diagram, we find that transition points are at mean values -0.992 and 0.9597 with standard deviations 0.047 and 0.026, respectively, after collecting results from 100 trained models with the same initial weight distribution. This small deviation is significantly different from AE approach, where the position of transition points found is more unstable. The supervised learning in the last step is not necessary in this case.

On the other hand, we take BCMs as our training input. Firstly, we train a VAE to encode the type-II inputs. Note that here we enforce the middlemost layer to output one set of mean and variance because similar PCA tests have been done to determine the minimal dimension of the encoding representations. Next, as shown in Fig. 9(a), SA shows that 2 clusters are the best result obtained through K-means method on the encoded representations [see Fig. 9(b)]. It turns out that this approach can distinguish topological phase (phase I) from non-topological ones (phases III and IV), but phases III and IV can not be further distinguished. This result is again consistent with previous work tsai20 , indicating too compressed information provided by type-II inputs.

Refer to caption
Figure 10: VAE for type-III data (OPEEs). (a) The discrete distribution of necessary number of neurons dzd_{z} for a given nmidn_{mid} (paired) neurons in the middlemost layer (2 to 10 along yy-axis). Results from 100 independently trained VAEs are statistically calculated: The dominant dzd_{z} is clearly one (pair) here. (b) The box plot of the ss-score as a function of nn-clustering (via K-means method). (c) Latent representations (of type-III data) as a function of μ/2t\mu/2t. Each color indicates its corresponding cluster (phase).

Thus, we next examine to train another VAEs by feeding in type-III inputs, which have been preprocessed by squaring each entry in matrices. We find the minimal dimension of the latent dimension to be 1 based on PCA test, as shown in Fig. 10(a). After K-means clustering the latent representations, Fig. 10(b) depicts the results of SA and the input data are suggested to be separated into 3 clusters. Fig. 10(c) shows how the encoded representations in the latent space can be divided into 3 clusters. In fact, by drawing the output probability for each phase as a function of μ\mu, the phase transition points are somewhat deviated from theoretical values. This, however, can be largely improved if following the same supervised learning strategy as in the MCM case typeiii . This way not only shifts transition points back to mean values -0.995 and 1.003, but also reduces standard deviation from 0.335, 0.244 to 0.107, 0.067, respectively (statistically over 100 same-architecture CNN classifiers). The “labeled” training dataset used here is based on 3 seeds at μ/2t=3.375,0.063,2.892\mu/2t=-3.375,-0.063,2.892, each of which gets the highest Silhouette value in the corresponding cluster. Note that, similar to the AE case, we again take the subsystem of size L=100L=100 to refine the transition boundaries and to reduce the possible finite-size effect as well.

IV.2 SSH model

We prepare the input “image” dataset by generating 10,001 MCMs according to Eq. (7) at evenly divided v/wv/w from 0 to 10, with subsystem size LL (block A) under periodic boundary conditions of the full system. Note that the ground state of SSH model is basically an insulating state, therefore, “MCM” is called for convenience and is nothing to do with “Majorana”. Each MCM can be viewed as a L×LL\times L “gray image” and entries in it represent pixel values. This forms our type-I input format and we will assume w1w\equiv 1 (energy units) and L=10L=10. Furthermore, one can also prepare eigenvalues and eigenvectors of BCMs (now of size 2L×2L2L\times 2L due to sublattice space), corresponding to type-II and type-III input formats, respectively. However, they do not bring new physics other than that from type-I format in unsupervised learning, and thus we omit the results for simplicity.

IV.2.1 AE approach

Similar to the 1D pp-wave superconductor case, we first train a neural network to encode our type-I input data to latent representations. We determine the minimal dimension dzd_{z} of the latent space by training a series of AEs with same model architecture appendix except for the number of hidden neurons in the middlemost layer (nmidn_{mid}, from 2 to 10). In Fig. 11(a), we show the discrete distribution of dzd_{z}, which indicates the necessary dimension of the latent representations to keep at least 99%99\% variance of them for each nmidn_{mid} by PCA, after repeating 100 times of the same training procedure. Clearly, dzd_{z} is suggested to be 4.

Refer to caption
Figure 11: AE results for type-I data (MCMs) of SSH model. (a) The discrete distribution of necessary number of neurons dzd_{z} for a given nmidn_{mid} neurons in the middlemost layer (2 to 10 along yy-axis). Results from 100 independently trained AEs with type-I data are statistically calculated: The length of every color bar is proportional to the number of times that dzd_{z} occurred within 100 models. Different color in the legend represents different dzd_{z}. (b) The box plot of the ss-score as a function of nn-clustering (via K-means method). (c) Latent representations projected to a subspace spanned by the first two principle components. Each color indicates its corresponding cluster (phase). (d) The neuron output “phase diagram” as a function of v/wv/w with L=10L=10 for SSH model from a trained CNN by supervised learning in the last step of the ML pipeline.

Once dzd_{z} is known, we take the latent representations of all input data from previously trained AEs with nmid=dzn_{mid}=d_{z} and do SA to estimate the optimal number of clusters nn via K-means method. As shown in Fig. 11(b), it points out that the mean of Silhouette value reaches the highest one when n=2n=2. In addition, by projecting the 4D latent representations into a 2D space spanned by the first two principle components (features), we have Fig. 11(c). By noticing the density change of the feature points, the plot suggests how the system could be consistently divided into two clusters (phases).

Due to the unconstrained nature of the latent representations in AEs, the phase transition point found is statistically at mean value v=0.987v=0.987 with standard deviations 0.207, after collecting clustering results from 100 sets of latent representations via different trained AEs (nmid=4n_{mid}=4). Although such critical value is very close to the theoretical value 1, it still gets non-ignorable deviation. To reduce this variance, we again train a CNN classifier appendix in supervised learning manner to predict the whole phase diagram. We prepare a “labeled” dataset by picking up two seeds having highest Silhouette value in each cluster. They correspond to v/w=0.332,5.799v/w=0.332,5.799 of the original input “images”, and for each point we expand symmetrically around its location by a window width 0.1w0.1w to obtain 2000 points with equal spacing. These 4000 data points are collected to be our training dataset, whereas the original 10,001 ones are formed our test set without labels. It turns out that the phase boundary predicted by trained CNN classifiers for the test set is clearly sharper at mean value 1.024 with much smaller deviation 0.041 (after repeating same training procedure 100 times), as shown in Fig. 11(d).

Refer to caption
Figure 12: VAE results for type-I data (MCMs) of SSH model. (a) The discrete distribution of necessary number of neurons dzd_{z} for a given nmidn_{mid} (paired) neurons in the middlemost layer (2 to 10 along yy-axis). Results from 100 independently trained VAEs are statistically calculated: The dominant dzd_{z} is clearly one (pair) here. (b) The box plot of the ss-score as a function of nn-clustering (via K-means method). (c) Latent representations (of type-I data) as a function of v/wv/w. Each color indicates its corresponding cluster (phase).

IV.2.2 VAE approach

Following the insights obtained from the 1D pp-wave SC case, we employ VAE approach to impose some constraints on the encoded representations for a more stable solution. As a first step, we train a series of VAEs with nmid=2n_{mid}=2 to 10 pairs of hidden neurons in the middlemost layer appendix . Note that, similarly, one of the neurons in each pair outputs the mean of encoded representation distribution and the other produces its corresponding variance. After repeating the same training procedure 100 times for a given nmidn_{mid}, the number of necessary dimension for the latent representations obtained after PCA to keep at least 99%99\% variance of them is simply one (pair), as clearly shown in Fig. 12(a). This again proves the stability of obtaining robust latent representations via VAE method.

Since the minimal dimension of the encoded representations is one (pair), we then employs SA to estimate the best number of clusters nn for them via K-means method. Fig. 12(b) clearly shows that the Silhouette value achieves the highest one when n=2n=2. Moreover, Fig. 12(c) depicts the encoded representations as a function of v/wv/w and different colors indicate distinct clusters (phases). Finally, from our repeated clustering results, the phase transition point is statistically found to be at mean value 1.051 with standard deviations 0.069. This deviation is relatively smaller than the one obtained from the AE approach. Therefore, the supervised learning in the last step in Fig. 3(a) is neglected here.

V Discussion and Conclusion

The proposed ML procedure may has demonstrated its superiority, enough for recognizing phase transitions without prior or with rare knowledge on phases of matter, by taking advantages of unsupervised and (optionally) supervised learning algorithms. However, a few issues regarding with this approach are worth mentioning here.

(1) As commonly known, AE can make a non-linear dimensional reduction for the data, while PCA can only do a linear one. Thus, most importantly, one may ask whether AE or VAE is an essential component in the ML pipeline. To address this issue, we conduct more numerical experiments on the 1D pp-wave SC with types I, II, and III data formats. For the most compressed data format, i.e., type-II, we find that with or without AE (or VAE) plays no essential role on the later clustering. But this is not the case when considering type-I and type-III formats. Using solely PCA results often leads to consequences such as keeping higher necessary latent space dimension dzd_{z} (to keep high variance) or getting wrong number of clusters (phases) via K-means method. The latter case could be related to the limitation of K-means method, which is notoriously known to fail for clustering concentric circles. So, one can view AE as an important component in order to consider general data formats.

Refer to caption
Figure 13: The ss-score as a function of μ/2t\mu/2t for 1D pp-SC at L=10L=10. For a given μ/2t\mu/2t, each yellow spot corresponds to an averaged ss-score (over 10 trained VAEs) when enforcing the encoded input data of type-I, with Δ/t\Delta_{/}t varying from -10 to 10, to two clusters. Red lines indicate theoretical phase boundaries.

(2) Another important aspect we haven’t mentioned is the effect of varying Δ\Delta, an essential piece for completing the whole phase diagram. As one can see in Fig. 1(a), there are two phases in the region with 1<μ/2t<1-1<\mu/2t<1 when varying Δ\Delta, while there is only one outside of it. This is a challenge for our proposed ML procedure because the K-means method and SA are not useful when number of clusters equals one. However, it is remarkable that our method still provides meaningful results. As shown in Fig. 13, each yellow spot corresponds to an averaged ss-score when grouping the encoded input data into fixed two clusters in the 1D pp-wave superconductor for a given μ/2t\mu/2t. Note that the encoded data are generated from 10 VAEs with nmid=1n_{mid}=1 for each μ\mu, trained by using type-I data format. Explicitly, they are collected from 20,001 equally-spaced data points in the range between Δ/t=10\Delta/t=-10 and 10. Finally, one can clearly see that the averaged ss-scores outside the region with 1<μ/2t<1-1<\mu/2t<1 are all suppressed, and this phenomenon implies the failure of our enforced clustering with n=2n=2. The decaying ss-score indicates that nn should be 1 instead of 2.

Refer to caption
Figure 14: The critical phase transition points as a function of the subsystem size LL (up to 300) for 1D pp-SCs at Δ/t=2\Delta/t=2. Each yellow spot represents an averaged critical μ/2t\mu^{*}/2t based on 10 independently trained AEs for a given LL. The red dashed line is a linear fit to data points of L>100L>100, with interception 0.9995 on yy-axis when extrapolating to L=L=\infty.

(3) To avoid some “boundary effect” that may degrade the validity of our proposed method, it should be additionally tested by considering the finite size effect. In particular, for a topological 1D pp-wave SC the Majorana zero modes would appear at boundaries with coherent length proportional to t/Δt/\Delta, and thus are sensitive to the system size. For simplicity, let us take into account type-II data for a 1D pp-wave SC at Δ/t=2\Delta/t=2 and do not worry about fine tuning by supervised learning in the last step. The task here is to determine the phase boundary between topological and non-topological phases. As the size LL of the subsystem A increases from 5 to 300, Fig. 14 shows that the phase boundary converges to μ/2t=0.9995\mu/2t=0.9995 by extrapolation to L=L=\infty. The value is very close to the theoretical one, namely, μ/2t=1\mu/2t=1 and thus it further confirms our results in Sec. IV.

(4) Using neural networks for ML often raises a serious query about what the reasoning is behind model’s predictions. This issue may degrade its value, in particular, for any scientific discovery. There is no exception in our proposal. However, to go one step further for the explainability, one may borrow some idea from feature selections in ML Phuong05 ; Ribeiro16 ; Shrikumar17 ; Lundberg21 . For instance, follow the proposed ML procedure and consider type-III input data, where each image is formed by eigenvectors of a BCM arranged in a certain way. If one masks the parts of all input images corresponding to the “boundaries” of the focused system, one can then observe its consequence after training. Our preliminary results show that the (phase) clustering becomes so poor that the information at boundaries is therefore not negligible. This also indirectly implies that the presence of edge modes or not helps model prediction. It, though not a complete solution, may shed some light on the opaque doubts of using DL in phase detection.

In conclusion, we propose a ML procedure, mainly in an unsupervised learning manner, to study topological phase transitions in both 1D pp-SC and SSH systems. This procedure includes a series of steps: feature extraction, dimensional reduction, clustering, Silhouette analysis, and fine tuning by supervised learning. Most importantly, three quantum-entanglement based quantities, MCMs, OPES and OPEEs (from BCMs), are considered to feed into neural networks for training. We find that in the feature extraction part VAEs provide more stable latent representations of the input data. Moreover, our results reliably reproduce the whole phase diagrams for both systems studied here, demonstrating the usefulness of our proposal without knowing prior knowledge of the phase space.

Acknowledgements.
M.C.Chung acknowledges the MoST support under the contract NO. 108-2112-M-005 -010 -MY3 and Asian Office of Aerospace Research and Development (AOARD) for the support under the award NO. FA2386-20-1-4049.

Appendix A Model architectures

The basic structures for both AEs and VAEs are already schematically shown in Figs. 3(b) and (c). Here we present explicit model details for our numerical results shown in all figures.

In the case of 1D pp-wave SCs, for the AE used for Fig. 4 the model architecture is given in Table 1,

Fig. 4 (AE)
Layer Params Activation Batch norm
Input:10×\times10×\times1
Conv 3×\times3×\times8 ReLU True
Residual block output:10×\times10×\times8
Average pooling 2×\times2
Linear 200×\times256 Tanh True
Linear 256×\times128 Tanh True
Linear 128×\times32 Tanh True
Linear 32×\times8 Tanh True
Linear 8×\times4 Tanh True
Linear 4×\times8 Tanh True
Linear 8×\times32 Tanh True
Linear 32×\times128 Tanh True
Linear 128×\times256 Tanh True
Linear 256×\times200 Tanh True
Up Sampling 2×\times2
Transposed Conv 3×\times3×\times8
Transposed Conv 3×\times3×\times8
Transposed Conv 3×\times3×\times8
Transposed Conv 3×\times3×\times8
Transposed Conv 3×\times3×\times1 Sigmoid
Table 1: Model architecture of AE used for Fig. 4.

where the content of a residual block He15 is separately shown in Table 2.

Residual Block
Layer Params Activation Batch norm
Left:
Conv 3×\times3×\timesoutchannel ReLU True
Conv 3×\times3×\timesoutchannel True
Shortcut:
Conv 1×\times1×\timesoutchannel True
Left+Shortcut
ReLU
Table 2: The residual block details in AE.

Moreover, as to fine tuning the phase boundaries, a CNN model is employed and shown in Table 3.

Fig. 4 (CNN)
Layer Params Activation Batch norm
Input:10×\times10×\times1
Conv 3×\times3×\times16 ReLU True
Residual block output:10×\times10×\times32
Average pooling 4×\times4
Linear 128×\times16 ReLU
Linear 16×\times3 ReLU
Linear 3 Softmax
Table 3: The CNN model used for Fig. 4(d).

s to the AE used in Fig. 5, the model architecture is given in Table 4.

Fig. 5 (AE)
Input:20×\times1
Layer Params Activation
Linear 20×\times16 ReLU
Linear 16×\times2 ReLU
Linear 2×\times16 ReLU
Linear 16×\times20 Sigmoid
Table 4: Model architecture of AE used for Fig. 5.

For the AE used in Fig. 6, the model architecture is given in Table 5.

Fig. 6 (AE)
Layer Params Activation Batch norm
Input:20×\times20×\times1
Conv 20×\times20×\times4 ReLU True
Residual block output:20×\times20×\times64
Residual block output:20×\times20×\times128
GlobAverage pooling 128
Linear 128×\times32 ReLU
Linear 32×\times8 ReLU
Linear 8×\times32 ReLU
Linear 32×\times128 ReLU
Linear 128×\times12800
Up Sampling 2×\times2
Transposed Conv 3×\times3×\times128
Transposed Conv 3×\times3×\times64
Transposed Conv 3×\times3×\times64
Transposed Conv 5×\times5×\times4
Transposed Conv 5×\times5×\times1 Sigmoid
Table 5: Model architecture of AE used for Fig. 6.

Similarly, a CNN model is employed to fine tune the phase boundaries and shown in Table 6.

Fig. 6 (CNN)
Layer Params Activation Batch norm
Input:200×\times200×\times1
Residual block output:200×\times200×\times32
Residual block output:200×\times200×\times64
Residual block output:200×\times200×\times128
GlobAverage pooling 128
Linear 128×\times64 ReLU
Linear 64×\times3 ReLU
Linear 3 Softmax
Table 6: The CNN model used for Fig. 6(d).

On the other hand, for the VAE used for Fig. 8, the model architecture is given in Table 7.

Fig. 8 , Fig. 12 , Fig. 13 (VAE)
Layer Params Activation
Input:10×\times10×\times1
Linear 100×\times128 ReLU
Linear 128×\times64 ReLU
Linear 64×\times32 ReLU
Linear 32×\times16 ReLU
Linear 16×\times1 ReLU
Linear 1×\times16 ReLU
Linear 16×\times32 ReLU
Linear 32×\times64 ReLU
Linear 64×\times128 ReLU
Linear 128×\times100 Sigmoid
Table 7: The CNN model used for Figs. 8, 12, and 13.

As to the VAE used in Fig. 9, the model architecture is given in Table 8.

Fig. 9 (VAE)
Input:20×\times1
Layer Params Activation
Linear 20×\times128 ReLU
Linear 128×\times64 ReLU
Linear 64×\times32 ReLU
Linear 32×\times16 ReLU
Linear 16×\times1 ReLU
Linear 1×\times16 ReLU
Linear 16×\times32 ReLU
Linear 32×\times64 ReLU
Linear 64×\times128 ReLU
Linear 128×\times20 Sigmoid
Table 8: The VAE model used for Fig. 9.

Finally, for the VAE used in Fig. 10, the model architecture is given in Table 9.

Fig. 10 (VAE)
Layer Params Activation
Input:20×\times20×\times1
Linear 400×\times256 ReLU
Linear 256×\times196 ReLU
Linear 196×\times128 ReLU
Linear 128×\times96 ReLU
Linear 96×\times32 ReLU
Linear 32×\times1 ReLU
Linear 1×\times32 ReLU
Linear 32×\times96 ReLU
Linear 96×\times128 ReLU
Linear 128×\times196 ReLU
Linear 196×\times256 ReLU
Linear 256×\times400 Sigmoid
Table 9: The VAE model used for Fig. 10.

In the case of SSH models, for the AE used for Fig. 11 the model architecture is given in Table 10,

Fig. 11 (AE)
Layer Params Activation Batch norm
Input:10×\times10×\times1
Conv 10×\times10×\times4 ReLU True
Residual block output:10×\times10×\times4
Average pooling 2×\times2
Linear 100×\times64 ReLU True
Linear 64×\times16 ReLU True
Linear 16×\times4 ReLU True
Linear 4×\times16 ReLU True
Linear 16×\times64 ReLU True
Linear 64×\times100 True
Up Sampling 2×\times2
Transposed Conv 3×\times3×\times4
Transposed Conv 3×\times3×\times4
Transposed Conv 3×\times3×\times4
Transposed Conv 3×\times3×\times4
Transposed Conv 3×\times3×\times1 Sigmoid
Table 10: The AE model used for Fig. 11.

where the residual block is again shown in Table 2. A CNN model is employed to fine tune the phase boundaries then and is shown in Table 11.

Fig. 11 (CNN)
Layer Params Activation Batch norm
Input:10×\times10×\times1
Conv 3×\times3×\times16 ReLU True
Residual block output:10×\times10×\times32
Average pooling 4×\times4
Linear 128×\times16 ReLU
Linear 16×\times3 Softmax
Table 11: The CNN model used for Fig. 11(d).

Furthermore, for the VAE used for Fig. 12, the model architecture is given in Table 7.

The VAE model used for examining the effect of varying Δ\Delta and the AE model used for the finite-size study on 1D pp-wave SCs are given in Table 7 and Table 12, respectively.

Fig. 14 (AE)
Layer Params Activation
Input:2×\timesL×\times1
Linear 2×\timesL×\times128 ReLU
Linear 128×\times64 ReLU
Linear 64×\times32 ReLU
Linear 32×\times16 ReLU
Linear 16×\times1 ReLU
Linear 1×\times16 ReLU
Linear 16×\times32 ReLU
Linear 32×\times64 ReLU
Linear 64×\times128 ReLU
Linear 128×\times2×\timesL Sigmoid
Table 12: The AE model used for Fig. 14.

References

  • (1) Y. LeCun, Y. Bengio, and G. Hinton, Nature 521, 436 (2015).
  • (2) T. Ohtsuki and T. Ohtsuki, J. Phys. Soc. Jpn. 85, 123706 (2016).
  • (3) Evert P. L. van Nieuwenburg, Y.-H. Liu, and S. D. Huber, Nature Physics 13, 435 (2017).
  • (4) J. Carrasquilla and R. G. Melko, Nature Physics 13, 431 (2017).
  • (5) P. Broecker, J. Carrasquilla, R. G. Melko, and S. Trebst, Scientific Reports 7, 8823 (2017).
  • (6) S. J.Wetzel and M. Scherzer, Phys. Rev. B 96, 184410 (2017).
  • (7) L. Wang, Phys. Rev. B 94, 195105 (2016).
  • (8) A. Tanaka and A. Tomiya, J. Phys. Soc. Jpn. 86, 063001 (2017).
  • (9) W. Hu, R. R. P. Singh, and R. T. Scalettar, Phys. Rev. E 95, 062122 (2017).
  • (10) P. Broecker, F. F. Assaad, and S. Trebst, arXiv:1707.00663 (2017).
  • (11) S. J. Wetzel, Phys. Rev. E 96, 022140 (2017).
  • (12) K. Ch’ng, N. Vazquez, and E. Khatami, Phys. Rev. E 97, 013306 (2018).
  • (13) Y.-H. Liu and E. P. L. van Nieuwenburg, Phys. Rev. Lett. 120, 176401 (2018).
  • (14) S. Durr and S. Chakravarty, Phys. Rev. B 100, 075102 (2019).
  • (15) K. Kottmann, P. Huembeli, M. Lewenstein, and A. Acin, Phys. Rev. Lett. 125, 170603 (2020).
  • (16) Y. Zhang and E.-A. Kim, Phys. Rev. Lett. 118, 216401 (2017).
  • (17) Y. Zhang, R. G. Melko, and E.-A. Kim, Phys. Rev. B 96, 245119 (2017).
  • (18) P. Zhang, H. Shen, and H. Zhai, Phys. Rev. Lett. 120, 066401 (2018).
  • (19) N. Sun, J. Yi, P. Zhang, H. Shen, and H. Zhai, Phys. Rev. B 98, 085402 (2018).
  • (20) D. Carvalho, N. A. Garca-Martnez, J. L. Lado, and J. Fernandez-Rossier, Phys. Rev. B 97, 115453 (2018).
  • (21) Y. Ming, C.-T. Lin, S. D. Bartlett, and W.-W. Zhang, NPJ Computational Materials 5, 88 (2019).
  • (22) M. D. Caio, M. Caccin, P. Baireuther, T. Hyart, and M. Fruchart, arXiv:1901.03346 (2019)
  • (23) L.-F. Zhang, L.-Z. Tang, Z.-H. Huang, G.-Q. Zhang, W. Huang, D.-W. Zhang, Phys. Rev. A 103, 012419 (2021).
  • (24) J. F. Rodriguez-Nieva and M. S. Scheurer, Nature Physics 15, 790 (2019).
  • (25) E. Greplova, A. Valenti, G. Boschung, F. Schafer, N. Larch, and S. D. Huber, New Journal of Physics 22, 045003 (2020).
  • (26) M. S. Scheurer and R.-J. Slager, Phys. Rev. Lett. 124, 226401 (2020).
  • (27) Y. LeCun, Modeles connexionnistes de l’apprentissage, Ph.D. thesis, Université Pierre et Marie Curie (1987).
  • (28) H. Bourlard and Y. Kamp, Biological Cybernetics 59, 291 (1988).
  • (29) G. E. Hinton and R. S. Zemel, NIPS’1993.
  • (30) Y.-H. Tsai, M.-Z. Yu, Y.-H. Hsu, and M.-C. Chung, Phys. Rev. B 102 054512 (2020).
  • (31) A. Kitaev, Phys.-Usp. 44, 131 (2001).
  • (32) W. P. Su, J. R. Schrieffer, and A. J. Heeger, Phys. Rev. Lett. 42, 1698 (1979).
  • (33) In fact, if one slightly breaks time-reversal and particle-hole symmetries of a SSH model, it would become a topological system characterized by a Z2Z_{2} number within AIII class.
  • (34) Andreas P. Schnyder, Shinsei Ryu, Akira Furusaki, and Andreas W. W. Ludwig, Phys. Rev. B 78, 195125 (2008).
  • (35) See, for instance, N. Batra and G. Sheet, Resonance 25 765 (2020).
  • (36) K. Pearson, LIII, Dublin Philos. Mag. J. Sci. 2, 559 (1901).
  • (37) I. Jolliffe, Principal Component Analysis (Wiley, Hoboken, New Jersey, 2002).
  • (38) 99%99\% is simply an empirical value to help select out an optimal dimension for a feature space without keeping too much noisy information.
  • (39) J. B. MacQueen, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. 1, University of California Press, 281 (1967).
  • (40) S. P. Lloyd, IEEE Transactions on Information Theory 28 (2), 129 (1982).
  • (41) P. J. Rousseeuw, Computational and Applied Mathematics 20, 53 (1987).
  • (42) R.C. de Amorim and C. Hennig, Information Sciences 324, 126 (2015).
  • (43) I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press (2016).
  • (44) P. Mehta, M. Bukov, C.-H. Wang, A. G.R. Day, C. Richardson, C. K. Fisher, and D. J. Schwab, Phyics Reports 810, 1 (2019).
  • (45) K. He, X. Zhang, S. Ren, and J. Sun, Computer Vision and Pattern Recognition (CVPR) (2016).
  • (46) K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, ICCV’09 (2009).
  • (47) D. P. Kingma and M.Welling, ICML’13 (2013).
  • (48) See all model architectures in the Appendix A.
  • (49) The input training data (made of OPEEs) taken here are not squared in order to follow the same supervised learning procedure adopted in the type-I (MCM) case for refining the phase boundaries.
  • (50) A. Paszke et al., Automatic differentiation in PyTorch, NIPS’17 (2017).
  • (51) T. M. Phuong, Z. Lin, and R. B. Altman, IEEE Computational Systems Bioinformatics Conference, 301 (2005).
  • (52) M. T. Ribeiro, S. Singh, and C. Guestrin, “Why Should I Trust You?”, New York, NY, USA (2016).
  • (53) A. Shrikumar, P. Greenside, and A. Kundaje, “Learning Important Features Through Propagating Activation Differences”, PMLR. 3145 (2017).
  • (54) S. M. Lundberg and S.-I. Lee, Advances in Neural Information Processing Systems 30, 4765 (2021).
  • (55) K. He, X. Zhang, S. Ren, J. Sun, arXiv:1512.03385 (2015).