This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Machine learning topological invariants of non-Hermitian systems

Ling-Feng Zhang Guangdong Provincial Key Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China    Ling-Zhi Tang Guangdong Provincial Key Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China    Zhi-Hao Huang Guangdong Provincial Key Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China    Guo-Qing Zhang [email protected] Guangdong Provincial Key Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China Guangdong-Hong Kong Joint Laboratory of Quantum Matter, Frontier Research Institute for Physics, South China Normal University, Guangzhou 510006, China    Wei Huang Guangdong Provincial Key Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China    Dan-Wei Zhang [email protected] Guangdong Provincial Key Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China Guangdong-Hong Kong Joint Laboratory of Quantum Matter, Frontier Research Institute for Physics, South China Normal University, Guangzhou 510006, China
Abstract

The study of topological properties by machine learning approaches has attracted considerable interest recently. Here we propose machine learning the topological invariants that are unique in non-Hermitian systems. Specifically, we train neural networks to predict the winding of eigenvalues of four prototypical non-Hermitian Hamiltonians on the complex energy plane with nearly 100%100\% accuracy. Our demonstrations in the non-Hermitian Hatano-Nelson model, Su-Schrieffer-Heeger model and generalized Aubry-André-Harper model in one dimension, and two-dimensional Dirac fermion model with non-Hermitian terms show the capability of the neural networks in exploring topological invariants and the associated topological phase transitions and topological phase diagrams in non-Hermitian systems. Moreover, the neural networks trained by a small data set in the phase diagram can successfully predict topological invariants in untouched phase regions. Thus, our work paves the way to revealing non-Hermitian topology with the machine learning toolbox.

I Introduction

Machine learning, which lies at the core of the artificial intelligence and data science, has recently achieved huge success from industrial applications (especially in computer vision and the natural language process) to fundamental researches in physics, cheminformatics and biology Jordan and Mitchell (2015); LeCun et al. (2015); Goodfellow et al. (2016); Carleo et al. (2019). In physics, machine learning has shown its availability in experimental data analysis Biswas et al. (2013); Rem et al. (2019); Kasieczka et al. (2019) and classification of phases of matter Wang (2016); Carrasquilla and Melko (2017); Zhang and Kim (2017); Deng et al. (2017); Huembeli et al. (2019); Dong et al. (2019); Van Nieuwenburg et al. (2017); Carvalho et al. (2018); Zhang et al. (2018a); Sun et al. (2018); Huembeli et al. (2018); Tsai et al. (2020); Ming et al. (2019); Rodriguez-Nieva and Scheurer (2019); Holanda and Griffith (2020); Ohtsuki and Mano (2020). Among these applications, one of the most interesting problems is to extract the global properties of topological phases of matter from local inputs, such as the topological invariants that intrinsically nonlocal. Recent works have shown that artificial neural networks can be trained to predict the topological invariants of band insulators with a high accuracy Zhang et al. (2018a); Sun et al. (2018). The advantage of this approach is that the neural network can capture global topology directly from local raw data inputs. Other theoretical proposals for identifying topological phases by using supervised or unsupervised learning have been suggested Carvalho et al. (2018); Huembeli et al. (2018); Rodriguez-Nieva and Scheurer (2019); Holanda and Griffith (2020); Ohtsuki and Mano (2020); Zhang et al. (2020a); Long et al. (2020); Scheurer and Slager (2020); Balabanov and Granath (2020, 2021). Notably, the convolutional neural network (CNN) trained from raw experimental data has been demonstrated to identify topological phases Rem et al. (2019); Lian et al. (2019).

On the other hand, growing efforts have been invested in uncovering exotic topological states and phenomena in non-Hermitian systems in recent years Diehl et al. (2008); Malzard et al. (2015); Lee (2016); Yao and Wang (2018); Yao et al. (2018); Song et al. (2019); Kunst et al. (2018); Takata and Notomi (2018); Wang et al. (2019); Zeng et al. (2017); Lang et al. (2018); Hamazaki et al. (2019); Jin and Song (2019); Kawabata et al. (2019a); Liu et al. (2019); Lee et al. (2019); Yamamoto et al. (2019); Hatano and Nelson (1996, 1997); Gong et al. (2018); Ghatak and Das (2019); Leykam et al. (2017); Shen et al. (2018); Zhang et al. (2018b, 2020b, 2020c); Luo and Zhang ; Jiang et al. (2019); Longhi (2019); Liu et al. (2020a); Wu and An (2020); Zeng et al. (2020); Zeng and Xu (2020); Tang et al. (2020); Liu et al. (2020b); Zhang et al. (2020d); Xu and Chen (2020); Liu et al. (2020c); Xi et al. ; Lee et al. (2020); Yoshida et al. (2019). The non-Hermiticity may come from gain and loss effects Zeng et al. (2017); Lang et al. (2018); Takata and Notomi (2018); Wang et al. (2019); Hamazaki et al. (2019), non-reciprocal hoppings Hatano and Nelson (1996, 1997), or dissipations in open systems Diehl et al. (2008); Malzard et al. (2015). Non-Hermiticity-induced topological phases are also investigated in disordered Zhang et al. (2020c); Luo and Zhang ; Jiang et al. (2019); Longhi (2019); Liu et al. (2020a); Wu and An (2020); Zeng et al. (2020); Zeng and Xu (2020); Tang et al. (2020); Liu et al. (2020b) and interacting systems Zhang et al. (2020d); Xu and Chen (2020); Liu et al. (2020c); Xi et al. ; Lee et al. (2020); Yoshida et al. (2019). In non-Hermitian topological systems, there are not only topological properties defined by the eigenstates (such as topological Bloch bands), but also topological invariants solely lying on the eigenenergies. For instance, complex energy landscapes and exceptional points give rise to different topological invariants, which include the winding number (vorticity) defined solely in the complex energy plane Gong et al. (2018); Ghatak and Das (2019); Leykam et al. (2017); Shen et al. (2018). This winding number and several closely related winding numbers in the presence of symmetries can lead to a richer topological classification than that of their Hermitian counterparts. In addition, it was revealed Borgnia et al. (2020); Okuma et al. (2020); Zhang et al. (2020e) that the nonzero winding number in the complex energy plane is the topological origin of the so-called non-Hermitian skin effect Lee (2016); Yao and Wang (2018); Yao et al. (2018); Song et al. (2019); Kunst et al. (2018). Considering that the topological invariants in Hermitian systems have been studied recently based on the machine learning approach  Carvalho et al. (2018); Zhang et al. (2018a); Sun et al. (2018); Huembeli et al. (2018); Rodriguez-Nieva and Scheurer (2019); Holanda and Griffith (2020); Ohtsuki and Mano (2020); Zhang et al. (2020a); Long et al. (2020); Scheurer and Slager (2020), the flexibility of machine learning such a different kind of winding number in non-Hermitian systems is urgent and meaningful research.

In this work, we adapt machine learning with neural networks to predict non-Hermitian topological invariants and classify the topological phases in several prototypical non-Hermitian models in one and two dimensions. We first take the Hatano-Nelson model Hatano and Nelson (1996, 1997) as a feasibility verification of machine learning in identifying non-Hermitian topological phases. We show that the trained CNN can predict the winding numbers of eigenenergies with a high accuracy even for those phases that are not included in the training, whereas the fully connected neural network (FCNN) can only predict those in the trained phases. We interpolate the intermediate value of the CNN and find a strong relationship with the winding angle of the eigenenergies in the complex plane. We then use the CNN to study topological phase transitions in a non-Hermitian Su-Schrieffer-Heeger (SSH) model Su et al. (1979) with non-reciprocal hopping. We find that the CNN can precisely detect the transition points near the phase boundaries even though trained only by the data in the deep phase region. By using the CNN, we further obtain the topological phase diagram of a non-Hermitian generalized Aubry-André-Harper (AAH) model Harper (1955); Aubry and Andre (1980); Liu et al. (2015) with non-reciprocal hopping and a complex quasiperiodic potential. The winding numbers evaluated from the CNN show an accuracy of more than 99% with theoretical values in the whole parameter space, even though the complex on-site potential is absent in the training process. Finally, we extend our scenario to a two-dimensional non-Hermitian Dirac fermion model Shen et al. (2018) and show the feasibility of neural networks in revealing the winding numbers associated with exceptional points. Our work may provide an efficient and general approach to reveal non-Hermitian topology based on the machine learning toolbox.

The rest of this paper is organized as follows. We first study the winding number of the Hatano-Nelson model as a feasibility verification of our machine learning method in Sec. II. Different performances of the CNN and the FCNN are also discussed. Section III is devoted to revealing the topological phase transition in the non-Hermitian SSH model by the CNN. In Sec. IV, we show that the CNN can precisely predict the topological phase diagram of the non-Hermitian generalized AAH model. In Sec. V, we extend our scenario to reveal the winding numbers associated with exceptional points in a two-dimensional non-Hermitian Dirac fermion model. A further discussion and short summary are finally presented in Sec. VI.

II Learning topological invariants in Hatano-Nelson model

Let us begin with the Hatano-Nelson model, which is a prototypical single-band non-Hermitian model and takes the following Hamiltonian in a one-dimensional lattice of length LL Hatano and Nelson (1996, 1997):

H1=jL(trc^j+μc^j+tlc^jc^j+μ+Vjc^jc^j).H_{1}=\sum_{j}^{L}(t_{r}\hat{c}^{\dagger}_{j+\mu}\hat{c}_{j}+t_{l}\hat{c}^{\dagger}_{j}\hat{c}_{j+\mu}+V_{j}\hat{c}^{\dagger}_{j}\hat{c}_{j}). (1)

Here tltrt_{l}\neq t_{r}^{*} denotes the amplitudes of non-reciprocal hopping, c^j(c^j)\hat{c}^{\dagger}_{j}(\hat{c}_{j}) is the creation (annihilation) operator at the jj-th lattice site, μ\mu denotes the hopping length between two sites, and VjV_{j} is the on-site energy in the lattice. The original Hatano-Nelson model takes the disorder potential with random VjV_{j} and the nearest-neighbor hopping with μ=1\mu=1, as shown in Fig. 1(a). Here we consider the clean case by setting Vj=0V_{j}=0 and take μ\mu as a parameter in learning the topological phase transition with neural networks. Under the periodic boundary condition, the corresponding eigenenergies in this case are given by

E1(k)=1(k)=treiμk+tleiμk,E_{1}(k)=\mathcal{H}_{1}(k)=t_{r}e^{-i\mu k}+t_{l}e^{i\mu k}, (2)

where 1(k)\mathcal{H}_{1}(k) is the Hamiltonian in momentum space with the quasimomentum k=0,2π/L,4π/L,,2πk=0,2\pi/L,4\pi/L,\cdots,2\pi.

Refer to caption
Figure 1: (Color online) (a) The Hatano-Nelson model with non-reciprocal hopping between two nearest-neighbor sites (μ=1\mu=1). (b) The complex eigenenergy draws a closed loop around the base energy EB=0E_{B}=0 during the variation of quasimomentum kk from 0 to 2π2\pi, giving rise to the winding number w=±1w=\pm 1 for the counterclockwise and clockwise windings, respectively.

Following Ref. Gong et al. (2018), we can define the winding number in the complex energy place as a topological invariant in the Hatano-Nelson model,

w\displaystyle w =02πdk2πiklndet1(k)\displaystyle=\int_{0}^{2\pi}\frac{dk}{2\pi i}\partial_{k}\ln\det\mathcal{H}_{1}(k)
=02πdk2πkargE1(k)={μ,|tr|<|tl|,μ,|tr|>|tl|,\displaystyle=\int_{0}^{2\pi}\frac{dk}{2\pi}\partial_{k}\arg E_{1}(k)=\left\{\begin{array}[]{ll}\mu,&|t_{r}|<|t_{l}|,\\ -\mu,&|t_{r}|>|t_{l}|,\end{array}\right. (5)

where arg\arg denotes the principal value of the argument belonging to [0,2π)[0,2\pi). For a discretized E1(k)E_{1}(k) with finite lattice site LL, the complex-energy winding number reduces to

w=12πn=1LΔθ(n)=12πn=1L[θ(n)θ(n1)],{}w=\frac{1}{2\pi}\sum_{n=1}^{L}\Delta\theta(n)=\frac{1}{2\pi}\sum_{n=1}^{L}[\theta(n)-\theta(n-1)], (6)

where θ(n)=argE1(2πn/L)\theta(n)=\arg E_{1}(2\pi n/L). Note that for Hermitian systems (tr=tlt_{r}=t_{l}^{*}), one has w=0w=0 due to the real energy spectrum with argE1(k)=0,π\arg E_{1}(k)=0,\pi. According to this definition, a nontrivial winding number gives the number of times the complex eigenenergy encircles the base point EB=0E_{B}=0, which is unique to non-Hermitian systems. The complex eigenenergy windings for two typical cases with w=±1w=\pm 1 are shown in Fig. 1(b). To examine whether the neural networks have the ability to learn the winding number in a general formalism, we enable the parameter μ\mu to control the number of times the complex eigenenergy encircles the origin of the complex plane. When the loop winds around the origin μ\mu times during the variation of kk from 0 to 2π2\pi, the winding number is ±μ\pm\mu, where ±\pm indicates counterclockwise and clockwise windings, respectively.

Refer to caption
Figure 2: (Color online) Schematic of machine learning workflow and the structure of neural networks for the Hatano-Nelson (HN) model, non-Hermitian SSH (NHSSH) model, and non-Hermitian generalized AAH (NHGAAH) model. The input data are represented by an (L+1)×2(L+1)\times 2-dimensional matrix for the CNN and a 2×(L+1)2\times(L+1)-dimensional vector for the FCNN, respectively. Here dRd_{R} and dId_{I} denote the real and imaginary parts of the input data (complex eigenenergies), respectively.

We now build a supervised task for learning the winding number given by Eq. (6) based on neural networks. First, we need labeled data sets for the training and evaluation. Since the winding number is intrinsically nonlocal and characterized by a complex energy spectrum, we feed neural networks with the normalized spectrum-dependent configurations 𝐝(n)=[𝐝R(n),𝐝I(n)]\mathbf{d}(n)=[\mathbf{d}_{R}(n),\mathbf{d}_{I}(n)] at LL points discretized uniformly from 0 to 2π2\pi, where 𝐝R(n)=Re[E1(2πn/L)]\mathbf{d}_{R}(n)=\mathrm{Re}[E_{1}(2\pi n/L)] and 𝐝I(n)=Im[E1(2πn/L)]\mathbf{d}_{I}(n)=\mathrm{Im}[E_{1}(2\pi n/L)]. Therefore, the input data are an (L+1)×2(L+1)\times 2-dimensional matrix of the form

[𝐝R(0)𝐝R(2π/L)𝐝R(4π/L)𝐝R(2π)𝐝I(0)𝐝I(2π/L)𝐝I(4π/L)𝐝I(2π)]T,\left[\begin{array}[]{ccccc}\mathbf{d}_{R}(0)&\mathbf{d}_{R}(2\pi/L)&\mathbf{d}_{R}(4\pi/L)&...&\mathbf{d}_{R}(2\pi)\\ \mathbf{d}_{I}(0)&\mathbf{d}_{I}(2\pi/L)&\mathbf{d}_{I}(4\pi/L)&...&\mathbf{d}_{I}(2\pi)\end{array}\right]^{T},

with a period of 2π2\pi: 𝐝(n)=𝐝(n+2π)\mathbf{d}(n)=\mathbf{d}(n+2\pi). In the following, we set L=32L=32, which is large enough to take discrete energy spectra as the input data of neural networks. Labels are computed according to Eq. (6) for the corresponding configurations.

The machine learning workflow is schematically shown in Fig. 2. For the Hatano-Nelson model with different μ\mu, the output of the neural network is a real number w~\tilde{w}, and the predicted winding number is interpreted as the integer that is closest to w~\tilde{w}. We first train the neural networks with both complex spectrum configurations and their corresponding true winding numbers. After the training, we feed only the complex-spectrum-dependent configurations to the neural networks and compare their predictions with the true winding numbers, from which we determine the percentage of the correct predictions as the accuracy. In this case, we consider two typical classes of neural networks: the CNN and FCNN, respectively. The neural networks are similar to those in Ref. Zhang et al. (2018a) for calculating the winding number of the Bloch vectors in Hermitian topological bands.

The CNN in our training has two convolution layers with 32 kernels of size 1×2×21\times 2\times 2 and 1 kernel of size 32×1×132\times 1\times 1, followed by a fully connected layer of two neurons before the output layer. The total number of trainable parameters is 262. The FCNN has two hidden layers with 32 and 2 neurons, respectively. The total number of trainable parameters is 2213. The architecture of two classes of neural networks is shown in Fig. 2. All hidden layers have rectified linear units f(x)=max(0,x)f(x)=\max{(0,x)} as activation functions and the output layer has linear activation function f(x)=xf(x)=x. The objective function to be optimized is defined by

J1=1Ni=1N(w~iwi)2,J_{1}=\frac{1}{N}\sum_{i=1}^{N}(\tilde{w}_{i}-w_{i})^{2}, (7)

where w~i\tilde{w}_{i} and wiw_{i} are, respectively, the winding number of the iith complex eigenenergies predicted by the neural networks and the true winding number, and NN is the total number of the training data set. We take 6×1046\times 10^{4} training configurations, which consist of a ratio of 1:1:11:1:1 of them having winding numbers {±1,±2,±3}\{\pm 1,\pm 2,\pm 3\}, respectively. The test set consists of some configurations with winding numbers w{±1,±2,±3}w\in\{\pm 1,\pm 2,\pm 3\} that are not included in the training set and w{±4,±5}w\in\{\pm 4,\pm 5\} that are not seen by neural networks during the training. The number of configurations for each kind of winding number is 4×1034\times 10^{3}. The training details are given in the Appendix A.

Refer to caption
Figure 3: (Color online) (a) Winding numbers predicted by the CNN on test data sets. Different colors represent different true winding numbers, and each test set contains 4×1034\times 10^{3} complex spectrum configurations. (b) Probability distribution of the predicted winding number from the CNN on test data sets. The sum of the probability distribution for a test set (bins of the same color) is equal to 1 and there are narrow peaks at each integer. (c) The intermediate output ana_{n}, which is the activation value after two convolutional layers versus the corresponding exact winding angle Δθ(n)\Delta\theta(n). 10L10L points corresponding 10 different test configurations are plotted.

After training, we test with other configurations and the predicted winding numbers w~\tilde{w} are shown in Fig. 3 (a). Note that the networks tend to produce w~\tilde{w} close to integers and thus we take each final winding number as the integer closest to w~\tilde{w}. As shown in Fig. 3 (b), we plot the probability distribution of w~\tilde{w} predicted from the CNN on different test data sets. The test results of two neural networks are presented in Table. 1, which shows a very high accuracy (more than 98%98\%) of the CNN and FCNN on the test data set with the winding numbers w={±1,±2,±3}w=\{\pm 1,\pm 2,\pm 3\}. We can find that the CNN performs generally better than the FCNN. Surprisingly, the CNN works well even in the cases of w={±4,±5}w=\{\pm 4,\pm 5\}, which consist of configurations with larger winding numbers not seen by neural networks during the training. On the contrary, the FCNN cannot predict the true winding numbers even though it has more trainable parameters. These results indicate that the convolutional layer respects the translation symmetry of complex spectrum in the momentum space explicitly and convolutional layers can take local winding angle Δθ\Delta\theta explicitly through the 2×22\times 2 kernels.

To further see the advantage of the CNN, we open up the black box of neural networks and find the relationship between intermediate activation values and physical quantities, i.e. the winding angle Δθ\Delta\theta. Based on the convolutional layers, we consider that the activation value after two convolutions should have a linear dependence on Δθ\Delta\theta to some extent and the following fully-connected layers use a simple linear regression. We plot ana_{n} versus Δθ(n)\Delta\theta(n), with n=1,,Ln=1,...,L and ana_{n} being the nn-th component of intermediate values after two convolution layers. As shown in Fig. 3 (c), the intermediate output is approximately linear with Δθ\Delta\theta within certain regions. A linear combination of these intermediate values with correct coefficients in the following fully connected layers can then easily lead to the true winding number. In this way, the CNN realizes a calculation workflow that is equivalent to the wingding angle Δθ\Delta\theta in Eq. (6).

ww ±1\pm 1 ±2\pm 2 ±3\pm 3 ±𝟒\mathbf{\pm 4} ±𝟓\mathbf{\pm 5}
CNN Accuracy 99.8 % 99.4 % 98.0% 96.7%\mathbf{96.7\%} 96.0%\mathbf{96.0\%}
FCNN Accuracy 99.2% 99.0% 98.5% 0.0%\mathbf{0.0\%} 0.0%\mathbf{0.0\%}
Table 1: Accuracy of the CNN and FCNN on test data set with the winding numbers w={±1,±2,±3,±4,±5}w=\{\pm 1,\pm 2,\pm 3,\pm 4,\pm 5\} in the Hatano-Nelson model with μ=1,2,3,4,5\mu=1,2,3,4,5. The winding numbers w={±4,±5}w=\{\pm 4,\pm 5\} are not seen by the neural networks during the training.

III Learning topological transition in non-Hermitian SSH model

Based on the accurate winding number calculated by the CNN, we further use a similar CNN to study topological phase transitions in the non-Hermitian SSH model, as shown in Fig. 4 (a). The considered model with nonreciprocal intra-cell hopping in the one-dimensional dimerized lattice of LL unit cells can be described by the following Hamiltonian:

H2=n=1L[(tδ)a^nb^n+(t+δ)b^na^n+ta^n+1b^n+tb^na^n+1].H_{2}=\sum_{n=1}^{L}[(t-\delta)\hat{a}^{\dagger}_{n}\hat{b}_{n}+(t+\delta)\hat{b}^{\dagger}_{n}\hat{a}_{n}+t^{\prime}\hat{a}^{\dagger}_{n+1}\hat{b}_{n}+t^{\prime}\hat{b}^{\dagger}_{n}\hat{a}_{n+1}]. (8)

Here a^n\hat{a}^{\dagger}_{n} and b^n\hat{b}^{\dagger}_{n} (a^n\hat{a}_{n}, b^n\hat{b}_{n}) denote the creation (annihilation) operators on the nn-th AA and BB sublattices, tt is the uniform intra-cell hopping amplitude, δ\delta is the non-Hermitian parameter, and tt^{\prime} is the inter-cell hopping amplitude. When δ=0\delta=0, this model reduces to the Hermitian SSH model. Under the periodic boundary condition, the corresponding Hamiltonian in the momentum space is given by

2(k)=(0teik+tδteik+t+δ0).\mathcal{H}_{2}(k)=\left(\begin{array}[]{cc}0&t^{\prime}e^{-ik}+t-\delta\\ t^{\prime}e^{ik}+t+\delta&0\end{array}\right). (9)

The two energy bands are then given by

E±(k)=±1+t2δ2+2tcoski2δsink.E_{\pm}(k)=\pm\sqrt{1+t^{2}-\delta^{2}+2t\cos{k}-i2\delta\sin{k}}. (10)

Following Ref. Leykam et al. (2017); Shen et al. (2018); Gong et al. (2018); Ghatak and Das (2019); Kawabata et al. (2019b) and considering the sublattice symmetry, one can define an inter-band winding number

w±=02πdk2πkarg(E+E)=02πdk4πkargE+2.w_{\pm}=\int_{0}^{2\pi}\frac{dk}{2\pi}\partial_{k}\arg(E_{+}-E_{-})=\int_{0}^{2\pi}\frac{dk}{4\pi}\partial_{k}\arg E_{+}^{2}. (11)

For discretized E±(k)E_{\pm}(k) with finite LL, it reduces to

w±=14πn=1L[θ(n)θ(n1)],w_{\pm}=\frac{1}{4\pi}\sum_{n=1}^{L}[\theta^{\prime}(n)-\theta^{\prime}(n-1)], (12)

with θ(n)=argE+2(2πn/L)\theta^{\prime}(n)=\arg{E_{+}^{2}(2\pi n/L)} in this model. Notably, w±w_{\pm} is half the total windings of teik+tδt^{\prime}e^{-ik}+t-\delta and teik+t+δt^{\prime}e^{ik}+t+\delta around the origin of the complex plane as kk is increased from 0 to 2π2\pi. The inter-band winding number w±w_{\pm} is quantized as /2\mathbb{Z}/2 because the windings of teik+tδt^{\prime}e^{-ik}+t-\delta and teik+t+δt^{\prime}e^{ik}+t+\delta are always integers due to periodicity Shen et al. (2018). We consider t=1t^{\prime}=1, t(6,6)t\in(-6,6), and δ(6,6)\delta\in(-6,6) in our study.

Refer to caption
Figure 4: (Color online) (a) The non-Hermitian SSH model with non-reciprocal hopping modulated by the parameter δ\delta. (b) The accuracy of two test sets versus the distance threshold TT. For each TT, data sets are regenerated and the CNN is retrained and retested. (c) Classification probabilities outputted by the CNN in test set II with T=0.2T=0.2, where true phase transition points are located at δ={1.5,0.5,0.5,1.5}\delta=\{-1.5,-0.5,0.5,1.5\}. The predicted phase transition points are located at the crossing point of prediction probabilities. Different colors represent different winding numbers.

For this model, we set the configuration of input data as 𝐝(n)={Re[E+2(2πn/L)],Im[E+2(2πn/L)]}\mathbf{d}(n)=\{\mathrm{Re}[E_{+}^{2}(2\pi n/L)],\mathrm{Im}[E_{+}^{2}(2\pi n/L)]\}. To learn the topological phase transition in this model, we treat it as a classification task assisted by neural networks. The output of the neural network is the probabilities of different winding numbers. We define {P1,P2,P3}\{P_{1},P_{2},P_{3}\} as the output probabilities of winding numbers w~±={0,0.5,0.5}\tilde{w}_{\pm}=\{0,0.5,-0.5\}, respectively. The predicted winding number is interpreted as w~±\tilde{w}_{\pm}, which has the highest probability. The architecture of the CNN is shown in Fig. 2, with some training details given in the Appendix A. For our task, the objective function to be optimized is defined by

J2=1N[i=1Nj=1nw=31(w±(i)=w~±,j)log2(Pj))],J_{2}=-\frac{1}{N}[\sum_{i=1}^{N}\sum_{j=1}^{n_{w}=3}1(w_{\pm}^{(i)}=\tilde{w}_{\pm,j})\log_{2}(P_{j}))], (13)

where w±(i)w_{\pm}^{(i)} is the label of the iith configuration, and the set {w~±,1,w~±,2,,w~±,nw}\{\tilde{w}_{\pm,1},\tilde{w}_{\pm,2},...,\tilde{w}_{\pm,n_{w}}\} represents the winding number predicted by the neural networks. The expression 1(w±(i)=w~±,j)1(w_{\pm}^{(i)}=\tilde{w}_{\pm,j}) means that it will take the value 1 when the condition w±(i)=w~±,j{w_{\pm}^{(i)}}=\tilde{w}_{\pm,j} is satisfied and the value 0 in the opposite case. In this model, nw=3n_{w}=3 and {w~±,1,w~±,2,w~±,3}\{\tilde{w}_{\pm,1},\tilde{w}_{\pm,2},\tilde{w}_{\pm,3}\} represent the winding numbers w={0,0.5,0.5}w=\{0,0.5,-0.5\} correspondingly.

To see whether the CNN is a good tool to study topological phase transitions in this model, we define a Euclidean distance ss between the configuration and the phase boundaries in the parameter space of the Hamiltonian:

s=|Aδ+Bt+C|A2+B2,s=\frac{|A\delta+Bt+C|}{\sqrt{A^{2}+B^{2}}}, (14)

where Aδ+Bt+C=0A\delta+Bt+C=0 (straight lines in the parameter space about δ\delta and tt) is the equation of phase boundaries with A,B,CA,B,C being the parameters of the equation. In addition, we define a distance threshold TT. In the following, we choose T=0.2T=0.2 as a demonstration and the situation of 0.2<T0.60.2<T\leq 0.6 is discussed later. The training data set consists of 2.4×1042.4\times 10^{4} configurations satisfying sTs\geq T that are sampled from different phases with different winding numbers.

We test the CNN with two test data sets: (I) 6×1036\times 10^{3} configurations satisfying s<Ts<T and (II) 300 configurations distributed uniformly in t=0.5,δ=[3,3]t=0.5,\delta=[-3,3]. The data sets distribution and some training details are given in the Appendix A. After the training, both test data sets, I and II, are evaluated by the CNN. We use the same training and test workflow for T=0.3,0.4,0.5,and0.6T=0.3,0.4,0.5,and0.6. Figure 4 (b) shows the accuracy of the test data sets versus the distance threshold TT. We find that the CNN achieves a high accuracy in different TT, meaning that the CNN can detect the phase transitions precisely in these regions. Moreover, we locate the phase transition points from the crossing points of prediction probabilities; the phase transitions determined by this method are relatively accurate, as shown in Fig. 4 (c). In the deep phase, the probability for the true winding number w±w_{\pm} stays at nearly 100%100\% . On the other hand, the probability for w±w_{\pm} increases linearly at the phase transitions. In a word, the CNN is a great supplementary tool to study the phase transitions when only phase properties in some confident regions (e.g., the deep phase) are provided.

Refer to caption
Figure 5: (Color online) (a) Non-Hermitian generalized AAH model with non-reciprocal hopping and a complex quasiperiodic potential. (b) Test accuracy table with respect to two non-Hermiticity parameters, α\alpha and hh. (c) The upper (lower) figure is the topological phase diagram predicted by the CNN for h=1.2h=1.2 and α=0.55\alpha=0.55 (h=1.6h=1.6 and α=1.95\alpha=1.95). Misclassified samples are distributed on the topological transition boundary.

IV Learning topological phase diagram in non-Hermitian AAH model

To show that our results can be generalized to other non-Hermitian topological models, we consider a generalized AAH model in a one-dimensional quasicrystal as shown in Fig. 5 (a), with two kinds of non-Hermiticities arising from the nonreciprocal hopping Jiang et al. (2019) and complex on-site potential phase Longhi (2019). The Hamiltonian of such a non-Hermitian AAH model is given by Tang et al. (2021)

H3=j(tj(r)c^j+1c^j+tj(l)c^jc^j+1+Δjn^j),\displaystyle H_{3}=\sum_{j}(t^{(r)}_{j}\hat{c}^{\dagger}_{j+1}\hat{c}_{j}+t^{(l)}_{j}\hat{c}^{\dagger}_{j}\hat{c}_{j+1}+\Delta_{j}\hat{n}_{j}), (15)

where the non-reciprocal hopping terms and the on-site potential are parameterized as

tj(r)\displaystyle t^{(r)}_{j} ={t+V2cos[2π(j+1/2)β]}eα,\displaystyle=\{t+V_{2}\cos[2\pi(j+1/2)\beta]\}e^{-\alpha},
tj(l)\displaystyle t^{(l)}_{j} ={t+V2cos[2π(j+1/2)β]}eα,\displaystyle=\{t+V_{2}\cos[2\pi(j+1/2)\beta]\}e^{\alpha}, (16)
Δj\displaystyle\Delta_{j} =V1cos(2πjβ+ih).\displaystyle=V_{1}\cos{(2\pi j\beta+ih)}.

Here tj(r)t^{(r)}_{j} (tj(l))(t^{(l)}_{j}) denotes the right-hopping (left-hopping) amplitude between the jj-th and the (j+1)(j+1)-th site with parameters t>0t>0 and V2V_{2} being real, Δj\Delta_{j} denotes the complex quasiperiodic potential with V1>0V_{1}>0 and β\beta an irrational number, and the parameters α\alpha and hh tune the non-reciprocity and complex phase, respectively. For finite quasiperiodic systems, one can take the lattice site number L=Fj+1L=F_{j+1} as a rational number and β=Fj/Fj+1\beta=F_{j}/F_{j+1} with FjF_{j} being the jj-th Fibonacci number since limjFj/Fj+1=(51)/2\lim_{j\rightarrow\infty}F_{j}/F_{j+1}=(\sqrt{5}-1)/2. In the following, we set t=1t=1 and L=89L=89.

The winding numbers discussed previously cannot be directly used here due to the periodicity breaking. In this case, one can consider a ring chain with an effective magnetic flux Φ\Phi penetrating through the center, such that the Hamiltonian matrix can be rewritten as

H3(Φ)=(Δ1t1(l)tL(r)eiΦt1(r)Δ2t2(l)tL2(r)ΔL1tL1(r)tL(l)eiΦtL1(r)ΔL).H_{3}(\Phi)=\left(\begin{array}[]{ccccc}\Delta_{1}&t^{(l)}_{1}&&&t^{(r)}_{L}e^{-i\Phi}\\ t^{(r)}_{1}&\Delta_{2}&t^{(l)}_{2}&&\\ &\ddots&\ddots&\ddots&\\ &&t^{(r)}_{L-2}&\Delta_{L-1}&t^{(r)}_{L-1}\\ t^{(l)}_{L}e^{i\Phi}&&&t^{(r)}_{L-1}&\Delta_{L}\end{array}\right). (17)

One can define the winding number with respect to Φ\Phi and the energy base EBE_{B} Gong et al. (2018); Jiang et al. (2019):

wΦ=02πdΦ2πiΦlndet[H3(Φ)EB].w_{\Phi}=\int_{0}^{2\pi}\frac{\mathrm{d}\Phi}{2\pi i}\partial_{\Phi}\ln\det[{H_{3}(\Phi)-E_{B}}]. (18)

Here wΦw_{\Phi} counts the number of times the complex spectral trajectory encircles the energy base EBE_{B} (EBE_{B}\in\mathbb{C} does not belong to the energy spectrum) when the flux Φ\Phi varies from 0 to 2π2\pi. For discretized H3(Φ)H_{3}(\Phi) with Φ=0,2π/LΦ,4π/LΦ,,2π\Phi=0,2\pi/L_{\Phi},4\pi/L_{\Phi},\cdots,2\pi, the winding number can be rewritten as

wΦ=12πn=1LΦ[θ′′(n)θ′′(n1)],w_{\Phi}=\frac{1}{2\pi}\sum_{n=1}^{L_{\Phi}}[\theta^{\prime\prime}(n)-\theta^{\prime\prime}(n-1)], (19)

where θ′′(n)=argdet[H3(2πn/LΦ)EB]\theta^{\prime\prime}(n)=\arg{\det[H_{3}(2\pi n/L_{\Phi})-E_{B}]}.

Below we show that the generalization ability enables the CNN to precisely obtain topological phase diagrams of this non-Hermitian generalized AAH model, even though we only use nonreciprocal-hopping configurations in the training. To do this, we treat the problem as a classification task and set the configuration in this case as 𝐝(n)={Redet[H~3(n)],Imdet[H~3(n)]}\mathbf{d}(n)=\{\mathrm{Re}\det[\tilde{H}_{3}(n)],\mathrm{Im}\det[\tilde{H}_{3}(n)]\} with H~3(n)H3(2πn/LΦ)EB\tilde{H}_{3}(n)\equiv H_{3}(2\pi n/L_{\Phi})-E_{B}. The architecture of the CNN is similar to that of the non-Hermitian SSH model, but the output layer now becomes two neurons for two kinds of winding numbers. We define {P1,P2}\{P_{1},P_{2}\} as the output probabilities of the winding numbers w~Φ={0,1}\tilde{w}_{\Phi}=\{0,-1\}, respectively. The objective function in this case is given by [similarly to that in Eq. (13)]

J3=1N[i=1Nj=1nw=21(wΦ(i)=w~Φ,j)log2(Pj))],J_{3}=-\frac{1}{N}[\sum_{i=1}^{N}\sum_{j=1}^{n_{w}=2}1(w_{\Phi}^{(i)}=\tilde{w}_{\Phi,j})\log_{2}(P_{j}))], (20)

where {w~Φ,1,w~Φ,2}\{\tilde{w}_{\Phi,1},\tilde{w}_{\Phi,2}\} (with nw=2n_{w}=2) represent w~Φ={0,1}\tilde{w}_{\Phi}=\{0,-1\}, respectively.

To test the generality of the neural network, we train the neural network with configurations corresponding to Hamiltonians with h=0h=0, and test it by using configurations corresponding to Hamiltonians with both nonreciprocal hopping amplitudes (α0\alpha\neq 0) and complex potentials (h0h\neq 0). The training data set includes configurations with α[0.1,1.0]\alpha\in[0.1,1.0] and the interval Δα=0.1\Delta\alpha=0.1; each one consists of 3.2×1033.2\times 10^{3} configurations corresponding Hamiltonians sampled from the two-dimensional parameter space spanned by V1[0,4]×V2[0,2]V_{1}\in[0,4]\times V_{2}\in[0,2]. The test data set includes 110 pairs of parameters, which consist of α\alpha from α=0.15\alpha=0.15 to α=1.95\alpha=1.95 with the interval Δα=0.2\Delta\alpha=0.2 and hh from h=0.0h=0.0 to h=2.0h=2.0 with the interval Δh=0.2\Delta h=0.2. We sample 3.2×1033.2\times 10^{3} configurations corresponding to Hamiltonians from the region V1[0,4]×V2[0,2]V_{1}\in[0,4]\times V_{2}\in[0,2] for each pair of parameters.

After the training, we find that the CNN performs well even without knowledge of the complex on-site potential (h=0h=0) during the training process. Figure 5(b) shows the test accuracy table with respect to the two non-Hermiticity parameters α\alpha and hh, with the accuracy more than 99%99\% in the whole parameter region. Moreover, we present the topological phase diagrams with respect to V1V_{1} and V2V_{2} predicted by the CNN, as shown in Fig. 5(c). It is clear that the CNN performs excellently in the deep phase with only a little struggle near the topological phase transitions. We attribute the high accuracy in this learning task to two factors. First, the normalizing data enable both the training and the test data distribution in the complex unit, which is important for the generality of the neural network. Second, the topological transitions in this model are consistent with the real-complex transitions in the energy spectrum Tang et al. (2021), which reduces the complexity of the problem when input data are dependent on a complex spectrum.

V Generalization to two-dimensional model

Previously, we have used neural networks to investigate the topological properties of several non-Hermitian models in one dimension. In this section, we extend our scenario to reveal the winding numbers associated with exceptional points in the two-dimensional non-Hermitian Dirac fermion model proposed in Ref. Shen et al. (2018). The Dirac Hamiltonian with non-Hermitian terms in two-dimensional momentum space 𝐤=(kx,ky)\mathbf{k}=(k_{x},k_{y}) is given by Shen et al. (2018)

4(𝐤)=(kx+iκx)σx+(ky+iκy)σy+(m+iδm)σz,\mathcal{H}_{4}(\mathbf{k})=(k_{x}+i\kappa_{x})\sigma_{x}+(k_{y}+i\kappa_{y})\sigma_{y}+(m+i\delta_{m})\sigma_{z}, (21)

where σx,y,z\sigma_{x,y,z} are the Pauli matrices, κx,y\kappa_{x,y} denote the non-Hermitian modulation parameters, and mm and δm\delta_{m} denotes the real and imaginary parts of the Dirac mass, respectively. The corresponding energy dispersion is obtained as

E±(𝐤)=±k2κ2+m2δm2+2i(𝐤𝜿)+mδm,E_{\pm}(\mathbf{k})=\pm\sqrt{k^{2}-\kappa^{2}+m^{2}-\delta_{m}^{2}+2i(\mathbf{k}\cdot\bm{\kappa})+m\delta_{m}}, (22)

where k|𝐤|k\equiv|\mathbf{k}|, 𝜿(κx,κy)\bm{\kappa}\equiv(\kappa_{x},\kappa_{y}) and κ|𝜿|\kappa\equiv|\bm{\kappa}|. The inter-band winding number w±(Γ)w_{\pm}(\Gamma) is defined for the energies E+(𝐤)E_{+}(\mathbf{k}) and E(𝐤)E_{-}(\mathbf{k}) in the complex energy plane Shen et al. (2018):

w±(Γ)=Γdk2π𝐤arg[E+(𝐤)E(𝐤)],w_{\pm}(\Gamma)=\oint_{\Gamma}\frac{dk}{2\pi}\partial_{\mathbf{k}}\arg{[E_{+}(\mathbf{k})-E_{-}(\mathbf{k})]}, (23)

where Γ\Gamma is a closed loop in the two-dimensional momentum space. A nonzero winding number w±(Γ)w_{\pm}(\Gamma) implies a band degeneracy in the region enclosed by Γ\Gamma. For a pair of separable bands, the winding number can be nonzero only for non-contractible loops in the momentum space. Here we choose loop Γ\Gamma as a unit circle that encircles an exceptional point (a band degeneracy in non-Hermitian band structures) when the Hamiltonian has exceptional points; otherwise we randomly choose a closed loop. The exact topological phase diagram Shen et al. (2018) in the parameter space spanned by (m,κm,\kappa) is shown in Fig. 6(a). The winding number is 0 in the regime κ<|m|\kappa<|m|, and the corresponding Hamiltonian has a pair of separable bands without band degeneracies. In the regime κ>|m|\kappa>|m|, the two bands E±(𝐤)E_{\pm}(\mathbf{k}) cross at two isolated exceptional points 𝐤±\mathbf{k}_{\pm} in the two-dimensional momentum space Shen et al. (2018)

𝐤±=mδmκ𝐧^±(κ2m2)(κ2+δm2)κ𝐳^×𝐧^,\mathbf{k}_{\pm}=-\frac{m\delta_{m}}{\kappa}\hat{\mathbf{n}}\pm\frac{\sqrt{(\kappa^{2}-m^{2})(\kappa^{2}+\delta^{2}_{m})}}{\kappa}\hat{\mathbf{z}}\times\hat{\mathbf{n}}, (24)

where 𝐧^=𝜿/κ\hat{\mathbf{n}}=\bm{\kappa}/\kappa. For the regime κ>|m|\kappa>|m|, the inter-band winding numbers w±(Γ)w_{\pm}(\Gamma) circling an exceptional point are half-integers and have opposite signs for 𝐤±\mathbf{k}_{\pm}. Thus, the winding number w±(Γ)w_{\pm}(\Gamma) associated with the exceptional points characterizes topological phase transitions in this model. Note that here we consider the loop Γ\Gamma clockwise circling the exceptional point 𝐤+\mathbf{k}_{+} for the two energy bands E±(𝐤)E_{\pm}(\mathbf{k}) in the complex plane, as displayed in Fig. 6(b).

Refer to caption
Figure 6: (Color online) (a) Phase diagram of the two-dimensional non-Hermitian Dirac fermion model with δm=1\delta_{m}=1 and 𝜿=(κx,0)\bm{\kappa}=(\kappa_{x},0). (b) The pair switching of eigenvalues E±(𝐤)E_{\pm}(\mathbf{k}) (denoted by solid red and dashed blue lines) around an exceptional point (𝐤+\mathbf{k}_{+}; denoted by EP) gives rise to the winding number w±(Γ)=0.5w_{\pm}(\Gamma)=0.5. (c) The accuracy of two test sets versus the distance threshold TT. For each TT, the data sets are regenerated and the CNN is retrained and retested. (d) Classification probabilities outputted by the CNN in the test set II with κ=3\kappa=3 and T=0.5T=0.5. The predicted topological phase transition points (m=±3m=\pm 3) located at the crossing points of prediction probabilities. Different colors represent different winding numbers.

In the training, we discretize the loop Γ\Gamma to LL equally distributed points and set the configuration of input data as 𝐝(n)={ReΔE(n),ImΔE(n)}\mathbf{d}(n)=\{\mathrm{Re}{\Delta E(n)},\mathrm{Im}{\Delta E(n)}\} with ΔE(n)=E+(𝐤n)E(𝐤n)\Delta E(n)=E_{+}(\mathbf{k}_{n})-E_{-}(\mathbf{k}_{n}). The corresponding winding numbers are used as the data labels. We use a workflow similar to that described in Sec. III and a CNN with the same structure as described in Sec. IV to study topological phase transitions characterized by w±w_{\pm} in this two-dimensional non-Hermitian model. The training data set consists of 3×1043\times 10^{4} configurations satisfying sTs\geq T, sampled from different phases with different winding numbers. We test the CNN with two test data sets: (I) 6×1036\times 10^{3} configurations satisfying s<Ts<T and (II) 600 configurations distributed uniformly in κ=3,m[6,6]\kappa=3,m\in[-6,6]. The CNN evaluates both the test data sets, I and II, after the training. In Fig. 6(c), we plot the accuracy versus the distance threshold TT, where the CNN is able to detect the winding number precisely for different thresholds TT. Furthermore, the topological phase transitions can be revealed by the crossing points of the prediction probabilities as shown in Fig. 6(d). These results demonstrate the feasibility of neural networks in learning the topological invariants in two-dimensional non-Hermitian models.

VI Conclusions

In summary, we have demonstrated that artificial neural networks can be used to predict the topological invariants and the associated topological phase transitions and topological phase diagrams in four different non-Hermitian models with a high accuracy. The eigenenergy winding numbers in the Hatano-Nelson model are presented as a demonstration of our machine learning method. The CNN trained by the data set within the deep phases has been shown to correctly detect the phase transition near each boundary of the non-Hermitian SSH model. We have also investigated the non-Hermitian generalized AAH model with non-reciprocal hopping and a complex quasiperiodic potential. It is found that the topological phase diagram in the non-Hermiticity parameter space predicted by the CNN has a high accuracy with the theoretical counterpart. Furthermore, we have generalized our scenario to reveal the winding numbers associated with exceptional points in the two-dimensional non-Hermitian Dirac fermion model. Our results have shown the generality of the machine learning method in classifying topological phases in prototypical non-Hermitian models.

Finally, we make some remarks on future studies on machine learning non-Hermitian topology. Some exotic features of non-Hermitian topological systems are sensitive to the boundary condition, such as the non-Hermitian skin effect under open boundary conditions Lee (2016); Yao and Wang (2018); Yao et al. (2018); Song et al. (2019); Kunst et al. (2018), which is closely related to the winding number of complex eigenenergies Zhang et al. (2020e); Yang et al. (2020); Okuma et al. (2020). The energy spectrum under periodic boundary conditions may deviate drastically from that under open boundary conditions. Further studies on the non-Hermitian skin effects and the classification of non-Hermitian topological phases under open boundary conditions based on machine learning algorithms will be conducted. In addition, machine learning non-Hermitian topological invariants defined by the eigenstates would be an interesting further study.

Note added. Recently, we noticed two related works on machine learning non-Hermitian topological states Narayan and Narayan (2021); Yu and Deng (2020), which focused on the winding number of the Hamiltonian vectors and the cluster of non-Hermitian topological phases in an unsupervised fashion, respectively.

Refer to caption
Figure 7: (Color online) (a) CNN and FCNN training loss history in the Hatano-Nelson model. CNN training loss history in (b) the non-Hermitian SSH model; and (c) the non-Hermitian generalized AAH model.
Refer to caption
Figure 8: (Color online) (a) Phase diagram of the non-Hermitian SSH model for t[6,6]t\in[-6,6], δ[6,6]\delta\in[-6,6], and t=1t^{\prime}=1. (b) Data set distribution when T=0.2T=0.2; the amounts of the training data set, validation data set, and test data set are about 2.4×1042.4\times 10^{4}, 6×1036\times 10^{3}, and 6×1036\times 10^{3}, respectively.

Appendix A Training details

We first describe some training details for the Hatano-Nelson model. We use the deep learning framework PyTorch Paszke et al. (2019) to construct and train the neural network. Weights are randomly initialized to a normal distribution with the Xavier algorithm Glorot and Bengio (2010) and the biases are initialized to 0. We use the Adam optimizerKingma and Ba (2014) to minimize the output of the neural network w~\tilde{w} with the true value ww. We set the initial learning rate at 0.001 and use the ReduceLROnPlateau algorithm Paszke et al. (2019) to lower by 10 times when the improvement of the validation loss stops for 20 epochs. All hyper-parameters are set to default, unless mentioned otherwise. In order to prevent neural overfitting, L2L_{2} regularization with strength 10410^{-4} and early stop Yao et al. (2007) are used during the training. We use mini-batch training with the batch size 64 and a validation set to confirm that there is no overfitting during training. We take 4×1034\times 10^{3} configurations, which consist of 1:1:11:1:1 samples with winding numbers w=±{1,2,3}w=\pm\{1,2,3\}. The typical loss during a training instance of the CNN and FCNN is shown in Fig. 7 (a), from which one can see that there is no sign of overfitting.

We now provide some training details for the non-Hermitian SSH model. In this case, the CNN has two convolution layers with 32 kernels of size 1×2×21\times 2\times 2 and 1 kernel of size 32×1×132\times 1\times 1, followed by a fully connected layer of 16 neurons before the output layer. In this model, the output layer consists of three neurons for three different inter-band winding numbers. All the hidden layers have ReLU as activation functions and the output layer has the softmax function f(𝐱)i=exp𝐱i/j=1nexp𝐱jf(\mathbf{x})_{i}=\exp{\mathbf{x}_{i}}/\sum_{j=1}^{n}\exp{\mathbf{x}_{j}}. The exact topological phase diagram in the parameter space spanned by tt and δ\delta is shown in Fig. 8 (a). The training data set satisfying sTs\geq T with T=0.2T=0.2 here and the test data set satisfying s<Ts<T are randomly sampled from the parameter space. The data set distribution is shown in Fig. 8 (b). The numbers of configurations in the training data set, validation data set, and test data set are about 2.4×1042.4\times 10^{4}, 6×1036\times 10^{3}, and 6×1036\times 10^{3}, respectively. Typical loss during training instances of the CNN for different training data sets is plotted in Fig. 7 (b), which clearly shows that the neural networks converge quickly without overfitting.

Finally, we present briefly some details for the non-Hermitian generalized AAH model. In this case, the validation set consists of 8×1038\times 10^{3} configurations corresponding to non-reciprocal-hopping Hamiltonians (with h=0h=0) that are not included in the training data set. The typical loss is shown in Fig. 7 (c), with the networks converging quickly without overfitting.

Acknowledgements.
We thank Dan-Bo Zhang for helpful discussions. This work was supported by the National Natural Science Foundation of China (Grants No. U1830111, No. U1801661, and No. 12047522), the Key-Area Research and Development Program of Guangdong Province (Grant No. 2019B030330001), and the Science and Technology Program of Guangzhou (Grants No. 201804020055 and No. 2019050001).

References