This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quaternion-based machine learning on topological quantum systems

Min-Ruei Lin [email protected] Department of Physics, National Sun Yat-sen University, Kaohsiung 80424, Taiwan    Wan-Ju Li Department of Physics, National Sun Yat-sen University, Kaohsiung 80424, Taiwan    Shin-Ming Huang [email protected] Department of Physics, National Sun Yat-sen University, Kaohsiung 80424, Taiwan Center of Crystal Research, National Sun Yat-sen University, Kaohsiung 80424, Taiwan
Abstract

Topological phase classifications have been intensively studied via machine-learning techniques where different forms of the training data are proposed in order to maximize the information extracted from the systems of interests. Due to the complexity in quantum physics, advanced mathematical architecture should be considered in designing machines. In this work, we incorporate quaternion algebras into data analysis either in the frame of supervised and unsupervised learning to classify two-dimensional Chern insulators. For the unsupervised-learning aspect, we apply the principal component analysis (PCA) on the quaternion-transformed eigenstates to distinguish topological phases. For the supervised-learning aspect, we construct our machine by adding one quaternion convolutional layer on top of a conventional convolutional neural network. The machine takes quaternion-transformed configurations as inputs and successfully classify all distinct topological phases, even for those states that have different distributuions from those states seen by the machine during the training process. Our work demonstrates the power of quaternion algebras on extracting crucial features from the targeted data and the advantages of quaternion-based neural networks than conventional ones in the tasks of topological phase classifications.

I Introduction

The phase classification using machine-learning (ML) based techniques has been attracting intense attentions since the pioneering work in 2017 [1]. In addition to the classical phase detections [2, 3] where each phase is well defined by the corresponding order parameters, detecting topological phase transitions [2] is interesting and challenging [4] due to the lack of local order parameters. Recently, the phase detections and classifications have been performed via different ML techniques for classifying various topological invariants [5, 6, 7, 8, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 6, 18, 19, 10, 20, 21, 22, 23, 24, 16, 25, 21, 18, 26, 18, 17, 27, 28, 29, 30, 6, 10, 4, 31, 32, 33, 34, 35, 26, 36], including the Chern number [14, 15, 16, 17, 6, 18, 19, 10, 20, 21, 22, 23], winding number [24, 16, 25, 21, 18, 26, 18], 2\mathbb{Z}_{2} index [17, 27, 28, 29, 30, 6, 10, 4, 31, 32, 33, 34, 35, 26, 36], to name a few. In addition to the applied ML architectures, the forms of the inputs for training the machine also play a crucial role in determining the resulting performance of the topological phase detections [4].

For the topological systems with the Chern numbers or the winding numbers as the topological invariants, various types of inputs are used to perform the phase classifications. For instance, the quantum loop topography (QLT) is introduced to construct multi-dimensional images from raw Hamiltonians or wave functions as inputs [14, 17]. The Bloch Hamiltonians are arranged into an arrays to feed the neural networks [24, 16]. In addition, the real-space particle densities and local density of states [15] and the local projections of the density matrix [6] are also used as inputs. From cold-atom experiments, momentum-space density images were generated as inputs for classifications [20]. The time-of-flight images [10, 19], spatial correlation function [10], density–density correlation function [10] and the density profiles formed in quantum walks were also proposed as appropriate inputs [23]. Furthermore, the spin configurations [18] and the Bloch Hamiltonians over the Brillouin zone (BZ) have also been treated as inputs for the neural networks [21, 18]. For these forms of inputs mentioned above, various ML techniques with distinct real-valued neural networks have been applied to discriminate different topological phases.

As the development of artificial neural networks becomes mature, a raise of representation capability of machines is anticipated by generalizing real-valued neural networks to complex-valued ones [37, 38]. Specifically, a quaternion number, containing one real part and three imaginary parts, and the corresponding quaternion-based neural networks [39, 40, 41, 42] are expected to enhance the performance on processing of data with more degrees of freedom than the conventional real-number and complex-number systems. There have been various proposals about quaternion-based neural networks in ML techniques and applications in computer science, such as the quaternion convolutional neural network (qCNN) [43, 44, 38], quaternion recurrent neural network [45], quaternion generative adversarial networks [46], quaternion-valued variational autoencoder [47], quaternion graph neural networks [48], quaternion capsule networks [49] and quaternion neural networks for the speech recognitions [50]. However, the ML-related applications of the quaternion-based neural networks on solving problems in physics are still limited, especially in the topological phase detections, even though the quaternion-related concepts have been applied in some fields in physics [51, 52, 53].

Refer to caption
Figure 1: Examples of spin textures in the Brillouin zone, kx,ky(π,π]k_{x},k_{y}\in(-\pi,\pi], with the Chern number C=1C=1 (a), C=2C=2 (b), C=3C=3 (c) and C=4C=4 (d).

In this work, we perform the Chern-insulator classifications from both supervised- and unsupervised-learning aspects based on the inputs transformed via the quaternion algebra. For the unsupervised learning, we encode the quaternion-transformed eigenstates of Chern insulators via a convolution function as inputs and study them using the principal component analysis (PCA). We found that using only the first two principal elements is not enough to fully classify the Chern insulators, consistent with Ming’s work [23]. Further studies show that the performance can be improved by including more principal components. For the supervised learning, we construct a quaternion-based neural network in which the first layer is a quaternion convolutional layer. We then show that this quaternion-based machine has better performance than a conventional CNN machine. Our machine is good not only for testing datasets but also for identifying data points that have different distributions from those seen by our machine in the training processes. The good performance can be attributed to the similarities between the formula of the Berry curvatures and our quaternion-based setup. Therefore, our work demonstrates the power of the quaternion algebra on extracting relevant information from data, paving the way to applications of quaternion-based ML techniques in topological phase classifications.

The outline of the remaining part of this work is as follows. In Sec. II, we introduce the model Hamiltonian, generating the data for our classification tasks, and the quaternion convolution layer used in this work. PCA analysis of the quaternion-transformed eigenstates is discussed in Sec. III. The data preparations, the network structures and the performance of the quaternion-based supervised learning task are given in Sec. IV. Discussions and Conclusions are presented in Sec. V and Sec. VI, respectively. We have three appendixes. Appendix A shows the details of data preparation. Appendix B provides a brief introduction to the quaternion algebra. Some properties of functions in Sec. III are included in Appendix C.

II Model and quaternion convolutional layer

II.1 Model

A generic two-band Bloch Hamiltonian with the aid of the identity matrix σ0\sigma_{0} and Pauli matrices 𝝈=(σ1,σ2,σ3)\boldsymbol{\sigma}=(\sigma_{1},\sigma_{2},\sigma_{3}) is written as

(k)=h0(k)σ0+𝐡(k)𝝈,\mathcal{H}(\vec{k})=h_{0}(\vec{k})\sigma_{0}+\mathbf{h}(\vec{k})\cdot\boldsymbol{\sigma}, (1)

where k=(kx,ky)\vec{k}=(k_{x},k_{y}) is the crystal momentum in the 2D BZ (kx,ky(π,π]\forall{k_{x},k_{y}\in(-\pi,\pi]}). h0(k)h_{0}(\vec{k}) can change energy of the system but has nothing to do with topology, so it will be ignored in the remaining part of this paper. The vector 𝐡=(h1,h2,h3)\mathbf{h}=(h_{1},h_{2},h_{3}) acts as an kk-dependent external magnetic field to the spin σ\vec{\sigma}, so that the eigenstate of the upper (lower) band at each k\vec{k} will be the spin pointing antiparallel (parallel) to 𝐡(k)\mathbf{h}(\vec{k}). It will be reasonable that the unit vector 𝐧=𝐡/|𝐡|S2\mathbf{n}=\mathbf{h}/|\mathbf{h}|\in S^{2} embeds the topology in this system. Indeed, the topological invariant is the Chern number C,

C=14πBZ𝐧(kx𝐧×ky𝐧)𝑑k,C=\frac{1}{4\pi}\int_{\text{BZ}}\mathbf{n}\cdot(\partial_{k_{x}}\mathbf{n}\times\partial_{k_{y}}\mathbf{n})d\vec{k}, (2)

where the integrand is the Berry curvature and the integration is over the first BZ. For brevity, sometimes we will omit the argument k\vec{k} in functions. The Chern number is analogous to the skyrmion number in real space [54]. The integral is the total solid angle 𝐧(k)\mathbf{n}(\vec{k}) subtended in the BZ, so the Chern number counts how many times 𝐧(k)\mathbf{n}(\vec{k}) wraps a sphere.

Refer to caption
Figure 2: The Chern number with various mm and cc.

We construct the normalized spin configurations 𝐧(k)\mathbf{n}(\vec{k}) based on the following models. For topological systems, we choose the Hamiltonian with 𝐡=𝐡(c)\mathbf{h}=\mathbf{h}^{(c)}, where

𝐡(c)(k,m)=(Re[(sin(kx)isin(ky))c]Im[(sin(kx)isin(ky))c]cos(kx)+cos(ky)+m)\mathbf{h}^{(c)}(\vec{k},m)=\begin{pmatrix}\ \mathrm{Re}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ -\mathrm{Im}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ \cos{k_{x}}+\cos{k_{y}}+m\end{pmatrix} (3)

with positive integer cc and real parameter mm to control the Chern number. c is the vorticity for the number of times the inplane component (nxn_{x} and nyn_{y}) swirls around the origin. The sign of the c indicates a counter-clockwise or clockwise swirl. For a nontrivial topology, nzn_{z} has to change sign somewhere in the BZ for 𝐧(k)\mathbf{n}(\vec{k}) to wrap a complete sphere. Therefore, |m|<2|m|<2 is required. Some examples of spin texture 𝐧(k)\mathbf{n}(\vec{k}) based on Eq. (3) are shown in Fig. 1. For c=1c=1, the model is the Qi-Wu-Zhang (QWZ) model [55]. For a given cc, the Chern number CC can be either 0,c, orc0,~{}c,\text{ or}-c depending on the value of mm:

C={sgn(m)c,0<|m|<2,0,|m|>2.C=\begin{cases}\mathrm{sgn}(m)c,&0<|m|<2,\\ 0,&|m|>2.\end{cases} (4)

The topological phase diagram is shown in Fig. 2. C=0C=0 denotes a topologically trivial phase and C0C\neq 0 a nontrivial phase.

In this work, the unsupervised learning involves seven topological phases (C=0,±1,±2,±3C=0,\pm 1,\pm 2,\pm 3) in Sec. III, and the supervised learning involves nine topological phases (C=0,±1,±2,±3,±4C=0,\pm 1,\pm 2,\pm 3,\pm 4) in Sec. IV.

II.2 Quaternion convolutional layer

A quaternion number has four components, the first of which stands for the real part and the other three of which stand for the imaginary parts. Given two quaternions q1=(r1,a1,b1,c1)q_{1}=(r_{1},a_{1},b_{1},c_{1}) and q2=(r2,a2,b2,c2)q_{2}=(r_{2},a_{2},b_{2},c_{2}), their product Q=q1q2=(R,A,B,C)Q=q_{1}q_{2}=(R,A,B,C) is given by

(RABC)=(r1r2a1a2b1b2c1c2a1r2+r1a2c1b2+b1c2b1r2+c1a2+r1b2a1c2c1r2b1a2+a1b2+r1c2),\begin{pmatrix}R\\ A\\ B\\ C\end{pmatrix}=\begin{pmatrix}r_{1}r_{2}-a_{1}a_{2}-b_{1}b_{2}-c_{1}c_{2}\\ a_{1}r_{2}+r_{1}a_{2}-c_{1}b_{2}+b_{1}c_{2}\\ b_{1}r_{2}+c_{1}a_{2}+r_{1}b_{2}-a_{1}c_{2}\\ c_{1}r_{2}-b_{1}a_{2}+a_{1}b_{2}+r_{1}c_{2}\end{pmatrix}, (5)

which can be written as the matrix product form

(RABC)=(r1a1b1c1a1r1c1b1b1c1r1a1c1b1a1r1)(r2a2b2c2).\begin{pmatrix}R\\ A\\ B\\ C\end{pmatrix}=\begin{pmatrix}[r]r_{1}&-a_{1}&-b_{1}&-c_{1}\\ a_{1}&r_{1}&-c_{1}&b_{1}\\ b_{1}&c_{1}&r_{1}&-a_{1}\\ c_{1}&-b_{1}&a_{1}&r_{1}\end{pmatrix}\begin{pmatrix}r_{2}\\ a_{2}\\ b_{2}\\ c_{2}\end{pmatrix}. (6)

To implement a quaternion convolutional (q-Conv) layer in numerical programming, we will regard the two quaternions as a 4×44\times 4 matrix and a 4×14\times 1 column matrix, respectively:

q1(r1a1b1c1a1r1c1b1b1c1r1a1c1b1a1r1)andq2(r2a2b2c2).q_{1}\doteq\begin{pmatrix}[r]r_{1}&-a_{1}&-b_{1}&-c_{1}\\ a_{1}&r_{1}&-c_{1}&b_{1}\\ b_{1}&c_{1}&r_{1}&-a_{1}\\ c_{1}&-b_{1}&a_{1}&r_{1}\end{pmatrix}\quad\mathrm{and}\quad q_{2}\doteq\begin{pmatrix}[r]r_{2}\\ a_{2}\\ b_{2}\\ c_{2}\end{pmatrix}. (7)

More details of quaternion algebra are described in Appendix B.

A conventional CNN contains a real-valued convolutional layer to execute the convolution of the input and the kernel. Let the input FF have the shape: Hi×Wi×CiH_{i}\times W_{i}\times C_{i} (Height ×\times Width ×\times Channel) and the shape of the kernel KK be Hk×Wk×Ci×CfH_{k}\times W_{k}\times C_{i}\times C_{f}. The convolution will produce an output OO, O=FKO=F\ast K, whose elements are

Oi,j,t=iHkjWktCiFi+i1,j+j1,tKi,j,t,t.O_{i^{\prime},j^{\prime},t^{\prime}}=\sum_{i}^{H_{k}}\sum_{j}^{W_{k}}\sum_{t}^{C_{i}}F_{i^{\prime}+i-1,j^{\prime}+j-1,t}\cdot K_{i,j,t,t^{\prime}}. (8)

Here the stride is assumed to be 11 both in the width and the height directions. The indices ii and jj are spatial indicators, tt is the index of channel in the input feature map and tt^{\prime} is the kernel index. The shape of the output will be (HiHk)×(WiWk)×Cf(H_{i}-H_{k})\times(W_{i}-W_{k})\times C_{f}.

Assume that the input has four components. To uncover the entanglement among components through CNN, we will utilize the quaternion product. Now, we introduce another dimension–depth–which is four, as a quaternion number of four components. Both of the input FF and the kernel KK have depth of four as two quaternion numbers. The product of FF and KK will have depth of four as a quaternion in Eq. (5). Referring to Eq. (7) where we show a matrix representation to implement quaternion algebra and thinking of FF as q1q_{1} and KK as q2q_{2} in Eq. (7), we transform the depth-four input FF into a 4×44\times 4 matrix, F(s,l)F^{(s,l)}, and keep the kernel KK still of depth 4, K(l)K^{(l)}, where l,s=1,,4l,s=1,…,4. The product of FF and KK, say OO, will have depth four as shown in Eq. (9). Further considering the shapes of FF and KK, the convolution is given by

Oi,j,t(s)=l4i,j,tFi+i1,j+j1,t(s,l)Ki,j,t,t(l),O_{i^{\prime},j^{\prime},t^{\prime}}^{(s)}=\sum_{l}^{4}\sum_{i,j,t}F^{(s,l)}_{i^{\prime}+i-1,j^{\prime}+j-1,t}\cdot K^{(l)}_{i,j,t,t^{\prime}}, (9)

where the summations over ii,jj,kk are equivalent to those in Eq. (8) and the summation over ll is for the quaternion product.

Refer to caption
Figure 3: Illustration of a quaternion convolutional layer. On the left, we start with the input q1q_{1} having four quaternion components ((yellow, red, green, blue) stands for (r1r_{1}, a1a_{1}, b1b_{1}, c1c_{1})). In the middle, q1q_{1} is permuted to construct {F(,l)}l=14\{F^{(\cdot,l)}\}_{l=1}^{4} on which the convolution with four kernels {K(l)}l=14\{K^{(l)}\}_{l=1}^{4} is performed. A summation is taken for each depth to obtain the output feature map OO on the right.

More specifically, we consider an input data as q1q_{1} (four color squares on the left of Fig. 3) and four kernels encoded in q2q_{2}, given in the following

{q1(r1a1b1c1)Tq2(r2a2b2c2)T=:K().\left\{\begin{matrix}q_{1}\doteq&(r_{1}~{}a_{1}~{}b_{1}~{}c_{1})^{T}&\\ q_{2}\doteq&(r_{2}~{}a_{2}~{}b_{2}~{}c_{2})^{T}&=:K^{(\cdot)}.\end{matrix}\right. (10)

The output feature maps O(RABC)TO\doteq(R~{}A~{}B~{}C)^{T} is then calculated based on Eq. (5). As the first step, we permute the order of q1q_{1} to obtain

F(,1)=:(r1a1b1c1),F(,2)=:(a1r1c1b1),\displaystyle F^{(\cdot,1)}=:\begin{pmatrix}r_{1}\\ a_{1}\\ b_{1}\\ c_{1}\end{pmatrix},F^{(\cdot,2)}=:\begin{pmatrix}[r]-a_{1}\\ r_{1}\\ c_{1}\\ -b_{1}\end{pmatrix}, (11)
F(,3)=:(b1c1r1a1),F(,4)=:(c1b1a1r1)\displaystyle F^{(\cdot,3)}=:\begin{pmatrix}[r]-b_{1}\\ -c_{1}\\ r_{1}\\ a_{1}\end{pmatrix},F^{(\cdot,4)}=:\begin{pmatrix}[r]-c_{1}\\ b_{1}\\ -a_{1}\\ r_{1}\end{pmatrix}

(see the four sets of sqaures in the middle of Fig. 3). We then convolute those four quaternions (F(,l)F^{(\cdot,l)} with l=1,2,3l=1,2,3 and 4) with four kernels (K(l)K^{(l)} with l=1,2,3l=1,2,3 and 4) in the following way:

{F(,1)K(1)(r1r2a1r2b1r2c1r2)TF(,2)K(2)(a1a2r1a2c1a2b1a2)TF(,3)K(3)(b1b2c1b2r1b2a1b2)TF(,4)K(4)(c1c2b1c2a1c2r1c2)T\left\{\begin{matrix}F^{(\cdot,1)}K^{(1)}&\doteq&\begin{pmatrix}r_{1}r_{2}&a_{1}r_{2}&b_{1}r_{2}&c_{1}r_{2}\end{pmatrix}^{T}\\ F^{(\cdot,2)}K^{(2)}&\doteq&\begin{pmatrix}-a_{1}a_{2}&r_{1}a_{2}&c_{1}a_{2}&-b_{1}a_{2}\end{pmatrix}^{T}\\ F^{(\cdot,3)}K^{(3)}&\doteq&\begin{pmatrix}-b_{1}b_{2}&-c_{1}b_{2}&r_{1}b_{2}&a_{1}b_{2}\end{pmatrix}^{T}\\ F^{(\cdot,4)}K^{(4)}&\doteq&\begin{pmatrix}-c_{1}c_{2}&b_{1}c_{2}&-a_{1}c_{2}&r_{1}c_{2}\end{pmatrix}^{T}\end{matrix}\right.

as shown in the middle of Fig. 3. Finally, we sum over the above four quaternions to get the output feature maps OO, as shown on the right of Fig. 3.

O:=(RABC)=(r1r2a1a2b1b2c1c2a1r2+r1a2c1b2+b1c2b1r2+c1a2+r1b2a1c2c1r2b1a2+a1b2+r1c2).O:=\begin{pmatrix}R\\ A\\ B\\ C\end{pmatrix}=\begin{pmatrix}r_{1}r_{2}-a_{1}a_{2}-b_{1}b_{2}-c_{1}c_{2}\\ a_{1}r_{2}+r_{1}a_{2}-c_{1}b_{2}+b_{1}c_{2}\\ b_{1}r_{2}+c_{1}a_{2}+r_{1}b_{2}-a_{1}c_{2}\\ c_{1}r_{2}-b_{1}a_{2}+a_{1}b_{2}+r_{1}c_{2}\end{pmatrix}.

III principal component analysis

Principal component analysis (PCA) is a linear manifold learning that is to find the relevant basis set among data [56, 57].

We prepare eigenstates |u±\ket{u_{\pm}} of Eq. (1), where +()+\ (-) stands for the upper (lower) band. For a topologically nontrivial state, the phase cannot be continuous over the whole BZ. Therefore, we can divide the whole BZ into two parts, in each part of them the topological wave function has continuously well-defined phase. We then pick up a gauge by choosing two regions according to the sign of h3h_{3} in Eq. (3):

|u+12h+(h++h3)(h++h3h1+ih2)|u12h(h+h3)(h1+ih2h+h3),h30,\displaystyle\begin{split}\ket{u_{+}}&\doteq\frac{1}{\sqrt{2h_{+}(h_{+}+h_{3})}}\matrixquantity(h_{+}+h_{3}\\ h_{1}+ih_{2})\\ \ket{u_{-}}&\doteq\frac{1}{\sqrt{2h_{-}(h_{-}+h_{3})}}\matrixquantity(-h_{1}+ih_{2}\\ h_{-}+h_{3})\end{split}\quad,~{}h_{3}\geq 0, (12)

and

|u+12h+(h+h3)(h1ih2h+h3)|u12h(hh3)(hh3h1ih2),h3<0,\displaystyle\begin{split}\ket{u_{+}}&\doteq\frac{1}{\sqrt{2h_{+}(h_{+}-h_{3})}}\matrixquantity(h_{1}-ih_{2}\\ h_{+}-h_{3})\\ \ket{u_{-}}&\doteq\frac{1}{\sqrt{2h_{-}(h_{-}-h_{3})}}\matrixquantity(h_{-}-h_{3}\\ -h_{1}-ih_{2})\end{split}\quad,~{}h_{3}<0, (13)

where h±=±h12+h22+h32h_{\pm}=\pm\sqrt{h_{1}^{2}+h_{2}^{2}+h_{3}^{2}}. In this choice of gauge, the first (second) component of |u+\ket{u_{+}} (|u)(\ket{u_{-}}) is real-valued when h30h_{3}\geq 0, and the second (first) component of |u+\ket{u_{+}} (|u)(\ket{u_{-}}) is real-valued when h3<0h_{3}<0.

By translating |u±(α±,β±)T\ket{u_{\pm}}\doteq(\alpha_{\pm},\beta_{\pm})^{T} with α±,β±\alpha_{\pm},\beta_{\pm}\in\mathbb{C}, into a quaternion number of four components, we have

q±:=Re(α±)+Im(α±)𝐢^+Re(β±)𝐣^+Im(β±)𝐤^.q_{\pm}:=\mathrm{Re}(\alpha_{\pm})+\mathrm{Im}(\alpha_{\pm})\hat{\mathbf{i}}+\mathrm{Re}(\beta_{\pm})\hat{\mathbf{j}}+\mathrm{Im}(\beta_{\pm})\hat{\mathbf{k}}. (14)

To see the correlation of states over k\vec{k}, we define the quantity FF to be the quaternion-based convolutions:

F(p):=(q+q+)[p](qq)[p]with(q±q±)[p]:=kBZq±(k)q±(pk),\begin{split}F(\vec{p}):=(q^{*}_{+}&\circledast q_{+})[\vec{p}]-(q^{*}_{-}\circledast q_{-})[\vec{p}]\quad\mathrm{with}\\ (q_{\pm}^{*}&\circledast q_{\pm})[\vec{p}]:=\sum_{\vec{k}\in\mathrm{BZ}}q_{\pm}^{*}(\vec{k})q_{\pm}(\vec{p}-\vec{k}),\end{split} (15)

where qq^{*} is the conjugate of qq. It can be proved that FF is real-valued. Therefore, F(p)F(\vec{p}) of all p\vec{p} in the BZ based on a given Hamiltonian can be analysed by using PCA.

Refer to caption
Figure 4: The maps of the function FF without noise in the BZ. Three rows are for c=1,2and 3c=1,2\ \mathrm{and}\ 3 in Eq. 3 from top to bottom; four columns from left to right are for m=3,1,1m=-3,-1,1 and 3. The corresponding Chern number CC is tagged with each panel.

We collected various FF of all k\vec{k}\inBZ within seven topological phases as the dataset for PCA. For each topological phase, 30 FF’s were prepared, so the total amount of data was 210. The data for six non-trivial phases were generated based on Eq. (3) with m=±1m=\pm 1 (the sign of mm determines the sign of CC). For the trivial phase, we prepared five data points from each of six combinations of {c,m}\{c,m\}, where c{1,2,3}c\in\{1,2,3\} and m{3,3}m\in\{3,-3\}, and then there are totally 30 data. To augment the number of data, we add Gaussian noises δ𝐡\delta\mathbf{h} at every k\vec{k} of the model [Eq. (3)] such that 𝐡𝐡+δ𝐡\mathbf{h}\to\mathbf{h}+\delta\mathbf{h} without closing the band gap.

In Fig. 4, we present various noiseless FF generated from Eq. (3) with different cc and mm.

Refer to caption
Figure 5: PCA of seven topological phases with various noise. The symbols with corresponding Chren numbers are marked in the legend.

It is notable that FF for C=0C=0 are featureless, FF for C=±1C=\pm 1 have a dipole moment, and FF for C=±2C=\pm 2 have a quadruple moment, and FF for C=±3C=\pm 3 seemingly have a primary dipole and a secondary quadruple moment. The remarkable features imply that the convolution function FF is a good choice for topological classifications.

We examine data with the standard deviation (SD) equal to 0, 0.1, 0.2 and 0.3 respectively, and show the first two PCs of 210 pieces of data for each SD in Fig. 5. In Fig. 5, data are clustered into four groups and their variances increase with SD. It is successful to separate different topological phases into different clusters via PCA. However, some clusters contain two topological phases of Chern numbers: {+1,3}\{+1,-3\}, {1,+3}\{-1,+3\}, and {+2,2}\{+2,-2\}. This CC modulo 4 resemblance has also be observed in a previous study [23].

Refer to caption
Figure 6: Magnitude of projection (logarithmic scale) from non-trivial data onto first six principal components. Inset: The first 16 principal values of PCA. (normalized by maximal λ1\lambda_{1})

We find that including more PCs helps separate different classes in each cluster. Figure 6 shows the first six PCs of data in topologically non-trivial phases, where PCxx denotes the xx-th PC component. One can find that PC1 and PC2 in each pair of {+1,3}\{+1,-3\}, {1,+3}\{-1,+3\}, and {+2,2}\{+2,-2\} are nearly identical, as also shown in Fig. 5. By incorporating more PCs up to PC6, all topological classes are completely classified. Via the proposed convolution, topological states can be successfully classified by using PCA, a linear classification machine.

Compared to the eigenstates, the spin configurations 𝐧(k)\mathbf{n(}\vec{k}) are gauge-invariant. Therefore, it is desired to classify the topology of the spin configurations via PCA. Unfortunately, the performance was not good, which will be discussed later. In order to directly classify the spin configurations, in the following, we train a qCNN machine via the supervised learning algorithm to discriminate spin configurations with different topological phases.

Refer to caption
Figure 7: Framework of (a) the CNN and (b) the qCNN classifier. In CNN, the shapes of data are labelled as (Width, Height, Channel) and the spin components are arranged in different channels in the first layer, while in qCNN the shapes of data are labelled as (Depth, Width, Height, Channel) but spin components are put along depth. The number appended after the symbol “@” stands for the number of kernels, which size is present overhead. The number of tuneable network parameters of CNN (qCNN) is 24,252 (19,350).

IV Supervised learning of CNN and the qCNN

IV.1 Datasets

The input data are normalized spin configurations 𝐧\mathbf{n}, laying on a 40×4040\times 40 square lattice with periodic boundary conditions, and their corresponding topological phases are labels with one-hot encoding. We prepared four datasets: training, validation, testing and prediction dataset (more details are described in Appendix A).

The first three datasets are well known in conventional deep learning procedure [58]. The data in the training, validation and testing datasets will be constructed by the same models so that they have the same data distributions even though they are all different data points. Therefore, we denote these three datasets as in-distribution datasets. The data in the prediction dataset, however, are constructed by similar but different models from those for the in-distribution datasets. Therefore, the data in the prediction dataset are not only unseen by the machine during the training process, but also of different distributions. We denote the prediction dataset as a out-of-distribution dataset, which is used to understand whether our machine can also classify spin configurations constructed by other similar but different topological models.

The data pool containing training and validation datasets is constructed as follows. Based on the Eq. (3), we firstly prepared 5760 data points of 𝐧\mathbf{n} of nine topological phases with Chern number ranging from -4 to 4 and each phase contains 640 data points. Besides of 5760 spin configurations, the dataset contains 360 two-dimensional spin vortices. A spin-vortex has an in-plane spin texture that winds around a center, which is generated by setting one of three components in Eq. (3) to be zero. By including spin vortices, the machine can tell the difference between 3D winding (non-trivial) and 2D winding (trivial) spin configurations . After the training process, the trained machine is scored by a testing dataset with the same composition of nine phases as that in the training (and validation) dataset. Importantly, without changing the topologies, the Gaussian distributed random transition and random rotation imposed on these three datasets can increase the diversity of dataset and enhance the ability of generalization of the trained machine.

The prediction dataset contains six categories of spin configurations. The first category is generated with mm uniformly distributed from +3+3 to 3-3. In the second and the third categories, we change the sign of nzn_{z} (the second category) and swapping nyn_{y} and nzn_{z} of 𝐧\mathbf{n} (the third category). Finally, we consider three categories for trivial states, which are ferromagnetic (FM), conical, and helical states. FM can be viewed as 1D uncompleted winding configuration while conical and helical can be viewed as 2D uncompleted ones. In total, we prepared six categories for the prediction dataset. More details about data preparations will be described in Appendix A.

For the conventional CNN, we use 𝐧\mathbf{n} as the input data. For the qCNN, in order to feed the input data into the qCNN classifier, we transform the 3D spin vector into an unit pure quaternion,

(nx,ny,nz)3(0,nx,ny,nz),(n_{x},n_{y},n_{z})\in\mathbb{R}^{3}\mapsto(0,n_{x},n_{y},n_{z})\in\mathbb{H}, (16)

where the scalar part (the first component) is zero and the vector part is 𝐧\mathbf{n}. Therefore, the inputs of qCNN are effectively equivalent to those of CNN.

IV.2 network structure and performance

The schematic architectures of these two classifiers are shown in Fig. 7, where the last black arrows point to nine neurons for nine topological phases. In the qCNN classifier, we implement a quaternion convolution (q-Conv) layer as the first layer [red dotted cuboid in Fig. 7(b)], and the operations in a q-Conv layer are based on the quaternion algebra to hybridize spin components. Then the next three layers are typical 3D convolutions (Conv3Ds). Our Conv3Ds do not mix depths by choosing proper sizes of kernels. Followed the Conv3D layers is a 2D convolution (Conv2D) layer to mix data in depth: nine kernels of kernel size 4×1 will transform data from 4×94\times 9 to 1×91\times 9. On the contrary, the CNN classifer has only Conv2D layers. Although the qCNN is more complex than the CNN, the total network parameters of the qCNN is however less than the CNN. This is one advantage of the qCNN over the conventional CNN.

In order for classifiers to satisfy some physically reasonable conditions, two special designs are implemented. Firstly, we extend the kk points out of the BZ by padding the input data according to the periodic boundary conditions [59].

Refer to caption
Figure 8: Schematic of “overlap” convolution (red solid) and “non-overlap” convolution (blue solid) from a 3×33\times 3 kernel (black dotted) over data. The blue solid square is a signal movement from the kernel, and the size of stride is the same as the length of kernel, thus each movement of this kernel is “non-overlap.”

Secondly, the first layer takes “overlapping” strides with an arctan activation function, and the latter layers take “non-overlapping” strides with the tanh activation function for both qCNN and CNN machines. Figure 8 illustrates how the “overlapping” and “non-overlapping” feature mapping can be manipulated by varying the size of stride.

Refer to caption
Figure 9: Learning curves of the qCNN and CNN classifiers. By applied Dropout, the validation is greater than the training.

Then, both qCNN and CNN machines are trained. The learning curves of both machines are shown in Fig. 9. The CNN machine (orange and light orange lines) jumps over a big learning barrier at around the 700th700^{th} epoch. After that, the training and the validation accuracy (orange and light orange line respectively) are separated and do not converge up to end of this training process. Even though the same training (and validation) dataset is used in the training process, the learning curves of the qCNN machine (blue and light blue lines) are qualitatively different. The training and the validation accuracy are separated around 90th90^{th} epoch, but the difference between these two accuracies decreases with increasing epochs. After the training procedure finished, the qCNN (CNN) machine gets 99.67% (94.12%) testing accuracy. This difference in accuracy results from the spin-vortex dataset, where the qCNN works well but CNN dose not.

The trained machines are ready to do prediction, and the result is shown in Fig. 10.

Refer to caption
Figure 10: The performance of the qCNN (blue line) and CNN (red dashed line) on the prediction datasets. Numbers tagged are the values of the accuracies. Standard deviations (by error bars) are also provided. The qCNN outperforms the conventional CNN on all prediction datasets, especially on three spin-vortex ones

In Fig. 10, since the first category contains 𝐧\mathbf{n} of uniformly distributed mm, where a few data points are very close to the phase boundaries m{0,±2}m\approx\{0,\pm 2\}, the accurate rate of the the qCNN is slightly low at 96%96\%. For the second and third categories, we choose m=±1m=\pm 1, away from the phase transition points, and the performance is nearly perfect. For the uncompleted winding configurations, the qCNN, different from the conventional CNN, can accurately classifies FM, helical and conical states after learning the spin-vortex states. This is the main advantage of the qCNN over the conventional CNN, which is expected to result from the quaternion algebra.

The processing times of two classifiers are summarized in Table 1. Since the q-Conv layer has massive matrix multiplication, the time of one epoch for qCNN is longer than that of convention CNN in our task, especially utilized by CPU.

Processing time (sec)
Architecture CPU GPU
CNN 6.115 1.011
qCNN 72.2 3.108
Table 1: Comparison between CPU and GPU processing time (seconds per epoch). Hardware facilities are as follows: NVIDIA GeForce RTXTM 2080Ti GPU, Intel Xeon® E5-2650v4 CPU (2.20 GHz Core 12), and 32GB DDR4 SDRAM.

V discussions

In this work, we apply the quaternion multiplication laws to both PCA (unsupervised learning) and qCNN (supervised learning). The two methods take different inputs; the former one takes scalar function F(p)F(p), which is something to do with a convolution of the wave function, and the second one takes the pure quaternion function (0,𝐧(k))(0,\mathbf{n}(\vec{k})), where real part is zero and the imaginary part is the spin vector. We will explain physical intuitions and comment the mechanisms in this section.

On PCA, we did not take 𝐧\mathbf{n} simply as the input because the representation of the vector 𝐧\mathbf{n} depends on coordinates but the topology is not. We believed that the topology as a global geometry should be embedded in the correlations. The correlation of dot products of 𝐧\mathbf{n} turned out to fail since relative angles of two spins were not informative to understand the swirling of 𝐧\mathbf{n} on S2S^{2}. If one tries the quaternion in the convolution Eq. (15) by q=(0,nx,ny,nz)q=(0,n_{x},n_{y},n_{z}), the result is still inappropriate for the convolution is independent of the sign of mm to discriminate topological states (see Appendix C). Eventually, we found that the F(p)F(p) defined in Eq. (15) was a proper quantity to characterize topology after PCA. The FF has the property that it is featured (featureless) when the wave function is unable (able) to be globally continuous that happens in the nontrivial (trivial) phases. Unfortunately, the F(p)F(p) is not gauge invariant. The results were based on the choice of gauge in Eqs. (12) and (13) that made the wave function continuous locally and discontinuous at kk where nz(k)=0n_{z}(\vec{k})=0. We had examined other choices of gauge and found that the present gauge exhibited the PCA features most clearly (results not shown). We remark that our PCA results looked good because the inputs were ingeniously designed and the PCA method might not be more practical than the qCNN method.

Refer to caption
Figure 11: (a) Three nearest neighborhood spin vectors will contribute a solid angle, and (b) four nearest neighbors are enclosed by the kernels in the first convolutional layer.

For qCNN, it is interesting to understand the mechanism behind. There are several possible factors promoting the performance of our supervised learning machine. The first one is that the size of kernel in the first convolutional layer is 2 × 2 with stride = 1, which means the machine can collect spin information among four nearest neighbors [see Fig. 11(b)]. We know that the Chern number is the integral of the Berry curvature in the BZ, and the Berry curvature is twice of the solid angle. A solid angle Ω\Omega subtended by three unit vectors a\vec{a}, b\vec{b}, and c\vec{c} is obtained by

tan(Ω2)=|a(b×c)|1+ab+bc+ca.\tan{\frac{\Omega}{2}}=\frac{\absolutevalue{\vec{a}\cdot(\vec{b}\times\vec{c})}}{1+\vec{a}\cdot\vec{b}+\vec{b}\cdot\vec{c}+\vec{c}\cdot\vec{a}}. (17)

Our choice of the size of the kernel in the first hidden layer is the minimal of 2×22\times 2 that mixes only the nearest-neighboring spins. In this way, it is very possible to enforce the machine to notice the solid angle extended in this plaquette. The second factor is the quaternion product. Recall that the conventional CNN might correlate spins 𝐧s\mathbf{n}^{\prime}s in neighboring ks\vec{k}^{\prime}s due to the feature map through the kernel. However, the map does not mix the components of spins. In comparison, the qCNN is more efficient for it directly entangle spins via the quaternion product. It is this entanglement of spin components by the quaternion product that makes the scalar and vector products in calculating the solid angle (see Eq.  (17)) become possible to be realized by the machine. As a solid angle involves at least three spins and the feature map by the kernel is just linear, a nonlinear transformation is crucial to create high-order (three spins) terms in the expansion. This is possible and proved in Ref. [60] that multiplication of variables can be accurately realized by simple neural nets with some smooth nonlinear activation function. Therefore, the third factor is the non-linear activation function, arctan in this work. We expect that using arctan as the activation function can further help the machine to learn correct representations because the calculation of a solid angle involves the arctan operation in Eq. (17). This belief is indeed supported by the results shown in Fig. 12, where the arctan activation function outperforms the ReLU and tanh activation functions over nine different datasets. In summary, several factors are combined to enhance the performance of our machine as follows. The quaternion-based operations in the q-Conv layer mix not only spins with their neighbors but also components of spins. When these linear combinations are fed into the non-linear activation functions in our qCNN, the output can be viewed as an expansion of a non-linear function, which may contain a term having both the scalar- and vector-product of neighboring spins, similar to that in Eq.  (17). Therefore, after the optimization process, the machine may keep increasing the weight of a solid-angle-related term and eventually learn to classify the topological phases.

Also, adding some noises to the training dataset helped our supervised-learning machine to learn the generic feature of our data. We found that when the training data was generated directly from Eq. (3) without adding any noise, the machine worked well for training and testing datasets but had poor performance on all the prediction dataset. This could be understood by noting that the topological invariant is determined by the sign of mm, which appears in the zz component in Eq. (3). By using the dataset without noise, the machine might naively regard the zz component as the judgment of topology when the training data does not contain wide distribution. We note that the topology is invariant when the spin texture is uniformly translated or rotated. So we trained our machine with randomly translated and rotated data to avoid incorrect learning. (See data preparation in Appendix A.) From our observations, the performance on the prediction dataset was remarkably enhanced when the noise was included, which supports our idea.

Refer to caption
Figure 12: Comparison between three activation functions applied in the first layer of the qCNN classifier.

VI Conclusions

In summary, we classify topological phases with distinct Chern numbers via two types of machine-learning techniques. For the unsupervised part, we propose a quaternion-based convolution to transform the topological states into the input data. With this convolution, distinct topological states are successfully classified by PCA, a linear machine for classification.

We then go to the supervised learning part where, in contrast to the conventional CNN, we successfully use the qCNN to classify different topological phases. This work demonstrates the power of quaternion-based algorithm, especially for the topological systems with the Chern number as the topological invariants.

Acknowledgements.
This study is supported by the Ministry of Science and Technology (MoST) in Taiwan under grant No. 108-2112-M-110-013-MY3. M.R.L. and W.J.L. contributed equally to this work.

Appendix A Data preparation

Training dataset— The normalized spin configurations 𝐧(c,m)(k),kBZ\mathbf{n}^{(c,m)}(\vec{k}),\forall\vec{k}\in\mathrm{BZ} are based on the formula [refer to Eq. (3)]

𝐧(c,m)(k):=𝐡(c,m)(k)/𝐡(c,m)(k),where𝐡(c,m)(k)=(Re[(sin(kx)isin(ky))c]Im[(sin(kx)isin(ky))c]cos(kx)+cos(ky)+m)\begin{split}\mathbf{n}^{(c,m)}&(\vec{k}):=\mathbf{h}^{(c,m)}(\vec{k})/\norm{\mathbf{h}^{(c,m)}(\vec{k})},\quad\textrm{where}\\ &\mathbf{h}^{(c,m)}(\vec{k})=\begin{pmatrix}\ \mathrm{Re}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ -\mathrm{Im}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ \cos{k_{x}}+\cos{k_{y}}+m\end{pmatrix}\end{split}

in a 40×4040\times 40 square lattice with periodic boundary conditions. For each c=1,2,3c=1,2,3 and 44, we generated four sets S1(c),S2(c)S_{1}^{(c)},S_{2}^{(c)}, S3(c)S_{3}^{(c)} and S4(c)S_{4}^{(c)}. The former two sets are topologically nontrivial and each has 640 configurations for different values of mm:

S1(c)={𝐧(c,m)(k):m[1.9,0.1],kBZ},S2(c)={𝐧(c,m)(k):m[0.1,1.9],kBZ},\begin{matrix}S_{1}^{(c)}=&\big{\{}\mathbf{n}^{(c,m)}(\vec{k}):m\in[-1.9,-0.1],~{}\vec{k}\in\mathrm{BZ}\big{\}},\\ S_{2}^{(c)}=&\big{\{}\mathbf{n}^{(c,m)}(\vec{k}):m\in[0.1,1.9],~{}\vec{k}\in\mathrm{BZ}\big{\}},\end{matrix}

where mm are random numbers in the corresponding ranges. The latter two sets are topologically trivial and each has 80 (identical) configurations:

S3(c)={𝐧(c,m)(k):m=3,kBZ},S4(c)={𝐧(c,m)(k):m=3,kBZ}.\begin{matrix}S_{3}^{(c)}=&\big{\{}\mathbf{n}^{(c,m)}(\vec{k}):m=-3,~{}\vec{k}\in\mathrm{BZ}\big{\}},\\ S_{4}^{(c)}=&\big{\{}\mathbf{n}^{(c,m)}(\vec{k}):m=3,~{}\vec{k}\in\mathrm{BZ}\big{\}}.\end{matrix}

So, for each cc there were 1280 nontrivial spin configurations and 160 trivial ones. Then the primitive data passed through some manipulations as the effect of data augmentation without changing the topologies. Each spin configuration 𝐧(k)\mathbf{n}(\vec{k}) was translated (𝒯\mathcal{T}), rotated (\mathcal{R}), and then polluted with noise (𝒢\mathcal{G}):

𝐧(k)𝒯𝐧(k+p0)𝐧(k+p0)𝒢𝐧(k+p0)+Δ𝐧(k),\mathbf{n}(\vec{k})\xrightarrow{\mathcal{T}}\mathbf{n}(\vec{k}+\vec{p}_{0})\xrightarrow{\mathcal{R}}\mathbf{n}^{\prime}(\vec{k}+\vec{p}_{0})\xrightarrow{\mathcal{G}}\mathbf{n}^{\prime}(\vec{k}+\vec{p}_{0})+\Delta\mathbf{n}^{\prime}(\vec{k}), (18)

where p0\vec{p}_{0} is a random displacement in k\vec{k}, \mathcal{R} stands for a random 3D rotation of the spin, and Δ𝐧(k)\Delta\mathbf{n}^{\prime}(\vec{k}) is Gaussian noise (𝒢\mathcal{G}) with standard deviation 0.1π0.1\pi in each component. (The spin should be normalized lastly.) 𝒯\mathcal{T} and \mathcal{R} are homogeneous transformations in k\vec{k}, but 𝒢\mathcal{G}, inhomogeneous, picks only 30 out of 1600 k\vec{k} sites.

In addition to the 5760 sets of data in nine topological phases (C=4C=-4 to C=+4C=+4), we also include 360 spin vortex states, which are C=0C=0 states, based on the formulas:

𝐡(c,m)yz(k)=(0Im[(sin(kx)isin(ky))c]cos(kx)+cos(ky)+m){}^{yz}\mathbf{h}^{(c,m)}(\vec{k})=\begin{pmatrix}0\\ -\mathrm{Im}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ \cos{k_{x}}+\cos{k_{y}}+m\end{pmatrix} (19)
𝐡(c,m)xz(k)=(Re[(sin(kx)isin(ky))c]0cos(kx)+cos(ky)+m){}^{xz}\mathbf{h}^{(c,m)}(\vec{k})=\begin{pmatrix}\ \mathrm{Re}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ 0\\ \cos{k_{x}}+\cos{k_{y}}+m\end{pmatrix} (20)
𝐡(c)xy(k)=(Re[(sin(kx)isin(ky))c]Im[(sin(kx)isin(ky))c]0){}^{xy}\mathbf{h}^{(c)}(\vec{k})=\begin{pmatrix}\ \mathrm{Re}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ -\mathrm{Im}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ 0\end{pmatrix} (21)

with their normalized configurations. For each cc, 30 spin configurations were generated with random mm ranging from -3 to 3. The data also went through translation 𝒯\mathcal{T} and rotation \mathcal{R} but no noise 𝒢\mathcal{G}.

Therefore, we generated 6120 spin configurations totally as the training dataset. Among the training dataset, 25% of the data are assigned as the validation dataset (light color lines in Fig. 9).

Testing dataset— In addition to the training and validation dataset, we prepare extra 1224 spin configurations as the testing dataset, with the same composition as the training and validation datasets. This dataset is prepared for scoring the trained classifiers.

Refer to caption
Figure 13: Some examples in the prediction dataset for states: (a) nynzn_{y}\leftrightarrow n_{z}, (b) helical, (c) conical, and (d) FM.

Prediction dataset— The prediction dataset is an extra dataset, different from the aforementioned three datasets. It consists of six categories, each of which was not seen by the machine during the training process. This dataset was processed by 𝒯\mathcal{T} and \mathcal{R} but not 𝒢\mathcal{G}. The six categories were constructed as following. The First category, the “chern” category, is a set SS which was generated from Eq. (3) with 30 mm’s uniformly ranging from -3 to 3 for each cc:

S={𝐧(c):c=1,2,3,4,m=3+6i29,i=0,,29}S=\big{\{}\mathbf{n}^{(c)}:c=1,2,3,4,~{}m=-3+\frac{6i}{29},~{}i=0,...,29\big{\}}

As a reminder, this category is different from the training dataset. The training data includes the specific m=±3m=\pm 3 at trivial phase, and two intervals {[1.9,0.1]\{\left[-1.9,-0.1\right] and [0.1,1.9]}\left[0.1,1.9\right]\} in nontrivial phases. Therefore, 20% of this category is close to the phase transition m{0,+2,2}m\approx\{0,+2,-2\}.

The next two categories were generated based on Eq. (3) with m=±1m=\pm 1. The first one was constructed by changing the sign of the zz-component:

𝐡(k)=(Re[(sin(kx)isin(ky))c]Im[(sin(kx)isin(ky))c]cos(kx)+cos(ky)+m)(Re[(sin(kx)isin(ky))c]Im[(sin(kx)isin(ky))c]cos(kx)cos(ky)m).\begin{split}\mathbf{h}(\vec{k})=\begin{pmatrix}\ \mathrm{Re}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ -\mathrm{Im}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ \cos{k_{x}}+\cos{k_{y}}+m\end{pmatrix}\\ \to\begin{pmatrix}\ \mathrm{Re}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ -\mathrm{Im}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ -\cos{k_{x}}-\cos{k_{y}}-m\end{pmatrix}.\end{split}

The second one was constructed by swapping the yy and the zz components:

𝐡(k)=(Re[(sin(kx)isin(ky))c]Im[(sin(kx)isin(ky))c]cos(kx)+cos(ky)+m)(Re[(sin(kx)isin(ky))c]cos(kx)+cos(ky)+mIm[(sin(kx)isin(ky))c]).\begin{split}\mathbf{h}(\vec{k})=\begin{pmatrix}\ \mathrm{Re}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ -\mathrm{Im}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ \cos{k_{x}}+\cos{k_{y}}+m\end{pmatrix}\\ \to\begin{pmatrix}\ \mathrm{Re}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\\ \cos{k_{x}}+\cos{k_{y}}+m\\ -\mathrm{Im}\big{[}(\sin{k_{x}}-i\sin{k_{y}})^{c}\big{]}\end{pmatrix}.\end{split}

The next two categories, called helical and conical spin configurations, were generated based on the following equation

𝐧helical/conical(k)=(1ϵ2cos((kx+ky))1ϵ2sin((kx+ky))ϵ).\mathbf{n}_{\mathrm{helical/conical}}(\vec{k})=\begin{pmatrix}\sqrt{1-\epsilon^{2}}\cos{(k_{x}+k_{y})}\\ \sqrt{1-\epsilon^{2}}\sin{(k_{x}+k_{y})}\\ \epsilon\end{pmatrix}.

Here ϵ=0\epsilon=0 is for the helical state and 0<|ϵ|<10<|\epsilon|<1 is for a conical state. The last category contains the ferromagnetic spin configurations (FM) whose zz-component are a constant and xx- and yy-component are zero. Some spin configurations in the prediction dataset are illustrated in Fig. 13.

Appendix B Quaternion

The quaternion number system were introduced by Irish mathematician William Rowan Hamilton in 1843 as an extension of the complex numbers. A quaternion number qq is composed of four real numbers r,a,br,~{}a,~{}b and cc to be

q=r+a𝐢^+b𝐣^+c𝐤^,q=r+a\hat{\mathbf{i}}+b\hat{\mathbf{j}}+c\hat{\mathbf{k}}, (22)

where {𝟏,𝐢^,𝐣^,𝐤^}\{\boldsymbol{1},~{}\hat{\mathbf{i}},~{}\hat{\mathbf{j}},~{}\hat{\mathbf{k}}\} is the basis. Sometimes it is written as q=(r,v)q=(r,\ \vec{v}) or q=(r,a,b,c)q=(r,a,b,c) in short. Here rr is called the scalar (or real) part of the quaternion and v=(a,b,c)\vec{v}=(a,b,c) the vector (or imaginary) part. A quaternion without scalar part q=(0,a,b,c)q=(0,a,b,c) is called pure quaternion. Similar to the imaginary number,

𝐢^2=𝐣^2=𝐤^2=𝐢^𝐣^𝐤^=𝟏.\hat{\mathbf{i}}^{2}=\hat{\mathbf{j}}^{2}=\hat{\mathbf{k}}^{2}=\hat{\mathbf{i}}\hat{\mathbf{j}}\hat{\mathbf{k}}=-\boldsymbol{1}. (23)

Importantly, the algebra of quaternions is noncommutative, based on

𝟏𝐢^=𝐢^𝟏=𝐢^,𝟏𝐣^=𝐣^𝟏=𝐣^,𝟏𝐤^=𝐤^𝟏=𝐤^,\displaystyle\boldsymbol{1}\hat{\mathbf{i}}=\hat{\mathbf{i}}\boldsymbol{1}=\hat{\mathbf{i}},\quad\boldsymbol{1}\hat{\mathbf{j}}=\hat{\mathbf{j}}\boldsymbol{1}=\hat{\mathbf{j}},\quad\boldsymbol{1}\hat{\mathbf{k}}=\hat{\mathbf{k}}\boldsymbol{1}=\hat{\mathbf{k}}, (24)
𝐢^𝐣^=𝐣^𝐢^=𝐤^,𝐣^𝐤^=𝐤^𝐣^=𝐢^,and𝐤^𝐢^=𝐢^𝐤^=𝐣^.\displaystyle\hat{\mathbf{i}}\hat{\mathbf{j}}=-\hat{\mathbf{j}}\hat{\mathbf{i}}=\hat{\mathbf{k}},\quad\hat{\mathbf{j}}\hat{\mathbf{k}}=-\hat{\mathbf{k}}\hat{\mathbf{j}}=\hat{\mathbf{i}},\quad\mathrm{and}\quad\hat{\mathbf{k}}\hat{\mathbf{i}}=-\hat{\mathbf{i}}\hat{\mathbf{k}}=\hat{\mathbf{j}}.

The conjugate of the quaternion is defined to be

q=ra𝐢^b𝐣^c𝐤^,q^{*}=r-a\hat{\mathbf{i}}-b\hat{\mathbf{j}}-c\hat{\mathbf{k}}, (25)

and the norm is given by

q=qq=r2+a2+b2+c2.||q||=\sqrt{qq^{*}}=\sqrt{r^{2}+a^{2}+b^{2}+c^{2}}. (26)

Therefore the inverse of qq is defined as

q1:=qq2.q^{-1}:=\frac{q^{*}}{\norm{q}^{2}}. (27)

If qq is unit quaternion, then their inverse is exactly their conjugate. The multiplication (so-called quaternion product or Hamilton product) of two quaternions q1=(r1,a1,b1,c1)q_{1}=(r_{1},~{}a_{1},~{}b_{1},~{}c_{1}) and q2=(r2,a2,b2,c2)q_{2}=(r_{2},~{}a_{2},~{}b_{2},~{}c_{2}) is given by

q1q2=(r1r2a1a2b1b2c1c2)+(a1r2+r1a2c1b2+b1c2)𝐢^+(b1r2+c1a2+r1b2a1c2)𝐣^+(c1r2b1a2+a1b2+r1c2)𝐤^=(r1r2v1v2,r1v2+r2v1+v1×v2).\begin{split}q_{1}q_{2}&=(r_{1}r_{2}-a_{1}a_{2}-b_{1}b_{2}-c_{1}c_{2})\\ &\quad+(a_{1}r_{2}+r_{1}a_{2}-c_{1}b_{2}+b_{1}c_{2})\hat{\mathbf{i}}\\ &\quad+(b_{1}r_{2}+c_{1}a_{2}+r_{1}b_{2}-a_{1}c_{2})\hat{\mathbf{j}}\\ &\quad+(c_{1}r_{2}-b_{1}a_{2}+a_{1}b_{2}+r_{1}c_{2})\hat{\mathbf{k}}\\ &=(r_{1}r_{2}-\vec{v}_{1}\cdot\vec{v}_{2},~{}r_{1}\vec{v}_{2}+r_{2}\vec{v}_{1}+\vec{v}_{1}\times\vec{v}_{2}).\end{split} (28)

To realize the algebra in Eqs. (23) and (24), one can choose the (4,)\mathcal{M}(4,\mathbb{R}) representation for the quaternion numbers with

𝟏(1000010000100001),𝐢^(01001000000100 10),𝐣^(0010000 110000100),and𝐤^(000100100 1001000),\begin{split}\boldsymbol{1}\doteq&\begin{pmatrix}1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\\ 0&0&0&1\end{pmatrix},\ \hat{\mathbf{i}}\doteq\begin{pmatrix}[r]0&-1&0&0\\ 1&0&0&0\\ 0&0&0&-1\\ 0&0&\ 1&0\end{pmatrix},\\ &\hat{\mathbf{j}}\doteq\begin{pmatrix}[r]0&0&-1&0\\ 0&0&0&\ 1\\ 1&0&0&0\\ 0&-1&0&0\end{pmatrix},\ \mathrm{and}\ \hat{\mathbf{k}}\doteq\begin{pmatrix}[r]0&0&0&-1\\ 0&0&-1&0\\ 0&\ 1&0&0\\ 1&0&0&0\end{pmatrix},\end{split}

so that

q(rabcarcbbcracbar).q\doteq\begin{pmatrix}[r]r&-a&-b&-c\\ a&r&-c&b\\ b&c&r&-a\\ c&-b&a&r\end{pmatrix}.

Reversely,

r=14tr(q),a=14tr(𝐢^q),b=14tr(𝐣^q),c=14tr(𝐤^q).r=\frac{1}{4}\mathrm{tr}(q),~{}a=-\frac{1}{4}\mathrm{tr}(\hat{\mathbf{i}}q),~{}b=-\frac{1}{4}\mathrm{tr}(\hat{\mathbf{j}}q),~{}c=-\frac{1}{4}\mathrm{tr}(\hat{\mathbf{k}}q).

It is evident that in terms of matrices the commutativity of multiplication of quaternions dose not hold. Furthermore, in the matrix representation q=qTq^{*}=q^{T}, conjugation of a quaternion being equal to its transposition. More specifically, an unit quaternion have a property q1=q=qTq^{-1}=q^{*}=q^{T} in the (4,)\mathcal{M}(4,\mathbb{R}) representations.

Appendix C Details of definition in Section III

In this section, we provide some properties about the F(p)F(\vec{p}) function we defined in PCA section and the convolution of normalized spin vector 𝐧\mathbf{n}. Recall that the convolution we defined in the PCA section is as follows

F(p):=(q¯q¯)[p](q¯q¯)[p]with(qq)[p]:=kBZq|kq|pk.\begin{split}F(\vec{p}):=(\overline{q}^{*}&\circledast\overline{q})[\vec{p}]-(\underline{q}^{*}\circledast\underline{q})[\vec{p}]\quad\mathrm{with}\\ (q^{*}&\circledast q)[\vec{p}]:=\sum_{\vec{k}\in\mathrm{BZ}}q^{*}\bigg{\rvert}_{\vec{k}}q\bigg{\rvert}_{\vec{p}-\vec{k}}.\end{split} (29)

From now on, because of lack of notations, we denoted upper-bar (e.g.:q¯,|u¯,u¯|,h¯~{}\overline{q},\ket{\overline{u}},\bra{\overline{u}},\overline{h}) being conduction band, and lower-bar (e.g. q¯,|u¯,u¯|,h¯\underline{q},\ket{\underline{u}},\bra{\underline{u}},\underline{h}) being valence band. A vertical line with a variable stand for its corresponding position of BZ.

Property C.1.

F(p)F(\vec{p}) is a purely real-valued function.

Proof.

Since pk\vec{p}-\vec{k} and k\vec{k} are one-to-one correspondence in BZ, and summing over k\vec{k}\inBZ or pk\vec{p}-\vec{k}\inBZ are equivalent. Once we take conjugate on FF, then

F(p)=kBZq¯|pkq¯|kq¯|pkq¯|k=pkBZq¯|pkq¯|kq¯|pkq¯|k=kBZq¯|kq¯|pkq¯|kq¯|pk=F(p).\begin{split}F^{*}(\vec{p})&=\sum_{\vec{k}\in\mathrm{BZ}}\overline{q}^{*}\bigg{\rvert}_{\vec{p}-\vec{k}}\overline{q}\bigg{\rvert}_{\vec{k}}-\underline{q}^{*}\bigg{\rvert}_{\vec{p}-\vec{k}}\underline{q}\bigg{\rvert}_{\vec{k}}\\ &=\sum_{\vec{p}-\vec{k}\in\mathrm{BZ}}\overline{q}^{*}\bigg{\rvert}_{\vec{p}-\vec{k}}\overline{q}\bigg{\rvert}_{\vec{k}}-\underline{q}^{*}\bigg{\rvert}_{\vec{p}-\vec{k}}\underline{q}\bigg{\rvert}_{\vec{k}}\\ &=\sum_{\vec{k}^{\prime}\in\mathrm{BZ}}\overline{q}^{*}\bigg{\rvert}_{\vec{k}^{\prime}}\overline{q}\bigg{\rvert}_{\vec{p}-\vec{k}^{\prime}}-\underline{q}^{*}\bigg{\rvert}_{\vec{k}^{\prime}}\underline{q}\bigg{\rvert}_{\vec{p}-\vec{k}^{\prime}}\\ &=F(\vec{p}).\end{split}

The first line come from the property of conjugate on quaternions, the second line come from the equivalence of summing whole BZ, and the third line come from kpk\vec{k}^{\prime}\leftrightarrow\vec{p}-\vec{k}. We see that conjugation of FF is itself. Therefore, F(p)F(\vec{p}) is a purely real-valued function. ∎

Recall that in our model Eq. (3), h1,h2h_{1},~{}h_{2} are both even of k\vec{k} and (h3m)(h_{3}-m) is odd of k\vec{k}. That is, given k=(kx,ky)BZ\vec{k}=(k_{x},k_{y})\in\mathrm{BZ}, there is k=(πkx,πky)BZ\vec{k}^{\prime}=(\pi-k_{x},~{}\pi-k_{y})\in\mathrm{BZ} such that

{h1|k=h1|kh2|k=h2|kh3(m)|km=(h3(m)|km).\left\{\begin{matrix}h_{1}\underset{\vec{k}}{\big{\rvert}}&=&h_{1}\underset{\vec{k}^{\prime}}{\big{\rvert}}\\ h_{2}\underset{\vec{k}}{\big{\rvert}}&=&h_{2}\underset{\vec{k}^{\prime}}{\big{\rvert}}\\ h_{3}(m)\underset{\vec{k}}{\big{\rvert}}-m&=&-\Big{(}h_{3}(m^{\prime})\underset{\vec{k}^{\prime}}{\big{\rvert}}-m^{\prime}\Big{)}\end{matrix}.\right. (30)

In addition, those two points k,k\vec{k},\vec{k}^{\prime} are one-to-one correspondence in BZ, and identical at the Γ\Gamma point. Notice that once we normalized 𝐡(m)=(h1,h2,h3(m))\mathbf{h}(m)=(h_{1},h_{2},h_{3}(m)) by 𝐡(m)\norm{\mathbf{h}(m)}, then each component of 𝐧(m)\mathbf{n}(m) is function of mm.

Property C.2.

Encoding 𝐧(m)\mathbf{n}(m) into quaternion by q=(0,nx(m),ny(m),nz(m))q=(0,n_{x}(m),n_{y}(m),n_{z}(m)), the convolution qqq^{*}\circledast q is independent on the sign of mm.

Proof.

We consider two convolutions with q=(0,𝐧(m))q=(0,~{}\mathbf{n}(m)) but over k\vec{k}, k=(πkx,πky)\vec{k}^{\prime}=(\pi-k_{x},\pi-k_{y}) and opposite sign mmm^{\prime}--m, respectively:

(qq)[p,m]\displaystyle(q^{*}\circledast q)[\vec{p},m] =kBZ(0,𝐧(m))|k(0,𝐧(m))|pk\displaystyle=\sum_{\vec{k}\in\mathrm{BZ}}\big{(}0,-\mathbf{n}(m)\big{)}\underset{\vec{k}}{\Big{\rvert}}\big{(}0,~{}\mathbf{n}(m)\big{)}\underset{\vec{p}-\vec{k}}{\Big{\rvert}} (31)
=kBZ(nx(m)|knx(m)|pk\displaystyle=\sum_{\vec{k}\in\mathrm{BZ}}\Big{(}n_{x}(m)\underset{\vec{k}}{\big{\rvert}}n_{x}(m)\underset{\vec{p}-\vec{k}}{\big{\rvert}}
+ny(m)|kny(m)|pk\displaystyle\hskip 30.00005pt+n_{y}(m)\underset{\vec{k}}{\big{\rvert}}n_{y}(m)\underset{\vec{p}-\vec{k}}{\big{\rvert}}
+nz(m)|knz(m)|pk,VL|k),\displaystyle\hskip 30.00005pt+n_{z}(m)\underset{\vec{k}}{\big{\rvert}}n_{z}(m)\underset{\vec{p}-\vec{k}}{\big{\rvert}},~{}\vec{V}_{L}\underset{\vec{k}}{\big{\rvert}}\Big{)},

and

(qq)[p,m]\displaystyle(q^{*}\circledast q)[\vec{p},-m] =kBZ(0,𝐧(m))|k(0,𝐧(m))|pk\displaystyle=\sum_{\vec{k}^{\prime}\in\mathrm{BZ}}\big{(}0,-\mathbf{n}(-m)\big{)}\underset{\vec{k}^{\prime}}{\Big{\rvert}}\big{(}0,~{}\mathbf{n}(-m)\big{)}\underset{\vec{p}-\vec{k}^{\prime}}{\Big{\rvert}} (32)
=kBZ(nx(m)|knx(m)|pk\displaystyle=\sum_{\vec{k}^{\prime}\in\mathrm{BZ}}\Big{(}n_{x}(-m)\underset{\vec{k}^{\prime}}{\big{\rvert}}n_{x}(-m)\underset{\vec{p}-\vec{k}^{\prime}}{\big{\rvert}}
+ny(m)|kny(m)|pk\displaystyle\hskip 35.00005pt+n_{y}(-m)\underset{\vec{k}^{\prime}}{\big{\rvert}}n_{y}(-m)\underset{\vec{p}-\vec{k}^{\prime}}{\big{\rvert}}
+nz(m)|knz(m)|pk,VR|k)\displaystyle\hskip 35.00005pt+n_{z}(-m)\underset{\vec{k}^{\prime}}{\big{\rvert}}n_{z}(-m)\underset{\vec{p}-\vec{k}^{\prime}}{\big{\rvert}},\vec{V}_{R}\underset{\vec{k}^{\prime}}{\big{\rvert}}\Big{)}

where VL,VR\vec{V}_{L},\vec{V}_{R} are vector parts of the above quaternion product at k,k\vec{k},\vec{k}^{\prime}, respectively. In Property (C.1), we have shown the convolution over entire BZ is a purely real-valued function. That is, we only need to consider dot product of vector part as the quaternion product of two quaternion q1,q2q_{1},q_{2} when q1,q2q_{1},q_{2} both doesn’t have real part. Now, the Eq. (30) and the assumption m=mm^{\prime}=-m gives us the fact

h3(m)|km=(h3(m)|km)=h3(m)|km.\displaystyle h_{3}(m)\underset{\vec{k}}{\Big{\rvert}}-m=-(h_{3}(m^{\prime})\underset{\vec{k}^{\prime}}{\Big{\rvert}}-m^{\prime})=-h_{3}(-m)\underset{\vec{k}^{\prime}}{\Big{\rvert}}-m.
h3(m)|k=h3(m)|kk,kBZ.\displaystyle\Rightarrow h_{3}(m)\underset{\vec{k}}{\Big{\rvert}}=-h_{3}(-m)\underset{\vec{k}^{\prime}}{\Big{\rvert}}\quad\forall\vec{k},\vec{k}^{\prime}\in\mathrm{BZ}.

Since 𝐡(k,m)=𝐡(k,m)\norm{\mathbf{h}(\vec{k},m)}=\norm{\mathbf{h}(\vec{k}^{\prime},-m)}, we can conclude that

{nx(m)|k=nx(m)|kny(m)|k=ny(m)|knz(m)|k=nz(m)|k,k,kBZ\left\{\begin{matrix}n_{x}(m)\underset{\vec{k}}{\big{\rvert}}&=&n_{x}(-m)\underset{\vec{k}^{\prime}}{\big{\rvert}}\\ n_{y}(m)\underset{\vec{k}}{\big{\rvert}}&=&n_{y}(-m)\underset{\vec{k}^{\prime}}{\big{\rvert}}\\ n_{z}(m)\underset{\vec{k}}{\big{\rvert}}&=&-n_{z}(-m)\underset{\vec{k}^{\prime}}{\big{\rvert}}\end{matrix},\quad\forall\vec{k},\vec{k}^{\prime}\in\mathrm{BZ}\right. (33)

Substituting Eq. (33) into Eq, (32), we identify Eqs. (31) and (32). Therefore, applying opposite mm, the convolutions over BZ gets exactly the same value. That is, the convolution qqq^{*}\circledast q is independent on the sign of mm once we encoded quaternion by (0,𝐧(m))(0,~{}\mathbf{n}(m)). ∎

Property (C.2) is based on how we encoded quaterion number. The following two properties are based on the way we transformed quaternion in main text.

Property C.3.

If we encode spinor |u=(α,β)T\ket{u}=(\alpha,\beta)^{T}, α,β\alpha,\beta\in\mathbb{C}, into quaternion number by following

q:=Re(α)+Im(α)𝐢^+Re(β)𝐣^+Im(β)𝐤^.q:=\mathrm{Re}(\alpha)+\mathrm{Im}(\alpha)\hat{\mathbf{i}}+\mathrm{Re}(\beta)\hat{\mathbf{j}}+\mathrm{Im}(\beta)\hat{\mathbf{k}}.

Then,

Re(q|kq|pk)=Re(u(k)|u(pk)),kBZ,\mathrm{Re}\Big{(}q^{*}\underset{\vec{k}}{\Big{\rvert}}q\underset{\vec{p}-\vec{k}}{\Big{\rvert}}\Big{)}=\mathrm{Re}\Big{(}\bra{u(\vec{k})}\ket{u(\vec{p}-\vec{k})}\Big{)},\quad\forall\vec{k}\in\mathrm{BZ},
Proof.

According to Property (C.1), it’s suffice to show real part. By the assumption, given kBZ\vec{k}\in\mathrm{BZ} and let α=a+bi,β=c+di\alpha=a+bi,~{}\beta=c+di with a,b,c,da,b,c,d\in\mathbb{R}, we have

q|kq|pk\displaystyle q^{*}\underset{\vec{k}}{\Big{\rvert}}q\underset{\vec{p}-\vec{k}}{\Big{\rvert}} =(a,b,c,d)|k(a,b,c,d)|pk\displaystyle=(a,b,c,d)^{*}\underset{\vec{k}}{\Big{\rvert}}(a,b,c,d)\underset{\vec{p}-\vec{k}}{\Big{\rvert}} (34)
=(a,b,c,d)|k(a,b,c,d)|pk\displaystyle=(a,-b,-c,-d)\underset{\vec{k}}{\Big{\rvert}}(a,b,c,d)\underset{\vec{p}-\vec{k}}{\Big{\rvert}}
=(a|ka|pk(b,c,d)|k(b,c,d)|pk,V|k)\displaystyle=\Big{(}a\underset{\vec{k}}{\Big{\rvert}}a\underset{\vec{p}-\vec{k}}{\Big{\rvert}}-(-b,-c,-d)\underset{\vec{k}}{\Big{\rvert}}\cdot(b,c,d)\underset{\vec{p}-\vec{k}}{\Big{\rvert}},~{}\vec{V}\underset{\vec{k}}{\Big{\rvert}}\Big{)}
=(a|ka|pk+b|kb|pk+c|kc|pk+d|kd|pk,V|k)\displaystyle=\Big{(}a\underset{\vec{k}}{\Big{\rvert}}a\underset{\vec{p}-\vec{k}}{\Big{\rvert}}+b\underset{\vec{k}}{\Big{\rvert}}b\underset{\vec{p}-\vec{k}}{\Big{\rvert}}+c\underset{\vec{k}}{\Big{\rvert}}c\underset{\vec{p}-\vec{k}}{\Big{\rvert}}+d\underset{\vec{k}}{\Big{\rvert}}d\underset{\vec{p}-\vec{k}}{\Big{\rvert}},~{}\vec{V}\underset{\vec{k}}{\Big{\rvert}}\Big{)}

where V\vec{V} is the vector part of the quaternion product. Notice that the second line above is a quaternion product, but the third line above is dot product between two vectors. On the other hand

u(k)|u(pk)\displaystyle\bra{u(\vec{k})}\ket{u(\vec{p}-\vec{k})} =(abicdi)|k(a+bic+di)|pk\displaystyle=\matrixquantity(a-bi\hfil&c-di)\underset{\vec{k}}{\Big{\rvert}}\matrixquantity(a+bi\\ c+di)\underset{\vec{p}-\vec{k}}{\Big{\rvert}} (35)
=a|ka|pk+b|kb|pk+c|kc|pk+d|kd|pk+V\displaystyle=a\underset{\vec{k}}{\Big{\rvert}}a\underset{\vec{p}-\vec{k}}{\Big{\rvert}}+b\underset{\vec{k}}{\Big{\rvert}}b\underset{\vec{p}-\vec{k}}{\Big{\rvert}}+c\underset{\vec{k}}{\Big{\rvert}}c\underset{\vec{p}-\vec{k}}{\Big{\rvert}}+d\underset{\vec{k}}{\Big{\rvert}}d\underset{\vec{p}-\vec{k}}{\Big{\rvert}}+V

where VV\in\mathbb{C} is the imaginary part. It’s obvious that Eq. (34) and Eq. (35) have exactly the same value in real part. ∎

Property C.4.

Given the quaternion number

q:=Re(α)+Im(α)𝐢^+Re(β)𝐣^+Im(β)𝐤^,q:=\mathrm{Re}(\alpha)+\mathrm{Im}(\alpha)\hat{\mathbf{i}}+\mathrm{Re}(\beta)\hat{\mathbf{j}}+\mathrm{Im}(\beta)\hat{\mathbf{k}},

from the spinor |u=(α,β)T\ket{u}=(\alpha,\beta)^{T}, α,β\alpha,\beta\in\mathbb{C} under the gauge in Eqs. (12) and  (13). If h3h_{3} have the same sign in the entire BZ, then F(p)=0F(\vec{p})=0.

Proof.

One can observe eigenvalues to conclude that

h¯=h¯=h12+h22+h32,kBZ.\overline{h}=-\underline{h}=\sqrt{h_{1}^{2}+h_{2}^{2}+h_{3}^{2}},\quad\forall\vec{k}\in\mathrm{BZ}. (36)

From now on, we consider h3(k)>0h_{3}(\vec{k})>0 for all kBZ\vec{k}\in\mathrm{BZ}, and therefore the spinor has the following form

|u¯=12h¯(h¯+h3)(h¯+h3h1+ih2)=12h¯(h¯h3)(h¯+h3h1+ih2)|u¯,=12h¯(h¯+h3)(h1+ih2h¯+h3).\begin{split}\ket{\overline{u}}&=\frac{1}{\sqrt{2\overline{h}(\overline{h}+h_{3})}}\matrixquantity(\overline{h}+h_{3}\\ h_{1}+ih_{2})\\ &=\frac{1}{\sqrt{2\underline{h}(\underline{h}-h_{3})}}\matrixquantity(-\underline{h}+h_{3}\\ h_{1}+ih_{2})\\ \ket{\underline{u}},&=\frac{1}{\sqrt{2\underline{h}(\underline{h}+h_{3})}}\matrixquantity(-h_{1}+ih_{2}\\ \underline{h}+h_{3}).\\ \end{split}

After transforming the above two eigenstates into quaternions, we calculate the value of q¯q¯q¯q¯\overline{q}^{*}\circledast\overline{q}-\underline{q}^{*}\circledast\underline{q} at k\vec{k}:

F(p)\displaystyle F(\vec{p}) =kBZ(12h¯(h¯h3)(h¯+h3,0,h1,h2)|k12h¯(h¯h3)(h¯+h3,0,h1,h2)|pk\displaystyle=\sum_{\vec{k}\in\mathrm{BZ}}\Bigg{(}\frac{1}{\sqrt{2\underline{h}(\underline{h}-h_{3})}}(-\underline{h}+h_{3},0,h_{1},h_{2})^{*}\underset{\vec{k}}{\bigg{\rvert}}\frac{1}{\sqrt{2\underline{h}(\underline{h}-h_{3})}}(-\underline{h}+h_{3},0,h_{1},h_{2})\underset{\vec{p}-\vec{k}}{\bigg{\rvert}} (37)
12h¯(h¯+h3)(h1,h2,h¯+h3,0)|k12h¯(h¯+h3)(h1,h2,h¯+h3,0)|pk)\displaystyle\hskip 160.00024pt-\frac{1}{\sqrt{2\underline{h}(\underline{h}+h_{3})}}(-h_{1},h_{2},\underline{h}+h_{3},0)^{*}\underset{\vec{k}}{\bigg{\rvert}}\frac{1}{\sqrt{2\underline{h}(\underline{h}+h_{3})}}(-h_{1},h_{2},\underline{h}+h_{3},0)\underset{\vec{p}-\vec{k}}{\bigg{\rvert}}\Bigg{)}
=kBZ(h¯+h32h¯(h¯h3)|kh¯+h32h¯(h¯h3)|pk+h12h¯(h¯h3)|kh12h¯(h¯h3)|pk+h22h¯(h¯h3)|kh22h¯(h¯h3)|pk\displaystyle=\sum_{\vec{k}\in\mathrm{BZ}}\Bigg{(}\frac{-\underline{h}+h_{3}}{\sqrt{2\underline{h}(\underline{h}-h_{3})}}\underset{\vec{k}}{\bigg{\rvert}}\frac{-\underline{h}+h_{3}}{\sqrt{2\underline{h}(\underline{h}-h_{3})}}\underset{\vec{p}-\vec{k}}{\bigg{\rvert}}+\frac{h_{1}}{\sqrt{2\underline{h}(\underline{h}-h_{3})}}\underset{\vec{k}}{\bigg{\rvert}}\frac{h_{1}}{\sqrt{2\underline{h}(\underline{h}-h_{3})}}\underset{\vec{p}-\vec{k}}{\bigg{\rvert}}+\frac{h_{2}}{\sqrt{2\underline{h}(\underline{h}-h_{3})}}\underset{\vec{k}}{\bigg{\rvert}}\frac{h_{2}}{\sqrt{2\underline{h}(\underline{h}-h_{3})}}\underset{\vec{p}-\vec{k}}{\bigg{\rvert}}
h12h¯(h¯+h3)|kh12h¯(h¯+h3)|pkh22h¯(h¯+h3)|kh22h¯(h¯+h3)|pkh¯+h32h¯(h¯+h3)|kh¯h32h¯(h¯+h3)|pk,V|k)\displaystyle\qquad-\frac{-h_{1}}{\sqrt{2\underline{h}(\underline{h}+h_{3})}}\underset{\vec{k}}{\bigg{\rvert}}\frac{-h_{1}}{\sqrt{2\underline{h}(\underline{h}+h_{3})}}\underset{\vec{p}-\vec{k}}{\bigg{\rvert}}-\frac{-h_{2}}{\sqrt{2\underline{h}(\underline{h}+h_{3})}}\underset{\vec{k}}{\bigg{\rvert}}\frac{-h_{2}}{\sqrt{2\underline{h}(\underline{h}+h_{3})}}\underset{\vec{p}-\vec{k}}{\bigg{\rvert}}-\frac{\underline{h}+h_{3}}{\sqrt{2\underline{h}(\underline{h}+h_{3})}}\underset{\vec{k}}{\bigg{\rvert}}\frac{-\underline{h}-h_{3}}{\sqrt{2\underline{h}(\underline{h}+h_{3})}}\underset{\vec{p}-\vec{k}}{\bigg{\rvert}},\vec{V}\underset{\vec{k}}{\bigg{\rvert}}\Bigg{)}
=12kBZ(h1h¯(h¯h3)|kh1h¯(h¯h3)|pkh1h¯(h¯+h3)|kh1h¯(h¯+h3)|pk\displaystyle=\frac{1}{2}\sum_{\vec{k}\in\mathrm{BZ}}\Bigg{(}\frac{h_{1}}{\sqrt{\underline{h}(\underline{h}-h_{3})}}\underset{\vec{k}}{\bigg{\rvert}}\frac{h_{1}}{\sqrt{\underline{h}(\underline{h}-h_{3})}}\underset{\vec{p}-\vec{k}}{\bigg{\rvert}}-\frac{h_{1}}{\sqrt{\underline{h}(\underline{h}+h_{3})}}\underset{\vec{k}}{\bigg{\rvert}}\frac{h_{1}}{\sqrt{\underline{h}(\underline{h}+h_{3})}}\underset{\vec{p}-\vec{k}}{\bigg{\rvert}}
+h2h¯(h¯h3)|kh2h¯(h¯h3)|pkh2h¯(h¯+h3)|kh2h¯(h¯+h3)|pk,2V|k),\displaystyle\hskip 160.00024pt+\frac{h_{2}}{\sqrt{\underline{h}(\underline{h}-h_{3})}}\underset{\vec{k}}{\bigg{\rvert}}\frac{h_{2}}{\sqrt{\underline{h}(\underline{h}-h_{3})}}\underset{\vec{p}-\vec{k}}{\bigg{\rvert}}-\frac{h_{2}}{\sqrt{\underline{h}(\underline{h}+h_{3})}}\underset{\vec{k}}{\bigg{\rvert}}\frac{h_{2}}{\sqrt{\underline{h}(\underline{h}+h_{3})}}\underset{\vec{p}-\vec{k}}{\bigg{\rvert}},2\vec{V}\underset{\vec{k}}{\bigg{\rvert}}\Bigg{)},

where V\vec{V} is the vector part of the above quaternion product at fixed k\vec{k} point. Recalling Property (C.1) that shows the vector part has no contribution for real-valued function F(p)F(\vec{p}), it is suffice to calculate the real part of the quaternion product at fixed k\vec{k} point.

From Eq. (30), there are two points k=(kx,ky)\vec{k}=(k_{x},k_{y}), and k=(πkx,πky)\vec{k}^{\prime}=(\pi-k_{x},\pi-k_{y}) such that

hih¯(h¯h3)|khih¯(h¯h3)|pk=hih¯(h¯+h3)|khih¯(h¯+h3)|pk,for i=1,2,and k,kBZ.\frac{h_{i}}{\sqrt{\underline{h}(\underline{h}-h_{3})}}\underset{\vec{k}}{\bigg{\rvert}}\frac{h_{i}}{\sqrt{\underline{h}(\underline{h}-h_{3})}}\underset{\vec{p}-\vec{k}}{\bigg{\rvert}}=\frac{h_{i}}{\sqrt{\underline{h}(\underline{h}+h_{3})}}\underset{\vec{k}^{\prime}}{\bigg{\rvert}}\frac{h_{i}}{\sqrt{\underline{h}(\underline{h}+h_{3})}}\underset{\vec{p}-\vec{k}^{\prime}}{\bigg{\rvert}},\quad\text{for~{}}i=1,~{}2,\>\text{and~{}}\vec{k},\vec{k}^{\prime}\in\mathrm{BZ}. (38)

Therefore, the terms at k\vec{k} and at k\vec{k}^{\prime} in Eq. (37) will cancel with each other. Note that values at Γ\Gamma and at (π,π)(\pi,\pi) are zero in Eq. (37) since h1=h2=0h_{1}=h_{2}=0 at these points in our model Eq. (3). Thus, F(p)=0F(\vec{p})=0 if h3(k)>0h_{3}(\vec{k})>0 for all k\vec{k}\in BZ.

Similarly, we assume h3(k)<0h_{3}(\vec{k})<0 for all k\vec{k}\in BZ. After calculation, the value of F(p)F(\vec{p}) is the same as Eq. (37). Therefore, we can conclude that if h3h_{3} has the same sign in the entire BZ, then F(p)=0F(\vec{p})=0. ∎

References

  • Carrasquilla and Melko [2017] J. Carrasquilla and R. G. Melko, Machine learning phases of matter, Nature Physics 13, 431 (2017).
  • Bedolla et al. [2020] E. Bedolla, L. C. Padierna, and R. Castañeda-Priego, Machine learning for condensed matter physics, Journal of Physics: Condensed Matter 33, 053001 (2020).
  • Mehta et al. [2019] P. Mehta, M. Bukov, C.-H. Wang, A. G. Day, C. Richardson, C. K. Fisher, and D. J. Schwab, A high-bias, low-variance introduction to machine learning for physicists, Physics reports 810, 1 (2019).
  • Beach et al. [2018] M. J. Beach, A. Golubeva, and R. G. Melko, Machine learning vortices at the kosterlitz-thouless transition, Physical Review B 97, 045207 (2018).
  • Yoshioka et al. [2018] N. Yoshioka, Y. Akagi, and H. Katsura, Learning disordered topological phases by statistical recovery of symmetry, Physical Review B 97, 205110 (2018).
  • Carvalho et al. [2018] D. Carvalho, N. A. García-Martínez, J. L. Lado, and J. Fernández-Rossier, Real-space mapping of topological invariants using artificial neural networks, Physical Review B 97, 115453 (2018).
  • Balabanov and Granath [2020a] O. Balabanov and M. Granath, Unsupervised learning using topological data augmentation, Physical Review Research 2, 013354 (2020a).
  • Balabanov and Granath [2020b] O. Balabanov and M. Granath, Unsupervised interpretable learning of topological indices invariant under permutations of atomic bands, Machine Learning: Science and Technology 2, 025008 (2020b).
  • Greplova et al. [2020] E. Greplova, A. Valenti, G. Boschung, F. Schäfer, N. Lörch, and S. D. Huber, Unsupervised identification of topological phase transitions using predictive models, New Journal of Physics 22, 045003 (2020).
  • Ho and Wang [2021] C.-T. Ho and D.-W. Wang, Robust identification of topological phase transition by self-supervised machine learning approach, New Journal of Physics 23, 083021 (2021).
  • Zhang et al. [2021] L.-F. Zhang, L.-Z. Tang, Z.-H. Huang, G.-Q. Zhang, W. Huang, and D.-W. Zhang, Machine learning topological invariants of non-hermitian systems, Physical Review A 103, 012419 (2021).
  • Narayan and Narayan [2021] B. Narayan and A. Narayan, Machine learning non-hermitian topological phases, Physical Review B 103, 035413 (2021).
  • Yu and Deng [2021] L.-W. Yu and D.-L. Deng, Unsupervised learning of non-hermitian topological phases, Physical Review Letters 126, 240402 (2021).
  • Zhang and Kim [2017] Y. Zhang and E.-A. Kim, Quantum loop topography for machine learning, Physical review letters 118, 216401 (2017).
  • Cheng et al. [2018] Q.-Q. Cheng, W.-W. Luo, A.-L. He, and Y.-F. Wang, Topological quantum phase transitions of chern insulators in disk geometry, Journal of Physics: Condensed Matter 30, 355502 (2018).
  • Sun et al. [2018] N. Sun, J. Yi, P. Zhang, H. Shen, and H. Zhai, Deep learning topological invariants of band insulators, Physical Review B 98, 085402 (2018).
  • Zhang et al. [2020] Y. Zhang, P. Ginsparg, and E.-A. Kim, Interpreting machine learning of topological quantum phase transitions, Physical Review Research 2, 023283 (2020).
  • Kerr et al. [2021] A. Kerr, G. Jose, C. Riggert, and K. Mullen, Automatic learning of topological phase boundaries, Physical Review E 103, 023310 (2021).
  • Käming et al. [2021] N. Käming, A. Dawid, K. Kottmann, M. Lewenstein, K. Sengstock, A. Dauphin, and C. Weitenberg, Unsupervised machine learning of topological phase transitions from experimental data, Machine Learning: Science and Technology 2, 035037 (2021).
  • Rem et al. [2019] B. S. Rem, N. Käming, M. Tarnowski, L. Asteria, N. Fläschner, C. Becker, K. Sengstock, and C. Weitenberg, Identifying quantum phase transitions using artificial neural networks on experimental data, Nature Physics 15, 917 (2019).
  • Che et al. [2020] Y. Che, C. Gneiting, T. Liu, and F. Nori, Topological quantum phase transitions retrieved through unsupervised machine learning, Physical Review B 102, 134213 (2020).
  • Chung et al. [2021] M.-C. Chung, T.-P. Cheng, G.-Y. Huang, and Y.-H. Tsai, Deep learning of topological phase transitions from the point of view of entanglement for two-dimensional chiral p-wave superconductors, Physical Review B 104, 024506 (2021).
  • Ming et al. [2019] Y. Ming, C.-T. Lin, S. D. Bartlett, and W.-W. Zhang, Quantum topology identification with deep neural networks and quantum walks, npj Computational Materials 5, 1 (2019).
  • Zhang et al. [2018] P. Zhang, H. Shen, and H. Zhai, Machine learning topological invariants with neural networks, Physical review letters 120, 066401 (2018).
  • Holanda and Griffith [2020] N. Holanda and M. Griffith, Machine learning topological phases in real space, Physical Review B 102, 054107 (2020).
  • Tsai et al. [2021] Y.-H. Tsai, K.-F. Chiu, Y.-C. Lai, K.-J. Su, T.-P. Yang, T.-P. Cheng, G.-Y. Huang, and M.-C. Chung, Deep learning of topological phase transitions from entanglement aspects: An unsupervised way, Physical Review B 104, 165108 (2021).
  • Zhang et al. [2017] Y. Zhang, R. G. Melko, and E.-A. Kim, Machine learning z 2 quantum spin liquids with quasiparticle statistics, Physical Review B 96, 245119 (2017).
  • Mano and Ohtsuki [2019] T. Mano and T. Ohtsuki, Application of convolutional neural network to quantum percolation in topological insulators, Journal of the Physical Society of Japan 88, 123704 (2019).
  • Su et al. [2019] Z. Su, Y. Kang, B. Zhang, Z. Zhang, and H. Jiang, Disorder induced phase transition in magnetic higher-order topological insulator: A machine learning study, Chinese Physics B 28, 117301 (2019).
  • Lian et al. [2019] W. Lian, S.-T. Wang, S. Lu, Y. Huang, F. Wang, X. Yuan, W. Zhang, X. Ouyang, X. Wang, X. Huang, et al., Machine learning topological phases with a solid-state quantum simulator, Physical review letters 122, 210503 (2019).
  • Richter-Laskowska et al. [2018] M. Richter-Laskowska, H. Khan, N. Trivedi, and M. M. Maśka, A machine learning approach to the berezinskii-kosterlitz-thouless transition in classical and quantum models, arXiv preprint arXiv:1809.09927  (2018).
  • Zhang et al. [2019] W. Zhang, J. Liu, and T.-C. Wei, Machine learning of phase transitions in the percolation and x y models, Physical Review E 99, 032142 (2019).
  • Rodriguez-Nieva and Scheurer [2019] J. F. Rodriguez-Nieva and M. S. Scheurer, Identifying topological order through unsupervised machine learning, Nature Physics 15, 790 (2019).
  • Tsai et al. [2020] Y.-H. Tsai, M.-Z. Yu, Y.-H. Hsu, and M.-C. Chung, Deep learning of topological phase transitions from entanglement aspects, Physical Review B 102, 054512 (2020).
  • Scheurer and Slager [2020] M. S. Scheurer and R.-J. Slager, Unsupervised machine learning and band topology, Physical review letters 124, 226401 (2020).
  • Caio et al. [2019] M. D. Caio, M. Caccin, P. Baireuther, T. Hyart, and M. Fruchart, Machine learning assisted measurement of local topological invariants, arXiv preprint arXiv:1901.03346  (2019).
  • Trabelsi et al. [2018] M. Trabelsi, P. Kakosimos, and H. Komurcugil, Mitigation of grid voltage disturbances using quasi-z-source based dynamic voltage restorer, in 2018 IEEE 12th International Conference on Compatibility, Power Electronics and Power Engineering (CPE-POWERENG 2018) (IEEE, 2018) pp. 1–6.
  • Gaudet and Maida [2018] C. J. Gaudet and A. S. Maida, Deep quaternion networks, in 2018 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2018) pp. 1–8.
  • García-Retuerta et al. [2020] D. García-Retuerta, R. Casado-Vara, A. Martin-del Rey, F. De la Prieta, J. Prieto, and J. M. Corchado, Quaternion neural networks: state-of-the-art and research challenges, in International Conference on Intelligent Data Engineering and Automated Learning (Springer, 2020) pp. 456–467.
  • Isokawa et al. [2009] T. Isokawa, N. Matsui, and H. Nishimura, Quaternionic neural networks: Fundamental properties and applications, in Complex-valued neural networks: utilizing high-dimensional parameters (IGI global, 2009) pp. 411–439.
  • Parcollet et al. [2020] T. Parcollet, M. Morchid, and G. Linarès, A survey of quaternion neural networks, Artificial Intelligence Review 53, 2957 (2020).
  • Matsui et al. [2004] N. Matsui, T. Isokawa, H. Kusamichi, F. Peper, and H. Nishimura, Quaternion neural network with geometrical operators, Journal of Intelligent & Fuzzy Systems 15, 149 (2004).
  • Zhu et al. [2018] X. Zhu, Y. Xu, H. Xu, and C. Chen, Quaternion convolutional neural networks, in Proceedings of the European Conference on Computer Vision (ECCV) (2018) pp. 631–647.
  • Hongo et al. [2020] S. Hongo, T. Isokawa, N. Matsui, H. Nishimura, and N. Kamiura, Constructing convolutional neural networks based on quaternion, in 2020 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2020) pp. 1–6.
  • Parcollet et al. [2018a] T. Parcollet, Y. Zhang, M. Morchid, C. Trabelsi, G. Linarès, R. De Mori, and Y. Bengio, Quaternion convolutional neural networks for end-to-end automatic speech recognition, arXiv preprint arXiv:1806.07789  (2018a).
  • Grassucci et al. [2022] E. Grassucci, E. Cicero, and D. Comminiello, Quaternion generative adversarial networks, in Generative Adversarial Learning: Architectures and Applications (Springer, 2022) pp. 57–86.
  • Grassucci et al. [2021] E. Grassucci, D. Comminiello, and A. Uncini, A quaternion-valued variational autoencoder, in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2021) pp. 3310–3314.
  • Nguyen et al. [2021] T. D. Nguyen, D. Phung, et al., Quaternion graph neural networks, in Asian Conference on Machine Learning (PMLR, 2021) pp. 236–251.
  • Özcan et al. [2021] B. Özcan, F. Kinli, and F. Kiraç, Quaternion capsule networks, in 2020 25th International Conference on Pattern Recognition (ICPR) (IEEE, 2021) pp. 6858–6865.
  • Parcollet et al. [2018b] T. Parcollet, M. Ravanelli, M. Morchid, G. Linarès, and R. De Mori, Speech recognition with quaternion neural networks, arXiv preprint arXiv:1811.09678  (2018b).
  • Girard [1984] P. R. Girard, The quaternion group and modern physics, European Journal of Physics 5, 25 (1984).
  • Girard and Baylis [2008] P. R. Girard and W. E. Baylis, Quaternions, clifford algebras and relativistic physics, SIAM review 50, 382 (2008).
  • Girard et al. [2018] P. R. Girard, P. Clarysse, R. Pujol, R. Goutte, and P. Delachartre, Hyperquaternions: a new tool for physics, Advances in Applied Clifford Algebras 28, 1 (2018).
  • Nagaosa and Tokura [2013] N. Nagaosa and Y. Tokura, Topological properties and dynamics of magnetic skyrmions, Nature Nanotechnology 8, 899 (2013).
  • Qi et al. [2006] X.-L. Qi, Y.-S. Wu, and S.-C. Zhang, Topological quantization of the spin hall effect in two-dimensional paramagnetic semiconductors, Phys. Rev. B 74, 085308 (2006).
  • Jolliffe and Cadima [2016] I. T. Jolliffe and J. Cadima, Principal component analysis: a review and recent developments, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374, 20150202 (2016).
  • Ma and Fu [2012] Y. Ma and Y. Fu, Manifold learning theory and applications, Vol. 434 (CRC press Boca Raton, FL, 2012).
  • Chollet [2021] F. Chollet, Deep learning with Python (Simon and Schuster, 2021).
  • Efthymiou et al. [2019] S. Efthymiou, M. J. S. Beach, and R. G. Melko, Super-resolving the ising model with convolutional neural networks, Phys. Rev. B 99, 075113 (2019).
  • Lin et al. [2017] H. R. Lin, M. Tegmark, and D. Rolnick, Why does deep and cheap learning work so well?, Journal of Statistical Physics 168, 1223 (2017).