This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The principle of learning sign rules by neural networks in qubit lattice models

Jin Cao Department of Physics and Center for Advanced Quantum Studies, Beijing Normal University, Beijing 100875, China    Shijie Hu [email protected] Beijing Computational Science Research Center, Beijing 100193, China Department of Physics, Beijing Normal University, Beijing 100875, China    Zhiping Yin Department of Physics and Center for Advanced Quantum Studies, Beijing Normal University, Beijing 100875, China    Ke Xia Department of Physics, Southeast University, Nanjing 211189, China
Abstract

A neural network is a powerful tool that can uncover hidden laws beyond human intuition. However, it often appears as a black box due to its complicated nonlinear structures. By drawing upon the Gutzwiller mean-field theory, we can showcase a principle of sign rules for ordered states in qubit lattice models. We introduce a shallow feed-forward neural network with a single hidden neuron to present these sign rules. We conduct systematical benchmarks in various models, including the generalized Ising, spin-1/21/2 XY, (frustrated) Heisenberg rings, triangular XY antiferromagnet on a torus, and the Fermi-Hubbard ring at an arbitrary filling. These benchmarks show that all the leading-order sign rule characteristics can be visualized in classical forms, such as pitch angles. Besides, quantum fluctuations can result in an imperfect accuracy rate quantitatively.

Decoding hidden information from the ground-state wave function is essential for understanding the properties of quantum closed systems at zero temperature, including orders, correlations, and even intricate entanglement features, etc Eisert et al. (2010); Chertkov and Clark (2018); Wang et al. (2019); Irkhin and Skryabin (2019). For a real Hamiltonian, the sign structure of elements in the real wave function can be summarized as a sign rule within a selected representation Grover and Fisher (2015). For example, the Perron-Frobenius theorem is applied to a class of Hamiltonians with non-positive off-diagonal elements Perron (1907); Frobenius (1909). The Marshall-Peierls rule (MPR) is another example applicable to antiferromagnetic spin models on bipartite lattices Marshall (1955); Lieb et al. (1961); Lieb and Mattis (1962). These sign rules have been believed to be connected to various physical phenomena, such as the volume law for the Rényi entanglement entropies Grover and Fisher (2015), spatial periodicity of states Zeng and Parkinson (1995); Bursill et al. (1995), phase transitions Retzlaff et al. (1993); Richter et al. (1994); Cai and Liu (2018); Westerhout et al. (2020), and so on.

Similar to the matrix product state (MPS) successfully applied to (quasi) one-dimensional (1D) lattice models White (1992); Peschel et al. (1999); Schollwöck (2011); Orús (2014), the neural network quantum state (NNQS) and fast-developing machine learning (ML) techniques provide a new approach for multi-scale compression of the wave function, which has been widely used in one and higher dimensional quantum many-body systems Carleo and Troyer (2017); Carleo et al. (2019); Jia et al. (2019); Vivas et al. (2022). By using the empirical activation function cosine in the hidden layer of NNQS, the complicated sign rules in qubit lattice models can be learned from the wave functions Cai and Liu (2018), and subsequent studies have drawn significant attention in recent years Choo et al. (2019); Westerhout et al. (2020); Szabó and Castelnovo (2020); Bukov et al. (2021). These studies have shown that it is a practical advantage to enhance the representation precision for complex sign rules Cai and Liu (2018); Choo et al. (2019); Westerhout et al. (2020); Szabó and Castelnovo (2020); Bukov et al. (2021). This can be achieved by adding more hidden layers/neurons in NNQS or designing new architectures. Meanwhile, there is a growing concern about interpreting the meaning of highly nonlinear structures in neural networks Roscher et al. (2020); He et al. (2020); Fan et al. (2021) and finding links to existing physical insights Raissi et al. (2019); Yuan and Weng (2021); Cai et al. (2021), which strongly motivates our work.

In this work, we establish a Gutzwiller mean-field (GWMF) principle of the sign rules for ordered ground states in qubit lattice models. The leading-order term can be well understood using a single-hidden-neuron feed-forward neural network (shn-FNN). Our findings, tested on various spin and fermion models, suggest that the leading-order sign rules have clear physical interpretations tightly related to orders in spins or charges. The structure of the paper is organized as follows: In Sec. I, we present the GWMF picture of the sign rules for ordered states in qubit lattice models. In Sec. II, we introduce shn-FNN in detail, which matches the GWMF picture and can be easily interpreted. In Sec. III, we demonstrate the technical details of data set preparation and shn-FNN training. In Sec. IV, we apply shn-FNN to extract the leading-order sign rules in spin and Fermi-Hubbard models. We also discuss the influence of frustration and global symmetries. At last, we summarize conclusions and make a discussion briefly in Sec. V.

I Gutzwiller mean-field theory

A qubit is commonly used to represent a quantum state |n\rvert n\rangle in various fields of condensed matter physics. Examples include a spin-1/21/2 in quantum magnets Vasiliev et al. (2018), a single fermion state in ultra-cold atomic systems Gardiner and Zoller (2017), a two-level atom in quantum cavities Meher and Sivakumar (2022), and so on Makhlin et al. (2001); Luo (2008); Kjaergaard et al. (2020). The binary value nn corresponds to an empty or occupied fermion level, or a spin-1/21/2 polarizing \uparrow or \downarrow in the z-axis. For a lattice with LL qubits, the basis can be expressed as |𝐧=l=1L|nl\rvert\mathbf{n}\rangle=\otimes^{L}_{l=1}\rvert n_{l}\rangle, where |nl\rvert n_{l}\rangle represents the local basis at site-ll, and the quantum indices nl=0n_{l}=0, 11 form a vector 𝐧=(n1,,nL)\mathbf{n}=(n_{1},\ \dots,\ n_{L}).

Without loss of generality, let us consider a spin-1/21/2 as an example. A spin operator 𝐒^l=(S^lx\hat{\mathbf{S}}_{l}=(\hat{S}^{x}_{l}, S^ly\hat{S}^{y}_{l}, S^lz)\hat{S}^{z}_{l}) defined at site-ll has three components in the xx, yy and zz-axes, respectively. For any state, there are only two free real variables out of two complex coefficients in front of the basis |σl\rvert\sigma_{l}\rangle of the S^lz\hat{S}^{z}_{l}-representation. The index σl=\sigma_{l}=\uparrow, \downarrow, corresponding to values of ±1/2\pm 1/2. These variables are governed by a pair of site-dependent angles θl\theta_{l} and ϕl\phi_{l} appearing in a spin-coherent state Penc and Läuchli (2010)

|Ωl=cl|+cl|,\displaystyle\rvert\Omega_{l}\rangle=c^{\uparrow}_{l}\left\rvert\uparrow\right\rangle+c^{\downarrow}_{l}\left\rvert\downarrow\right\rangle\ , (1)

where the coefficients are given by

cl=cos(θl/2)andcl=sin(θl/2)eiϕl.\displaystyle c^{\uparrow}_{l}=\cos(\theta_{l}/2)\quad\text{and}\quad c^{\downarrow}_{l}=\sin(\theta_{l}/2)e^{i\phi_{l}}\ . (2)

As convention, θl[0\theta_{l}\in[0, π]\pi] and ϕl[0\phi_{l}\in[0, 2π)2\pi). In such a state, 𝐒l=𝐒^l=𝛀l/2\mathbf{S}_{l}=\langle\hat{\mathbf{S}}_{l}\rangle=\mathbf{\Omega}_{l}/2 behaves as half of the unit vector in three-dimensional coordinates, that is,

𝛀l=(sinθlcosϕl,sinθlsinϕl,cosθl).\displaystyle\mathbf{\Omega}_{l}=(\sin\theta_{l}\cos\phi_{l},\ \sin\theta_{l}\sin\phi_{l},\ \cos\theta_{l})\ . (3)

The phase factor for the basis |σl\rvert\sigma_{l}\rangle with a non-vanishing amplitude only depends on ϕl\phi_{l}, since both cos(θl/2)\cos(\theta_{l}/2) and sin(θl/2)\sin(\theta_{l}/2) are positive. Besides, a phase angle hlh_{l} can modulate the phase factor in front of the spin-coherent state (1), i.e., eihl|Ωle^{ih_{l}}\rvert\Omega_{l}\rangle.

In the GWMF theory Gutzwiller (1963, 1965), the wave function of the ground state |ψgw\rvert\psi^{\text{gw}}\rangle is a product of bases for LL spin-1/21/2s, i.e.,

|ψgw=l=1L|Ωl={𝝈}a¯𝝈gwp𝝈gw|𝝈,\displaystyle\rvert\psi^{\text{gw}}\rangle=\bigotimes^{L}_{l=1}\rvert\Omega_{l}\rangle=\sum_{\{\bm{\sigma}\}}\bar{a}^{\text{gw}}_{\bm{\sigma}}p^{\text{gw}}_{\bm{\sigma}}\rvert\bm{\sigma}\rangle\ , (4)

where

a¯𝝈gw=l|clσl|andp𝝈gw=exp[i(ϕ𝐧+l=1Lhl)]\displaystyle\bar{a}^{\text{gw}}_{\bm{\sigma}}=\prod_{l}\lvert c^{\sigma_{l}}_{l}\rvert\quad\text{and}\quad p^{\text{gw}}_{\bm{\sigma}}=\exp\left[i\left(\bm{\phi}\cdot\mathbf{n}+\sum_{l=1}^{L}h_{l}\right)\right]\quad (5)

represent the positive amplitude and the phase factor, respectively. Here, we define the angle vector as ϕ=(ϕ1,,ϕL)\bm{\phi}=(\phi_{1},\ \cdots,\ \phi_{L}), and the index vector 𝝈=(σ1,,σL)\bm{\sigma}=(\sigma_{1},\ \cdots,\ \sigma_{L}) satisfies the corresponding relation 𝐧=1/2𝝈\mathbf{n}=1/2-\bm{\sigma}. For a quantum model featuring a real Hamiltonian discussed in this work, the complex conjugate of the GWMF wave function |ψgw\rvert\psi^{\text{gw}}\rangle^{*} also indicates a state sharing the ground-state energy. Thus, the modified GWMF wave function eih0|ψgw=|ψrgw+i|ψigwe^{ih_{0}}|\psi^{\text{gw}}\rangle=|\psi^{\text{gw}}_{r}\rangle+i|\psi^{\text{gw}}_{i}\rangle for the ground state can be decomposed into the real part |ψrgw|\psi^{\text{gw}}_{r}\rangle and imaginary part |ψigw|\psi^{\text{gw}}_{i}\rangle, given a specific global phase angle h0h_{0}. Both parts are real-valued and have an extra orthogonalization relation of ψrgw|ψigw=ψigw|ψrgw=0\langle\psi^{\text{gw}}_{r}|\psi^{\text{gw}}_{i}\rangle=\langle\psi^{\text{gw}}_{i}|\psi^{\text{gw}}_{r}\rangle=0. Concretely, two parts are expressed as

|ψrgw={𝝈}a¯𝝈gwcos(ϕ𝐧+h~)|𝝈,|ψigw={𝝈}a¯𝝈gwsin(ϕ𝐧+h~)|𝝈,\displaystyle\begin{split}\rvert\psi^{\text{gw}}_{r}\rangle&=\sum_{\{\bm{\sigma}\}}\bar{a}^{\text{gw}}_{\bm{\sigma}}\cos(\bm{\phi}\cdot\mathbf{n}+\tilde{h})\rvert\bm{\sigma}\rangle\ ,\\ \rvert\psi^{\text{gw}}_{i}\rangle&=\sum_{\{\bm{\sigma}\}}\bar{a}^{\text{gw}}_{\bm{\sigma}}\sin(\bm{\phi}\cdot\mathbf{n}+\tilde{h})\rvert\bm{\sigma}\rangle\ ,\end{split} (6)

where the phase angle h~=l=1Lhl+h0\tilde{h}=\sum^{L}_{l=1}h_{l}+h_{0} is given. Regardless of whether the imaginary part |ψigw|\psi^{\text{gw}}_{i}\rangle is null or two parts share the same energy, we can always obtain a real ground-state wave function. It is worth noticing that the mean fields in the GWMF theory prefer selecting one of the degenerate manifolds if they exist, which artificially breaks the corresponding symmetry. So, the above-mentioned rotation, adjusted by the phase angle h0h_{0}, would introduce an energy split between the real part |ψrgw|\psi^{\text{gw}}_{r}\rangle and the imaginary part |ψigw|\psi^{\text{gw}}_{i}\rangle. In our theory, we ignore this effect and suppose their degeneracy still survives.

To assume that the real part |ψrgw={𝝈}c𝝈gw|𝝈\rvert\psi^{\text{gw}}_{r}\rangle=\sum_{\{\bm{\sigma}\}}c^{\text{gw}}_{\bm{\sigma}}\rvert\bm{\sigma}\rangle can be expanded in the representation of bases |𝝈\rvert\bm{\sigma}\rangle, the real expansion coefficient c𝝈gw=a¯𝝈gwcos(ϕ𝐧+h~)=s𝝈gwa𝝈gwc^{\text{gw}}_{\bm{\sigma}}=\bar{a}^{\text{gw}}_{\bm{\sigma}}\cos(\bm{\phi}\cdot\mathbf{n}+\tilde{h})=s^{\text{gw}}_{\bm{\sigma}}a^{\text{gw}}_{\bm{\sigma}} comprises of an amplitude a𝝈gw=|c𝝈gw|a^{\text{gw}}_{\bm{\sigma}}=\lvert c^{\text{gw}}_{\bm{\sigma}}\rvert and a sign s𝝈gw=Sgn(c𝝈gw)s^{\text{gw}}_{\bm{\sigma}}=\text{Sgn}(c^{\text{gw}}_{\bm{\sigma}}) following a rule:

s𝝈gw=Sgn[cos(ϕ𝐧+h~)].\displaystyle s^{\text{gw}}_{\bm{\sigma}}=\text{Sgn}[\cos(\bm{\phi}\cdot\mathbf{n}+\tilde{h})]\ . (7)

The rule is called the leading-order version, removing short-range fluctuations completely. The phase angle h~\tilde{h} is determined by other necessarily preserved global symmetries for a specified eigenstate, e.g., translational and inversion symmetries. And “Sgn” denotes the standard sign function. When examining the sign rule for the nonzero imaginary part |ψigw\rvert\psi^{\text{gw}}_{i}\rangle, an extra π/2\pi/2 needs to be added to the phase angle h~\tilde{h}, which equivalently replaces the cosine function with the sine function in Eq. (7). In the alternative notation-𝐧\mathbf{n} of utilizing bases |𝐧\rvert\mathbf{n}\rangle, the sign rule remains the same, i.e., s𝐧gws𝝈gws^{\text{gw}}_{\mathbf{n}}\equiv s^{\text{gw}}_{\bm{\sigma}}. For the convenience of the following discussions, we only use the notation-𝐧\mathbf{n}.

In the GWMF scenario, spins in the ordered state are typically visualized as classical vectors that follow a regular profile {𝛀l}\{\mathbf{\Omega}_{l}\} in space. Our derivation shows that the leading-order sign rule, which depends on angles ϕl\phi_{l}, is closely related to the spin-order profile {𝛀l}\{\mathbf{\Omega}_{l}\}. The above conclusion is still valid for general qubit lattice models. In Sec. II, we will demonstrate that the leading-order sign rule can, in principle, be learned by shn-FNN.

II Single-hidden-neuron feed-forward neural network

The feed-forward neural network (FNN) is a powerful tool for approximating continuous functions Cybenko (1989); Hornik et al. (1989) and sorting samples by discrete values of characters Goodfellow et al. (2016). For instance, it can be applied to the classification of the double-valued sign s𝐧s_{\mathbf{n}} for arbitrary basis |𝐧\rvert\mathbf{n}\rangle, when the expansion coefficient c𝐧c_{\mathbf{n}} in the ground-state wave function |ψ=𝐧c𝐧|𝐧\rvert\psi\rangle=\sum_{\mathbf{n}}c_{\mathbf{n}}\rvert\mathbf{n}\rangle for a real Hamiltonian consists of an amplitude a𝐧a_{\mathbf{n}} and a sign s𝐧s_{\mathbf{n}}. However, as FNN becomes deeper, its complexity grows, making it difficult to understand the sign rule and its connection to meaningful physics insights. To address this issue, we introduce shn-FNN, similar to previous shallow FNNs He et al. (2020); Yuan and Weng (2021) but distinct from recently developed operations in a compact latent space Iten et al. (2020); Wang et al. (2021).

Refer to caption
Figure 1: Shn-FNN is utilized to acquire the leading-order sign rule of a quantum state for LL qubits. The network consists of three layers: the input layer (black squares) with LL neurons, the hidden layer (blue hexagons) with 11 neuron, and the output layer (red circles) with 22 neurons. These layers are connected by two weight vectors 𝒘\bm{w} (blue lines) and 𝒘O\bm{w}_{O} (red lines). The hidden and output layers are activated using a cosine function and a softmax function, respectively. The sign determined by shn-FNN is positive if yO,1>yO,2y_{O,1}>y_{O,2} and negative otherwise.

The shn-FNN consists of an input, hidden, and output layer, as illustrated in Fig. 1. The configuration 𝐧\mathbf{n} is assigned to the input layer of shn-FNN by simply setting a LL-dimensional vector 𝒚I=𝐧\bm{y}_{I}=\mathbf{n}. The hidden layer contains one neuron that produces a one-dimensional vector yHy_{H}. The output layer consists of two neurons that form a one-hot vector 𝒚O=(yO,1,yO,2)\bm{y}_{O}=(y_{O,1},\ y_{O,2}). These three layers are connected by two weight vectors 𝒘\bm{w} and 𝒘O\bm{w}_{O}.

The activation function cosine is empirically chosen for the hidden layer so that

yH=cos(𝒘𝐧).\displaystyle y_{H}=\cos(\bm{w}\cdot\mathbf{n})\ . (8)

The vector 𝒚O\bm{y}_{O} in the output layer is determined by applying the softmax function, i.e.,

𝒚O=softmax(yH𝒘O),\displaystyle\bm{y}_{O}=\text{softmax}(y_{H}\bm{w}_{O})\ , (9)

where the weight vector 𝒘O=(wO,1\bm{w}_{O}=(w_{O,1}, wO,2)w_{O,2}) is fixed. The function softmax executes normalization by an exponential function to obtain probabilities, which is usually used in classification tasks Goodfellow et al. (2016). Specifically, the function softmax gives two neurons yO,1y_{O,1} and yO,2y_{O,2} in the output layer, given by

yO,1=ewO,1yHewO,1yH+ewO,2yH,yO,2=ewO,2yHewO,1yH+ewO,2yH.\displaystyle\begin{split}y_{O,1}&=\frac{e^{w_{O,1}y_{H}}}{e^{w_{O,1}y_{H}}+e^{w_{O,2}y_{H}}}\ ,\\ y_{O,2}&=\frac{e^{w_{O,2}y_{H}}}{e^{w_{O,1}y_{H}}+e^{w_{O,2}y_{H}}}\ .\end{split} (10)

In this work, we choose 𝒘O=(1,1)\bm{w}_{O}=(1,-1), so that ewO,1yHe^{w_{O,1}y_{H}} and ewO,2yHe^{w_{O,2}y_{H}} are distinguishable. Thus, the desired sign s𝐧fnns^{\text{fnn}}_{\mathbf{n}} is determined as follows:

s𝐧fnn={+1foryO,1>yO,21foryO,1yO,2,\displaystyle s^{\text{fnn}}_{\mathbf{n}}=\left\{\begin{array}[]{lc}+1&\quad\text{for}\quad y_{O,1}>y_{O,2}\\ \\ -1&\quad\text{for}\quad y_{O,1}\leq y_{O,2}\end{array}\right.\ , (14)

which equivalently indicates the sign of yHy_{H}, i.e., s𝐧fnnSgn(yH)s^{\text{fnn}}_{\mathbf{n}}\equiv\text{Sgn}(y_{H}) in principle. Without one-hot representation, it has been proven that FNN performs worse since a categorization task turns into a regression task compulsively Goodfellow et al. (2016).

In summary, the shn-FNN representation for the sign rule corresponds to a function

s𝐧fnn=Sgn[cos(𝒘𝐧)].\displaystyle s^{\text{fnn}}_{\mathbf{n}}=\text{Sgn}\left[\cos(\bm{w}\cdot\mathbf{n})\right]\ . (15)

Compared to the leading-order sign rule (7) that is derived from the GWMF theory, we usually get

wl=ϕl+h~/NandN=l=1Lnl\displaystyle w_{l}=\phi_{l}+\tilde{h}/N\quad\text{and}\quad N=\sum^{L}_{l=1}n_{l} (16)

for models with constant NN. Thus, Eq. (15) is also called the leading-order sign rule.

III Data sets and training

Data sets. We use the exact diagonalization (ED) method to obtain the ground-state wave function. In the wave function, the sign s𝐧s_{\mathbf{n}} and the configuration 𝐧\mathbf{n} for a specified basis |𝐧\rvert\mathbf{n}\rangle constitute a sample in the data set 𝐓\mathbf{T}. Each sign s𝐧s_{\mathbf{n}} is encoded as a one-hot vector 𝒚(𝐧)=(y1(𝐧)\bm{y}^{(\mathbf{n})}=(y^{(\mathbf{n})}_{1}, y2(𝐧))y^{(\mathbf{n})}_{2}), which can only take two valid combinations:

𝒚(𝐧)={(1, 0)fors𝐧>0(0, 1)fors𝐧<0.\displaystyle\bm{y}^{(\mathbf{n})}=\left\{\begin{array}[]{lc}(1,\ 0)&\quad\text{for}\quad s_{\mathbf{n}}>0\\ \\ (0,\ 1)&\quad\text{for}\quad s_{\mathbf{n}}<0\end{array}\right.\ . (20)

After arranging the samples in descending order of amplitude a𝐧a_{\mathbf{n}}, we discard those with a𝐧<1015a_{\mathbf{n}}<10^{-15} to avoid any artificial effects caused by the limited numeric precision. The remaining 𝒩s\mathcal{N}_{s} samples in the data set 𝐓\mathbf{T} are divided into a training set 𝐓train\mathbf{T}_{\text{train}} and a testing set 𝐓test\mathbf{T}_{\text{test}} in a 4:14:1 ratio Goodfellow et al. (2016). Thus, the number of samples in the testing set 𝐓test\mathbf{T}_{\text{test}} is given by 𝒩test=𝒩s/5\mathcal{N}_{\text{test}}=\mathcal{N}_{s}/5.

Training. During the training scheme-I\rm{I}, we employ the back-propagation (BP) algorithm Rumelhart et al. (1986) to optimize the variables in the weight vector 𝒘\bm{w} while adaptively adjusting the learning rate using the Adam algorithm P and Ba (2014). The process aims to minimize the cross entropy, defined as

𝒮×={𝐧}(y1(𝐧)lnyO,1(𝐧)+y2(𝐧)lnyO,2(𝐧)),\displaystyle{\cal S}_{\times}=-\sum_{\{\mathbf{n}\}}\left(y^{(\mathbf{n})}_{1}\ln y^{(\mathbf{n})}_{O,1}+y^{(\mathbf{n})}_{2}\ln y^{(\mathbf{n})}_{O,2}\right)\ , (21)

which sums over samples in the entire training set. Here, the one-hot vector 𝒚O(𝐧)=(yO,1(𝐧)\bm{y}^{(\mathbf{n})}_{O}=(y^{(\mathbf{n})}_{O,1}, yO,2(𝐧))y^{(\mathbf{n})}_{O,2}) is the output of shn-FNN as the vector 𝐧\mathbf{n} is input.

We employ the mini-batch method based on the stochastic gradient descent (SGD) Goodfellow et al. (2016); Wilson and Martinez (2003) to reduce the huge computational costs. Instead of using the entire training set 𝐓train\mathbf{T}_{\text{train}} directly, we randomly select 𝒩step=100\mathcal{N}_{\text{step}}=100 samples to calculate the gradients of the weights wlw_{l} at each training step. In such a case, Eq. (21) only sums over selected 𝒩step\mathcal{N}_{\text{step}} samples. This method performs well in accuracy and speed, and the random selection helps prevent the “over-fitting” problem. In our program, we implement shn-FNN and the Adam optimization using the ML library “TensorFlow” Abadi et al. (2016).

To evaluate the performance of shn-FNN, we introduce the accuracy rate (AR), which is calculated as the ratio of the number of successfully classified samples 𝒩sc\mathcal{N}^{c}_{s} to the total number of samples 𝒩s\mathcal{N}_{s} in the data set 𝐓\mathbf{T}, i.e., AR=𝒩sc/𝒩s\text{AR}=\mathcal{N}^{c}_{s}/\mathcal{N}_{s}. To monitor the optimization process, we define two additional accuracy rates. At each training step, we calculate the number of samples successfully classified 𝒩stepc\mathcal{N}^{c}_{\text{step}}. Then, we evaluate the optimization by computing the accuracy rate ARstep=𝒩stepc/𝒩step\text{AR}_{\text{step}}=\mathcal{N}^{c}_{\text{step}}/\mathcal{N}_{\text{step}}. To provide a comprehensive evaluation, we utilize the testing set 𝐓test\mathbf{T}_{\text{test}} and define the accuracy rate ARtest=𝒩testc/𝒩test\text{AR}_{\text{test}}=\mathcal{N}^{c}_{\text{test}}/\mathcal{N}_{\text{test}}, where 𝒩testc\mathcal{N}^{c}_{\text{test}} represents the number of correctly classified samples in the testing set 𝐓test\mathbf{T}_{\text{test}}.

After each training step, we assess the convergence criterion ϵ\epsilon, which measures the absolute value of the difference between the accuracy rates ARtest\text{AR}_{\text{test}} obtained in the current step and the previous step. We halt the training process once the convergent criterion ϵ\epsilon falls below a threshold ϵ0=103\epsilon_{0}=10^{-3}.

Refer to caption
Figure 2: Following scheme-I\rm{I}, we train shn-FNN to acquire the leading-order sign rule (15) of the ground-state wave function for a generalized ferromagnetic (J<0J<0) Ising model (22) of L=16L=16 spins in a ring. The spin polarization is along the x-axis, i.e., 𝛀=x^\mathbf{\Omega}=\hat{x}. Throughout the training process, we monitor the cross entropy 𝒮×\mathcal{S}_{\times} (green) as well as three different accuracy rates: ARstep\text{AR}_{\text{step}} (black), ARtest\text{AR}_{\text{test}} (red) and AR (blue).

To exemplify how shn-FNN learns the sign rule, we study the ground state of a generalized Ising ring described by the Hamiltonian

H^Ising(J,𝛀)=Jl=1L(𝐒^l𝛀)(𝐒^l+1𝛀),\displaystyle\hat{H}_{\text{Ising}}(J,\ \mathbf{\Omega})=J\sum^{L}_{l=1}(\hat{\mathbf{S}}_{l}\cdot\mathbf{\Omega})(\hat{\mathbf{S}}_{l+1}\cdot\mathbf{\Omega})\ , (22)

where JJ represents the strength of the coupling, and 𝛀\mathbf{\Omega} determines the orientation of the spin polarization.

For the case of ferromagnetic coupling (J<0J<0) and spin polarization along the x-axis (𝛀=x^\mathbf{\Omega}=\hat{x}), we obtain the ground-state wave function for L=16L=16 spins, and then prepare the data sets 𝐓\mathbf{T}, 𝐓train\mathbf{T}_{\text{train}} and 𝐓test\mathbf{T}_{\text{test}}, as stated above. We initialize the weight vectors 𝒘\bm{w} and 𝒘O\bm{w}_{O} in shn-FNN and start the training process. As illustrated in Fig. 2, the cross entropy 𝒮×\mathcal{S}_{\times} rapidly decreases around the 170170-th training step. After approximately 200200 training steps, the cross entropy 𝒮×\mathcal{S}_{\times} tends towards stability, and the three accuracy rates ARstep\text{AR}_{\text{step}}, ARtrain\text{AR}_{\text{train}} and AR consistently reach a maximum value 11 or 100%100\%. We terminate the training process when the convergence criterion ϵ<ϵ0\epsilon<\epsilon_{0} is met. In addition to scheme-I\rm{I}, we will introduce scheme-II\rm{II} in Sec. IV.2.1, tailored for frustrated spin models and the Fermi-Hubbard ring later.

IV Qubit lattice models

Using shn-FNN, we analyze the leading-order sign rules for various ordered ground states in qubit lattice models, including non-frustrated spin models in Sec. IV.1, frustrated spin models in Sec. IV.2, and interacting fermions in Sec. IV.3.

IV.1 Non-frustrated spin models

Refer to caption
Figure 3: (a.1-d.1) The accuracy rate AR of the sign rule (15) as a function of the angle φ\varphi and phase angle h~\tilde{h}. In (a.1) and (b.1), the angle ϕl=φ\phi_{l}=\varphi, while in (c.1) and (d.1) ϕl=φl\phi_{l}=\varphi l. (a.2-d.2) Following scheme-I\rm I, the accuracy rate AR changes with the training step number during the training process. After full training, the optimized shn-FNN with AR=1\text{AR}=1 gives weights wlw_{l}. These weights are then converted to ϕl\phi_{l} and h~\tilde{h} using Eq. (16), which follow simple rules: ϕl=φ\phi_{l}=\varphi in (a.2) and (b.2), while ϕl=φl\phi_{l}=\varphi l in (c.2) and (d.2). Consequently, we mark coordinates φ\varphi and h~\tilde{h} in (a.1-d.1) by black open triangles for representing the optimized weights. Here, we investigate ground states in (a.1, a.2) a generalized ferromagnetic Ising ring with 𝛀=x^\mathbf{\Omega}=\hat{x}, (b.1, b,2) a ferromagnetic spin-1/21/2 XY ring, (c.1, c.2) a twisted ferromagnetic spin-1/21/2 XY ring (even parity), and (d.1, d.2) an antiferromagnetic spin-1/21/2 XY ring. The system size is L=16L=16.

IV.1.1 A generalized ferromagnetic Ising ring

For the case of J<0J<0 and 𝛀=x^\mathbf{\Omega}=\hat{x} in the model (22), the optimized shn-FNN with AR=1\text{AR}=1, as illustrated in Fig. 3(a.2), suggests that the weights wl=0w_{l}=0 in the sign rule (15). The physical interpretation of wlw_{l} can be understood from the connection between Eq. (15) and the leading-order sign rule (7). Using Eq. (16), wl=0w_{l}=0 can be converted to a combination of the angles ϕl=0\phi_{l}=0 and the phase angle h~=0\tilde{h}=0 in the sign rule (7). This demonstrates the presence of ferromagnetic order along the x-axis, according to the spin-coherent state representation (2). To assume that ϕl=φ\phi_{l}=\varphi in Eq. (16), we can visualize AR for the sign rule (15) in the (φ\varphi, h~\tilde{h}) plane, as shown in Fig. 3(a.1). The optimized weights wl=0w_{l}=0 are positioned at the coordinates of a maximum, i.e., φ=0\varphi=0 and h~=0\tilde{h}=0 (black open triangle).

IV.1.2 A ferromagnetic spin-1/21/2 XY ring

For a spin-1/21/2 XY ring with L=16L=16 sites, the Hamiltonian is given by

H^xyP(J)=12(l=1LJS^l+S^l+1+h.c.),\displaystyle\hat{H}^{P}_{\text{xy}}(J)=\frac{1}{2}\left(\sum^{L}_{l=1}J\hat{S}^{+}_{l}\hat{S}^{-}_{l+1}+\textrm{h.c.}\right)\ , (23)

where S^l±\hat{S}^{\pm}_{l} represent the flipping-up and flipping-down operators for a spin-1/21/2. The coupling strengths JJ in the xx and yy-axes are equal in this ring.

For the case of J<0J<0 shown in Fig. 3(b.2), the accuracy rate AR for shn-FNN reaches a perfect-classification limit of 11 after approximately 150150 training steps. In the optimized shn-FNN, the weights are given by wl=π/8w_{l}=\pi/8, which are equivalent to ϕl=π/8\phi_{l}=\pi/8 and h~=0\tilde{h}=0 according to Eq. (16). The result remains consistent with an in-plane ferromagnetic order, where all spins are confined to the xy-plane and aligned in the same polarization direction. Therefore, we still set ϕl=φ\phi_{l}=\varphi in Eq. (16), and plot the AR distribution in the (φ\varphi, h~\tilde{h}) plane in Fig. 3(b.1). We can easily find that the weights wl=π/8w_{l}=\pi/8 correspond to the coordinates φ=π/8\varphi=\pi/8 and h~=0\tilde{h}=0 (black open triangle). Since wl=π/8w_{l}=\pi/8 is uniform in space, the resulting sign rule (15) can be summarized as the Perron-Frobenius theorem Perron (1907); Frobenius (1909) by removing a global phase angle π\pi, when L=16L=16.

IV.1.3 A twisted ferromagnetic spin-1/21/2 XY ring

For a spin-1/21/2 XY ring with the ferromagnetic coupling strength J<0J<0 under the twisted boundary condition (TBC), an antiferromagnetic bond connects site-11 and site-LL in the Hamiltonian

H^xyT(J)=12(l=1L1JS^l+S^l+1JS^L+S^1+h.c.).\displaystyle\hat{H}^{T}_{\text{xy}}(J)=\frac{1}{2}\left(\sum^{L-1}_{l=1}J\hat{S}^{+}_{l}\hat{S}^{-}_{l+1}-J\hat{S}^{+}_{L}\hat{S}^{-}_{1}+\textrm{h.c.}\right)\ .\quad (24)

To achieve convergence of the accuracy rate AR=1\text{AR}=1, we train the shn-FNN for L=16L=16, as shown in Fig. 3(c.2). The optimized weights obtained from this training are given by

wl=πLl(L+1)π2L\displaystyle w_{l}=\frac{\pi}{L}l-\frac{(L+1)\pi}{2L} (25)

in the sign rule (15) for the ground-state wave function. We find Eq. (25) corresponds to a combination of the angles ϕl=πl/16\phi_{l}=\pi l/16 and the phase angle h~=17π/4\tilde{h}=-17\pi/4 according to Eq. (16). This suggests a gradually varying spin profile in space with a pitch angle of π/16\pi/16. In a similar manner, by setting ϕl=φl\phi_{l}=\varphi l in Eq. (16), we parameterize the sign rule (15) with two parameters φ\varphi and h~\tilde{h}. The AR distribution in the (φ\varphi, h~\tilde{h}) plane is plotted in Fig. 3(c.1), where the coordinates φ=π/16\varphi=\pi/16 and h~=17π/4\tilde{h}=-17\pi/4 are marked by a black open triangle.

The above result can be better understood through the following analysis. Under a rotation defined by the operators

𝒰^φ=l=1L^l(lφ)and^l(q)=eiqS^lz,\displaystyle\hat{\mathcal{U}}_{\varphi}=\prod^{L}_{l=1}\hat{\mathcal{R}}_{l}(l\varphi)\quad\text{and}\quad\hat{\mathcal{R}}_{l}(q)=e^{iq\hat{S}^{z}_{l}}\ ,\quad (26)

the twisting effect from the antiferromagnetic bond is absorbed into a gauge field J~=exp(iφ)\tilde{J}=\exp(i\varphi) in a new Hamiltonian H^xyP(JJ~){\hat{H}}^{P}_{\text{xy}}(J\tilde{J}), where φ=π/L\varphi=\pi/L. Meanwhile,

𝒰^φS^l±𝒰^φ=S^l±exp(ilφ).\displaystyle\hat{\mathcal{U}}^{\dagger}_{\varphi}{\hat{S}}^{\pm}_{l}\hat{\mathcal{U}}_{\varphi}={\hat{S}}^{\pm}_{l}\exp(\mp il\varphi)\ . (27)

Based on the arguement in App. A, it is found that the even-parity ground-state wave function |ψxyP(JJ~)\rvert\psi^{P}_{\text{xy}}(J\tilde{J})\rangle for the Hamiltonian H^xyP(JJ~){\hat{H}}^{P}_{\text{xy}}(J\tilde{J}) has positive signs, i.e., s𝐧>0s_{\mathbf{n}}>0. As a result, the ground-state wave function |ψxyT(J)\rvert\psi^{T}_{\text{xy}}(J)\rangle for the Hamiltonian H^xyT(J){\hat{H}}^{T}_{\text{xy}}(J) carries a nonzero complex phase factor due to the rotation 𝒰^φ\hat{\mathcal{U}}_{\varphi}. Specifically, it can be written as

|ψxyT(J)=𝒰^φ|ψxyP(JJ~).\displaystyle\rvert\psi^{T}_{\text{xy}}(J)\rangle=\hat{\mathcal{U}}_{\varphi}\rvert\psi^{P}_{\text{xy}}(J\tilde{J})\rangle\ . (28)

Thus, the real part of the wave function is given by

|ψxy,rT(J)=cos(𝒘𝐧)|ψxyP(JJ~),\displaystyle|\psi^{T}_{\text{xy},r}(J)\rangle=\cos(\bm{w}\cdot\mathbf{n})\rvert\psi^{P}_{\text{xy}}(J\tilde{J})\rangle\ , (29)

which is inversion-symmetric concerning the chain center. It is worth noting that although Eq. (29) implies the same sign rule (15) obtained from the GWMF theory, this analysis is rigorous for the twisted spin-1/21/2 XY ring.

IV.1.4 An antiferromagnetic spin-1/21/2 XY ring

For the case of antiferromagnetic coupling J>0J>0, the optimized shn-FNN suggests wl=πlπ/8w_{l}=\pi l-\pi/8 in the sign rule (15), as shown in Fig. 3(d.1). We find that 𝒘\bm{w} corresponds to a combination of the angles ϕl=πl\phi_{l}=\pi l and the phase angle h~=π\tilde{h}=-\pi, indicating the presence of Ne´\acute{e}el order. Remarkably, the sign rule defined by 𝒘\bm{w} is equivalent to MPR, where s𝐧=(1)NOs_{\mathbf{n}}=(-1)^{N_{O}}, and the quantity

NO=loddS^lz+1/2\displaystyle N_{O}=\sum_{l\in\text{odd}}\langle{\hat{S}}^{z}_{l}+1/2\rangle (30)

sums over all odd sites. Here, we parameterize the sign rule (15) with two parameters φ\varphi and h~\tilde{h}, by setting ϕl=φl\phi_{l}=\varphi l in Eq. (16). Fig. 3(d.1) illustrates the AR distribution in the (φ\varphi, h~\tilde{h}) plane, with the corresponding coordinates of the optimized 𝒘\bm{w} represented by a black open triangle, specifically φ=π\varphi=\pi and h~=π\tilde{h}=-\pi. This sign rule is well understood because the ground-state wave function for J>0J>0 is connected to the one for J<0J<0 by a π\pi-rotation operation

𝒰^π=l=1L^l(lπ)=lodd^l(π).\displaystyle\hat{\mathcal{U}}_{\pi}=\prod^{L}_{l=1}\hat{\mathcal{R}}_{l}(l\pi)=\prod_{l\in\text{odd}}\hat{\mathcal{R}}_{l}(\pi)\ . (31)

Hence, it is evident that the sign rules in the XY ring with perfect AR, shown in Sec. IV.1.2, IV.1.3, and IV.1.4, adhere to a standardized format of weights Eq. (16) by setting ϕl=φl\phi_{l}=\varphi l with specific pitch angles φ=0\varphi=0, π/L\pi/L and π\pi. This pitch angle φ\varphi is related to the profile of spins rotating in space and can be acquired by training shn-FNN.

IV.1.5 An antiferromagnetic Heisenberg ring

In a pure antiferromagnetic Heisenberg ring (AFHR) with equal nearest-neighboring antiferromagnetic couplings J1J_{1} in the xx, yy, and zz-axes, the spins at odd sites align anti-parallel to the spins at even sites according to GWMF. Even though the precise ground state behaves as the Tomonago-Luttinger liquid (TLL) Tomonaga (1950); Luttinger (1963); Haldane (1981), the optimized shn-FNN suggests MPR, which is consistent with previous studies Marshall (1955); Richter et al. (1994). We discuss the sign rule uniformly in the J1J_{1}-J2J_{2} AFHR in Sec. IV.2.1.

IV.2 Frustrated spin models

Refer to caption
Figure 4: Training shn-FNN to learn the leading-order sign rule (15) of the ground-state wave function for a J1J_{1}-J2J_{2} antiferromagnetic Heisenberg model (32) of L=16L=16 spins in a ring. The chosen ratio is α=0.9\alpha=0.9. (a) In scheme-I\rm{I}, all 𝒩s\mathcal{N}_{s} samples in the data set 𝐓\mathbf{T} are utilized for training shn-FNN. The training process is monitored by the cross entropy 𝒮×\mathcal{S}_{\times} (green) as well as three distinct accuracy rates: ARstep\text{AR}_{\text{step}} (black), ARtest\text{AR}_{\text{test}} (red) and AR (blue). (b) The samples are sorted in descending order of magnitude a𝐧a_{\mathbf{n}}. They are then regrouped into two data sets: 𝐓obey\mathbf{T}_{\text{obey}} that follows the sign rule proposed by the optimized shn-FNN obtained from training scheme-I\rm{I} (red), and 𝐓disobey\mathbf{T}_{\text{disobey}} that violates the sign rule (blue). (c) In scheme-II\rm{II}, the first 𝒩~s𝒩s\widetilde{\mathcal{N}}_{s}\leq\mathcal{N}_{s} samples in the data set 𝐓\mathbf{T} are used for training shn-FNN. The accuracy rate AR~\widetilde{\text{AR}} is plotted as a function of the selection rate β=𝒩~s/𝒩s\beta=\widetilde{\mathcal{N}}_{s}/\mathcal{N}_{s}. (d) The spatial distribution of the weight vector elements wlw_{l} in the sign rule (15) from training scheme-I\rm{I} (blue) and scheme-II\rm{II} (red) are shown.

IV.2.1 A J1J_{1}-J2J_{2} antiferromagnetic Heisenberg ring

When the antiferromagnetic next-nearest-neighboring (NNN) Heisenberg coupling J2>0J_{2}>0 is introduced, we investigate the behavior of the frustrated spin-1/21/2 J1J_{1}-J2J_{2} AFHR. The Hamiltonian for this system is given by

H^J1-J2=l=1L(J1𝐒^l𝐒^l+1+J2𝐒^l𝐒^l+2),\displaystyle\hat{H}_{J_{1}\text{-}J_{2}}=\sum_{l=1}^{L}(J_{1}\hat{\mathbf{S}}_{l}\cdot\hat{\mathbf{S}}_{l+1}+J_{2}\hat{\mathbf{S}}_{l}\cdot\hat{\mathbf{S}}_{l+2})\ , (32)

where α=J2/J1\alpha=J_{2}/J_{1} is a dimensionless ratio.

Using the techniques suggested in Sec. III and following training scheme-I\rm{I} exemplified in Fig. 2, we initially train shn-FNN with all 𝒩s\mathcal{N}_{s} samples in the data set 𝐓\mathbf{T}. For example, when the ratio α=0.9\alpha=0.9, the optimization of shn-FNN searches for the minima of the cross entropy 𝒮×\mathcal{S}_{\times} with extremely low efficiency. As illustrated in Fig. 4(a), three accuracy rates oscillate irregularly near the worst-performance limit of 0.50.5. After approximately 30003000 training steps, all accuracy rates suddenly increase and reach another plateau. During this phase, the cross entropy 𝒮×\mathcal{S}_{\times} exhibits random oscillation and fails to offer a meaningful gradient direction for updating the weight vector 𝒘\bm{w} in shn-FNN. Once the accurate rate AR reaches approximately 0.630.63, the weight vector 𝒘\bm{w} in the optimized shn-FNN (blue circles), as shown in Fig. 4(d), becomes difficult to interpret.

To address this issue, we sort samples in the descending order of amplitude a𝐧a_{\mathbf{n}}, as shown in Fig. 4(b), and we observe that correct classification in the data set 𝐓obey\mathbf{T}_{\text{obey}} and wrong classification in the data set 𝐓disobey\mathbf{T}_{\text{disobey}} are irregularly mixed. Instead, we adopt training scheme-II\rm{II}, where we use the first 𝒩~s𝒩s\widetilde{\mathcal{N}}_{s}\leq\mathcal{N}_{s} samples in the data set 𝐓\mathbf{T} to train shn-FNN. In Fig. 4(c), as we reduce the selection rate β=𝒩~s/𝒩s\beta=\widetilde{\mathcal{N}}_{s}/\mathcal{N}_{s}, the accuracy rate AR~=𝒩~sc/𝒩~s\widetilde{\text{AR}}=\widetilde{\mathcal{N}}^{c}_{s}/\widetilde{\mathcal{N}}_{s} approaches the perfect-classification limit of 11. With the optimized shn-FNN, 𝒩~sc\widetilde{\mathcal{N}}^{c}_{s} out of 𝒩~s\widetilde{\mathcal{N}}_{s} samples are correctly classified. The resulting weight vector 𝒘\bm{w}, shown in Fig. 4(d), exhibits a straight line in the sign rule (15), which will be used to demonstrate physical insight later.

Refer to caption
Figure 5: (a.1-a.5) The structure factor 𝒮k\mathcal{S}_{k} as a function of the momentum kk for a J1J_{1}-J2J_{2} antiferromagnetic Heisenberg ring (32) with L=16L=16. Five different values of the ratio α=J2/J1\alpha=J_{2}/J_{1} are considered: (a.1) 0.350.35, (a.2) 0.520.52, (a.3) 0.580.58, (a.4) 0.730.73, and (a.5) 1.21.2. (b) The optimized shn-FNN is trained using scheme-II, and the resulting weights Eq. (16) with ϕl=φl\phi_{l}=\varphi l gives the pitch angle φ\varphi. (c) Meanwhile, the corresponding accuracy rate AR (black) and the correct weight ρ\rho (red) are also shown. The plots demonstrate a series of level crossings at αMG=0.5\alpha_{\text{MG}}=0.5, αc10.53\alpha_{c1}\approx 0.53, αc20.64\alpha_{c2}\approx 0.64, and αDC0.99\alpha_{\text{DC}}\approx 0.99.

After conducting a systematical analysis of the ground states for L=16L=16 sites, we have discovered that the optimized shn-FNN proposes the sign rules with weights Eq. (16) by setting ϕl=φl\phi_{l}=\varphi l, where the phase angle φ=2pπ/L\varphi=2p\pi/L, as shown in Fig. 5(b). The integer number pp ranges from L/2L/2 to L/4L/4, which allows us to divide the broad regime α[0, 1.2]\alpha\in[0,\ 1.2] into five intervals. Within each interval, the value of φ\varphi plays the role of the commensurate/incommensurate pitch angle White and Affleck (1996); Soos et al. (2016). In Fig. 5(a.1-a.5), we observe that the double peaks in the structure factor

𝒮k=1L2l,leik(ll)𝐒^l𝐒^l\displaystyle{\cal S}_{k}=\frac{1}{L^{2}}\sum_{l,l^{\prime}}e^{ik(l-l^{\prime})}\langle\hat{\mathbf{S}}_{l}\cdot\hat{\mathbf{S}}_{l^{\prime}}\rangle (33)

are located at the momenta k=±φk=\pm\varphi. Due to the interplay of interactions, the accuracy rate AR<1\text{AR}<1 and the correct weight

ρ=𝐧𝐓obey|a𝐧|2<1\displaystyle\rho=\sum_{\mathbf{n}\in\mathbf{T}_{\text{obey}}}|a_{\mathbf{n}}|^{2}<1 (34)

are shown in Fig. 5(c), where the data set 𝐓obey\mathbf{T}_{\text{obey}} includes samples obeying the leading-order sign rule (15). Besides, we find that the correct weight ρ\rho in the whole parameter regime is close to 11, which means that most of the samples with the largest amplitudes obey the leading-order sign rule, so the proposed scheme-II\rm{II} works well.

The investigation of the sign rule provides consistent physical insights. At the Majumdar-Ghosh (MG) point αMG=0.5\alpha_{\text{MG}}=0.5, and for φ=π\varphi=\pi within the range α[0,αMG]\alpha\in[0,\ \alpha_{\text{MG}}], the leading-order sign rule is in accordance with MPR. As the ratio α\alpha approaches infinity, one of the decoupled chains, composed of odd or even sites, individually follows MPR. However, away from that limit, a relatively tiny positive J1J_{1} promotes a stable commensurate spin order with a pitch angle φ=π/2\varphi=\pi/2. When the ratio α<αDC0.99\alpha<\alpha_{\text{DC}}\approx 0.99, commensurability is disrupted due to the emergence of triplet defects White and Affleck (1996); Soos et al. (2016). Between αMG\alpha_{\text{MG}} and αDC\alpha_{\text{DC}}, the ground state undergoes an incommensurate crossover White and Affleck (1996); Soos et al. (2016), which is indicated by the varying pitch angle φ\varphi in the weights of the leading-order sign rule (15), as shown in Fig. 5(b).

Besides, the ground state maintains the translation symmetry with a conserved momentum of either 0 or π\pi, depending on the integer number pp, so that ϕl=φl\phi_{l}=\varphi l in the leading-order sign rule (7). Moreover, the center inversion symmetry of the chain imposes a constraint of h~=pπ/2\tilde{h}=p\pi/2. Consequently, in the equivalent sign rule (15), we choose the activation function sine/cosine for odd/even values of pp.

Refer to caption
Figure 6: The sign-fidelity susceptibility density χf\chi_{f} for the ground state is examined in the region α[0\alpha\in[0, αMG)\alpha_{\text{MG}}) for different system sizes L=8L=8, 1212, 1616, 2020, 2222 and 2424 in the J1J_{1}-J2J_{2} antiferromagnetic Heisenberg ring (32).

To quantitatively assess the violation of MPR when α[0,αMG)\alpha\in[0,\ \alpha_{\text{MG}}), we introduce a sign-fidelity

f=ψMPR|ψ.\displaystyle f=\langle\psi^{\text{MPR}}|\psi\rangle\ . (35)

Here, we define the MPR state |ψMPR|\psi^{\text{MPR}}\rangle as

|ψMPR={𝐧}s𝐧MPRa𝐧|𝐧,\displaystyle\rvert\psi^{\text{MPR}}\rangle=\sum_{\{\mathbf{n}\}}s^{\text{MPR}}_{\mathbf{n}}a_{\mathbf{n}}\rvert\mathbf{n}\rangle\ , (36)

where the sign s𝐧MPRs^{\text{MPR}}_{\mathbf{n}} fully satisfies MPR. Thus, we get f=2ρ1f=2\rho-1 Retzlaff et al. (1993); Richter et al. (1994); Zeng and Parkinson (1995). In the vicinity of a continuous transition point, the minimum sign-fidelity ff or correct weight ρ\rho is expected to be achieved, indicating the most complicated sign rule Cai and Liu (2018); Westerhout et al. (2020). Like the orthogonalization catastrophe for free fermions Anderson (1967), fidelity follows a pow-law function of the system size LL. In principle, the relevant sign-fidelity susceptibility density, given by

χf=(lnf)/L,\displaystyle\chi_{f}=-(\ln f)/L\ , (37)

is capable of identifying the places of continuous transition points Gu (2010). However, the maximum of χf\chi_{f} is located at αpeak0.43>αBKT0.241\alpha_{\text{peak}}\approx 0.43>\alpha_{\text{BKT}}\approx 0.241 in the dimerized (DM) region Wang et al. (2009), where χf\chi_{f} approaches a LL-independent function as shown in Fig. 6. It is possibly caused by the anomalous behavior of the exponential closure of gaps at the famous Berzinskii-Kosterlitz-Thouless (BKT) transition point αBKT\alpha_{\text{BKT}} Cincio et al. (2019).

IV.2.2 A spin-1/21/2 triangular XY antiferromagnet on a torus

Shn-FNN can learn the leading-order sign rules for the ground-state wave function of 22D quantum models, such as the XY model on triangular lattices with a size of Lx×LyL_{x}\times L_{y} sites, as shown in Fig. 7(a). The corresponding Hamiltonian for the model reads

H^=12l,l(S^l+S^l+h.c.),\displaystyle{\hat{H}}_{\triangle}=\frac{1}{2}\sum_{\langle l,l^{\prime}\rangle}(\hat{S}^{+}_{l}\hat{S}^{-}_{l^{\prime}}+\textrm{h.c.})\ , (38)

and \langle\rangle sums over all nearest-neighboring sites ll and ll^{\prime}. In the XC geometry, the lattice site labeled as l=lyLx+lx+1l=l_{y}L_{x}+l_{x}+1, is identified by binary indices (lx,ly)(l_{x},l_{y}) with lx=0l_{x}=0, \cdots, Lx1L_{x}-1 and ly=0l_{y}=0, \cdots, Ly1L_{y}-1. The displacement for the site is given by 𝐫l\mathbf{r}_{l}.

Refer to caption
Figure 7: (a) The spin-1/21/2 XY antiferromagnet (38) with the XC geometry on a torus. a1a_{1} and a2a_{2} denote two primitive vectors. All sites are labeled with a single index ll. (b) For the geometry 3×43\times 4, we employ scheme-II\rm{II} to train shn-FNN for the ground-state wave function. The spatial distribution of weights wlw_{l} in the optimized shn-FNN is plotted.
Refer to caption
Figure 8: Regarding the ground state of the spin-1/21/2 XY antiferromagnet on a torus (38), we demonstrate (a) the accuracy rate AR (black) and (b) correct weight ρ\rho (red) for the sign rule (15) with weights defined in Eq. (39). Both are shown as a function of the geometry Lx×LyL_{x}\times L_{y}. It is worth noting that the optimized shn-FNN indicates that h~=0\tilde{h}=0 for even (Ly/2)(L_{y}/2), but h~=π/2\tilde{h}=\pi/2 for odd (Ly/2)(L_{y}/2).

To ensure an exact hit at relevant high-symmetry momentum points 𝐊±\mathbf{K}^{\pm} in the first Brillouin zone, the length LxL_{x} is chosen as a multiple of 33. Following training scheme-II\rm{II}, the weights w(lx,ly)w_{(l_{x},l_{y})} in the leading-order sign rule (15) are determined by the optimized shn-FNN. Specifically, we get

w(lx,ly)=2π3(lx+[ly]),\displaystyle w_{(l_{x},l_{y})}=\frac{2\pi}{3}(l_{x}+[l_{y}])\ , (39)

where [ly]=1[l_{y}]=1 if lyl_{y} is even and 0 otherwise, as illustrated in Fig. 7(b). This result matches the physical scenario of the coplanar 120120^{\circ} order Bach et al. (2021) observed in previous studies, where the angle between spin polarization orientations at neighboring sites is always 2π/32\pi/3.

Moreover, the ground state possesses point group symmetries of the torus. These symmetry operations listed in table 1 carry eigenvalues of +1+1 or 1-1, corresponding to the symmetric/even or antisymmetric/odd sector of the group representation in mathematics.

Table 1: The symmetry operations are measured on the ground-state wave function for the spin-1/21/2 triangular XY antiferromagnet on a torus. The table includes the translation by a site 𝒯x{\cal T}_{x} in the x-axis and 𝒯y{\cal T}_{y} in the y-axis, mirror inversion x{\cal M}_{x} about the x-axis and y{\cal M}_{y} about the y-axis, center inversion c{\cal I}_{c}.
Lx×LyL_{x}\times L_{y} 𝒯x\langle{\cal T}_{x}\rangle 𝒯y\langle{\cal T}_{y}\rangle x\langle{\cal M}_{x}\rangle y\langle{\cal M}_{y}\rangle c\langle{\cal I}_{c}\rangle
3×43\times 4 +1+1 +1+1 +1+1 +1+1 +1+1
3×63\times 6 +1+1 +1+1 +1+1 1-1 1-1
3×83\times 8 +1+1 +1+1 +1+1 +1+1 +1+1
6×46\times 4 +1+1 +1+1 +1+1 +1+1 +1+1
3×103\times 10 +1+1 +1+1 +1+1 1-1 1-1

Let us discuss the mirror inversion y{\cal M}_{y} about the y-axis. Under y{\cal M}_{y}, the basis |𝐧|\mathbf{n}\rangle becomes |𝐧|\mathbf{n}^{\prime}\rangle but the sign is unchanged, so we have

𝒘𝐧=Lyπ𝒘𝐧modulo2π.\displaystyle\bm{w}\cdot\mathbf{n}^{\prime}=L_{y}\pi-\bm{w}\cdot\mathbf{n}\quad\text{modulo}\quad 2\pi\ . (40)

For even (Ly/2)(L_{y}/2), such as the 3×43\times 4 lattice, the y{\cal M}_{y}-symmetric ground-state wave function leads to h~=0\tilde{h}=0 in the leading-order sign rule (7), or equivalently the activation function cosine in Eq. (15), which obeys

Sgn[cos(𝒘𝐧)]=Sgn[cos(𝒘𝐧)].\displaystyle\text{Sgn}[\cos(\bm{w}\cdot\mathbf{n})]=\text{Sgn}[\cos(\bm{w}\cdot\mathbf{n^{\prime}})]\ . (41)

In contrast, for odd (Ly/2)(L_{y}/2), e.g., the geometry of 3×63\times 6 lattices, the y{\cal M}_{y}-antisymmetric ground-state wave function prefers h~=π/2\tilde{h}=\pi/2 in Eq. (7) and the sine function in Eq. (15). This difference is captured by shn-FNN in Fig. 8.

Furthermore, based on the mean-field picture of spinless Dirac fermions coupled to Chern-Simons gauge fields Wang et al. (2018); Sedrakyan et al. (2020), it has been shown that for different lattice geometries with finite LxL_{x} and LyL_{y}, non-condensed BCS pairs of spinons from high symmetry points 𝐊±\mathbf{K}^{\pm} would violate the leading-order sign rule, where both AR and ρ\rho deviate from 11. However, a more nuanced understanding of the subtle relationship between lattice geometry and the deviation from GWMF still needs to be included.

IV.3 A Fermi-Hubbard ring

The Fermi-Hubbard model is a simple model that describes the physics in strongly correlated electron systems, which is closely connected to quantum magnetism, metal-insulator transition, and the promising theory of high-temperature superconductivity Henderson et al. (1992); Essler et al. (2005); Moriya (2012). In a ring, the Hamiltonian for two-species fermions can be written as

H^F=l=1L[tσ(c^l,σc^l+1,σ+h.c.)+Un^l,n^l,],\displaystyle\hat{H}_{\text{F}}=\sum^{L}_{l=1}\left[-t\sum_{\sigma}(\hat{c}^{{\dagger}}_{l,\sigma}\hat{c}_{l+1,\sigma}+{\rm h.c.})+U\hat{n}_{l,\uparrow}\hat{n}_{l,\downarrow}\right]\ ,\quad (42)

where c^l,σ\hat{c}^{{\dagger}}_{l,\sigma}, c^l,σ\hat{c}_{l,\sigma} and n^l,σ=c^l,σc^l,σ\hat{n}_{l,\sigma}=\hat{c}^{{\dagger}}_{l,\sigma}\hat{c}_{l,\sigma} represent the creation, annihilation and particle number operators of fermion at site-ll respectively, σ=\sigma=\uparrow, \downarrow denotes the spin polarization, t>0t>0 is the hopping amplitude between two nearest-neighboring sites, and UU is the onsite coulomb repulsion.

Refer to caption
Figure 9: (a) In the noninteracting limit U/t=0U/t=0, we examine the double-even-parity ground state for the Fermi-Hubbard ring (42). We train shn-FNN by scheme-II\rm{II}. In the optimized shn-FNN, we plot the weights wlw_{l}, which match Eq. (47). (b) When U/t=0.1U/t=0.1 and N=2N_{\downarrow}=2, we plot the accuracy rate AR of the sign rule s𝐧,𝐧=Sgn[cos(𝒘𝐧+ϑ)]s_{\mathbf{n}_{\uparrow},\mathbf{n}_{\downarrow}}=\text{Sgn}[\cos(\bm{w}\cdot\mathbf{n}+\vartheta)] as a function of the variational angle ϑ\vartheta for the ground state. The maximum AR value occurs at ϑ=0\vartheta=0. The system size is given by L=16L=16.

In the Fock space, each basis is a product of the bases for two species, that is,

|𝐧,𝐧=[l=1L|nl,][l=1L|nl,].\displaystyle|\mathbf{n}_{\uparrow},\mathbf{n}_{\downarrow}\rangle=\left[\bigotimes^{L}_{l=1}|n_{l,\uparrow}\rangle\right]\left[\bigotimes^{L}_{l=1}|n_{l,\downarrow}\rangle\right]\ . (43)

Here, we define the vectors 𝐧σ=(n1,σ,,nL,σ)\mathbf{n}_{\sigma}=(n_{1,\sigma},\ \cdots,\ n_{L,\sigma}) for species-σ\sigma. Under the conventional Jordan-Wigner transformation Derzhko (2001), the two-channel spin flipping operators S^l,σ±\hat{S}^{\pm}_{l,\sigma} can be represented by fermion operators as follows:

S^l,σ+=c^l,σeiπk<ln^k,σ,S^l,σ=c^l,σeiπk<ln^k,σ.\displaystyle\begin{split}\hat{S}^{+}_{l,\sigma}&=\hat{c}^{\dagger}_{l,\sigma}e^{\phantom{-}i\pi\sum_{k<l}\hat{n}_{k,\sigma}}\ ,\\ \hat{S}^{-}_{l,\sigma}&=\hat{c}_{l,\sigma}e^{-i\pi\sum_{k<l}\hat{n}_{k,\sigma}}\ .\end{split} (44)

Thus, we get a two-leg spin-1/21/2 ladder

H^Ladder=σH^,σ+H^,\displaystyle\hat{H}_{\text{Ladder}}=\sum_{\sigma}\hat{H}_{\parallel,\sigma}+\hat{H}_{\perp}\ , (45)

where

H^,σ=t[l=1L1S^l,σ+S^l+1,σ+(1)N^σ1S^L,σ+S^1,σ+h.c.],\displaystyle\hat{H}_{\parallel,\sigma}=\!-t\left[\sum^{L-1}_{l=1}{\hat{S}}^{+}_{l,\sigma}{\hat{S}}^{-}_{l+1,\sigma}\!+\!(-1)^{\hat{N}_{\sigma}-1}{\hat{S}}^{+}_{L,\sigma}{\hat{S}}^{-}_{1,\sigma}\!+\!{\rm h.c.}\right]\ ,
H^=Ul=1L(S^l,z+1/2)(S^l,z+1/2).\displaystyle\hat{H}_{\perp}=U\sum^{L}_{l=1}(\hat{S}^{z}_{l,\uparrow}+1/2)(\hat{S}^{z}_{l,\downarrow}+1/2)\ . (46)

denote the transverse and longitudinal parts, respectively. The particle number operator in total for species-σ\sigma is given by N^σ=l=1Ln^l,σ\hat{N}_{\sigma}=\sum^{L}_{l=1}\hat{n}_{l,\sigma}. We are interested in the ground state for the case of N+N=LN_{\uparrow}+N_{\downarrow}=L, and even Nσ=N^σN_{\sigma}=\langle\hat{N}_{\sigma}\rangle.

Refer to caption
Figure 10: (a) The accuracy rate AR of the sign rule (50) as a function of NN_{\downarrow} when we choose L=16L=16. (b) The same accuracy rate as a function of LL when we choose N=2N_{\downarrow}=2. Here, we concern the ground state for the Fermi-Hubbard ring (42) with U/t=0.1U/t=0.1.

When U=0U=0, TBC is effectively applied to two decoupled chains in the spin-ladder model, as both NN_{\uparrow} and NN_{\downarrow} are even. For each species, when the parity of the ground state is even, the optimized shn-FNN with perfect AR can identify the leading-order sign rule given by

s𝐧σ(e)=Sgn[cos(𝒘𝐧σ)]\displaystyle s^{(e)}_{\mathbf{n}_{\sigma}}=\text{Sgn}[\cos(\bm{w}\cdot\mathbf{n}_{\sigma})] (47)

with the weights

wl=(l1)πL+π2π2L,\displaystyle w_{l}=-\frac{(l-1)\pi}{L}+\frac{\pi}{2}-\frac{\pi}{2L}\ , (48)

depicted in Fig. 9(a). For the degenerate ground state with odd parity, the function cosine is replaced by the function sine, i.e.,

s𝐧σ(o)=Sgn[sin(𝒘𝐧σ)].\displaystyle s^{(o)}_{\mathbf{n}_{\sigma}}=\text{Sgn}[\sin(\bm{w}\cdot\mathbf{n}_{\sigma})]\ . (49)

For small U/t>0U/t>0 and any NσN_{\sigma}, the parity of the unique ground state is always even. In the case of U/t=0.1U/t=0.1, N=2N_{\downarrow}=2 and L=16L=16, as illustrated in Fig. 9(b), the optimized shn-FNN suggests a sign rule

s𝐧,𝐧=Sgn[cos(𝒘𝐧)],\displaystyle s_{\mathbf{n}_{\uparrow},\mathbf{n}_{\downarrow}}=\text{Sgn}[\cos(\bm{w}\cdot\mathbf{n})]\ , (50)

with the accuracy rate AR0.97\text{AR}\approx 0.97, where the vector is defined as

𝐧=𝐧+𝐧=(n1,+n1,,,nL,+nL,).\displaystyle\mathbf{n}=\mathbf{n}_{\uparrow}+\mathbf{n}_{\downarrow}=(n_{1,\uparrow}+n_{1,\downarrow},\ \cdots,\ n_{L,\uparrow}+n_{L,\downarrow})\ . (51)

So, the resulting leading-order sign rule for the Fermi-Hubbard model remains consistent with the sign rule (15).

The leading-order sign rule (50) is robust and less dependent on the filling fraction and system size LL. In the case of U/t=0.1U/t=0.1 and L=16L=16 (Fig. 10), the accuracy rate AR is greater than 0.9680.968 for different NN_{\downarrow}. Additionally, as LL grows, the accuracy rate AR for N=2N_{\downarrow}=2 gets closer to 0.990.99.

Refer to caption
Figure 11: The correct weight ρ\rho of the sign rule (50) as a function of U/tU/t in the Fermi-Hubbard ring (42). We concern the ground state with N=2N_{\downarrow}=2, and four different system sizes are investigated: L=8L=8 (green), 1212 (blue), 1616 (black), and 2020 (red).

In the limit of large UU, only single occupations can exist in the ground state because of a considerable charge gap. As a result, spin fluctuations in the reduced Hilbert space of either spin-up c^l,|0\hat{c}^{\dagger}_{l,\uparrow}\rvert 0\rangle or spin-down c^l,|0-\hat{c}^{\dagger}_{l,\downarrow}\rvert 0\rangle are described by the effective antiferromagnetic Heisenberg ring. The ground state for the effective model follows the MPR sign rule exactly. Returning to the fermion bases, it is easy to prove that the weights wlw_{l} are the same as ones in the leading-order sign rule at U=0U=0. We can observe that the corresponding correct weight ρ\rho approaches 11 when U/t8U/t\geq 8, as shown in Fig. 11.

According to the Bethe ansatz solution Lieb and Wu (1968), the Fermi liquid only survives at U=0U=0 in the thermodynamical limit (TDL). However, because of a tiny charge gap close to U=0U=0, fermions exhibit behavior like a Fermi liquid in the ground state for system size limited to L20L\leq 20, much smaller than the correlation length. Consequently, a quasi-critical point is indicated by the minimum of the correct weight ρ\rho, where the strong quantum fluctuations would strongly violate the leading-order sign rules. As LL grows in Fig. 11, the quasi-critical point gradually approaches U=0U=0.

In an alternative definition of bases, that is,

|𝐧=l=1L[|nl,|nl,],\displaystyle\rvert\mathbf{n}\rangle=\bigotimes^{L}_{l=1}\bigg{[}\rvert n_{l,\uparrow}\rangle\rvert n_{l,\downarrow}\rangle\bigg{]}\ , (52)

the Jordan Wigner transformation changes accordingly, and an additional nonlinear appendix

(1)l=2Lnl,k=1l1nk,\displaystyle(-1)^{\sum^{L}_{l=2}n_{l,\uparrow}\sum^{l-1}_{k=1}n_{k,\downarrow}} (53)

exists in front of the predicted sign rules. However, this appendix can not be expressed in shn-FNN.

V Summary and discussions

We have successfully developed a Gutzwiller mean-field theory of sign rules for the ordered ground states in qubit lattice models, which perfectly matches the sign predicted by a shallow FNN with a single hidden neuron, called shn-FNN. By utilizing this principle, we provide a consistent explanation for the excellent performance of activation functions in the neural network and offer a vivid interpretation of the sign rule represented by FNN.

We systematically test our theory on various spin models and the Fermi-Hubbard ring. For non-frustrated spin-1/21/2 models, such as a generalized Ising ring, (twisted) XY rings, and an antiferromagnetic Heisenberg ring, the sign rules for ground states with magnetic orders can be fully captured by shn-FNN, where the accurate rate of the prediction can archive 11 exactly. However, in the case of frustrated models where interactions compete, the complexity of sign rules for ground states is significantly enhanced, reducing prediction accuracy. Nonetheless, the leading-order sign rules obtained by optimizing shn-FNN still provide a visual scenario of orders in spins, with the characteristic weight vector closely related to pitch angles. In the Fermi-Hubbard ring, we can obtain a unified sign rule by selecting suitable bases.

GWMF may not be suitable for 1D models since quantum fluctuations tend to destroy long-range orders. However, our current work presents a fresh perspective by demonstrating that GWMF can effectively capture the leading-order sign rule in the wave function, where fluctuations in amplitudes are erased. Our theory is a simple starting point by removing short-range details in ordered states. It would be intriguing to explore the information encoded in high-order microscopic processes instead of focusing solely on the leading-order ones. Of course, the theory for general lattice models also deserves profound studies in the future.

VI Acknowlegement

We thank Tao Li, Rui Wang, Ji-Lu He, Wei Su, and Wei Pan for the grateful discussion. S. H. acknowledges funding from the Ministry of Science and Technology of China (Grant No. 2022YFA1402700) and the National Science Foundation of China (Grants No. 12174020). Z. P. Y. acknowledges funding from the National Science Foundation of China (Grants No. 12074041). S. H. and K. X. further acknowledge support from Grant NSAF-U2230402. The computations were performed on the Tianhe-2JK at the Beijing Computational Science Research Center (CSRC) and the high-performance computing cluster of Beijing Normal University in Zhuhai.

Appendix A Sign rule for the twisted ferromagnetic spin-1/21/2 XY rings

Here we prove that the even-parity ground-state wave function for the Hamiltonian H^xyP(JJ~)\hat{H}^{P}_{\text{xy}}(J\tilde{J}) with J<0J<0, mentioned in Sec. IV.1.3 of the main text, has positive signs s𝐧>0s_{\mathbf{n}}>0.

Under the Jordan-Wigner transformation Derzhko (2001), the spin operators can be represented by fermion operators as follows:

S^l+=c^leiπk<ln^kandS^l=c^leiπk<ln^k\displaystyle\hat{S}^{+}_{l}=\hat{c}^{\dagger}_{l}e^{i\pi\sum_{k<l}\hat{n}_{k}}\ \text{and}\ \hat{S}^{-}_{l}=\hat{c}_{l}e^{-i\pi\sum_{k<l}\hat{n}_{k}}\ (54)

where c^l\hat{c}_{l}, c^l\hat{c}^{\dagger}_{l}, and n^l=c^lc^l\hat{n}_{l}=\hat{c}^{\dagger}_{l}\hat{c}_{l} represent the annihilation, creation, and particle number operators of fermion, respectively. As a result, the Hamiltonian H^xyP(JJ~)\hat{H}^{P}_{\text{xy}}(J\tilde{J}) can be transformed into the one for spinless free fermions defined as

H^FP/T(J)=J(l=1L1c^lc^l+1±c^Lc^1)+h.c..\displaystyle\hat{H}^{P/T}_{F}(J)=J\left(\sum^{L-1}_{l=1}\hat{c}^{\dagger}_{l}\hat{c}_{l+1}\pm\hat{c}^{\dagger}_{L}\hat{c}_{1}\right)+{\rm h.c.}. (55)

The selection of periodic or twisted boundary conditions in the fermion model depends on whether the particle number NN is odd or even.

For odd NN, the single-particle levels are described by plane waves exp(ikml)\exp(ik_{m}l) with discrete momenta km=2mπ/Lk_{m}=2m\pi/L, where the integer mm ranges from 0 to L1L-1. The energy ϵm\epsilon_{m} for the mm-th single-particle level follows a formula ϵm=2Jcos(km+φ)\epsilon_{m}=2J\cos(k_{m}+\varphi) with the phase angle φ=π/L\varphi=\pi/L. At half-filling N=L/2N=L/2, the ground state selects single-particle levels with the integer m(L/4,L/4]m\in(-L/4,\ L/4]. Therefore, the ground-state wave function for the Hamiltonian H^FP(JJ~)\hat{H}^{P}_{F}(J\tilde{J}), seen as a determinant of selected plane waves, is the same as the one for the other Hamiltonian H^FP(J)\hat{H}^{P}_{F}(J) at half-filling. Similarly, when NN is even, the ground-state wave function for the Hamiltonian H^FP(JJ~)\hat{H}^{P}_{F}(J\tilde{J}) is the same as the one for the Hamiltonian H^FT(J)\hat{H}^{T}_{F}(J) as well. In conclusion, both Hamiltonian H^FP/T(J)\hat{H}^{P/T}_{F}(J) for odd/even NN can be transformed back to the unique Hamiltonian H^xyP(J)\hat{H}^{P}_{\text{xy}}(J) through the inverse Jordan-Wigner transformation, where the even-parity ground-state wave function always has positive signs, according to the Perron-Frobenius theorem Perron (1907); Frobenius (1909).

References