This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Machine-learning the spectral function of a hole in a quantum antiferromagnet

Jackson Lee Physics and Computer Science Department, Rutgers University, New Brunswick, New Jersey 08854, USA Condensed Matter Physics and Materials Science Division, Brookhaven National Laboratory, Upton, New York 11973, USA    Matthew R. Carbone Computational Science Initiative, Brookhaven National Laboratory, Upton, New York 11973, USA    Weiguo Yin [email protected] Condensed Matter Physics and Materials Science Division, Brookhaven National Laboratory, Upton, New York 11973, USA
Abstract

Understanding charge motion in a background of interacting quantum spins is a fundamental problem in quantum many-body physics. The most extensively studied model for this problem is the so-called tt-tt^{\prime}-t′′t^{\prime\prime}-JJ model, where the determination of the parameter tt^{\prime} in the context of cuprate superconductors is challenging. Here we present a theoretical study of the spectral functions of a mobile hole in the tt-tt^{\prime}-t′′t^{\prime\prime}-JJ model using two machine learning techniques: K-nearest Neighbors regression (KNN) and a feed-forward neural network (FFNN). We employ the self-consistent Born approximation to generate a dataset of about 1.3×1051.3\times 10^{5} spectral functions. We show that for the forward problem, both methods allow for the accurate and efficient prediction of spectral functions, allowing for e.g. rapid searches through parameter space. Furthermore, we find that for the inverse problem (inferring Hamiltonian parameters from spectra), the FFNN can, but the KNN cannot, accurately predict the model parameters using merely the density-of-state. Our results suggest that it may be possible to use deep learning methods to predict materials parameters from experimentally measured spectral functions.

I Introduction

Understanding charge motion in a background of interacting quantum spins has been considered an essential first step in search for the mechanisms of superconductivity in many unconventional materials ranging from cuprates Dagotto (1994); Lee et al. (2006); Schmitt-Rink et al. (1988) to iron-based superconductors Johnson et al. (2015); Yin et al. (2010) and to twisted bilayer graphene Cao et al. (2018). This topic has recently received renewed interest thanks to recent novel experiments using ultracold atoms in optical lattices, as they provide an essentially perfect realization of the Fermi-Hubbard model, with site-resolved imaging ability Ji et al. (2021); Koepsell et al. (2021, 2019); Bohrdt et al. (2019); Chiu et al. (2019); Brown et al. (2019); Mazurenko et al. (2017); Nyhegn et al. (2022). The most extensively studied model for this problem is the so-called tt-JJ-type model Zhang and Rice (1988). Its applicability to cuprates was established by comparing the model study results with various experiments, most notably angle-resolved photoemission spectroscopy (ARPES), which can specifically yield the spectral function of holes introduced by photoemission of electrons Damascelli et al. (2003); Marshall et al. (1996); Eder et al. (1997); Kim et al. (1998); Yin et al. (1998). The single-hole problem corresponds to photoemission from an undoped Mott insulator, such as Sr2CuO2Cl2 (a parent compound of cuprates), where besides the nearest-neighbor hole hopping parameter (tt), the second and third nearest-neighbor hopping parameters (tt^{\prime} and t′′t^{\prime\prime}) are found to be necessary to reproduce the correct quasiparticle dispersion relation E(𝐤)E(\mathbf{k}) Wells et al. (1995); Ronning et al. (2003); Nazarenko et al. (1995); Kyung and Ferrell (1996); Xiang and Wheatley (1996); Yin and Ku (2009a); Belinicher et al. (1996); Leung et al. (1997); Lee and Shih (1997). While the need for longer hopping parameters is justified by first-principles analysis of the in-crystal overlapping of electronic wave functions Yin and Ku (2009a); Pavarini et al. (2001), it was uncovered Yin and Ku (2009a); Belinicher et al. (1996) that the determination of tt^{\prime} via fitting E(𝐤)E(\mathbf{k}) is inconclusive because E(𝐤)E(\mathbf{k}) could be insensitive to tt^{\prime} varying from 0 to 0.3t-0.3t [see Fig. 1(a)]. This is a relevant problem since the value of tt^{\prime} was shown to correlate with the superconducting transition temperature at optimal doping Pavarini et al. (2001) and affect phase competition Yu et al. (2017) in the cuprates.

Another drawback of this traditional approach to predicting model parameters is that E(𝐤)E(\mathbf{k}) is generally derived from low-energy spectral peaks. However, it is difficult to resolve E(𝐤)E(\mathbf{k}) when the quasiparticle spectral weight is small VALLA et al. (1999); Laughlin (1997), which is a common phenomenon in systems close to a non-Fermi liquid state, specifically near the high-energy edge of the quasiparticle band in cuprates, resulting in large error bars [see Fig. 1(a)]. It is thus highly desirable if the model parameters can be predicted by studying the full energy range of the spectral functions directly [see Fig. 1(b)]. Extension from fitting E(𝐤)E(\mathbf{k}) to treating the whole spectral function A(𝐤,ω)A(\mathbf{k},\omega) means a dramatic increase in the total amount of data that needs to be processed. Here we use machine learning (ML) methods to address this outstanding problem in the field of strongly correlated electron systems and high-temperature superconductivity.

Refer to caption
Figure 1: The tt^{\prime} dependence of (a) the quasiparticle band dispersion E(𝐤)E(\mathbf{k}) compared with experimental data (open circles), reproducing Fig. 4 in Ref. Yin and Ku, 2009a with permission, and (b) the density of states A(ω)=𝐤A(𝐤,ω)A(\omega)=\sum_{\mathbf{k}}A(\mathbf{k},\omega), where the first broad peaks around the Fermi level (zero energy) are almost identical for a wide range of tt^{\prime} but the other peaks could be used to resolve tt^{\prime}.

Machine learning plays a huge role when cataloguing or processing large amounts of data in general. It is uniquely able to identify important patterns and correlations that might otherwise be missed, especially in large datasets. Recently, it has emerged as an important computational tool across disciplines in the physical sciences Krenn et al. (2022). For example, in particle physics, ML played an instrumental role in the discovery of the Higgs boson Radovic et al. (2018). In astrophysics, ML techniques have been used to study photometric redshifts Sadeh et al. (2016), cluster membership of galaxies Hashimoto and Liu (2022), and exoplanet transit detection Schanche et al. (2018). In materials and molecular science, ML is heralding in a “second computational revolution” Schmidt et al. (2019), helping predict crystal structures Ryan et al. (2018), calculate material properties Zheng et al. (2018); Carbone et al. (2019); Torrisi et al. (2020), and accelerate first-principles calculations Jalem et al. (2018); Carbone et al. (2020); Ghose et al. (2022); Rankine and Penfold (2022); Penfold and Rankine (2022). In condensed matter physics, ML was used to find phase transition temperatures Carrasquilla and Melko (2017), catalogue snapshots of strongly correlated electronic states Bohrdt et al. (2019), infer fundamental physical information from model systems Miles et al. (2021), efficiently sample configurations in many-body systems Liu et al. (2017), and predict impurity spectral functions Sturm et al. (2021).

In this paper, we show how ML can be used to predict and understand spectral functions in the tt-tt^{\prime}-t′′t^{\prime\prime}-JJ model. The paper is organized as follows: Section II describes the tt-tt^{\prime}-t′′t^{\prime\prime}-JJ model, the used ML methods, and how we obtain the needed dataset for training, validation, and testing. Section III.1 presents a preliminary examination of the data using Principal Component Analysis (PCA), primarily to help determine what linear correlations may be present in the data. Section III.2 addresses the forward problem of predicting spectral functions from a given set of model parameters Sturm et al. (2021); Arsenault et al. (2014); Walker et al. (2020). Section III.3 addresses the inverse problem of predicting the model parameters t/tt^{\prime}/t, t′′/tt^{\prime\prime}/t, and J/tJ/t from spectral functions. Section III.4 introduces an algorithm to find the value of tt. Finally, the results and discussions are summarized in Section IV.

II Methods

II.1 Hamiltonian and spectral functions

The tt-tt^{\prime}-t′′t^{\prime\prime}-JJ model is described by the following Hamiltonian Yin et al. (1998):

H=\displaystyle H= \displaystyle- (ti,j1,σ+ti,j2,σ+t′′i,j3,σ)(c~iσc~j,σ+h.c.)\displaystyle\left(t\sum_{\langle i,j\rangle_{1},\sigma}+t^{\prime}\sum_{\langle i,j\rangle_{2},\sigma}+t^{\prime\prime}\sum_{\langle i,j\rangle_{3},\sigma}\right)\left(\tilde{c}^{\dagger}_{i\sigma}\tilde{c}_{j,\sigma}+h.c.\right) (1)
+\displaystyle+ Ji,j1SiSj\displaystyle J\sum_{\langle i,j\rangle_{1}}{\textbf{S}_{i}\cdot\textbf{S}_{j}}

in the standard notation of the constrained fermionic operators: c~iσ\tilde{c}^{\dagger}_{i\sigma} creates an electron with the spin index σ\sigma (either \uparrow or \downarrow) at site ii—with the constraint of no double occupancy at any site—and c~i,σ\tilde{c}_{i,\sigma} annihilates it. The spin operators 𝐒i\mathbf{S}_{i} expressed in the matrix form are given by (𝐒i)σσ=12σσc~iστ^σσc~iσ\left(\mathbf{S}_{i}\right)_{\sigma\sigma^{\prime}}=\frac{1}{2}\sum_{\sigma\sigma^{\prime}}{\tilde{c}^{\dagger}_{i\sigma}\hat{\tau}_{\sigma\sigma^{\prime}}\tilde{c}_{i\sigma^{\prime}}} where τ^=(τ^x,τ^y,τ^z)\hat{\tau}=(\hat{\tau}^{x},\hat{\tau}^{y},\hat{\tau}^{z}) are the 2×22\times 2 Pauli matrices. The angle brackets denote the first (i,j1\langle i,j\rangle_{1}), second (i,j2\langle i,j\rangle_{2}), and third (i,j3\langle i,j\rangle_{3}) neighbor sites, respectively. Thus, the JJ term describes the Heisenberg interaction between nearest-neighboring quantum spins; the tt, tt^{\prime}, and t′′t^{\prime\prime} terms describe the electron hopping to nearest, second nearest, and third nearest sites, respectively.

The angle-resolved spectral function of a doped hole with momentum 𝐤\mathbf{k} and energy ω\omega is given by

A(𝐤,ω)=1πImG(𝐤,ω),A(\mathbf{k},\omega)=-\frac{1}{\pi}\imaginary G(\mathbf{k},\omega), (2)

with the retarded Green’s function of the single hole being

G(𝐤,ω)=limη0+Ψ0|c~𝐤σ1ω+iηH+E0c~𝐤σ|Ψ0,G(\mathbf{k},\omega)=\lim_{\eta\to 0^{+}}\langle\Psi_{0}|\tilde{c}^{\dagger}_{\mathbf{k}\sigma}\frac{1}{\omega+i\eta-H+E_{0}}\tilde{c}_{\mathbf{k}\sigma}|\Psi_{0}\rangle, (3)

where E0E_{0} and |Ψ0|\Psi_{0}\rangle are the ground-state energy and wave function of the undoped system, respectively, thus H|Ψ0=E0|Ψ0H|\Psi_{0}\rangle=E_{0}|\Psi_{0}\rangle. Or equivalently,

A(𝐤,ω)=ν|ν|c~𝐤σ|Ψ0|2δ(ωEν+E0),A(\mathbf{k},\omega)=\sum_{\nu}{|\langle\nu|\tilde{c}_{\mathbf{k}\sigma}|\Psi_{0}\rangle|^{2}\delta(\omega-E_{\nu}+E_{0})}, (4)

where |ν|\nu\rangle is an eigenstate of HH with one less electron and EνE_{\nu} is the corresponding eigen-energy satisfying H|ν=Eν|νH|\nu\rangle=E_{\nu}|\nu\rangle, as the Dirac delta function δ(ω)\delta(\omega) is related to a Lorentzian by δ(ω)=limη0+1πηω2+η2=limη0+1πIm1ω+iη\delta(\omega)=\lim_{\eta\to 0^{+}}\frac{1}{\pi}\frac{\eta}{\omega^{2}+\eta^{2}}=\lim_{\eta\to 0^{+}}-\frac{1}{\pi}\mathrm{Im}\,\frac{1}{\omega+i\eta}.

The angle-unresolved spectral function is given by

A(ω)=𝐤A(𝐤,ω),A(\omega)=\sum_{\mathbf{k}}A(\mathbf{k},\omega), (5)

which is also called the density of states (DOS). To obtain the DOS in the normal procedure of theoretical calculations, one needs to have first calculated out A(𝐤,ω)A(\mathbf{k},\omega) using Eq. (4) for a dense mesh of 𝐤\mathbf{k} points, and then sum the results over 𝐤\mathbf{k} using Eq. (5). This implies that if DOS can be accurately predicted from known DOS data, a significant speedup in evaluating DOS can be achieved, e.g., a four-orders-of-magnitude speedup compared with the normal procedure using a 100×100100\times 100 𝐤\mathbf{k}-mesh. More interestingly, we will explore whether the model Hamiltonian parameters can be accurately predicted by machine learning the DOS A(ω)A(\omega), which is relevant to x-ray photoemission (XPS), or A(𝐤,ω)A(\mathbf{k},\omega) with a fixed 𝐤\mathbf{k}, which is relevant to laser-based ARPES where the 𝐤\mathbf{k} points are most accessible near the zone center 𝐤=0\mathbf{k}=0.

II.2 Dataset generation

To obtain the dataset for use in our ML approach, we use the self-consistent born approximation (SCBA) to calculate Green’s function of a hole in the tt-tt^{\prime}-t′′t^{\prime\prime}-JJ model Schmitt-Rink et al. (1988); Marsiglio et al. (1991); Martinez and Horsch (1991); Liu and Manousakis (1991, 1992); Yin and Gong (1997, 1998); Manousakis (2007a, b); Valla et al. (2007) (see Appendix A for details). This approximation produces quantitatively accurate results for the hole Green’s function compared with exact diagonalization on small systems Leung and Gooding (1995) and Monte Carlo simulations Diamantis and Manousakis (2021).

We note that although the Hamiltonian has four parameters (t,t,t′′,J)(t,t^{\prime},t^{\prime\prime},J), all the data can be scaled with respect to tt, e.g.

A(𝐤,ω)A(𝐤,aω)/afortat.A(\mathbf{k},\omega)\to A(\mathbf{k},a\,\omega)/a\quad\mathrm{for}\quad t\to a\,t. (6)

where aa is an arbitrary positive real number. Setting tt as the energy unit (t=1t=1) reduces the ML complexity by one dimension, which is of significant advantage in high-throughput computation and big data management. Thus, the Green’s functions are generated in a grid of t[0.5,0.5]t^{\prime}\in[-0.5,0.5], t′′[0.5,0.5]t^{\prime\prime}\in[-0.5,0.5] and J[0.2,1.0]J\in[0.2,1.0], with each parameter sampled on a 51-point uniform grid.

For each combination of tt^{\prime}, t′′t^{\prime\prime}, and JJ, the calculation of the Green’s function G(𝐤,ω)G(\mathbf{k},\omega) is performed by using a 128×128128\times 128 mesh for the 𝐤\mathbf{k} points, ω[6,6]\omega\in[-6,6] with the step (i.e., energy resolution) being 0.01, and η=0.01\eta=0.01. Then, η=0.1\eta=0.1 is used to broaden the resulting spiky DOS and a uniform grid of 301 ω\omega points is used to sample the DOS. Therefore, our dataset for the DOS consists of 513=132,65151^{3}=132,651 pairs (x(i),y(i))\left(\textbf{x}^{(i)},\textbf{y}^{(i)}\right), with 𝐱(i)=(t,t′′,J)(i)\mathbf{x}^{(i)}=(t^{\prime},t^{\prime\prime},J)^{(i)} being the 3-dimensional vector representation of the iith model parameter set and 𝐲(i)=(A(ω1),A(ω2),,A(ω301))(i)\mathbf{y}^{(i)}=(A(\omega_{1}),A(\omega_{2}),\dots,A(\omega_{301}))^{(i)} the corresponding 301-dimensional vector representation of the DOS. For the forward problem, 𝐱(i)\mathbf{x}^{(i)} are the input and 𝐲(i)\mathbf{y}^{(i)} are the output. For the inverse problem, the definitions of x(i)\textbf{x}^{(i)} and 𝐲(i)\mathbf{y}^{(i)} are switched, i.e., 𝐱(i)=(A(ω1),A(ω2),,A(ω301))(i)\mathbf{x}^{(i)}=(A(\omega_{1}),A(\omega_{2}),\dots,A(\omega_{301}))^{(i)} and 𝐲(i)=(t,t′′,J)(i)\mathbf{y}^{(i)}=(t^{\prime},t^{\prime\prime},J)^{(i)}. We then randomly partitioned the dataset into an 80/10/10 training (𝕋\mathbb{T}), validation (𝕍\mathbb{V}), and testing 𝒯\mathcal{T} split. Here we use the computationally generated testing sets to demonstrate without any ambiguity that the ML methods work well for the present baseline problems.

II.3 Machine Learning Methods

Training a ML model consists of an optimization procedure in which a loss function encoding the difference between predicted and ground-truth outputs is minimized on a training set. In addition, a set of hyperparameters of the ML model is tuned during cross-validation to achieve high accuracy on the validation set. Hyperparameters are untrained parameters that include, but are not limited to, training time, network architecture and activation functions. Ultimately, final results are presented on the testing set in order to provide an unbiased estimate of model performance. Here we use the total mean squared error (MSE) as the loss function. Given the training set 𝕋={(𝐱(i),𝐲(i))}\mathbb{T}=\left\{\left(\mathbf{x}^{(i)},\mathbf{y}^{(i)}\right)\right\} of size |𝕋||\mathbb{T}|, i.e., i=1,2,3,,|𝕋|i=1,2,3,...,|\mathbb{T}|, for an nn-dimensional input vector 𝐱(i)\mathbf{x}^{(i)}, the corresponding ground-truth output is a mm-dimensional vector 𝐲(i)\mathbf{y}^{(i)}; if the ML model predicts 𝐲^(i)\hat{\mathbf{y}}^{(i)}, then the individual MSE for that training example is given by

L(i)=1mj=1m|y^j(i)yj(i)|2,L^{(i)}=\frac{1}{m}\sum_{j=1}^{m}\absolutevalue{\hat{y}_{j}^{(i)}-y_{j}^{(i)}}^{2}, (7)

and the total MSE score of the ML method is given by

L=1|𝕋|i=1|𝕋|L(i).L=\frac{1}{\absolutevalue{\mathbb{T}}}\sum_{i=1}^{\absolutevalue{\mathbb{T}}}L^{(i)}. (8)

We now introduce the two ML methods used in this work: K-nearest neighbors (KNN) and feed-forward neural network (FFNN).

II.3.1 K-nearest neighbors

The KNN algorithm predicts an output via a nearest neighbor search Biau and Devroye (2015). With a training set 𝕋={(𝐱(i),𝐲(i))}\mathbb{T}=\left\{\left(\mathbf{x}^{(i)},\mathbf{y}^{(i)}\right)\right\}, the KNN algorithm finds the closest kk points in the input parameter space. Then, a weighted average is taken over the outputs of the kk neighbors to predict the output 𝐲^\hat{\mathbf{y}} of a new input vector 𝐱\mathbf{x} by

𝐲^=iNN(𝐱)w(i)(𝐱)𝐲(i),\hat{\mathbf{y}}=\sum_{i\in\mathrm{NN}(\mathbf{x})}w^{(i)}(\mathbf{x})\mathbf{y}^{(i)}, (9)

where NN(𝐱)\mathrm{NN}(\mathbf{x}) indicates the kk nearest neighbors to 𝐱.\mathbf{x}. Here the weights w(i)w^{(i)} is given by the inverse Euclidean distance Shepard (1968)

w(i)(𝐱)=|𝐱𝐱(i)|αjNN(𝐱)|𝐱𝐱(j)|α,iNN(𝐱).w^{(i)}(\mathbf{x})=\frac{\absolutevalue{\mathbf{x}-\mathbf{x}^{(i)}}^{-\alpha}}{\sum_{j\in\mathrm{NN}(\mathbf{x})}\absolutevalue{\mathbf{x}-\mathbf{x}^{(j)}}^{-\alpha}},\quad i\in\mathrm{NN}(\mathbf{x}). (10)

The optimized hyperparameters learned from training are k=9,α=5k=9,\alpha=5 for the forward problem and k=9k=9, α=3\alpha=3 for the inverse problem.

II.3.2 Feed-Forward Neural Network

Refer to caption
Figure 2: A fully connected neural network, applied to the forward problem of predicting a DOS given t,t′′,J{t^{\prime},t^{\prime\prime},J}. The neural network takes in a 3 dimensional vector representing t,t′′,J{t^{\prime},t^{\prime\prime},J}, and outputs a 301 dimensional vector representing the predicted DOS.

A neural network is a ML algorithm that implements multiple repeated blocks of linear predictions followed by the application of non-linear functions. In this work, we use feed-forward neural networks (FFNN), which consist of several layers of artificial neurons and is defined by how the layers are implemented and connected. Here we use a fully-connected FFNN in which neurons between adjacent layers are fully connected Gardner and Dorling (1998). For example, a 3-layer FFNN is illustrated in Fig. 2. The architecture of a fully connected FFNN is primarily defined by the size of each layer, and the layer-by-layer one-way activation is given by 𝐚l=fl(Wl𝐚l1+𝐛l)\mathbf{a}_{l}=f_{l}(W_{l}\mathbf{a}_{l-1}+\mathbf{b}_{l}) where 𝐚l\mathbf{a}_{l} is the nln_{l}-dimensional vector output of the llth layer, flf_{l} is the activation function, WlW_{l} is an nl×nl1n_{l}\times n_{l-1} matrix of weights (nln_{l} is the number of neurons in layer ll), and 𝐛l\mathbf{b}_{l} is a vector of biases. Among them, WlW_{l} and 𝐛l\mathbf{b}_{l} are learned during training. For both the forward and inverse problems, we train the neural networks for 30 minutes, using the rectified linear unit (ReLU) activation function fl(x)=max(0,x)f_{l}(x)=\max(0,x) and Adam optimizer Kingma and Ba (2014).

III Results and Discussion

III.1 Principal component analysis

To analyze the quality of our dataset and visualize the potential of applying ML algorithms to the data, we performed the following principal component analysis (PCA) Wold et al. (1987) (see Appendix B for details).

III.1.1 Full DOS data

We proceed with PCA of the dataset in which 𝐲(i)=(A(ω1),A(ω2),,A(ω301))(i)\mathbf{y}^{(i)}=(A(\omega_{1}),A(\omega_{2}),\dots,A(\omega_{301}))^{(i)} and 𝐱(i)=(t,t′′,J)(i)\mathbf{x}^{(i)}=(t^{\prime},t^{\prime\prime},J)^{(i)}, where i=1,2,3,,513i=1,2,3,\dots,51^{3}. Following Eq. (15), we show the projected (reduced-dimensional) data vector in Fig. 3, and color each point (z1,z2)(i)(z_{1},z_{2})^{(i)} by the value of t(i)t^{\prime(i)}, t′′(i)t^{\prime\prime(i)}, and J(i)J^{(i)}, respectively, producing three subplots. All results look quite structured (ear like) and the color gradients are smooth. This suggests that the input parameters can be continuously mapped to spectral functions, making ML algorithms well suited for the forward problem. Furthermore, this suggests that the inverse problem of mapping spectral functions to input parameters is feasible.

Refer to caption
Figure 3: 2D visualization of the DOS spectra projected into the first two principal components (z1,z2)(i)(z_{1},z_{2})^{(i)}. The color maps of the three subplots are determined by the values of tt^{\prime}, t′′t^{\prime\prime}, and JJ, respectively. The horizontal and vertical axes represent the first and second principal components, respectively.
Refer to caption
Figure 4: 2D visualization of the lorentzian parameters (obtained from fitting the first peaks of the DOS spectra) projected into the first two principal components (z1,z2)(i)(z_{1},z_{2})^{(i)}. The color maps of the three subplots are determined by the values of tt^{\prime}, t′′t^{\prime\prime}, and JJ, respectively.

III.1.2 Using the first peak of DOS

In comparison, the traditional method for predicting the Hamiltonian parameters is to fit the quasiparticle band E(𝐤)E(\mathbf{k}) derived from the low-energy peak of the spectral function A(𝐤,ω)A(\mathbf{k},\omega), resulting a difficulty determining tt^{\prime} Yin and Ku (2009a); Belinicher et al. (1996). To visualize this problem with PCA, we use the Lorentzian

f(x)=Aγ2(xx0)2+γ2f(x)=\frac{A\gamma^{2}}{(x-x_{0})^{2}+\gamma^{2}} (11)

to fit the first peak of every DOS spectrum considered in the inverse problem, resulting in a feature dataset in which now 𝐲(i)=(A,x0,γ)(i)\mathbf{y}^{(i)}=(A,x_{0},\gamma)^{(i)}. Then, we redo PCA and show the color maps of the first two principal components in Fig. 4. We see that while the t′′t^{\prime\prime} and JJ plots have smooth gradients, the tt^{\prime} plot contains much more scattering data, demonstrating the difficulty in resolving tt^{\prime} by using only the first-peak information.

Refer to caption
Figure 5: Comparison of the KNN-predicted DOS and the ground truth (the SCBA-generated DOS) for the worst performing data points in the testing set.

III.2 The forward problem

KNN.—

We first trained a KNN for the forward problem (see Appendix C.1 for details) and found that with the optimized hyperparameters k=9k=9 and α=5\alpha=5, the KNN was able to produce DOS that are almost visually identical to the SCBA results. The worst percentiles of prediction for the testing set, in terms of mean squared error score, are shown in Fig. 5. We note that even for the examples in the testing set where KNN performs the worst, the KNN prediction is able to reproduce the peak positions and widths almost perfectly, while also performing quite well when reproducing the peak heights.

FFNN.—

We then applied a neural networks for the same task. We found that with optimized hyperparameters 𝐧=(3,170,340,510,680,850,1020,301)\mathbf{n}=(3,170,340,510,680,850,1020,301), batch size of 10241024, and initial learning rate of 10310^{-3} (see Appendix C.1 for details), the neural network with six hidden layers was able to outperform KNN by roughly a factor of 6 in terms of MSE loss. As shown in Fig. 6, even for the worst examples in the testing set, the neural network reproduces peak positions, widths, and heights almost perfectly. For these low-percentile testing examples, the neural network qualitatively appears to reproduce the peak heights better than KNN, which is also manifested quantitatively in its improvement over KNN in terms of MSE loss.

Refer to caption
Figure 6: Comparison of the neural-network-predicted DOS and the ground truth (the SCBA-generated DOS) for the worst performing data points in the testing set.

In addition to excellently reproducing the DOS, the ML algorithms offer a great speedup in computation time over SCBA. While generating the DOS from 130k input combinations using SCBA took over 30 hours, both KNN and the neural network were able to generate the DOS from those same input combinations in seconds: 7.47.4 seconds for KNN and 1.21.2 seconds for FFNN; KNN and the neural network saw a 1.5×1041.5\times 10^{4} and 9×1049\times 10^{4} speedup over SCBA in predicting the DOS, respectively.

Table 1: Examples of worst percentiles when predicting tt^{\prime}, t′′t^{\prime\prime}, and JJ with KNN and FFNN, given the DOS. The numbers in the parentheses are ground truth values.
KNN-Predicted FFNN-Predicted
Percentile t-t^{\prime} t′′t^{\prime\prime} JJ t-t^{\prime} t′′t^{\prime\prime} JJ
0 0.072(0.02) 0.130(0.14) 0.235(0.232) 0.387(0.38) 0.497(0.50) 0.201(0.200)
1 0.050(0.02) 0.093(0.10) 0.565(0.552) 0.177(0.18) 0.402(0.40) 0.889(0.888)
2 0.235(0.26) 0.140(0.14) 0.657(0.664) 0.022(0.02) 0.201(0.20) 0.266(0.264)
3 0.045(0.02) 0.440(0.44) 0.520(0.520) 0.498(0.50) 0.481(0.48) 0.362(0.360)

III.3 The inverse problem

The inverse problem, which involves predicting the model’s parameters from observable quantities, has important experimental implications. Since ARPES experiments produce the spectral function and the DOS, the final goal of inverse modeling would be to predict the Hamiltonian parameters from this available experimental data. In order to make our DOS dataset more experimentally relevant, we shifted every DOS with respect to the top of the quasiparticle valence band [see Fig. 1(b)]. This better mimics experimental data, where absolute energies are not measured, but are instead found relative to the Fermi level [see Fig. 1(a)]. We also limited the dataset to include only examples with t<0t^{\prime}<0 and t′′>0t^{\prime\prime}>0, i.e, the hole doped case (the case of t>0t^{\prime}>0 and t′′>0t^{\prime\prime}>0 corresponds to electron doping). As different DOS in our dataset requires shifting of different amounts, this demands an expansion of our energy window. As a result, we now use a 354-point linear grid to sample the shifted DOS. The input is 𝐱(i)=(A(ω1),A(ω2),,A(ω354))(i)\mathbf{x}^{(i)}=(A(\omega_{1}),A(\omega_{2}),\dots,A(\omega_{354}))^{(i)} and the output is 𝐲(i)=(t,t′′,J)(i)\mathbf{y}^{(i)}=(t^{\prime},t^{\prime\prime},J)^{(i)}, where i=1,2,3,,513/4i=1,2,3,\dots,\sim 51^{3}/4.

KNN.—

We first use a KNN to predict the corresponding tt^{\prime}, t′′t^{\prime\prime}, and JJ, given a DOS. With the trained hyperparameters k=9k=9 and α=3\alpha=3, the results for the worst-percentile predictions are displayed in Table 1. We see that for worst percentiles, a KNN is able to predict t′′t^{\prime\prime} and JJ quite accurately, but has trouble with tt^{\prime}. This means that the outstanding problem of predicting tt^{\prime} likely cannot be resolved by this classic modeling method.

FFNN.—

We then trained a FFNN for the same task. We found that with the hyperparameters 𝐧=(301,256,128,64,32,3)\mathbf{n}=(301,256,128,64,32,3), batch size of 128128, and a learning rate of 10310^{-3} (see Appendix C.2 for details), the neural network is able to significantly outperform KNN, with the MSE being 67×67\times better than that of KNN. As shown in Table 1, even for the worst examples in the test set, the neural network can predict at least the first two significant figures. Thus, the neural network offers an accurate approach to prediction of material parameters.

III.4 Inverse problem: Finding tt

Refer to caption
Figure 7: The mean squared error when rescaling a mock “experimental” DOS in units of tguesst_{\mathrm{guess}}, and running FFNN to predict the ground truth tt^{\prime}, t′′t^{\prime\prime}, and JJ.

.

In our above analysis, we have produced and analyzed DOS with tt being the energy unit, i.e., t=1t=1. However, ARPES experiments produce DOS that are measured in terms of absolute energy. We thus proceed to analyze the feasibility of obtaining ground truth tt (referred to as ttrutht_{\mathrm{truth}}) from more experimentally realistic DOS. To this end, we examine the following simulation: We start with an SCBA-generated DOS with t=1t=1 from our dataset, shifted with respect to the top of the valence band. We then scale the DOS by using A(ω)A(ωttruth)/ttruthA(\omega)\to A(\omega\,t_{\mathrm{truth}})/t_{\mathrm{truth}}, thus producing an “experimental” DOS in units of absolute energy. The task is to find ttrutht_{\mathrm{truth}} from this “experimental” DOS while pretending that we do not know this ttrutht_{\mathrm{truth}}.

We propose the following algorithm for this task: We add to the ML methods presented in Section III.3 an outermost loop over various guesses of ttrutht_{\mathrm{truth}}. Specifically, for each tguesst_{\mathrm{guess}}, we rescale the experimental DOS by using A(ω)A(ω/tguess)tguessA(\omega)\to A(\omega/t_{\mathrm{guess}})\,t_{\mathrm{guess}}, producing the DOS with tguesst_{\mathrm{guess}} being the energy unit; thus, we arrive at the same inverse problem of predicting tt^{\prime}, t′′t^{\prime\prime}, and JJ with t=1t=1 studied in Section III.3. Then, we resample the rescale DOS with the the same 354 point energy grid as for the previous inverse problem, using cubic spline interpolation. After that, we run the trained neural network to produce tt^{\prime}, t′′t^{\prime\prime}, and JJ from the rescaled DOS and calculate the MSE.

The results of this procedure are shown in Fig. 7. We find that this algorithm is able to predict ttrutht_{truth} very accurately, as seen by the steep drop in the mean squared error when tguess=ttrutht_{\mathrm{guess}}=t_{\mathrm{truth}}. Adding such one outermost loop takes advantage of the scalability of the spectral functions [Eq. (6)] and reduces the dimensionality of the model parameter space from 4 to 3, a significant improvement in coping with the curse of dimensionality in big-data ML research.

IV Summary

We have investigated the potential of ML algorithms for understanding the spectral functions of a hole in the tt-tt^{\prime}-t′′t^{\prime\prime}-JJ model and found that ML algorithms are well suited for the task. The analysis of the dataset of SCBA-generated spectral functions demonstrates the presence of a continuous mapping between the model parameters and the resulting DOS. Given a set of the model parameters, we found that both KNN and neural networks can produce almost visually identical DOS as SCBA, with a speedup of as much as 9×1049\times 10^{4}. We also found that the ML algorithms, especially deep learning neural networks, can predict tt^{\prime}, t′′t^{\prime\prime}, and JJ very accurately given a DOS. With such a speedup in the calculation of DOS, as well as the ability to solve the inverse problem, ML offers a potential tool to search for the model parameters that produce desirable spectral functions. The present method can be directly applied to other cases of energy distribution curves (EDC) such as A(𝐤,ω)A(\mathbf{k},\omega) at constant momentum or the cases of momentum distribution curves (MDC), which are the intensities as a function of momentum at constant energy VALLA et al. (1999). Future work will focus on working with experimental data, which are further complicated by instrument resolution and irreducible noise.

Data Availability

The data generated and used in this study are openly available from the Zenodo database Lee et al. (2023).

Acknowledgements.
This work was supported by U.S. Department of Energy (DOE) the Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division under Contract No. DE-SC0012704. This project was supported in part by the U.S. Department of Energy, Office of Science, Office of Workforce Development for Teachers and Scientists (WDTS) under the Science Undergraduate Laboratory Internships Program (SULI). This project was supported in part by the Brookhaven National Laboratory (BNL), Condensed Matter Physics and Materials Science Division under the BNL Supplemental Undergraduate Research Program (SURP).

Appendix A The self-consistent Born approximation

SCBA uses non-crossing Feynman diagrams to calculate Green’s function G(𝐤,ω)G(\mathbf{k},\omega) used in Eqs. (2) and (3), which describes the propagation of a particle in the lattice. The self-consistent system of equations to be solved is

G(𝐤,ω)\displaystyle G(\mathbf{k},\omega) =\displaystyle= [G0(𝐤,ω)1Σ(𝐤,ω)]1,\displaystyle\left[G^{0}(\mathbf{k},\omega)^{-1}-\Sigma(\mathbf{k},\omega)\right]^{-1}, (12)
Σ(𝐤,ω)\displaystyle\Sigma(\mathbf{k},\omega) =\displaystyle= 𝐪|M(𝐤,𝐪)|2G(𝐤𝐪,ω),\displaystyle\sum_{\mathbf{q}}{|M(\mathbf{k},\mathbf{q})|^{2}G(\mathbf{k-q},\omega)}, (13)

where Σ(𝐤,ω)\Sigma(\mathbf{k},\omega) is the so-called self-energy, G0(𝐤,ω)=limη0+[ω+iηϵ𝐤]1G^{0}(\mathbf{k},\omega)=\lim_{\eta\to 0^{+}}{\left[\omega+i\eta-\epsilon_{\mathbf{k}}\right]^{-1}} is the bare Green’s function with ϵ𝐤=4tcoskxcosky+2t′′[cos(2kx)+cos(2ky)]\epsilon_{\mathbf{k}}=4t^{\prime}\cos k_{x}\cos k_{y}+2t^{\prime\prime}[\cos(2k_{x})+\cos(2k_{y})] being the bare dispersion relation of the hole quasiparticle.

In order to generate a high-quality dataset of spectral functions, SCBA samples over a dense mesh for both the hole momentum 𝐤\mathbf{k} and the magnon momentum 𝐪\mathbf{q}. The sizes of 𝐤\mathbf{k} and 𝐪\mathbf{q} can be different while being commensurate, corresponding to the application of twisted boundary conditions Yin and Ku (2009b). While higher density k and q sampling leads to higher quality spectral functions, they are also more computationally expensive. We tested various combinations of 𝐤\mathbf{k} sampling density and 𝐪\mathbf{q} sampling density. We found that above sampling densities of a 128×128128\times 128 lattice for 𝐤\mathbf{k}, and a 32×3232\times 32 lattice for 𝐪\mathbf{q}, the results converge.

Appendix B Principal component analysis

Given the training set 𝕋={(𝐱(i),𝐲(i))}\mathbb{T}=\left\{\left(\mathbf{x}^{(i)},\mathbf{y}^{(i)}\right)\right\} of size NN where 𝐱(i)\mathbf{x}^{(i)} is an nn-dimensional vector and 𝐲(i)\mathbf{y}^{(i)} is a mm-dimensional vector, we begin with the m×Nm\times N matrix Z=(y(1),y(2),,y(N))Z=\left(\textbf{y}^{(1)},\textbf{y}^{(2)},\dots,\textbf{y}^{(N)}\right) with each column representing a 𝐲(i)\mathbf{y}^{(i)}. The mm rows of ZZ are then each shifted so that the mean of every raw is zero, that is, the center of the data is translated to the origin of the mm-dimensional space, which does not change how the data points are positioned relative to each other. The m×mm\times m covariance matrix is given by

𝐂𝐙=1m𝐙𝐙T\mathbf{C_{Z}}=\frac{1}{m}\mathbf{Z}\mathbf{Z}^{\mathrm{T}} (14)

The principal components of 𝐂𝐙\mathbf{C_{Z}} are just its normalized eigenvectors 𝐞j\mathbf{e}_{j} with j=1,2,3,,mj=1,2,3,\dots,m, arranged according to their corresponding eigenvalues (variance) in descending order. 𝐞1\mathbf{e}_{1} is the direction in the mm-dimensional space with the largest variance in ZZ, 𝐞2\mathbf{e}_{2} is the direction with the second largest variance, and so on. One can use the eigenvalues to determine the proportion of the variation that each principal component accounts for. If 𝐞1\mathbf{e}_{1} and 𝐞2\mathbf{e}_{2} account for the vast majority of the variation in the data, a 2D graph, using only 𝐞1\mathbf{e}_{1} and 𝐞2\mathbf{e}_{2} as the axes, would be a good approximation of an unimaginable mm-dimensional graph. The coordinates of 𝐲(i)\mathbf{y}^{(i)} projected into the 2D subspace is given by

𝐳(i)(z1,z2)(i)=(𝐞1𝐲(i),𝐞2𝐲(i)).\mathbf{z}^{(i)}\equiv(z_{1},z_{2})^{(i)}=(\mathbf{e}_{1}\cdot\mathbf{y}^{(i)},\mathbf{e}_{2}\cdot\mathbf{y}^{(i)}). (15)

Then, a 2D color map can be produced by coloring all the 𝐳(i)\mathbf{z}^{(i)} points according to the value of an element in the vector 𝐱(i)\mathbf{x}^{(i)}, so we can obtain nn such 2D color maps. Among them, those appearing to be quite structured and smooth in the color gradients suggesting that the corresponding input parameters can be continuously mapped to spectral functions, making ML algorithms well suited for the task. This also suggests that the inverse problem of mapping spectral functions to input parameters is also feasible.

Appendix C Total mean squared error

C.1 MSE for the forward problem

Refer to caption
Figure 8: Validation loss for the forward problem for different values of k, and α\alpha.

Hyperparameter tuning was performed by optimizing hyperparameters to reduce validation set error.

For KNN, we tested various values of α\alpha and kk via grid search. Fig. 8 shows a plot of the validation mean squared error for various values of kk, and α\alpha, including the optimal value α=5\alpha=5 and k=9k=9.

For FFNN, we tuned hyperparameters with a combination of hand tuning and grid search. The architecture of the neural network 𝐧\mathbf{n} was of particular interest in hyperparameter tuning. We tested a variety of architectures with different number of layers, which “ramped” up to a different number of neurons in the final hidden layer. Fig. 9 shows a plot of the the validation error over time for different architectures over a 30 minute time period.

Refer to caption
Figure 9: Validation loss over time for various different FFNN architectures, with bs =1024=1024, and lr =103=10^{-3}, for the forward problem.

For the architecture design, we tested various “linear ramps” which simply ramp from 3 input neurons to the 301 output neurons linearly. For example, a linear-ramp architecture with 3 hidden layers would have 𝐧=(3,77,151,225,301)\mathbf{n}=(3,77,151,225,301). We found that while these linear ramps intuitively made more sense, they trained considerably less efficiently than architectures which had more than 301 neurons in the hidden layers. We see in Fig. 10 that after training for 30 minutes, even the best linear ramps perform worse on the validation set than architectures which include larger hidden layers.

Other hyperparameters include the batch size and the learning rate. The batch size (bs) is the size of a subset of 𝕋\mathbb{T} fed to the neural network to perform a single gradient update. One epoch of training completes after all the training data have been fed through the network (in a randomized order each time). The learning rate (lr) is the base step size for tuning weights towards the optimization direction (along gradient descent) and is scheduled to decrease by a factor of 2 when no improvement is realized after 10 epochs. For the forward problem, we find optimized hyperparameters 𝐧=(3,170,340,510,680,850,1020,301)\mathbf{n}=(3,170,340,510,680,850,1020,301), bs =1024=1024, and lr =103=10^{-3}.

After optimizing hyperparameters using the validation set for both KNN and FFNN, the following MSE results are derived from the performance of the ML models on the testing set: 4.71×1054.71\times 10^{-5} for KNN and 7.24×1067.24\times 10^{-6} for FFNN.

Refer to caption
Figure 10: Validation loss for the forward problem for linear ramps with different number of hidden layers. These are compared to architectures that include more than 301 neurons in the hidden layers.

C.2 MSE for the inverse problem

For KNN, we again used grid search to tune hyperparameters.

For FFNN, while we found that an architecture of 𝐧=(354,256,128,64,32,3)\mathbf{n}=(354,256,128,64,32,3) together with bs =128=128 and lr =103=10^{-3} after hyperparameter tuning, we note that several architectures, which ramped down from 354 to 3 neurons, performed similarly on the validation set.

After optimizing hyperparameters using the validation set for both KNN and FFNN, the following MSE results are derived from the performance of the ML models on the testing set: 4.19×1054.19\times 10^{-5} for KNN and 6.29×1076.29\times 10^{-7} for FFNN.

References