This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Maximum-Likelihood-Estimate Hamiltonian learning via efficient and robust quantum likelihood gradient

Tian-Lun Zhao International Center for Quantum Materials, School of Physics, Peking University, Beijing, 100871, China    Shi-Xin Hu International Center for Quantum Materials, School of Physics, Peking University, Beijing, 100871, China    Yi Zhang [email protected] International Center for Quantum Materials, School of Physics, Peking University, Beijing, 100871, China
(Today)
Abstract

Given the recent developments in quantum techniques, modeling the physical Hamiltonian of a target quantum many-body system is becoming an increasingly practical and vital research direction. Here, we propose an efficient strategy combining maximum likelihood estimation, gradient descent, and quantum many-body algorithms. Given the measurement outcomes, we optimize the target model Hamiltonian and density operator via a series of descents along the quantum likelihood gradient, which we prove is negative semi-definite with respect to the negative-log-likelihood function. In addition to such optimization efficiency, our maximum-likelihood-estimate Hamiltonian learning respects the locality of a given quantum system, therefore, extends readily to larger systems with available quantum many-body algorithms. Compared with previous approaches, it also exhibits better accuracy and overall stability toward noises, fluctuations, and temperature ranges, which we demonstrate with various examples.

I Introduction

Understanding the quantum states and the corresponding properties of a given quantum Hamiltonian is a crucial problem in quantum physics. Many powerful numerical and theoretical tools have been developed for such purposes and made compelling progress [1, 2, 3, 4, 5]. On the other hand, with the rapid experimental developments of quantum technology, e.g., near-term quantum computation [6, 7] and simulation [8, 9, 10, 11, 12, 13, 14], it is also vital to explore the inverse problem, e.g., Hamiltonian learning - optimize a model Hamiltonian characterizing a quantum system with respect to the measurement results. Given the knowledge and assumption of a target system, researchers have achieved many resounding successes modeling quantum Hamiltonians with physical pictures and phenomenological approaches [15, 16]. However, such subjective perspectives may risk biases and are commonly insufficient on detailed quantum devices. Therefore, the explorations for objective Hamiltonian learning strategies have attracted much recent attention [17, 18, 19, 20, 21, 22, 23, 24, 25].

There are mainly two categories of Hamiltonian-learning strategies, based upon either quantum measurements on a large number of (identical copies of) quantum states, e.g., Gibbs states or eigenstates [17, 18, 19, 20, 21, 22], or initial states’ time evolution dynamics [23, 24, 25, 26], corresponding to the target quantum system. For example, given the measurements of the correlations of a set of local operators, the kernel of the resulting correlation matrix offers a candidate model Hamiltonian [17, 18, 19]. On the other hand, while established theoretically, most approaches suffer from elevated costs and are limited to small systems in experiments or numerical simulations [19, 20, 27, 28]. Besides, there remains much room for improvements in stability towards noises and temperature ranges.

Maximum likelihood estimation (MLE) is a powerful tool that parameterizes and then optimizes the probability distribution of a statistical model so that the given observed data is most probable. MLE’s intuitive and flexible logic makes it a prevailing method for statistical inference. Adding to its wide range of applications, MLE has been applied successfully to quantum state tomography[29, 30, 31, 32, 33], providing the most probable quantum states given the measurement outputs.

Inspired by MLE’s successes in quantum problems, we propose a general MLE Hamiltonian learning protocol: given finite-temperature measurements of the target quantum system in thermal equilibrium, we optimize the model Hamiltonian towards the MLE step-by-step via a “quantum likelihood gradient”. We show that such quantum likelihood gradient, acting collectively on all presenting operators, is negative semi-definite with respect to the negative-log-likelihood function and thus provides efficient optimization. In addition, our strategy may take advantage of the locality of the quantum system, therefore allowing us to extend studies to larger quantum systems with tailored quantum many-body ansatzes such as Lanczos, quantum Monte Carlo (QMC), density matrix renormalization group (DMRG), and finite temperature tensor network (FTTN) [34, 35] algorithms in suitable scenarios. We also demonstrate that MLE Hamiltonian learning is more accurate, less restrictive, and more robust against noises and broader temperature ranges. Further, we generalize our protocol to measurements on pure states, such as the target quantum systems’ ground states or quantum chaotic eigenstates. Therefore, MLE Hamiltonian learning enriches our arsenal for cutting-edge research and applications of quantum devices and experiments, such as quantum computation, quantum simulation, and quantum Boltzmann machines [36].

We organize the rest of the paper as follows: In Sec. II, we review the MLE context and introduce the MLE Hamiltonian learning protocol; especially, we show explicitly that the corresponding quantum likelihood gradient leads to a negative semi-definite change to the negative-log-likelihood function. Via various examples in Sec. III, we demonstrate our protocol’s capability, especially its robustness against noises and temperature ranges. We generalize the protocol to quantum measurements of pure states in Sec. IV and Appendix D, with consistent results for exotic quantum systems such as quantum critical and topological models. We summarize our studies in Sec. V with a conclusion on our protocol’s advantages (and limitations), potential applications, and future outlooks.

II Maximum-likelihood-estimate Hamiltonian learning

To start, we consider an unknown target quantum system H^s=jμjO^j\hat{H}_{s}=\sum_{j}\mu_{j}\hat{O}_{j} in thermal equilibrium, and measurements of a set of observables {O^i}\{\hat{O}_{i}\} on its Gibbs state ρ^s=exp(βH^s)/tr[exp(βH^s)]\hat{\rho}_{s}=\exp(-\beta\hat{H}_{s})/\mbox{tr}[\exp(-\beta\hat{H}_{s})], where β\beta is the inverse temperature. Given a sufficient number NiN_{i} of measurements of the operator O^i\hat{O}_{i}, the occurrence time fλif_{\lambda_{i}} of the λith\lambda_{i}^{th} eigenvalue oλio_{\lambda_{i}} approaches:

fλi=pλiNitr[ρ^sP^λi]Ni,f_{\lambda_{i}}=p_{\lambda_{i}}N_{i}\approx\mbox{tr}[\hat{\rho}_{s}\hat{P}_{\lambda_{i}}]N_{i}, (1)

where pλi=fλi/Nip_{\lambda_{i}}=f_{\lambda_{i}}/N_{i} denotes the statistics of the outcome oλio_{\lambda_{i}}, and P^λi\hat{P}_{\lambda_{i}} is the corresponding projection operator to the oλio_{\lambda_{i}} sector. Our goal is to locate the model Hamiltonian H^s\hat{H}_{s} for the quantum system, which commonly requires the presence of all H^s\hat{H}_{s}’s terms in the measurement set {O^i}\{\hat{O}_{i}\}.

Following previous MLE analysis [29, 30, 31, 32, 33], the statistical weight of any given state ρ^\hat{\rho} is:

(ρ^)i,λi{tr[ρ^P^λi]fλiNtot}Ntot,\mathcal{L}(\hat{\rho})\propto\prod_{i,\lambda_{i}}\{\mbox{tr}[\hat{\rho}\hat{P}_{\lambda_{i}}]^{\frac{f_{\lambda_{i}}}{N_{tot}}}\}^{N_{tot}}, (2)

upto a trivial factor, where Ntot=iNiN_{tot}=\sum_{i}N_{i} is the total number of measurements. For Hamiltonian learning, we search for (the set of parameters μj{\mu_{j}} of) the MLE Hamiltonian H^\hat{H}, whose Gibbs state ρ^\hat{\rho} maximizes the likelihood function in Eq. 2. The maximum condition for Eq. 2 can be re-expressed as:

R^(ρ^)ρ^=ρ^,\displaystyle\hat{R}(\hat{\rho})\hat{\rho}=\hat{\rho},
R^(ρ^)=i,λifλiNtotP^λitr[ρ^P^λi],\displaystyle\hat{R}(\hat{\rho})=\sum_{i,\lambda_{i}}\frac{f_{\lambda_{i}}}{N_{tot}}\frac{\hat{P}_{\lambda_{i}}}{\mbox{tr}[\hat{\rho}\hat{P}_{\lambda_{i}}]}, (3)

see Appendix A for a detailed review. Solving Eq. 3 is a nonlinear and nontrivial problem, for which many algorithms have been proposed [31, 30, 32, 33]. For example, we can employ iterative updates ρ^k+1R^(ρ^k)ρ^kR^(ρ^k)\hat{\rho}_{k+1}\propto\hat{R}(\hat{\rho}_{k})\hat{\rho}_{k}\hat{R}(\hat{\rho}_{k}) until Eq. 3 is fulfilled [31]. These algorithms mostly center around the parameterization and optimization of a quantum state ρ^\hat{\rho}, whose cost is exponential in the system size. Besides, such iterative updates do not guarantee that the quantum state ρ^\hat{\rho} remains a Gibbs form, especially when the measurements are insufficient to uniquely determine the state (e.g., large noises or small numbers of measurements and there are many quantum states satisfying Eq. 3). Consequently, extracting H^1βlnρ^\hat{H}\propto-\frac{1}{\beta}\ln\hat{\rho} from ρ^\hat{\rho} further adds up to the inconvenience.

Considering that the operator R^(ρ^)\hat{R}(\hat{\rho}) has the same operator structure as the Hamiltonian, we take an alternative stance for the Hamiltonian learning task and update the candidate Hamiltonian H^k\hat{H}_{k}, i.e., the model parameters, collectively and iteratively. In particular, we integrate the corrections to the Hamiltonian coefficients to the operator R^(ρ^)\hat{R}(\hat{\rho}), which offers such a quantum likelihood gradient (Fig. 1):

H^k+1\displaystyle\hat{H}_{k+1} =\displaystyle= H^kγR^k,\displaystyle\hat{H}_{k}-\gamma\hat{R}_{k},
ρ^k+1\displaystyle\hat{\rho}_{k+1} =\displaystyle= eβH^k+1tr[eβH^k+1]=eβ(H^kγR^k)tr[eβ(H^kγR^k)],\displaystyle\frac{e^{-\beta\hat{H}_{k+1}}}{\mbox{tr}[e^{-\beta\hat{H}_{k+1}}]}=\frac{e^{-\beta(\hat{H}_{k}-\gamma\hat{R}_{k})}}{\mbox{tr}[e^{-\beta(\hat{H}_{k}-\gamma\hat{R}_{k})}]}, (4)

where γ>0\gamma>0 is the learning rate - a small parameter controlling the step size. We denote R^kR^(ρ^k)\hat{R}_{k}\equiv\hat{R}(\hat{\rho}_{k}) for short here afterwards. Compared with previous Hamiltonian extractions from MLE quantum state tomography, the update in Eq. 4 possesses several advantages in Hamiltonian learning. First, we can utilize the Hamiltonian structure (e.g., locality) to choose suitable numerical tools (e.g., QMC and FTTN) and even calculate within the subregions - we circumvent the costly parametrization of the quantum state ρ^\hat{\rho}. Also, the update guarantees a state in its Gibbs form. Last but not least, we will show that for γ1\gamma\ll 1, such a quantum likelihood gradient in Eq. 4 yields a negative semi-definite contribution to the negative-log-likelihood function, guaranteeing the MLE Hamiltonian (upto a trivial constant) at its convergence and an efficient optimization toward it.

Refer to caption
Figure 1: An illustration of the MLE Hamiltonian learning algorithm: given the quantum measurements on the Gibbs state ρ^s\hat{\rho}_{s} of the target quantum system, we update the candidate Hamiltonian iteratively until the negative-log-likelihood function (or relative entropy) converges below a given threshold ϵ\epsilon, after which the output yields the MLE Hamiltonian. Within each iterative step, we evaluate the operator expectation values with respect to the Gibbs state ρ^k=exp(βH^k)/tr[exp(βH^k)]\hat{\rho}_{k}=\exp(-\beta\hat{H}_{k})/\mbox{tr}[\exp(-\beta\hat{H}_{k})], which directs the model update ΔH^k=γR^k\Delta\hat{H}_{k}=-\gamma\hat{R}_{k} for the next iterative step.

Theorem: For γ1\gamma\ll 1, γ>0\gamma>0, the quantum likelihood gradient in Eq. 4 yields a negative semi-definite contribution to the negative-log-likelihood function M(ρ^k+1)=1Ntotlog(ρ^k+1)M(\hat{\rho}_{k+1})=-\frac{1}{N_{tot}}\log\mathcal{L}(\hat{\rho}_{k+1}).

Proof: We note that upto linear order in γ1\gamma\ll 1:

eβH^k+1\displaystyle e^{-\beta\hat{H}_{k+1}} =\displaystyle= eβH^kn=0exp[(1)n(n+1)!adβH^kn(βγR^k)+o(γ2)]\displaystyle e^{-\beta\hat{H}_{k}}\prod_{n=0}^{\infty}\exp\left[\frac{(-1)^{n}}{(n+1)!}{\rm ad}_{-\beta\hat{H}_{k}}^{n}(\beta\gamma\hat{R}_{k})+o(\gamma^{2})\right] (5)
\displaystyle\approx eβH^k[1+n=0(1)n(n+1)!adβH^kn(βγR^k)]\displaystyle e^{-\beta\hat{H}_{k}}\left[1+\sum_{n=0}^{\infty}\frac{(-1)^{n}}{(n+1)!}{\rm ad}_{-\beta\hat{H}_{k}}^{n}(\beta\gamma\hat{R}_{k})\right]
=\displaystyle= eβH^k(1+βγ01eβsH^kR^keβsH^kds),\displaystyle e^{-\beta\hat{H}_{k}}(1+\beta\gamma\int_{0}^{1}e^{\beta s\hat{H}_{k}}\hat{R}_{k}e^{-\beta s\hat{H}_{k}}{\rm d}s),

where adA^jB^=[A^,adA^j1B^]{\rm ad}_{\hat{A}}^{j}\hat{B}=[\hat{A},{\rm ad}_{\hat{A}}^{j-1}\hat{B}] and adA^0B^=B^{\rm ad}_{\hat{A}}^{0}\hat{B}=\hat{B} are the adjoint action of the Lie algebra. The first and third lines are based on the Zassenhaus formula[37] and the Baker-Hausdorff formula [37], respectively, while the second line neglects terms above the linear order of γ\gamma.

Following this, we can re-express the quantum state in Eq. 4 as:

ρ^k+1=ρ^k1+βγ01eβsH^kR^keβsH^kds1+βγ,\hat{\rho}_{k+1}=\hat{\rho}_{k}\frac{1+\beta\gamma\int_{0}^{1}e^{\beta s\hat{H}_{k}}\hat{R}_{k}e^{-\beta s\hat{H}_{k}}{\rm d}s}{1+\beta\gamma}, (6)

where we have used tr[ρ^kR^k]=1\mbox{tr}[\hat{\rho}_{k}\hat{R}_{k}]=1 as a direct consequence of RkR_{k}’s definition in Eq. 3.

Subsequently, after introducing the quantum likelihood gradient, the negative-log-likelihood function becomes:

M(ρ^k+1)\displaystyle M(\hat{\rho}_{k+1}) =\displaystyle= 1Ntotlog(ρ^k+1)\displaystyle-\frac{1}{N_{tot}}\log\mathcal{L}(\hat{\rho}_{k+1}) (7)
=\displaystyle= i,λifλiNtotlogtr[ρ^k+1P^λi]\displaystyle-\sum_{i,\lambda_{i}}\frac{f_{\lambda_{i}}}{N_{tot}}\log\mbox{tr}[\hat{\rho}_{k+1}\hat{P}_{\lambda_{i}}]
\displaystyle\approx M(ρ^k)+βγ(1Δk),\displaystyle M(\hat{\rho}_{k})+\beta\gamma(1-\Delta_{k}),

where we keep terms upto linear order of γ\gamma in the log\log expansion.

On the other hand, we can establish the following inequality:

Δk\displaystyle\Delta_{k} =\displaystyle= tr[ρ^k01eβsH^kR^keβsH^kR^kds]\displaystyle\mbox{tr}[\hat{\rho}_{k}\int_{0}^{1}e^{\beta s\hat{H}_{k}}\hat{R}_{k}e^{-\beta s\hat{H}_{k}}\hat{R}_{k}{\rm d}s] (8)
=\displaystyle= 01tr[ρ^keβsH^kR^keβsH^kR^k]tr[ρ^k]ds\displaystyle\int_{0}^{1}\mbox{tr}[\hat{\rho}_{k}e^{\beta s\hat{H}_{k}}\hat{R}_{k}e^{-\beta s\hat{H}_{k}}\hat{R}_{k}]\mbox{tr}[\hat{\rho}_{k}]{\rm d}s
=\displaystyle= 01eβsH^k/2R^keβ(1s)H^k/2F2eβH^k/2F2dsZk2\displaystyle\int_{0}^{1}||e^{-\beta s\hat{H}_{k}/2}\hat{R}_{k}e^{-\beta(1-s)\hat{H}_{k}/2}||_{F}^{2}||e^{-\beta\hat{H}_{k}/2}||_{F}^{2}\frac{{\rm d}s}{Z_{k}^{2}}
\displaystyle\geq 01tr[ρ^kR^k]2ds=1,\displaystyle\int_{0}^{1}\mbox{tr}[\hat{\rho}_{k}\hat{R}_{k}]^{2}{\rm d}s=1,

where Zk=tr[eβH^k]Z_{k}=\mbox{tr}[e^{-\beta\hat{H}_{k}}] is the partition function, AF=tr[AA]||A||_{F}=\sqrt{\mbox{tr}[A^{\dagger}A]} is the Frobenius norm of matrix AA, and the non-negative definiteness of ρ^k\hat{\rho}_{k} allows ρ^k=(ρ^k1/2)2=(eβH^k/2)2/Zk\hat{\rho}_{k}=(\hat{\rho}_{k}^{1/2})^{2}=(e^{-\beta\hat{H}_{k}/2})^{2}/Z_{k}. The inequality in the fourth line follows the Cauchy-Schwarz inequality.

We note that the equality - the convergence criteria of our MLE Hamiltonian learning protocol - is established if and only if:

eβsH^k/2R^keβ(1s)H^k/2=eβH^k/2,e^{-\beta s\hat{H}_{k}/2}\hat{R}_{k}e^{-\beta(1-s)\hat{H}_{k}/2}=e^{-\beta\hat{H}_{k}/2}, (9)

which implies the conventional MLE optimization target R^ρ^=ρ^\hat{R}\hat{\rho}=\hat{\rho} in Eq. 3. We can also establish such consistency from our iterative convergence 111In practice, given sufficient measurements, we have R^kI^\hat{R}_{k}\sim\hat{I} dictating the quantum likelihood gradient at the iteration’s convergence. following Eq. 4:

ρ^k+1=eβ(H^kγR^k)tr[eβ(H^kγR^k)]=eβγR^kρ^ktr[eβγR^kρ^k]=ρ^k,\hat{\rho}_{k+1}=\frac{e^{-\beta(\hat{H}_{k}-\gamma\hat{R}_{k})}}{\mbox{tr}[e^{-\beta(\hat{H}_{k}-\gamma\hat{R}_{k})}]}=\frac{e^{\beta\gamma\hat{R}_{k}}\hat{\rho}_{k}}{\mbox{tr}[e^{\beta\gamma\hat{R}_{k}}\hat{\rho}_{k}]}=\hat{\rho}_{k}, (10)

where we have used the commutation relation [R^k,H^k]=0[\hat{R}_{k},\hat{H}_{k}]=0 between the Hermitian operators R^k\hat{R}_{k} and H^k\hat{H}_{k} following [R^k,ρ^k]=[R^k,eβH^k]=0[\hat{R}_{k},\hat{\rho}_{k}]=[\hat{R}_{k},e^{-\beta\hat{H}_{k}}]=0.

Finally, combining Eq. 7 and Eq. 8, we have shown that M(ρ^k+1)M(ρ^k)0M(\hat{\rho}_{k+1})-M(\hat{\rho}_{k})\leq 0 is a negative semi-definite quantity, which proves the theorem.

We conclude that the quantum likelihood gradient in Eq. 4 offers an efficient and collective optimization towards the MLE Hamiltonian, modifying all model parameters simultaneously. For each step of quantum likelihood gradient, the most costly calculation is on ρ^k+1\hat{\rho}_{k+1}, or more precisely, the expectation value tr[ρ^k+1P^λi]\mbox{tr}[\hat{\rho}_{k+1}\hat{P}_{\lambda_{i}}] from H^k+1\hat{H}_{k+1}. Fortunately, this is a routine calculation in quantum many-body physics and condensed matter physics with various tailored candidate algorithms under different scenarios. For example, we may resort to the FTTN, or the QMC approaches, which readily apply to much larger systems than brute-force exact diagonalization. Thus, we emphasize that MLE Hamiltonian learning works with evaluations of the expectation values of quantum states instead of the more expensive quantum states themselves in their entirety.

Interestingly, MLE Hamiltonian learning also allows a more local stance. For a given Hamiltonian, the necessary expectation value of its Gibbs state tr[ρ^P^λi]\mbox{tr}[\hat{\rho}\hat{P}_{\lambda_{i}}] takes the form:

tr[ρ^P^λi]\displaystyle\mbox{tr}[\hat{\rho}\hat{P}_{\lambda_{i}}] =\displaystyle= tr[ρ^effAP^λi],\displaystyle\mbox{tr}[\hat{\rho}^{A}_{eff}\hat{P}_{\lambda_{i}}],
ρ^effA=trA¯[ρ^]\displaystyle\hat{\rho}^{A}_{eff}=\mbox{tr}_{\bar{A}}[\hat{\rho}] =\displaystyle= eβH^effAtr[eβH^effA],\displaystyle\frac{e^{-\beta\hat{H}^{A}_{eff}}}{\mbox{tr}[e^{-\beta\hat{H}^{A}_{eff}}]}, (11)

where ρ^effA\hat{\rho}^{A}_{eff} is the reduced density operator defined upon a relatively local subsystem AA still containing P^λi\hat{P}_{\lambda_{i}}. The effective Hamiltonian H^effA=H^A+V^effA\hat{H}_{eff}^{A}=\hat{H}_{A}+\hat{V}_{eff}^{A} of the subregion AA contains the existing terms H^A\hat{H}_{A} within the subsystem and the effective interacting terms V^effA\hat{V}_{eff}^{A} from the trace operation [39]. According to the conclusions of the quantum belief propagation theory [40, 39], the locality of the interaction in the latter term V^effA(lA)F(β/βc)lA/r||\hat{V}_{eff}^{A}(l_{A})||_{F}\propto(\beta/\beta_{c})^{l_{A}/r}, where β\beta (βc\beta_{c}) denotes the current (model-dependent critical) inverse temperature, lAl_{A} is the distance between a specific site in the bulk of AA and the boundary of the subregion AA, and rr is the maximum acting distance(diameter) of a single operator in the original Hamiltonian(similar to the k-local in the next section). Thus, when β<βc\beta<\beta_{c}(especially when ββc\beta\ll\beta_{c}), V^effA\hat{V}_{eff}^{A} is exponentially localized around the boundary of AA, and the effective Hamiltonian in the bulk of AA remains the same as that of the original H^A\hat{H}_{A} of the entire system. Therefore, we may further boost the efficiency of MLE Hamiltonian learning by redirecting the expectation-value evaluations of the global quantum system to that of a series of local patches, as we will show in the next section.

In summary, given the quantum measurements of a thermal (Gibbs) state: {O^i}\{\hat{O}_{i}\}, NiN_{i}, and fλif_{\lambda_{i}}, we can perform MLE Hamiltonian learning to obtain the MLE Hamiltonian via the following steps (Fig. 1):

  • 1)

    Initialization/Update:

    For initialization, start with a random model Hamiltonian H^0\hat{H}_{0}:

    H^0=iμiO^i,\hat{H}_{0}=\sum_{i}\mu_{i}\hat{O}_{i}, (12)

    or an identity Hamiltonian.

    For update, carry out the quantum likelihood gradient:

    H^k+1=H^kγR^k,\hat{H}_{k+1}=\hat{H}_{k}-\gamma\hat{R}_{k}, (13)

    where R^k\hat{R}_{k} is defined in Eq. 3 or Eq. 16.

  • 2)

    Evaluate the properties tr[ρ^kP^λi]\mbox{tr}[\hat{\rho}_{k}\hat{P}_{\lambda_{i}}] of the quantum state:

    ρ^k=eβH^ktr[eβH^k],\hat{\rho}_{k}=\frac{e^{-\beta\hat{H}_{k}}}{\mbox{tr}[e^{-\beta\hat{H}_{k}}]}, (14)

    with suitable numerical methods.

  • 3)

    Check for convergence: loop back to step 1) to update, kk+1k\rightarrow k+1, if the relative entropy M(ρ^k)M0ϵM(\hat{\rho}_{k})-M_{0}\geq\epsilon is above a given threshold ϵ\epsilon; otherwise, terminate the process, and the final H^k\hat{H}_{k} is the result for the MLE Hamiltonian. Here, M0M_{0} is the theoretical minimum of the negative-log-likelihood function:

    M0=i,λiNiNtotpλilogpλi.M_{0}=-\sum_{i,\lambda_{i}}\frac{N_{i}}{N_{tot}}p_{\lambda_{i}}\log{p_{\lambda_{i}}}. (15)

In practice, R^k\hat{R}_{k} in Eq. 4 is singular for small values of tr[ρ^kP^λi]\mbox{tr}[\hat{\rho}_{k}\hat{P}_{\lambda_{i}}] and may become numerically unstable, which requires a minimal or dynamical learning rate γ\gamma to maintain the range of quantum likelihood gradient properly. Instead, we may employ a re-scaled version of R^k\hat{R}_{k}:

R^~k=i,λiNiNtotfg(pλi/tr[ρ^kP^λi])P^λi,\tilde{\hat{R}}_{k}=\sum_{i,\lambda_{i}}\frac{N_{i}}{N_{tot}}f_{g}(p_{\lambda_{i}}/\mbox{tr}[\hat{\rho}_{k}\hat{P}_{\lambda_{i}}])\hat{P}_{\lambda_{i}}, (16)

where fgf_{g} is a monotonic tuning-function:

fg(x)=gxx+g1,g>1,g,f_{g}(x)=\frac{gx}{x+g-1},g>1,g\in\mathbb{N}, (17)

which maps its argument in (0,)(0,\infty) to a finite range (0,g)(0,g). Such a re-scaled R^~k\tilde{\hat{R}}_{k} regularizes the quantum likelihood gradient and allows a simple yet relatively larger learning rate γ\gamma for more efficient MLE Hamiltonian learning. We also have fg(1)=1f_{g}(1)=1, therefore R^~kR^k\tilde{\hat{R}}_{k}\rightarrow\hat{R}_{k} as we approach convergence. We will mainly employ R^~k\tilde{\hat{R}}_{k} for our examples in the following sections.

In addition to the negative-log-likelihood function M(ρ^k)M(\hat{\rho}_{k}), we also consider the Hamiltonian distance as another criterion on the quality of Hamiltonian learning:

Δμk=μsμk2μs2,\Delta{\vec{\mu}}_{k}=\frac{||\vec{\mu}_{s}-\vec{\mu}_{k}||_{2}}{||\vec{\mu}_{s}||_{2}}, (18)

where μs\vec{\mu}_{s} and μk\vec{\mu}_{k} are the (vectors of) coefficients 222We typically perform MLE Hamiltonian learning and update the model Hamiltonian on the projection-operator basis; therefore, we transform the Hamiltonian back to the original, ordinary operator basis before Δμk\Delta\vec{\mu}_{k} evaluations. of the target Hamiltonian and the learned Hamiltonian after kk iterations, respectively. However, we do not recommend Δμk\Delta{\vec{\mu}}_{k}(Δμ\Delta\mu for short) as convergence criteria as μs\vec{\mu}_{s} is generally unknown aside from benchmark scenarios [28].

III Example models and results

In this section, we demonstrate the performance of the MLE Hamiltonian learning protocol. For better numerical simulations, we consider the kk-local Hamiltonians, with operators acting non-trivially on no more than kk contiguous sites in each direction. For example, for a 1-dimensional spin-12\frac{1}{2} system, a kk-local operator for k=2k=2 takes the form S^iαS^i+1β\hat{S}_{i}^{\alpha}\hat{S}_{i+1}^{\beta} or S^iα\hat{S}_{i}^{\alpha}, α,β{x,y,z}\alpha,\beta\in\{x,y,z\}, where S^iα\hat{S}_{i}^{\alpha} denotes the spin operator. In particular, we focus on general 1D quantum spin chains with k=2k=2, taking the following form:

H^s=i,α,βL1JiαβS^iαS^i+1β+i,αLhiαS^iα,\hat{H}_{s}=\sum_{i,\alpha,\beta}^{L-1}J_{i}^{\alpha\beta}\hat{S}_{i}^{\alpha}\hat{S}_{i+1}^{\beta}+\sum_{i,\alpha}^{L}h_{i}^{\alpha}\hat{S}_{i}^{\alpha}, (19)

where S^iα\hat{S}_{i}^{\alpha} denotes the spin operator on site ii, α,β,{x,y,z}\alpha,\beta,\in\{x,y,z\}. There are 12L912L-9 2-local operators under the open boundary condition, where LL is the system size. We generate the model parameters μs={Jiαβ,hiα}\vec{\mu}_{s}=\{J_{i}^{\alpha\beta},h_{i}^{\alpha}\} randomly following a uniform distribution in [1,1][-1,1]. This Hamiltonian H^s\hat{H}_{s}, specifically the model parameters μs\vec{\mu}_{s}, will be our target for MLE Hamiltonian learning. As the protocol’s inputs, we simulate quantum measurements of all 2-local operators {O^i}\{\hat{O}_{i}\} on the Gibbs states of H^s\hat{H}_{s} numerically via exact diagonalization on small systems and FTTN for large systems. For the latter, we use a tensor network ansatz called the “ancilla” method [34], where we purify a Gibbs state with some auxiliary qubits ρ^s=traux|ψsψs|\hat{\rho}_{s}=\mbox{tr}_{aux}|\psi_{s}\rangle\langle\psi_{s}|, and obtain |ψs=eβH^s/2|ψ0\ket{\psi_{s}}=e^{-\beta\hat{H}_{s}/2}\ket{\psi_{0}} from a maximally-entangled state |ψ0\ket{\psi_{0}} via imaginary time evolution. In addition, given a large number nn of Trotter steps, the imaginary time evolution operator eβH^s/2e^{-\beta\hat{H}_{s}/2} is decomposed into Trotter gates’ product as (ΠieμiO^iβ/2n)n+O(β2/n)(\Pi_{i}e^{-\mu_{i}\hat{O}_{i}\beta/2n})^{n}+O(\beta^{2}/n). Here, we set the Trotter step δt=β/n[0.01,0.1]\delta t=\beta/n\in[0.01,0.1], for which the Trotter errors of order O(β2/n)O(\beta^{2}/n) show little impact on our protocol’s accuracy. Without loss of generality, we employ the integrated FTTN algorithm in the ITensor numerical toolkit [35], and set the number of measures Ni=NN_{i}=N for all operators in our examples for simplicity.

Refer to caption
Figure 2: Both the Hamiltonian distance Δμ\Delta\mu defined in Eq. 18 and the negative-log-likelihood function M(ρ^k+1)M(\hat{\rho}_{k+1}) (or relative entropy M(ρ^k+1)M0M(\hat{\rho}_{k+1})-M_{0}) show successful convergence of the iterations in MLE Hamiltonian learning, albeit a variety of system sizes and temperature ranges. We simulate the target Hamiltonian and the iteration process by FTTN with Trotter step δt=0.1\delta t=0.1. Each curve is averaged on 10 trials of random H^0\hat{H}_{0} initializations. We set the learning rate γ=0.1\gamma=0.1. The maximum number of iterations here is 1000.

As we demonstrate in Fig. 2, MLE Hamiltonian learning obtains the target Hamiltonians with high accuracy and efficiency under various settings of system sizes and inverse temperatures β\beta. Besides, instead of the original quantum likelihood gradient in Eq. 3, we may obtain a faster convergence with the re-scaled R^~k\tilde{\hat{R}}_{k} in Eq. 16 and a larger learning rate, as we discuss in Appendix B. In the following numerical examples, we use the re-scaled quantum likelihood gradient R^~k\tilde{\hat{R}}_{k} and set g=2g=2 for the tuning function in Eq. 17. Within the given iterations, not only have we achieved results (Hamiltonian distance ΔμO(1012)\Delta\mu\sim O(10^{-12}) and relative entropy M(ρ^k)M0O(1016)M(\hat{\rho}_{k})-M_{0}\sim O(10^{-16})) comparable to, if not exceeding, previous methods [19] for L=10L=10 systems and β=1\beta=1 straightforwardly, but we have also achieved satisfactory consistency (ΔμO(102)\Delta\mu\sim O(10^{-2}) and M(ρ^k)M0O(109)M(\hat{\rho}_{k})-M_{0}\sim O(10^{-9})) for large systems L=100L=100 and low temperatures β=3\beta=3 that were previously inaccessible.

Refer to caption
Figure 3: The performance of MLE Hamiltonian learning maintains relatively well against noises and, especially, broader temperature ranges. Left: the Hamiltonian distance versus the inverse temperature β\beta shows a broader applicable temperature range. Each data point contains 10 trials. Right: the performance (left figure’s data averaged over temperature) versus the noise strength δ\delta shows the impact of noises and the protocol’s relative robustness against them. The slope of the straight line is 1\sim 1, indicating a linear relationship between Δμ\Delta\mu and δ\delta. Note the log scale log(Δμ)\log(\Delta\mu) for the vertical axis. We set L=10L=10 for the system size, and learning rate γ=1\gamma=1.

MLE Hamiltonian learning is also relatively robust against temperature and noises, two key factors impacting accuracy in Hamiltonian learning. For illustration, we include random errors δO^i\delta\langle\hat{O}_{i}\rangle following Gaussian distribution with zero mean and standard deviation δ\delta to all quantum measurements: O^iO^i+δO^i\langle\hat{O}_{i}\rangle\rightarrow\langle\hat{O}_{i}\rangle+\delta\langle\hat{O}_{i}\rangle. We note that such δ\delta may also depict the quantum fluctuations [19, 21] from a finite number of measurements δNi1/2\delta\propto N_{i}^{-1/2}. We also focus on smaller systems with L=10L=10 and employ exact diagonalization to avoid confusion from potential Trotter error of the FTTN ansatz[34]. We summarize the results in Fig. 3.

Most previous algorithms on Hamiltonian learning have a rather specific applicable temperature range. For example, the high-temperature expansion of eβH^e^{-\beta\hat{H}} only works in the β1\beta\ll 1 limit [42, 43]. Besides, gradient descent on the log partition function, despite a convex optimization, performs well in a narrow temperature range [20]. The gradient of this algorithm is proportional to the inverse temperature, so the algorithm’s convergence slows at high temperatures. Also, the gradient descent algorithm cannot extend to the β\beta\rightarrow\infty limit - the ground state, while our protocol is directly applicable to the ground states of quantum systems, as we will generalize and justify later.

MLE Hamiltonian learning is also more robust to noises, with an accuracy of Hamiltonian distance ΔμO(1011)\Delta\mu\sim O(10^{-11}) across a broad temperature range at noise strength δO(1012)\delta\sim O(10^{-12}). Such noise level is hard to realize in practice; nevertheless, it is necessary to safeguard the correlation matrix method [17, 19, 44]. Even so, due to the uncontrollable spectral gap, the correlation matrix method is susceptible to high temperature, and its accuracy drastically decreases to ΔμO(103)\Delta\mu\sim O(10^{-3}) at β=0.01\beta=0.01. In comparison, MLE Hamiltonian learning is more versatile, with an approximately linear dependence between its accuracy Δμ\Delta\mu and the noise strength δ\delta across a broad range of temperatures and noise strengths, saturating the previous bound [20]; see the right panel of Fig. 3. We also provide more detailed comparisons between the algorithms in Appendix C.

Refer to caption
Figure 4: Upper: MLE Hamiltonian learning’s need for evaluations of local observables can be satisfied among local patches for thermal states at sufficiently high temperatures, where the effective potential VeffAV^{A}_{eff} (VeffBV^{B}_{eff}) is weak and localized on the boundaries of subregion AA (BB) within a cut-off range ΛA\Lambda_{A} (ΛB\Lambda_{B}). Consequently, for P^λi\hat{P}_{\lambda_{i}} defined sufficiently deep inside AA (BB), we can estimate its expectation value via tr[ρ^AP^λi]\mbox{tr}[\hat{\rho}^{A}\hat{P}_{\lambda_{i}}] (tr[ρ^BP^λi]\mbox{tr}[\hat{\rho}^{B}\hat{P}_{\lambda_{i}}]). Lower: the Hamiltonian distance Δμ\Delta\mu of the results after MLE Hamiltonian learning indicates better validity of the local-patch approximation at higher temperatures. The total system size is L=100L=100, while the local patches are of sizes LA=10,16L_{A}=10,16 with cut-offs Λ=1,2,4\Lambda=1,2,4, respectively. Each data point contains 10 trials. We set δt=0.1\delta t=0.1 for the Trotter step in FTTN ansatz and learning rate γ=1\gamma=1.

Despite efficient quantum likelihood gradient and applicable quantum many-body ansatz, the computational cost of MLE Hamiltonian learning still increases rapidly with the system size LL. Fortunately, as stated above in Eq. 11, we may resort to calculations on local patches, especially for low dimensions and high temperatures due to their quasi-Markov property. In particular, when β<βc\beta<\beta_{c} (T>TcT>T_{c}), the difference between the cutoff Hamiltonian H^A\hat{H}_{A} and the effective Hamiltonian H^effA\hat{H}_{eff}^{A} in a local subregion AA, VeffAV_{eff}^{A}, should be weak, short-ranged, and localized at AA’s boundary [40, 39]; therefore, for those operators P^λi\hat{P}_{\lambda_{i}} adequately deep inside AA, we can use ρ^A\hat{\rho}^{A}, the Gibbs state defined by H^A\hat{H}_{A}, to estimate the corresponding tr[ρ^P^λi]\mbox{tr}[\hat{\rho}\hat{P}_{\lambda_{i}}]; see illustration in Fig. 4 upper panel.

For example, we apply MLE Hamiltonian learning on L=100L=100 systems, where we iteratively calculate the necessary expectation values on different local patches of size LA=10,16L_{A}=10,16. We also choose different cut-offs Λ\Lambda, and evaluate tr[ρ^AP^λi]\mbox{tr}[\hat{\rho}^{A}\hat{P}_{\lambda_{i}}] for those operators at least Λ\Lambda away from the boundaries and sufficiently deep inside the subregion AA, so that the effective potential VeffAV^{A}_{eff} may become negligible. We also employ a sufficient number of local patches to guarantee full coverage of necessary observables - operators outside AA or in ΛA\Lambda_{A} are obtainable from another local patch BB, as shown in the upper panel of Fig. 4, and so on so forth. Both the L=100L=100 target system and the local patches for MLE Hamiltonian learning are simulated via FTTN. We have no problem achieving convergence, and the resulting Hamiltonians’ accuracy, the Hamiltonian distance Δμ\Delta\mu versus the inverse temperature β\beta, is summarized in the lower panel of Fig. 4. Indeed, the local-patch approximation is more reliable at higher temperatures, as well as with larger subsystems and cutoffs, albeit with rising costs. We also note that we can achieve much larger systems with the local patches than L=100L=100 we have demonstrated.

IV MLE Hamiltonian learning for pure eigenstates

In addition to the Gibbs states, MLE Hamiltonian learning also applies to measurements of certain eigenstates of target quantum systems:

1. The ground states are essentially the β\beta\rightarrow\infty limit of the Gibbs states. However, due to the order-of-limit issue, the γ0\gamma\rightarrow 0 requirement of the theorem on Gibbs states forbids a direct extension to the ground states. In the Appendix D, we offer rigorous proof of the effectiveness of quantum likelihood gradient based on ground-state measurements, along with several nontrivial MLE Hamiltonian learning examples on quantum critical and topological ground states. We note that Ref. 45 offers preliminary studies on pure-state quantum state tomography, inspiring this work.

2. A highly-excited eigenstate of a (non-integrable) quantum chaotic system H^s\hat{H}_{s} is believed to obey the eigenstate thermalization hypothesis (ETH), that its density operator ρ^s=|ψsψs|\hat{\rho}_{s}=\ket{\psi_{s}}\bra{\psi_{s}} behaves locally indistinguishable from a Gibbs state ρ^s,A\hat{\rho}_{s,A} in thermal equilibrium [46]:

ρ^s,A=trA¯[ρ^s]eβsH^Atr[eβsH^A],\hat{\rho}_{s,A}=\mbox{tr}_{\bar{A}}[\hat{\rho}_{s}]\approx\frac{e^{-\beta_{s}\hat{H}_{A}}}{\mbox{tr}[e^{-\beta_{s}\hat{H}_{A}}]}, (20)

where βs\beta_{s} is an effective temperature determined by the energy expectation value ψs|H^s|ψs=tr[eβsH^sH^s]tr[eβsH^s]\braket{\psi_{s}}{\hat{H}_{s}}{\psi_{s}}=\frac{\mbox{tr}[e^{-\beta_{s}\hat{H}_{s}}\hat{H}_{s}]}{\mbox{tr}[e^{-\beta_{s}\hat{H}_{s}}]}. As MLE Hamiltonian learning only engages local operators, its applicability directly generalizes to such eigenstates |ψs\ket{\psi_{s}} following ETH.

3. In general, ETH applies to eigenstates in the center of the spectrum of quantum chaotic systems, while low-lying eigenstates are too close to the ground state to exhibit ETH [47]. However, in the rest of the section, we demonstrate numerically that MLE Hamiltonian learning still works well for low-lying eigenstates.

We consider the 1D longitudinal-transverse-field Ising model [46, 47] as our target quantum system:

H^s=JjL1σ^jzσ^j+1z+gzjLσ^jz+gxjLσ^jx,\hat{H}_{s}=J\sum_{j}^{L-1}\hat{\sigma}_{j}^{z}\hat{\sigma}_{j+1}^{z}+g_{z}\sum_{j}^{L}\hat{\sigma}_{j}^{z}+g_{x}\sum_{j}^{L}\hat{\sigma}_{j}^{x}, (21)

where the system size is L=80L=80. We set J=1J=1, gx=0.9045g_{x}=0.9045, and gz=0.8090g_{z}=0.8090. The quantum system is strongly non-integrable under such settings. Previous studies mainly focused on eigenstates in the middle of the energy spectrum. In contrast, we pick the first excited state - a typical low-lying eigenstate considered asymptotically integrable and ETH-violating [47] - for quantum measurements (via DMRG) and then MLE Hamiltonian learning for its candidate Hamiltonian (via FTTN).

Refer to caption
Figure 5: The coefficients obtained via MLE Hamiltonian learning (green columns) compare well with those of the target Hamiltonian H^s\hat{H}_{s} even though the quantum measurements are based upon a low-lying (first) excited state. The red columns denote the coefficients of H^s\hat{H}_{s} in Eq. 21 multiply the effective (inverse) temperature βs=4\beta_{s}=4. The error bars demonstrate the variances over the lattices and trials. We set the system size L=80L=80, learning rate γ=0.1\gamma=0.1, and the Trotter step δt=0.1\delta t=0.1.

We summarize the results in Fig. 5. Further, the model Hamiltonian we established is approximately equivalent to the target quantum Hamiltonian at an (inverse) temperature βs4\beta_{s}\approx 4 [46], which we have absorbed into the unit of our H^k\hat{H}_{k}. Therefore, we have accurately established the model Hamiltonian and derived the effective temperature consistent with previous results [46] for a low-lying excited eigenstate not necessarily following ETH. The physical reason for quantum likelihood gradient applicability in such states is an interesting problem that deserves further studies.

V Discussions

We have proposed a novel MLE Hamiltonian learning protocol to achieve the model Hamiltonian of the target quantum system based on quantum measurements of its Gibbs states. The protocol updates the model Hamiltonian iteratively with respect to the negative-log-likelihood function from the measurement data. We have theoretically proved the efficiency and convergence of the corresponding quantum likelihood gradient and demonstrated it numerically on multiple non-trivial examples, which show more accuracy, better robustness against noises, and less temperature dependence. Indeed, the accuracy is almost linear to the imposed noise amplitude, thus inverse proportional to the square root of the number of samples, the asymptotic upper bound[20]. Further, MLE Hamiltonian learning directly rests on the Hamiltonians and their physical properties instead of direct and costly access to the quantum many-body states. Consequently, we can resort to various quantum many-body ansatzes in our systematic quantum toolbox and even local-patch approximation when the situation allows. These advantages allow applications to larger systems and lower temperatures with better accuracy than previous approaches. On the other hand, while our protocol is generally applicable for learning any Hamiltonian, its advantages are most apparent for local Hamiltonians, where various quantum many-body ansatzes and local-patch approximation shine. Despite such limitations, we note that the physical systems are characterized by local Hamiltonians in a significant proportion of scenarios.

In addition to the Gibbs states, we have generalized the applicability of MLE Hamiltonian learning to eigenstates of the target quantum states, including ground states, ETH states, and even selected cases of low-lying excited states. We have also provided theoretical proof of quantum likelihood gradient rigor and convergence in the Appendix D, along with several other numerical examples.

Our strategy may apply to the entanglement Hamiltonians and the tomography of the quantum states under the maximum-likelihood-maximum-entropy assumption [48]. Besides, our algorithm may also provide insights into the quantum Boltzmann machine [36] - a quantum version of the classical Boltzmann machine with degrees of freedom that obey the distribution of a target quantum Gibbs state. Instead of brute-force calculations of the loss function derivatives with respect to the model parameters or approximations with the gradients’ upper bounds, our protocol provides an efficient optimization that updates the model parameters collectively.

Acknowledgement:- We thank insightful discussions with Jia-Bao Wang. We acknowledge support from the National Key R&D Program of China (No.2021YFA1401900) and the National Science Foundation of China (No.12174008 & No.92270102). The calculations of this work are supported by HPC facilities at Peking University.

References

  • Lanczos [1950] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, Journal of Research of the National Bureau of Standards 45, 255 (1950).
  • White [1992] S. R. White, Density matrix formulation for quantum renormalization groups, Phys. Rev. Lett. 69, 2863 (1992).
  • Schollwoeck and Germany Institute for Advanced Study Berlin [2011] U. Schollwoeck and . B. Germany Institute for Advanced Study Berlin, Wallotstrasse 19, The density-matrix renormalization group in the age of matrix product states, Annals of Physics (New York) 32610.1016/j.aop.2010.09.012 (2011).
  • Foulkes et al. [2001] W. M. C. Foulkes, L. Mitas, R. J. Needs, and G. Rajagopal, Quantum monte carlo simulations of solids, Rev. Mod. Phys. 73, 33 (2001).
  • Zhang and Liu [2019] X. W. Zhang and Y. L. Liu, Electronic transport and spatial current patterns of 2d electronic system: A recursive green’s function method study, AIP Advances 9, 115209 (2019).
  • Nielsen and Chuang [2002] M. A. Nielsen and I. Chuang, Quantum computation and quantum information (2002).
  • Nayak et al. [2008] C. Nayak, S. H. Simon, A. Stern, M. Freedman, and S. Das Sarma, Non-abelian anyons and topological quantum computation, Rev. Mod. Phys. 80, 1083 (2008).
  • Buluta and Nori [2009] I. Buluta and F. Nori, Quantum simulators, Science 326, 108 (2009).
  • Georgescu et al. [2014] I. M. Georgescu, S. Ashhab, and F. Nori, Quantum simulation, Rev. Mod. Phys. 86, 153 (2014).
  • Barthelemy and Vandersypen [2013] P. Barthelemy and L. M. K. Vandersypen, Quantum dot systems: a versatile platform for quantum simulations, Annalen der Physik 525, 808 (2013).
  • Browaeys and Lahaye [2020] A. Browaeys and T. Lahaye, Many-body physics with individually controlled rydberg atoms, Nature Physics 16, 132 (2020).
  • Scholl et al. [2021] P. Scholl, M. Schuler, H. J. Williams, A. A. Eberharter, D. Barredo, K.-N. Schymik, V. Lienhard, L.-P. Henry, T. C. Lang, T. Lahaye, A. M. Läuchli, and A. Browaeys, Quantum simulation of 2d antiferromagnets with hundreds of rydberg atoms, Nature 595, 233 (2021).
  • Ebadi et al. [2022] S. Ebadi, A. Keesling, M. Cain, T. T. Wang, H. Levine, D. Bluvstein, G. Semeghini, A. Omran, J.-G. Liu, R. Samajdar, X.-Z. Luo, B. Nash, X. Gao, B. Barak, E. Farhi, S. Sachdev, N. Gemelke, L. Zhou, S. Choi, H. Pichler, S.-T. Wang, M. Greiner, V. Vuletić, and M. D. Lukin, Quantum optimization of maximum independent set using rydberg atom arrays, Science 376, 1209 (2022).
  • Bluvstein et al. [2021] D. Bluvstein, A. Omran, H. Levine, A. Keesling, G. Semeghini, S. Ebadi, T. T. Wang, A. A. Michailidis, N. Maskara, W. W. Ho, S. Choi, M. Serbyn, M. Greiner, V. Vuletić, and M. D. Lukin, Controlling quantum many-body dynamics in driven rydberg atom arrays, Science 371, 1355 (2021).
  • Bistritzer and MacDonald [2011] R. Bistritzer and A. H. MacDonald, Moir&#xe9; bands in twisted double-layer graphene, Proceedings of the National Academy of Sciences 108, 12233 (2011).
  • Bardeen et al. [1957] J. Bardeen, L. N. Cooper, and J. R. Schrieffer, Theory of superconductivity, Phys. Rev. 108, 1175 (1957).
  • Qi and Ranard [2019] X.-L. Qi and D. Ranard, Determining a local Hamiltonian from a single eigenstate, Quantum 3, 159 (2019).
  • Dupont et al. [2019] M. Dupont, N. Macé, and N. Laflorencie, From eigenstate to hamiltonian: Prospects for ergodicity and localization, Phys. Rev. B 100, 134201 (2019).
  • Bairey et al. [2019] E. Bairey, I. Arad, and N. H. Lindner, Learning a local hamiltonian from local measurements, Phys. Rev. Lett. 122, 020504 (2019).
  • Anshu et al. [2021] A. Anshu, S. Arunachalam, T. Kuwahara, and M. Soleimanifar, Sample-efficient learning of interacting quantum systems, Nature Physics 17, 931 (2021).
  • Zhou and Zhou [2022] J. Zhou and D. L. Zhou, Recovery of a generic local hamiltonian from a steady state, Phys. Rev. A 105, 012615 (2022).
  • Turkeshi et al. [2019] X. Turkeshi, T. Mendes-Santos, G. Giudici, and M. Dalmonte, Entanglement-guided search for parent hamiltonians, Phys. Rev. Lett. 122, 150606 (2019).
  • Valenti et al. [2022] A. Valenti, G. Jin, J. Léonard, S. D. Huber, and E. Greplova, Scalable hamiltonian learning for large-scale out-of-equilibrium quantum dynamics, Phys. Rev. A 105, 023302 (2022).
  • Wenjun Yu [2022] Z. H. X. Y. Wenjun Yu, Jinzhao Sun, Practical and efficient hamiltonian learning (2022), arXiv:2201.00190 .
  • Hsin-Yuan Huang [2022] D. F. Y. S. Hsin-Yuan Huang, Yu Tong, Learning many-body hamiltonians with heisenberg-limited scaling (2022), arXiv:2210.03030 .
  • Frederik Wilde [2022] I. R. D. H. R. S. J. E. Frederik Wilde, Augustine Kshetrimayum, Scalably learning quantum many-body hamiltonians from dynamical data (2022), arXiv:2209.14328 .
  • Wang et al. [2017] J. Wang, S. Paesani, R. Santagati, S. Knauer, A. A. Gentile, N. Wiebe, M. Petruzzella, J. L. O’Brien, J. G. Rarity, A. Laing, and M. G. Thompson, Experimental quantum hamiltonian learning, Nature Physics 13, 551 (2017).
  • Carrasco et al. [2021] J. Carrasco, A. Elben, C. Kokail, B. Kraus, and P. Zoller, Theoretical and experimental perspectives of quantum verification, PRX Quantum 2, 010102 (2021).
  • Hradil [1997] Z. Hradil, Quantum-state estimation, Phys. Rev. A 55, R1561 (1997).
  • Řeháček et al. [2007] J. Řeháček, Z. c. v. Hradil, E. Knill, and A. I. Lvovsky, Diluted maximum-likelihood algorithm for quantum tomography, Phys. Rev. A 75, 042108 (2007).
  • Lvovsky [2004] A. I. Lvovsky, Iterative maximum-likelihood reconstruction in quantum homodyne tomography, Journal of Optics B: Quantum and Semiclassical Optics 6, S556 (2004).
  • Teo et al. [2012] Y. S. Teo, B. Stoklasa, B.-G. Englert, J. Řeháček, and Z. c. v. Hradil, Incomplete quantum state estimation: A comprehensive study, Phys. Rev. A 85, 042317 (2012).
  • Fiurášek and Hradil [2001] J. Fiurášek and Z. c. v. Hradil, Maximum-likelihood estimation of quantum processes, Phys. Rev. A 63, 020101 (2001).
  • Feiguin and White [2005] A. E. Feiguin and S. R. White, Finite-temperature density matrix renormalization using an enlarged hilbert space, Phys. Rev. B 72, 220401 (2005).
  • Fishman et al. [2022] M. Fishman, S. R. White, and E. M. Stoudenmire, The ITensor Software Library for Tensor Network Calculations, SciPost Phys. Codebases , 4 (2022).
  • Amin et al. [2018] M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, and R. Melko, Quantum boltzmann machine, Phys. Rev. X 8, 021050 (2018).
  • Kimura [2017] T. Kimura, Explicit description of the Zassenhaus formula, Progress of Theoretical and Experimental Physics 201710.1093/ptep/ptx044 (2017), 041A03.
  • Note [1] In practice, given sufficient measurements, we have R^kI^\hat{R}_{k}\sim\hat{I} dictating the quantum likelihood gradient at the iteration’s convergence.
  • Kuwahara et al. [2020] T. Kuwahara, K. Kato, and F. G. S. L. Brandão, Clustering of conditional mutual information for quantum gibbs states above a threshold temperature, Phys. Rev. Lett. 124, 220601 (2020).
  • Bilgin and Poulin [2010] E. Bilgin and D. Poulin, Coarse-grained belief propagation for simulation of interacting quantum systems at all temperatures, Phys. Rev. B 81, 054106 (2010).
  • Note [2] We typically perform MLE Hamiltonian learning and update the model Hamiltonian on the projection-operator basis; therefore, we transform the Hamiltonian back to the original, ordinary operator basis before Δμk\Delta\vec{\mu}_{k} evaluations.
  • Jeongwan Haah [2021] E. T. Jeongwan Haah, Robin Kothari, Optimal learning of quantum hamiltonians from high-temperature gibbs states (2021), arXiv:2108.04842 .
  • Rudinger and Joynt [2015] K. Rudinger and R. Joynt, Compressed sensing for hamiltonian reconstruction, Phys. Rev. A 92, 052322 (2015).
  • Tim J. Evans [2019] S. T. F. Tim J. Evans, Robin Harper, Scalable bayesian hamiltonian learning (2019), arXiv:1912.07636 .
  • Jia-Bao Wang [2022] Y. Z. Jia-Bao Wang, Single-shot quantum measurements sketch quantum many-body states (2022), arXiv:2203.01348 .
  • Garrison and Grover [2018] J. R. Garrison and T. Grover, Does a single eigenstate encode the full hamiltonian?, Phys. Rev. X 8, 021026 (2018).
  • Kim et al. [2014] H. Kim, T. N. Ikeda, and D. A. Huse, Testing whether all eigenstates obey the eigenstate thermalization hypothesis, Phys. Rev. E 90, 052105 (2014).
  • Teo et al. [2011] Y. S. Teo, H. Zhu, B.-G. Englert, J. Řeháček, and Z. c. v. Hradil, Quantum-state reconstruction by maximizing likelihood and entropy, Phys. Rev. Lett. 107, 020404 (2011).
  • Rahmani et al. [2015] A. Rahmani, X. Zhu, M. Franz, and I. Affleck, Phase diagram of the interacting majorana chain model, Phys. Rev. B 92, 235123 (2015).
  • Gong et al. [2017] S.-S. Gong, W. Zhu, J.-X. Zhu, D. N. Sheng, and K. Yang, Global phase diagram and quantum spin liquids in a spin-12\frac{1}{2} triangular antiferromagnet, Phys. Rev. B 96, 075116 (2017).
  • Zhang et al. [2012] Y. Zhang, T. Grover, A. Turner, M. Oshikawa, and A. Vishwanath, Quasiparticle statistics and braiding from ground-state entanglement, Phys. Rev. B 85, 235151 (2012).

Appendix A Maximum condition for MLE

In this appendix, we review the derivation of the maximum condition [30] in Eq. 3 in the main text.

A general quantum state takes the form of a density operator:

ρ^=jpj|ψjψj|,\hat{\rho}=\sum_{j}p_{j}\ket{\psi_{j}}\bra{\psi_{j}}, (22)

where pj0p_{j}\geq 0, jpj=1\sum_{j}p_{j}=1, and |ψj\ket{\psi_{j}} is a set of orthonormal basis. The search for the quantum state that maximizes the likelihood function:

(ρ^)=i,λi{tr[ρ^P^λi]fλiNtot}Ntot,\mathcal{L}(\hat{\rho})=\prod_{i,\lambda_{i}}\{\mbox{tr}[\hat{\rho}\hat{P}_{\lambda_{i}}]^{\frac{f_{\lambda_{i}}}{N_{tot}}}\}^{N_{tot}}, (23)

can be converted to the optimization problem:

minρ^𝒟M(ρ^)=1Ntotlog[(ρ^)]subjecttoρ^0,tr[ρ^]=1.\begin{split}&\min\limits_{\hat{\rho}\in\mathcal{D}}\quad M(\hat{\rho})=-\frac{1}{N_{tot}}\log[\mathcal{L}(\hat{\rho})]\\ &{\rm subject\enspace to}\quad\hat{\rho}\succeq 0,\mbox{tr}[\hat{\rho}]=1.\\ \end{split} (24)

It is hard to solve this semi-definite programming problem directly and numerically. Instead, forgoing the non-negative definiteness, we adopt the Lagrangian multiplier method:

ψj|{M(ρ^)+λtr[ρ^]}=0,\frac{\partial}{\partial{\bra{\psi_{j}}}}\{M(\hat{\rho})+\lambda\mbox{tr}[\hat{\rho}]\}=0, (25)

where λ\lambda is a Lagrangian multiplier. Given Eq. 22, we obtain the following solution:

R^|ψj=|ψjR^=i,λifλiNtotP^λitr[ρ^P^λi],\begin{split}&\hat{R}\ket{\psi_{j}}=\ket{\psi_{j}}\\ &\hat{R}=\sum_{i,\lambda_{i}}\frac{f_{\lambda_{i}}}{N_{tot}}\frac{\hat{P}_{\lambda_{i}}}{\mbox{tr}[\hat{\rho}\hat{P}_{\lambda_{i}}]},\\ \end{split} (26)

and λ=1\lambda=1. Combining Eq. 22 and Eq. 26, we obtain the maximum condition:

R^ρ^=ρ^.\hat{R}\hat{\rho}=\hat{\rho}. (27)

We note that Eq. 26 does not guarantee the positive semi-definiteness of the density operator. Instead, one may search within the density-operator space (the space of positive semi-definite matrix with unit trace) to locate the MLE quantum state fulfilling Eq. 26 or Eq. 27. For the Hamiltonian learning task in this work, the search space is naturally the space of Gibbs states (under selected quantum many-body ansatz).

Appendix B MLE Hamiltonian learning with rescaling function

In this appendix, we compare the MLE Hamiltonian learning with the quantum likelihood gradient R^k\hat{R}_{k} and the re-scaled counterpart R^~k\tilde{\hat{R}}_{k}. As we state in the main text, R^~k\tilde{\hat{R}}_{k} regularizes the gradient, allowing us to employ a larger learning rate γ=1\gamma=1, which leads to a faster convergence (Fig. 6) and a higher accuracy (Tab. 1) given identical number of iterations.

Refer to caption
Figure 6: Both the Hamiltonian distance Δμ\Delta\mu defined in Eq. 18 and the negative-log-likelihood function M(ρ^k+1)M(\hat{\rho}_{k+1}) (or relative entropy M(ρ^k+1)M0M(\hat{\rho}_{k+1})-M_{0}) show successful convergence of the iterations in MLE Hamiltonian learning, albeit a variety of system sizes and temperature ranges. We simulate the target Hamiltonian and the iteration process by FTTN with Trotter step δt=0.1\delta t=0.1. Each curve is averaged on 10 trials of random H^0\hat{H}_{0} initializations. The maximum number of iterations here is 1000. In comparison with Fig. 2 in the main text, we use the re-scaled R^~k\tilde{\hat{R}}_{k} in Eq. 16 in the main text with g=2g=2, which allows us to employ a larger learning rate γ=1\gamma=1 and obtain a faster convergence.
L=10L=10,β=1\beta=1 L=50L=50,β=2\beta=2 L=100L=100,β=3\beta=3
R^k\hat{R}_{k} O(106)O(10^{-6}) O(102)O(10^{-2}) O(101)O(10^{-1})
R^~k\tilde{\hat{R}}_{k} O(1012)O(10^{-12}) O(103)O(10^{-3}) O(102)O(10^{-2})
Table 1: The algorithm’s accuracy (Hamiltonian distance Δμ\Delta\mu) further improves with a re-scaled quantum likelihood gradient R^~k\tilde{\hat{R}}_{k} under various system sizes LL and (inverse) temperatures β\beta.

Appendix C Comparisons between Hamiltonian learning algorithms

In this appendix, we compare different Hamiltonian learning algorithms, including the correlation matrix (CM) method [17, 19], the gradient descent (GD) method [20], and the MLE Hamiltonian learning (MLEHL) algorithm, by looking into some of their numerical results and performances. We consider general 2-local Hamiltonians in Eq. 10 in the main text for demonstration and measurements {O^i}\{\hat{O}_{i}\} over all the 2-local operators (instead of all 4-local operators as in Ref. [19]).

We summarize the results in Fig. 7: the accuracy of CM is unstable and highly sensitive to temperature; while GD performs similarly to the proposed MLEHL algorithm at low temperatures, its descending gradient becomes too small at high temperatures to allow a satisfactory convergence within the given maximum iterations.

Refer to caption
Figure 7: The performances (logarithm of Hamiltonian distances) of different algorithms versus the inverse temperature β\beta show the advantages of the MLEHL algorithm. Each data point contains 10 trials of random Hamiltonians. We also include noises following a normal distribution with zero means and O(1012)O(10^{-12}) standard deviation. For both GD and MLEHL algorithms, we employ a learning rate γ=1\gamma=1 and a maximum number of iterations of 7000. The system size is L=7L=7.

We also compare the convergence rates of the MLEHL and GD algorithms with the same learning rate. As in Fig. 8, the MLEHL algorithm exhibits a faster convergence and a smaller computational cost, which is similar under both algorithms for each iteration.

Refer to caption
Figure 8: The logarithm of the Hamiltonian distances versus the number of iterations shows a faster convergence under the proposed MLEHL algorithm. We also include noises following a normal distribution with zero means and O(1012)O(10^{-12}) standard deviation. We employ a learning rate of γ=1\gamma=1 for both GD and MLEHL. We set the system size L=7L=7 and the (inverse) temperature β=1\beta=1.

Appendix D Hamiltonian learning from ground state

In this appendix, we prove the effectiveness of the quantum likelihood gradient based on measurements of the target quantum system’s ground state and provide several nontrivial numerical examples, including 1D quantum critical states and 2D topological states.

D.1 Proof for ground-state-based quantum likelihood gradient

Given a sufficient number NiN_{i} measurements of the operator O^i\hat{O}_{i} on the non-degenerate ground state |ψs\ket{\psi_{s}} of a target system H^s\hat{H}_{s}, we obtain a number of outcomes as the λith\lambda_{i}^{th} eigenvalue of O^i\hat{O}_{i} as:

fλi=pλiNiψs|P^λi|ψsNi,f_{\lambda_{i}}=p_{\lambda_{i}}N_{i}\approx\bra{\psi_{s}}\hat{P}_{\lambda_{i}}\ket{\psi_{s}}N_{i}, (28)

where pλi=fλi/Nip_{\lambda_{i}}=f_{\lambda_{i}}/N_{i}, and P^λi\hat{P}_{\lambda_{i}} is the projection operator of the eigenvalue oλio_{\lambda_{i}}.

Our MLE Hamiltonian learning follows the iterations:

H^k+1=H^kγR^k,R^k=i,λifλiNtotP^λiψkgs|P^λi|ψkgs,\begin{split}&\hat{H}_{k+1}=\hat{H}_{k}-\gamma\hat{R}_{k},\\ &\hat{R}_{k}=\sum_{i,\lambda_{i}}\frac{f_{\lambda_{i}}}{N_{tot}}\frac{\hat{P}_{\lambda_{i}}}{\bra{\psi_{k}^{gs}}\hat{P}_{\lambda_{i}}\ket{\psi_{k}^{gs}}},\\ \end{split} (29)

where |ψkgs\ket{\psi_{k}^{gs}} is the non-degenerate ground state of H^k\hat{H}_{k}.

Theorem: For γ1,γ>0\gamma\ll 1,\gamma>0, the quantum likelihood gradient in Eq. 29 yields a negative semi-definite contribution to the negative-log-likelihood function M(|ψk+1gs)=1Ntotlog(|ψk+1gs)M(\ket{\psi_{k+1}^{gs}})=-\frac{1}{N_{tot}}\log\mathcal{L}(\ket{\psi_{k+1}^{gs}}) following Eq. 2 in the main text.

Proof: At the linear order in γ\gamma, we may treat the addition of γR^k-\gamma\hat{R}_{k} to H^k\hat{H}_{k} at the kthk^{th} iteration as a perturbation:

|ψk+1gs=|ψkgsγG^kR^k|ψkgs+O(γ2),\ket{\psi_{k+1}^{gs}}=\ket{\psi_{k}^{gs}}-\gamma\hat{G}_{k}\hat{R}_{k}\ket{\psi_{k}^{gs}}+O(\gamma^{2}), (30)

where G^k\hat{G}_{k} is the Green’s function in the kthk_{th} iteration:

G^k=Q^k1EkgsH^kQ^k,\hat{G}_{k}=\hat{Q}_{k}\frac{1}{E_{k}^{gs}-\hat{H}_{k}}\hat{Q}_{k}, (31)

where Q^k=I|ψkgsψkgs|\hat{Q}_{k}=I-\ket{\psi_{k}^{gs}}\bra{\psi_{k}^{gs}} is the projection operator orthogonal to the ground space |ψkgsψkgs|\ket{\psi_{k}^{gs}}\bra{\psi_{k}^{gs}}, and EkgsE_{k}^{gs} is the ground state energy. Keeping terms upto the linear order of γ\gamma in the log expansion of the negative-log-likelihood function, we have:

M(|ψk+1gs)=1Ntotlog(|ψk+1gs)=i,λifλiNtotlogψk+1gs|P^λi|ψk+1gs,M(|ψkgs)+2γΔk.\begin{split}M(\ket{\psi_{k+1}^{gs}})&=-\frac{1}{N_{tot}}\log\mathcal{L}(\ket{\psi_{k+1}^{gs}})\\ &=-\sum_{i,\lambda_{i}}\frac{f_{\lambda_{i}}}{N_{tot}}\log\bra{\psi_{k+1}^{gs}}\hat{P}_{\lambda_{i}}\ket{\psi_{k+1}^{gs}},\\ &\approx M(\ket{\psi_{k}^{gs}})+2\gamma\Delta_{k}.\end{split} (32)

where difference takes the form:

Δk=ψkgs|R^kG^kR^k|ψkgs=lgs|ψkgs|R^k|ψkl|2EkgsEkl0.\begin{split}\Delta_{k}&=\bra{\psi_{k}^{gs}}\hat{R}_{k}\hat{G}_{k}\hat{R}_{k}\ket{\psi_{k}^{gs}}\\ &=\sum_{l\neq gs}\frac{|\bra{\psi_{k}^{gs}}\hat{R}_{k}\ket{\psi_{k}^{l}}|^{2}}{E_{k}^{gs}-E_{k}^{l}}\leq 0.\end{split} (33)

Here, Ekl>EkgsE_{k}^{l}>E_{k}^{gs} because EklE_{k}^{l} denotes the energy for eigenstates other than the ground state. Our iteration converges when the equality in Eq. 33 is established. This happens when |ψkgs\ket{\psi_{k}^{gs}} is an eigenstate of R^k\hat{R}_{k}, consistent with the MLE condition R^|ψ=|ψ\hat{R}\ket{\psi}=\ket{\psi} (or R^ρ^=ρ^\hat{R}\hat{\rho}=\hat{\rho}).

Finally, combining Eq. 32 and Eq. 33, we have shown that M(|ψk+1gs)M(|ψkgs)M(\ket{\psi_{k+1}^{gs}})-M(\ket{\psi_{k}^{gs}}) is a negative semi-definite quantity, which proves the theorem.

One potential complication to the proof is that Eq. 30 needs to assume there is no ground-state level crossing or degeneracy after adding the quantum likelihood gradient. A potential remedy is to keep some low-lying excited states together with the ground state and compare them for maximum likelihood, especially for steps with singular behaviors. Otherwise, we can only hope such transitions are sparse, especially near convergence, and they establish a new line of iterations heading toward the same convergence. A more detailed discussion is available in Ref. 45.

D.2 Example: c=32c=\frac{3}{2} CFT ground state of Majorana fermion chain

Here, we consider the spinless 1D Majorana fermion chain model of length 2L2L as an example [49]:

H^s=jitγ^jγ^j+1+gγ^jγ^j+1γ^j+2γ^j+3,\hat{H}_{s}=\sum_{j}it\hat{\gamma}_{j}\hat{\gamma}_{j+1}+g\hat{\gamma}_{j}\hat{\gamma}_{j+1}\hat{\gamma}_{j+2}\hat{\gamma}_{j+3}, (34)

where γ^j\hat{\gamma}_{j} is the Majorana fermion operator obeying:

γ^j=γ^j,{γ^i,γ^j}=δij,\hat{\gamma}_{j}^{\dagger}=\hat{\gamma}_{j},\{\hat{\gamma}_{i},\hat{\gamma}_{j}\}=\delta_{ij}, (35)

and tt and g=1g=-1 are model parameters. This model presents a wealth of nontrivial quantum phases under different t/gt/g. We focus on the model parameters in t/g(2.86,0.28)t/g\in(-2.86,-0.28), where the ground state of Eq. 34 is a c=32c=\frac{3}{2} CFT composed of a critical Ising theory (c=12c=\frac{1}{2}) and a Luttinger liquid (c=1c=1).

Refer to caption
Figure 9: Both the Hamiltonian distance Δμ\Delta\mu and the relative entropy M(|ψk+1gs)M0M(\ket{\psi_{k+1}^{gs}})-M_{0} (inset) as defined in the main text indicate successful convergence of the iterations during MLE Hamiltonian learning. We set g=1g=-1, t=0.5t=0.5 (red curve) or t=1.5t=1.5 (blue curve), and system size L=12L=12 for the target quantum system, and learning rate γ=0.1\gamma=0.1 (γ=0.05\gamma=0.05) before (after) the 490th iteration.

Through the definition of the complex fermions followed by the Jordan-Wigner transformation:

c^j\displaystyle\hat{c}_{j} =\displaystyle= γ^2j+iγ^2j+12,\displaystyle\frac{\hat{\gamma}_{2j}+i\hat{\gamma}_{2j+1}}{2},
σ^jz\displaystyle\hat{\sigma}_{j}^{z} =\displaystyle= 2n^j1,\displaystyle 2\hat{n}_{j}-1, (36)
σ^j+\displaystyle\hat{\sigma}_{j}^{+} =\displaystyle= eiπi<jn^ic^j,\displaystyle e^{-i\pi\sum_{i<j}\hat{n}_{i}}\hat{c}_{j}^{\dagger},

where n^j=c^jc^j\hat{n}_{j}=\hat{c}_{j}^{\dagger}\hat{c}_{j} is the complex fermion number operator, we map Eq. 34 to a 3-local spin chain of length LL:

H^s=tjσ^jztjσ^ixσ^i+1xgjσ^izσ^i+1zgjσ^ixσ^i+2x.\begin{split}\hat{H}_{s}=&t\sum_{j}\hat{\sigma}_{j}^{z}-t\sum_{j}\hat{\sigma}_{i}^{x}\hat{\sigma}_{i+1}^{x}\\ -&g\sum_{j}\hat{\sigma}_{i}^{z}\hat{\sigma}_{i+1}^{z}-g\sum_{j}\hat{\sigma}_{i}^{x}\hat{\sigma}_{i+2}^{x}.\\ \end{split} (37)

We employ quantum measurements on the ground state |ψs\ket{\psi_{s}} of this Hamiltonian, based on which we carry out our MLE Hamiltonian learning protocol. Here, we evaluate the ground-state properties via exact diagonalization. The numerical results for two cases of t=0.5,1.5t=0.5,1.5 are in Fig. 9. We achieve successful convergence and satisfactory accuracy on the target Hamiltonian. The relative entropy’s instabilities are mainly due to the ground state’s level crossing and degeneracy.

D.3 Example: alternative Hamiltonian for ground state

We have seen that MLE Hamiltonian learning can retrieve the unknown target Hamiltonians via quantum measurements of its Gibbs states, even its ground states. For pure states, however, one interesting byproduct is that the relation between Hamiltonian and eigenstates is essentially many-to-one. Therefore, it is possible to obtain various candidate Hamiltonians H^k\hat{H}_{k} sharing the same ground state as the original target H^s\hat{H}_{s}, especially by controlling the operator/observable set. Here, we show such numerical examples.

Refer to caption
Figure 10: Both the relative entropy and the fidelity fgs=ψs|ψkgsf_{gs}=\braket{\psi_{s}}{\psi_{k}^{gs}} (inset) indicate successful convergence of the iterations during MLE Hamiltonian learning, yielding a consistent yet different Hamiltonian from the original quantum system. We set the system size L=15L=15 and learning rate γ=0.005\gamma=0.005.

As our target quantum system, we consider the transverse field Ising model (TFIM) of length L=15L=15:

H^s=JjS^jzS^j+1z+gjS^jx,\hat{H}_{s}=J\sum_{j}\hat{S}_{j}^{z}\hat{S}_{j+1}^{z}+g\sum_{j}\hat{S}_{j}^{x}, (38)

at its critical point J=g=1J=g=1. Its ground state is |ψs\ket{\psi_{s}}. However, instead of the operators presenting in H^s\hat{H}_{s}, we employ a different operator set for |ψs\ket{\psi_{s}}’s quantum measurements:

{O^i}={S^izS^i+1z,S^ixS^i+1x}.\{\hat{O}_{i}\}=\{\hat{S}_{i}^{z}\hat{S}_{i+1}^{z},\hat{S}_{i}^{x}\hat{S}_{i+1}^{x}\}. (39)

We evaluate the ground-state properties via DMRG.

The subsequent MLE Hamiltonian learning results are in Fig. 10. Since we obtain a candidate Hamiltonian with the operators in Eq. 39 and destined to differ from H^s\hat{H}_{s}, the Hamiltonian distance is no longer a viable measure of its accuracy. Instead, we introduce the ground-state fidelity fgs=ψs|ψkgsf_{gs}=\braket{\psi_{s}}{\psi_{k}^{gs}}, where |ψs\ket{\psi_{s}} (|ψkgs\ket{\psi_{k}^{gs}}) is the ground state of H^s\hat{H}_{s} (H^k\hat{H}_{k}). Interestingly, while the relative entropy shows full convergence, the fidelity fgsf_{gs} jumps between 99.5%\sim 99.5\% and 103%\sim 10^{-3}\%. This is understandable, as the quantum system is gapless, and the ground and low-lying excited states have similar properties under quantum measurements.

D.4 Example: two-dimensional topological states

Refer to caption
Figure 11: The relative entropy, the fidelity fgs=ψs|ψkgsf_{gs}=\braket{\psi_{s}}{\psi_{k}^{gs}}, and the Hamiltonian distance Δμ\Delta\mu (inset) show distinct convergence behaviors in the iterations of MLE Hamiltonian learning for a 2D topological CSL system. Our system size is 4×44\times 4, and we set the learning rate γ=0.1\gamma=0.1.

Here, we consider MLE Hamiltonian learning on two-dimensional topological quantum systems. In particular, we consider the chiral spin liquid (CSL) on a triangular lattice:

H^s=J1ijSiSj+J2ijSiSj+Ki,j,k/Si(Sj×Sk),\hat{H}_{s}=J_{1}\sum_{\langle ij\rangle}\vec{S}_{i}\cdot\vec{S}_{j}+J_{2}\sum_{\langle\langle ij\rangle\rangle}\vec{S}_{i}\cdot\vec{S}_{j}+K\sum_{i,j,k\in\bigtriangledown/\bigtriangleup}\vec{S}_{i}\cdot\left(\vec{S}_{j}\times\vec{S}_{k}\right), (40)

where the first and second terms are Heisenberg interactions, and the last term is a three-spin chiral interaction. Previous DMRG studies have established H^s\hat{H}_{s}’s ground state as a CSL under the model parameters J1=1.0J_{1}=1.0, J2=0.1J_{2}=0.1, and K=0.2K=0.2[50], which we set as the parameters of the target Hamiltonian. Here, we employ exact diagonalization on a 4×44\times 4 system. Based upon entanglement studies of the lowest-energy eigenstates, we verify that both the modular SUSU matrix corresponding to C6C_{6} rotations and the entanglement entropy fit well with a CSL topological phase[51]. Subsequently, we perform MLE Hamiltonian learning based on quantum measurements of the ground state, focusing on the operators presenting in H^s\hat{H}_{s}. We summarize the results in Fig. 11. The Hamiltonian distance indicates a stable converging accuracy, yet the relative entropy and the fidelity fgs=ψs|ψkgsf_{gs}=\braket{\psi_{s}}{\psi_{k}^{gs}} witness certain instabilities. Indeed, being a topological phase means ground-state degeneracy - competing low-energy eigenstates with global distinctions yet similar local properties.