This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Effective temperature in approximate quantum many-body states

Yu-Qin Chen [email protected] Graduate School of China Academy of Engineering Physics, Beijing 100193, China    Shi-Xin Zhang [email protected] Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China
Abstract

In the pursuit of numerically identifying the ground state of quantum many-body systems, approximate quantum wavefunction ansatzes are commonly employed. This study focuses on the spectral decomposition of these approximate quantum many-body states into exact eigenstates of the target Hamiltonian. The energy spectral decomposition could reflect the intricate physics at the interplay between quantum systems and numerical algorithms. Here we examine various parameterized wavefunction ansatzes constructed from neural networks, tensor networks, and quantum circuits, employing differentiable programming to numerically approximate ground states and imaginary-time evolved states. Our findings reveal a consistent exponential decay pattern in the spectral contributions of approximate quantum states across different ansatzes, optimization objectives, and quantum systems, characterized by small decay rates denoted as inverse effective temperatures. The effective temperature is related to ansatz expressiveness and accuracy and shows phase transition behaviors in learning imaginary-time evolved states. The universal picture and unique features suggest the significance and potential of the effective temperature in characterizing approximate quantum states.

Introduction

Understanding the ground state properties of quantum many-body systems is a central challenge in modern physics, with broad implications ranging from fundamental principles of quantum complexity theories to the design of novel materials and quantum technologies [1, 2, 3, 4]. The exponentially large Hilbert space with the system size often precludes analytical solutions and exact numerical methods, necessitating the development of approximate numerical methods. Among these, the use of approximate quantum wavefunction ansatzes has proven to be powerful for its good trade-off between expressivity and complexity.

In this work, we investigate a profound and previously unexplored aspect of approximate quantum states: the effective temperature that characterizes their spectral properties. The concept of temperature arises from statistical mechanics [5]. Here, we apply this concept to quantum many-body pure states, specifically to the approximate states obtained through optimizing different ansatzes. We investigate the effective temperature and its dynamics during variational training, using both energy expectation and fidelity as objectives.

Our work reveals a surprising universality in the effective temperatures of approximate quantum ground states, irrespective of the ansatz structure, objective function, training wellness, or underlying physical system. By decomposing the approximate states into the exact eigenstates of the system Hamiltonian, we observe a spectrum extending to the high-energy end with exponential decay of small decay factors, interpreted as inverse effective temperatures. The effective temperature also shows an intriguing two-stage behavior when approximating pure states of finite “temperature". The universal pattern could play an important role in understanding complex quantum systems or designing new classical and quantum algorithms. The effective temperature offers a new lens to assess the accuracy and reliability of different variational ansatzes and algorithms, potentially guiding the design of more efficient methods.

Refer to caption
Figure 1: Sketch of the effective temperature as a metric characterizing different systems and numerical methods. The effective temperature (1/β~1/\tilde{\beta}) can be extracted from the estimated slope of the approximate quantum state spectral decomposition. The metric varies with different systems, methods, objectives, and training wellness. However, universal features remain the same and are valuable in diagnosing the capacity of different numerical methods. In general, β~\tilde{\beta} is small and can be even negative, indicating a nearly flat excited-states spectrum of approximate ground states.

Setups

A wavefunction ansatz comprises a family of parameterized quantum states as |ψ(θ)|\psi(\theta)\rangle with tunable parameters θ\theta. Such quantum states usually admit a compact and scalable representation on classical computers or quantum computers to circumvent the challenge of the exponentially large Hilbert space for generic quantum many-body systems. The approximate state within the manifold of a given ansatz can be obtained through variational optimization [6] |ψ=minθ(|ψ(θ))|\psi\rangle=\text{min}_{\theta}\mathcal{L}(|\psi(\theta)\rangle), where \mathcal{L} is the objective function built on top of the wavefunction. Specifically,

E=ψ(θ)|H|ψ(θ){\mathcal{L}_{E}=\langle\psi(\theta)|H|\psi(\theta)\rangle} (1)

is often utilized to obtain the ground state of the system Hamiltonian HH. The Hamiltonian has an exact energy spectral decomposition as H=i=0εi|εiεi|H=\sum_{i=0}\varepsilon_{i}|\varepsilon_{i}\rangle\langle\varepsilon_{i}| with |ε0|\varepsilon_{0}\rangle being the exact ground state. Additionally, we also frequently utilize the fidelity objective

F=1|ε0|ψ(θ)|2\mathcal{L}_{F}=1-|\langle\varepsilon_{0}|\psi(\theta)\rangle|^{2} (2)

for numerically approaching ground states. Throughout this work, we perform optimization via gradient descent, namely, the variational parameters are updated in each step according to Δθθ\Delta\theta\propto\frac{\partial\mathcal{L}}{\partial\theta}, where the gradients can be efficiently evaluated using automatic differentiation [7, 8, 9, 10].

For an approximate state |ψ|\psi\rangle, we decompose it on the basis of the exact eigenstates |εi|\varepsilon_{i}\rangle of Hamiltonian HH:

|ψ=ici|εi.|\psi\rangle=\sum_{i}c_{i}|\varepsilon_{i}\rangle. (3)

The spectral decomposition coefficients |ci|2|c_{i}|^{2} are the central focus of our study. We find that the pairs of (εi,|ci|2)(\varepsilon_{i},|c_{i}|^{2}) roughly follow an exponential decay for the approximate ground state, analogous with Boltzmann distribution,

|ci|2eβ~εi,|c_{i}|^{2}\propto e^{-\tilde{\beta}\varepsilon_{i}}, (4)

where we denote the decay factor β~\tilde{\beta} as the inverse effective temperature for the pure state |ψ|\psi\rangle under investigation.

To gain deeper insights into the spectrum pattern of approximate quantum states, we evaluate additional target states beyond the ground state |ε0|\varepsilon_{0}\rangle. This set of states is directly parameterized with an inverse effective temperature β\beta as

|ϕ(β)=1Z(β)ieβεi2|εi,|\phi(\beta)\rangle=\frac{1}{Z(\beta)}\sum_{i}e^{-\frac{\beta\varepsilon_{i}}{2}}|\varepsilon_{i}\rangle, (5)

where Z(β)=ieβεiZ(\beta)=\sqrt{\sum_{i}e^{-\beta\varepsilon_{i}}} is the normalization factor. These states can be understood as imaginary-time evolved states with time β\beta from the initial state |ϕ(0)=1Z(0)i|εi|\phi(0)\rangle=\frac{1}{Z(0)}\sum_{i}|\varepsilon_{i}\rangle. Therefore, we call the target states imaginary-time evolved states (ITES). Exact ITES by default admits an exponential decay for spectral decomposition with inverse temperature β\beta. We investigate how spectral properties change for approximate ITES |ψ|\psi\rangle which are obtained by optimizing the fidelity objective F(β)=1|ϕ(β)|ψ(θ)|2\mathcal{L}_{F}(\beta)=1-|\langle\phi(\beta)|\psi(\theta)\rangle|^{2}. This objective reduces to the ground state target in the β\beta\rightarrow\infty limit. We find that the approximate ITES spectrum shows an evident two-stage behavior for all ansatzes: β~\tilde{\beta} from approximate ITES successfully matches β\beta for small β\beta (high-temperature regime) and shows deviation β~<β\tilde{\beta}<\beta for large β\beta (low-temperature regime). More importantly, the transition point β\beta^{*} characterizes the lowest effective temperature a given ansatz can reach and implies the intrinsic power of the corresponding ansatz. Notably, the conclusion remains qualitatively the same when the coefficient for each eigenstate in ITES acquires an extra random phase for generality, i.e. |ϕ(0)=1Z(0)ieiωi|εi|\phi(0)\rangle=\frac{1}{Z(0)}\sum_{i}e^{i\omega_{i}}|\varepsilon_{i}\rangle.

In this work, we employ various quantum wavefunction ansatzes to demonstrate the universality of our findings, including (1) tensor-network based ansatz including matrix product states (MPS) [11, 12, 13] and projected entangled-pair states (PEPS) [14, 15], (2) neural quantum states (NQS) built on top of neural networks [16, 17], (3) output states from variational quantum circuits [18, 19] such as variational quantum eigensolvers (VQE), [20, 21] and (4) vector state ansatz (VEC) where each wavefunction component is directly modeled as a trainable parameter. We also comment on the relevance of our results to quantum approximate optimization algorithms (QAOA) [22]. We apply these ansatzes to different system Hamiltonians on different lattice geometries and optimize both objectives E\mathcal{L}_{E} and F\mathcal{L}_{F}. The main findings of this work are summarized in Fig. 1.

Refer to caption
Figure 2: Spectral decomposition of approximate states. The 2D XXZ model on 4×34\times 3 square lattice with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8 and MPS ansatz with bond dimension χ=32\chi=32 are employed. The optimization objective is the infidelity between approximate states and the target states. The target states are chosen as the ground states for (a)(b) and imaginary-time evolved states |ϕ(β)|\phi(\beta)\rangle with β=0.3\beta=0.3 (c) and β=0.6\beta=0.6 (d). Overlaps with different energy eigenstates are depicted as scatter plots in red and gray for approximate and target states, respectively. The optimization is at the early stage for (a) and converged for (b)(c)(d) at the steady stage toward the target state. The logarithmic overlaps with excited eigenstates show a linear pattern during training with varying fitted slope β~\tilde{\beta}, aka. inverse effective temperature. The fitted line is shown in dashdot. When the target state is ITES, there is a phase transition in the spectrum patterns of the approximate state. For small β\beta in (c), the optimized approximate state exhibits a near-perfect correspondence with the target state overlap, with a fitted slope β~\tilde{\beta} matching β=0.3\beta=0.3. Conversely, for large β\beta in (d), the overlaps from approximate states exhibit distinct behaviors – an exponential decay in the lower energy regime and a plateau in the higher energy regime. This behavior leads to a poor linear fit, characterized by a deviating slope β~<β\tilde{\beta}<\beta.

Results for approximate ground states

Since producing the exact spectrum requires a full diagonalization of the Hamiltonian, our numerical study is limited to small systems up to size L=16L=16. We use TensorCircuit-NG [*[][.https://github.com/tensorcircuit/tensorcircuit-ng.]Zhang2022] for the numerical simulation of variational training and QuSpin [24] for the exact diagonalization of the Hamiltonian within full Hilbert space or specific charge sectors.

We focus on the two-dimensional XXZ model on a square lattice with spin-1/21/2 degrees of freedom:

H=ijJxXiXj+JyYiYj+JzZiZj,H=\sum_{\langle ij\rangle}J_{x}X_{i}X_{j}+J_{y}Y_{i}Y_{j}+J_{z}Z_{i}Z_{j}, (6)

where X(Y,Z)iX(Y,Z)_{i} are Pauli matrices on lattice site ii and ij\langle ij\rangle denotes the nearest neighbor sites. The phase diagram of this model, as given in Ref. [25], features Jzc=1J_{z}^{c}=1 that separates the spin-flipping phase and the antiferromagnetic phase when Jx=Jy=1J_{x}=J_{y}=1. We impose periodic boundary conditions throughout this work. For 4×44\times 4 lattice, we focus on the half-filling charge sector for spectral decomposition while the variational wavefunction ansatzes are still expressed and optimized in the full Hilbert space for simplicity. For 4×34\times 3 lattice, we utilize the full spectral decomposition for most of the ansatzes while restricting to the half-filling sector for VQE states (see SM for details).

Refer to caption
Figure 3: Effective temperature during variational training targeting the ground states. The objectives are energy E\mathcal{L}_{E} (a) and fidelity F\mathcal{L}_{F} (b), respectively. The results are obtained from the 2D XXZ model on 4×44\times 4 square lattice with Jx=Jy=1J_{x}=J_{y}=1, Jz=0.8J_{z}=0.8. During training, points move to the left toward higher accuracy and β~\tilde{\beta} first increases and then decreases in general.

Using different ansatzes and variational optimization against E\mathcal{L}_{E} or F\mathcal{L}_{F}, we obtain different approximate ground states. The typical spectral decomposition patterns (εi,|ci|2)(\varepsilon_{i},|c_{i}|^{2}) of these states are shown in Fig. 2 (a)(b). (See SM for more spectral decomposition results from other variational ansatzes.) We find a robust exponential relation in the spectral decomposition across different ansatzes and training stages. The exponential decay factor β~\tilde{\beta} extracted from the fitting plays the role of inverse effective temperature. The training dynamics of β~\tilde{\beta} for different ansatzes and objectives is shown in Fig. 3. We find that the effective temperature 1/β~1/\tilde{\beta} is high during training and this is particularly true for optimized approximate ground states where β~<0.3\tilde{\beta}<0.3 in most cases. Such a small β~\tilde{\beta} contrasts sharply with imaginary-time evolution based approaches where a large β~\tilde{\beta} is expected. During variational training, a general trend emerges where β~\tilde{\beta} first increases and then decreases, implying an upper bound for the possible β~\tilde{\beta} of a given ansatz. Besides, training dynamics of β~\tilde{\beta} show similar patterns for both objectives within the same ansatz while β~\tilde{\beta} for optimizing energy objectives E\mathcal{L}_{E} is typically higher due to the unequal penalty on different excited states. Conversely, fidelity objective equally penalizes all excited states and the finite β~\tilde{\beta} in this case is attributed to the physical prior encoded in the ansatz structures. This understanding is validated by the near-zero β~\tilde{\beta} for VEC ansatz, which contains the least physical prior in the structure. Furthermore, the β~\tilde{\beta} trends are more closely aligned within the same family of ansatzes, reflecting its potential power to diagnose intrinsic properties in these ansatzes. The universality and validness of the results are also confirmed with other system sizes, models, and lattice geometries (see SM for details).

Results for approximate imaginary-time evolved states

To better understand the spectrum pattern and the underlying mechanism of effective temperature in approximate ground states, we evaluate approximate ITES whose targets have strictly exponential decay spectrum patterns. For different target β\betas, we identify a two-stage behavior with a putative phase transition or crossover in between. For the high-temperature regime of small β\beta, the approximate ITES conforms to the strict exponential decay spectrum pattern with β~=β\tilde{\beta}=\beta, as shown in Fig. 2 (c). On the contrary, for the low-temperature regime of large β\beta, the spectral decomposition follows the exponential decay only for the low-energy part, while a nearly flat spectrum emerges from high-energy contributions. If we attempt to extract a single exponential decay factor β~\tilde{\beta} in this case, its value is significantly smaller than the exact β\beta, as shown in Fig. 2 (d). It is worth noting that the nearly flat spectrum deviation is not due to numerical machine precision, as points |ci|2|c_{i}|^{2} of the same magnitude conform to the exponential decay for small β\beta but fail to do so for large β\beta (see SM for details).

Refer to caption
Figure 4: Effective temperature and fidelity of approximate ITES with different β\betas. The results are obtained from the 2D XXZ model on 4×34\times 3 square lattice with Jx=Jy=1J_{x}=J_{y}=1, Jz=0.8J_{z}=0.8. (a) The inverse effective temperature β~\tilde{\beta} shows a two-stage behavior with varying β\beta. β\beta^{*} (insets) marks the deviation of β~\tilde{\beta} from target β\beta, separating the two stages. (b) The fidelity of approximate ITES with different β\betas. For high-accuracy methods, the fidelity decreases with increasing β\beta due to the increased optimization hardness. For low-accuracy methods, the fidelity increases with increasing β\beta as the target states are less entangled and are better suited for the limited ansatz.

The results of β~\tilde{\beta} of approximate ITES for different target β\betas are shown in Fig. 4 (a), where the two-stage behavior is evident. For each ansatz, there is a transition point β\beta^{*} when β~\tilde{\beta} begins to deviate from β\beta and the associated spectrum pattern moves from Fig. 2 (c) to Fig. 2 (d). β\beta^{*} is a figure of merit as it marks the critical temperature that the ansatz can accurately represent and sets an upper bound on β~\tilde{\beta} that an ansatz can reach. More importantly, β\beta^{*} is also correlated with the expressive capacity of the ansatz, as a higher β\beta^{*} generally implies a better fidelity. In the low temperature limit β\beta\rightarrow\infty, the results for approximate ITES are reduced to the results for approximate ground states by optimizing f\mathcal{L}_{f}.

We further remark on the fidelity results shown in Fig. 4 (b) with varying methods and β\betas. For low-accuracy ansatzes, the fidelity gets improved with increasing β\beta as |ϕ(β)|\phi(\beta)\rangle of larger β\beta is closer to the ground state and has smaller entanglement. Such less entangled states can be better suited in ansatzes of limited expressiveness. Conversely, for high-accuracy ansatzes such as tensor network ansatz with large bond dimensions or VEC, the expressive capacity is sufficient for targeting any ITES, and the bottleneck is the optimization difficulty. Since the number of optimization steps is fixed in our simulation, a slower optimization speed results in a worse final fidelity with increasing β\beta. In other words, when the target is closer to the ground state manifold, the optimization landscape is harder to navigate (see quantitative analysis for the expressiveness and optimization hardness in the SM).

Discussions and Conclusion

Some hints of the general picture presented in this work are previously reported in QAOA [26, 27] and MPS cases [28], where the optimization of E\mathcal{L}_{E} is considered. In the QAOA case, the output states from the QAOA circuits were found to approximate Gibbs distribution with β~0.20.3\tilde{\beta}\approx 0.2\sim 0.3 [26], consistent with the findings in this study, which can be regarded as a special limit of the VQE ansatz. In the MPS case, a nearly flat spectrum is identified [28]. Given that the density of states was effectively considered with Gaussian broadening in their work, the spectrum appeared flatter as the exponential decay of small positive β~\tilde{\beta} was compensated with the high density of states in the middle-energy regime.

The effective temperature has profound implications for understanding numerical methods and many-body systems. One practical implication of such a high-temperature tail in the spectrum is mentioned in Ref. [28], where sampling based energy variance estimation becomes more fragile than expected. The variance H2H^{2} is just an example for operator f(H)f(H), with expectation ψ|f(H)|ψ=i|ci|2f(εi)\langle\psi|f(H)|\psi\rangle=\sum_{i}|c_{i}|^{2}f(\varepsilon_{i}). The operator expectation is highly sensitive to small high-energy components when f(ε)f(\varepsilon) is significantly larger for high-energy inputs. The effective temperature captures finer details than fidelity as demonstrated in Fig. 4: while the fidelity remains the same for larger β\beta, the effective temperature continues to vary. This quantity also has strong relevance to the expressive capacity of ansatzes as β\beta^{*} values differ for different ansatzes, which further confirms that the two-stage behavior in approximating ITES is not a numerical precision issue. Consequently, the metric and the methodology explored in this work serve as a guiding principle to help design better approximate quantum ansatzes and variational algorithms.

The effective temperature metric opens up several intriguing future research directions. We can quantitatively investigate the scaling of effective temperature β~\tilde{\beta} and its critical value β~\tilde{\beta}^{*} with respect to varying system sizes, Hamiltonian parameters, and ansatz structure parameters, to gain deeper insights into the physics of the system and the intrinsic patterns in numerical methods. It is also interesting to explore whether some approximation methods can be developed to estimate the effective temperature beyond exact diagonalization sizes. The ITES proposed in this work are also of independent academic interest, as they provide a platform to explore the trade-off between expressiveness and optimization hardness, as well as host a putative phase transition at β\beta^{*}, the nature and universal behavior of which merit future exploration. It is also crucial to validate the general picture of spectrum patterns on more ansatzes including mean-field ansatz, physics-inspired ansatz [29] and advanced hybrid variational ansatzes [30, 31], and more models including integrable systems, many-body localized systems and fermionic systems.

In this work, we have identified a universal pattern in the spectrum of approximate ground states. We define the effective temperature β~\tilde{\beta} to characterize the exponential decay in the spectrum and investigate the behavior of β~\tilde{\beta} with different systems, ansatzes, objectives, and training steps. To better understand the underlying mechanism for this metric, we further propose ITES targets and identify the two-stage behaviors in approximating ITES with the figure of merit critical inverse effective temperature β\beta^{*}. We believe that the universal picture presented here provides a fresh and powerful perspective on benchmarking numerical algorithms and understanding quantum many-body systems.

Methods

We describe the details of different quantum many-body ansatzes employed in this work. Additional hyperparameters in terms of optimization can be found in Supplemental Materials.

Matrix product states. We use MPS with periodic boundary conditions. The amplitude for the ansatz is given as

i1i2iL|ψ=Tr(j=1LAij(j)),\langle i_{1}i_{2}\cdots i_{L}|\psi\rangle=\mathrm{Tr}(\prod_{j=1}^{L}A_{i_{j}}^{(j)}), (7)

where ij=0i_{j}=0 or 11 represents the computational basis and A(j)A^{(j)} are LL different tensors with shape (χ,χ,2)(\chi,\chi,2). χ\chi is the bond dimension controlling the expressiveness of the MPS ansatz. For χ=2L/2\chi=2^{L/2}, the ansatz can be an exact representation for any wavefunction in principle even with open boundary conditions. The tensor network graphic representation for the ansatz in Eq. (7) is shown in Fig. 5. Note that we don’t explicitly impose the normalization conditions by canonicalization but leave the MPS ansatz unnormalized. The Hamiltonian expectation is evaluated as

E=ψ|H|ψψ|ψ,E=\frac{\langle\psi|H|\psi\rangle}{\langle\psi|\psi\rangle}, (8)

where normalization is explicitly accounted for. For 2D systems, we use the natural order from site 11 to LL to encode the system sites into 1D MPS. Since we need to consider the full spectrum properties in the calculation, the system size accessible is limited to L=16L=16. Therefore, we don’t rely on the density matrix renormalization group optimization algorithm. Instead, we directly obtain the full wavefunction by contracting the MPS according to Eq. (7) and directly evaluate the Hamiltonian expectation by sparse matrix-vector multiplication where the objective can then be differentiated and variational optimized via gradient descent. For variational optimization, each element in A(j)A^{(j)} is randomly initialized according to the Gaussian distribution of mean 0 and standard deviation 0.10.1. The total number of variational parameters for MPS ansatz is 2Lχ22L\chi^{2}.

Refer to caption
Figure 5: The tensor network graphic representation for the periodic MPS ansatz used in this work. Each blue square corresponds to a tensor of shape (χ,χ,2)(\chi,\chi,2). The red legs correspond to physical indices of dimension 22 and all blue legs are virtual bonds of dimension χ\chi. The dashed blue line is for the bond connecting the tensor on the last site and the first site for periodic boundary conditions.

Projected entangled-pair states. The tensor network graphic representation of the PEPS ansatz with periodic boundary conditions is shown in Fig. 6. Similarly, the ansatz is unnormalized by default. During optimization, L=Lx×LyL=L_{x}\times L_{y} tensors with shape (χ,χ,χ,χ,2)(\chi,\chi,\chi,\chi,2) are randomly initialized with each element following Gaussian distribution of mean 0 and standard deviation 0.10.1. Henceforth, the total number of variational parameters for PEPS is 2Lχ42L\chi^{4}. In the numerical calculation, we first contract the entire PEPS tensor network with physical legs open to directly obtain the full unnormalized wavefunction since the system size is small. We can then compute fidelity or Hamiltonian expectation directly and integrate the whole computation graph with automatic differentiation and variational optimization.

Refer to caption
Figure 6: The tensor network graphic representation for the two dimensional PEPS ansatz with periodic boundary conditions used in this work. Each blue square correspond to a tensor of shape (χ,χ,χ,χ,2)(\chi,\chi,\chi,\chi,2). The red legs correspond to physical indices of dimension 22 on each site and all blue legs are virtual bonds of dimension χ\chi. Dashed blue lines are for the periodic boundary connections on both x and y dimensions.

Neural quantum states. For NQS, the probability amplitude is given by the output of neural networks with the input the computational basis, i.e.

i1i2iL|ψ=f(i1i2iL),\langle i_{1}i_{2}\cdots i_{L}|\psi\rangle=f(i_{1}i_{2}\cdots i_{L}), (9)

where ff corresponds to the neural network. In this work, we employ a deep neural network with the architecture shown in Fig. 7. Given the small system size, we do not run the variational Monte Carlo method with sampling to optimize the NQS. Instead, similar to the MPS and PEPS cases, we directly evaluate the neural network output on all 2L2^{L} computational basis inputs to first obtain the (unnormalized) full wavefunction and optimize infidelity or energy based on the wavefunction. The total number of trainable parameters is of order O(LWD)O(LWD), where DD is the number of repetitions of the ResBlock and WW is the width of intermediate fully connected layers. The network width WW controls the expressiveness of the NQS ansatz.

Refer to caption
Figure 7: Neural network architecture for the NQS used in this work. The wavefunction amplitude and phase are evaluated separately by going through a stack of residue blocks (ResBlock). In each ResBlock, the input vector of dimension LL is firstly mapped to a vector of dimension WW via a fully connected (FC) layer and then projected back to the dimension LL via the other FC layer. The output vector is then added to the input vector forming a residue connection. The hyperparameter width WW controls the expressiveness of the NQS ansatz.

Variational quantum eigensolver states. In VQE methods, we use the output quantum states from the parameterized quantum circuits (PQC) and the trainable parameters correspond to rotation angles of quantum gates on the PQC. The PQC U=UvUiU=U_{v}U_{i} consists of an initialization block UiU_{i} and dd variational blocks Uv=idUviU_{v}=\prod_{i}^{d}U_{vi}. The initialization block prepares the Bell pair 1/2(|01|10)1/\sqrt{2}(|01\rangle-|10\rangle) on nearest neighbor qubits so that the initial state is in the Jtot=0J_{tot}=0 sector. Each variation block is given by: Uvi=jLeiθj(0)Zjeiθj(1)Yjjkeiθjk(2)SWAPjkU_{vi}=\prod_{j}^{L}e^{-i\theta^{(0)}_{j}Z_{j}}e^{-i\theta^{(1)}_{j}Y_{j}}\prod_{\langle jk\rangle}e^{-i\theta^{(2)}_{jk}\text{SWAP}_{jk}}, where the total number of circuit parameters θ\theta is of the order O(Ld)O(Ld) and SWAPjk=1/2(XjXk+YjYk+ZjZk+IjIk)\text{SWAP}_{jk}=1/2(X_{j}X_{k}+Y_{j}Y_{k}+Z_{j}Z_{k}+I_{j}I_{k}) is the swap gate on qubits jj and kk. These parameterized swap gates are applied on the bond between nearest neighbor sites according to the lattice geometry. For 2D square lattice, these two-qubit gates are first applied on all rows following all columns without periodic bonds across the opposite boundaries. For 1D lattice, these two-qubit gates are applied on the system in a ladder fashion with a final periodic bond gate, i.e. the entangling gates apply on the qubit pairs (1,2),(2,3),(L1,L),(L,1)(1,2),(2,3),\cdots(L-1,L),(L,1). A schematic of the PQC structure investigated in this work is shown in Fig. 8. The approximate state we use is |ψ=U|0L|\psi\rangle=U|0^{L}\rangle accordingly. Our numerical simulation for expectation value and fidelity omits quantum noise error and shot sampling error inevitable on real quantum devices. The circuit parameter gradients are obtained via automatic differentiation in our simulation and can be obtained via parameter shift [32] on quantum processors.

Refer to caption
Figure 8: VQE ansatz employed in this work. For 1D lattice, a parameterized swap gate on qubits 11 and LL is omitted. For 2D lattices, these parameterized swap gates are aligned with lattice bonds between the nearest neighbor sites. The initialization block prepares the initial states as copies of Bell pairs 1/2(|01|10)1/\sqrt{2}(|01\rangle-|10\rangle). The variational blocks are repeated for dd times with different parameters. Ry and Rz are notations for eiθYe^{-i\theta Y} and eiθZe^{-i\theta Z} gates, respectively. The output state from the PQC is employed for optimization and approximating target states.

Vector states. Since the system size under investigation is small, we can directly build up the so-called vector states ansatz where the wavefunction amplitude of each basis is directly modeled as a trainable parameter. For a spin-1/2 system of size LL, 2L2^{L} complex parameters are required, namely, we treat i1i2iL|ψ\langle i_{1}i_{2}\cdots i_{L}|\psi\rangle as variational parameters. In our work, the real part and imaginary part of these complex parameters are independently initialized (Gaussian distribution of mean 0 and standard deviation 0.10.1) and optimized. This is because the Adam optimizer does not work as expected with complex training parameters. This ansatz is expected to encode the least physical prior and be the most unbiased.

Acknowledgments We thank Yi-Fan Jiang, Zi-Xiang Li, and Lei Wang for their valuable discussions. SXZ acknowledges support from a startup grant at CAS-IOP.

References

  • Kitaev et al. [2002] A. Kitaev, A. H. Shen,  and M. N. Vyalyi, Classical and quantum computation (American Mathematical Soc., Providence, 2002).
  • Kempe et al. [2006] J. Kempe, A. Kitaev,  and O. Regev, The complexity of the local Hamiltonian problem, SIAM J. Comput. 35, 1070 (2006).
  • Zheng et al. [2017] B.-X. Zheng, C.-M. Chung, P. Corboz, G. Ehlers, M.-P. Qin, R. M. Noack, H. Shi, S. R. White, S. Zhang,  and G. K.-L. Chan, Stripe order in the underdoped region of the two-dimensional Hubbard model, Science 358, 1155 (2017).
  • Lee et al. [2023] S. Lee, J. Lee, H. Zhai, Y. Tong, A. M. Dalzell, A. Kumar, P. Helms, J. Gray, Z.-H. Cui, W. Liu, M. Kastoryano, R. Babbush, J. Preskill, D. R. Reichman, E. T. Campbell, E. F. Valeev, L. Lin,  and G. K.-L. Chan, Evaluating the evidence for exponential quantum advantage in ground-state quantum chemistry, Nat. Commun. 14, 1952 (2023).
  • Greiner et al. [1995] W. Greiner, L. Neise,  and H. Stöcker, Thermodynamics and Statistical Mechanics (Springer New York, New York, NY, 1995).
  • Wu et al. [2024] D. Wu, R. Rossi, F. Vicentini, N. Astrakhantsev, F. Becca, X. Cao, J. Carrasquilla, F. Ferrari, A. Georges, M. Hibat-Allah, M. Imada, A. M. Läuchli, G. Mazzola, A. Mezzacapo, A. Millis, J. Robledo Moreno, T. Neupert, Y. Nomura, J. Nys, O. Parcollet, R. Pohle, I. Romero, M. Schmid, J. M. Silvester, S. Sorella, L. F. Tocchio, L. Wang, S. R. White, A. Wietek, Q. Yang, Y. Yang, S. Zhang,  and G. Carleo, Variational benchmarks for quantum many-body problems, Science 386, 296 (2024).
  • Rumelhart et al. [1986] D. E. Rumelhart, G. E. Hinton,  and R. J. Williams, Learning representations by back-propagating errors, Nature 323, 533 (1986).
  • Bartholomew-Biggs et al. [2000] M. Bartholomew-Biggs, S. Brown, B. Christianson,  and L. Dixon, Automatic differentiation of algorithms, J. Comput. Appl. Math. 124, 171 (2000).
  • Liao et al. [2019] H.-J. Liao, J.-G. Liu, L. Wang,  and T. Xiang, Differentiable Programming Tensor Networks, Phys. Rev. X 9, 31041 (2019).
  • Zhang et al. [2023a] S.-X. Zhang, Z.-Q. Wan,  and H. Yao, Automatic differentiable Monte Carlo: Theory and application, Phys. Rev. Res. 5, 033041 (2023a).
  • White [1992] S. R. White, Density Matrix Formulation for Quantum Renormalization Groups, Phys. Rev. Lett. 69, 2863 (1992).
  • Schollwöck [2011] U. Schollwöck, The density-matrix renormalization group in the age of matrix product states, Ann. Phys. 326, 96 (2011).
  • Stoudenmire and White [2012] E. Stoudenmire and S. R. White, Studying Two-Dimensional Systems with the Density Matrix Renormalization Group, Annu. Rev. Condens. Matter Phys. 3, 111 (2012).
  • Verstraete and Cirac [2004a] F. Verstraete and J. I. Cirac, Renormalization algorithms for Quantum-Many Body Systems in two and higher dimensions, arXiv:cond-mat/0407066  (2004a).
  • Verstraete and Cirac [2004b] F. Verstraete and J. I. Cirac, Valence-bond states for quantum computation, Phys. Rev. A 70, 060302 (2004b).
  • Carleo and Troyer [2017] G. Carleo and M. Troyer, Solving the quantum many-body problem with artificial neural networks, Science 355, 602 (2017).
  • Deng et al. [2017] D.-L. Deng, X. Li,  and S. Das Sarma, Quantum Entanglement in Neural Network States, Phys. Rev. X 7, 021021 (2017).
  • Cerezo et al. [2021] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, L. Cincio,  and P. J. Coles, Variational quantum algorithms, Nat. Rev. Phys. 3, 625 (2021).
  • Bharti et al. [2022] K. Bharti, A. Cervera-Lierta, T. H. Kyaw, T. Haug, S. Alperin-Lea, A. Anand, M. Degroote, H. Heimonen, J. S. Kottmann, T. Menke, W.-K. Mok, S. Sim, L.-C. Kwek,  and A. Aspuru-Guzik, Noisy intermediate-scale quantum algorithms, Rev. Mod. Phys. 94, 015004 (2022).
  • Peruzzo et al. [2014] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik,  and J. L. O’Brien, A variational eigenvalue solver on a photonic quantum processor, Nat. Commun. 5, 4213 (2014).
  • Tilly et al. [2022] J. Tilly, H. Chen, S. Cao, D. Picozzi, K. Setia, Y. Li, E. Grant, L. Wossnig, I. Rungger, G. H. Booth,  and J. Tennyson, The Variational Quantum Eigensolver: A review of methods and best practices, Phys. Rep. 986, 1 (2022).
  • Farhi et al. [2014] E. Farhi, J. Goldstone,  and S. Gutmann, A Quantum Approximate Optimization Algorithm, arXiv:1411.4028  (2014).
  • Zhang et al. [2023b] S.-X. Zhang, J. Allcock, Z.-Q. Wan, S. Liu, J. Sun, H. Yu, X.-H. Yang, J. Qiu, Z. Ye, Y.-Q. Chen, C.-K. Lee, Y.-C. Zheng, S.-K. Jian, H. Yao, C.-Y. Hsieh,  and S. Zhang, TensorCircuit: a Quantum Software Framework for the NISQ Era, Quantum 7, 912 (2023b).
  • Weinberg and Bukov [2017] P. Weinberg and M. Bukov, QuSpin: a Python package for dynamics and exact diagonalisation of quantum many body systems part I: spin chains, SciPost Phys. 2, 003 (2017).
  • Yunoki [2002] S. Yunoki, Numerical study of the spin-flop transition in anisotropic spin-1/2 antiferromagnets, Phys. Rev. B 65, 092402 (2002).
  • Díez-Valle et al. [2023] P. Díez-Valle, D. Porras,  and J. J. García-Ripoll, Quantum Approximate Optimization Algorithm Pseudo-Boltzmann States, Phys. Rev. Lett. 130, 050601 (2023).
  • Lotshaw et al. [2023] P. C. Lotshaw, G. Siopsis, J. Ostrowski, R. Herrman, R. Alam, S. Powers,  and T. S. Humble, Approximate Boltzmann distributions in quantum approximate optimization, Phys. Rev. A 108, 042411 (2023).
  • Silvester et al. [2024] J. M. Silvester, G. Carleo,  and S. R. White, Unusual energy spectra of matrix product states, arXiv:2408.13616  (2024).
  • Yokoyama and Shiba [1987] H. Yokoyama and H. Shiba, Variational Monte-Carlo Studies of Hubbard Model. I, J. Phys. Soc. Japan 56, 1490 (1987).
  • Zhang et al. [2022] S.-X. Zhang, Z.-Q. Wan, C.-K. Lee, C.-Y. Hsieh, S. Zhang,  and H. Yao, Variational Quantum-Neural Hybrid Eigensolver, Phys. Rev. Lett. 128, 120502 (2022).
  • Yuan et al. [2021] X. Yuan, J. Sun, J. Liu, Q. Zhao,  and Y. Zhou, Quantum Simulation with Hybrid Tensor Networks, Phys. Rev. Lett. 127, 040501 (2021).
  • Crooks [2019] G. E. Crooks, Gradients of parameterized quantum gates using the parameter-shift rule and gate decomposition, arXiv:1905.13311  (2019).

Supplemental Material for “Effective Temperature in Approximate Quantum Many-body States”

I Training dynamics for effective temperature toward the ground state

We conduct numerical experiments on different models and find that the dynamics for β~\tilde{\beta} show a universal pattern. Fig. S1 and Fig. S2 show the training dynamics for the 1D XXZ model with E\mathcal{L}_{E} and F\mathcal{L}_{F} as objectives, respectively. The system Hamiltonian for the 1D XXZ model with periodic boundary conditions is defined as:

H=iJxXiXi+1+JyYiYi+1+JzZiZi+1+hiZi.H=\sum_{i}J_{x}X_{i}X_{i+1}+J_{y}Y_{i}Y_{i+1}+J_{z}Z_{i}Z_{i+1}+h_{i}Z_{i}. (S1)

Similarly, the training dynamics for 2D XXZ models in antiferromagnetic phases is shown in Fig. S3 which complements the results in the main text where the target ground state is in spin-flipping phases. Besides, we also evaluated β~\tilde{\beta} with different 2D system size 4×34\times 3, as shown in Fig. S4.

From these training dynamics results toward ground states on different dimensions and different phases, several important features remain consistent as universal patterns for approximate ground states.

  1. 1.

    Optimized approximate ground states admit very small β~\tilde{\beta}s indicating a nearly flat spectrum across excited states.

  2. 2.

    During training, β~\tilde{\beta} first increases and then decreases in general, implying an upper bound for the possible β~\tilde{\beta} of a given ansatz. Moreover, a smaller β~\tilde{\beta} often indicates a better accuracy at the late stage of training. Besides, initial β~0\tilde{\beta}\approx 0 unless the ansatz has a strong physical prior. For example, in the VQE case, the PQC structure and the initialization render the state very close to the valence bond state at the beginning, resulting in a higher decay factor β~\tilde{\beta}.

  3. 3.

    For each method, the training dynamics is qualitatively similar for both energy and fidelity objectives though β~\tilde{\beta} values are typically smaller when optimizing fidelity. The energy objective imposes unequal penalties for different excited state components, i.e. the higher energy the excited state has, the larger penalty it receives. On the contrary, the fidelity objective equally penalizes all excited states which could result in smaller or even negative β~\tilde{\beta}. Since the fidelity objective shows no preference for different excited states, the finite β~\tilde{\beta} value in this case can be attributed to the physical prior encoded in the ansatzes. Indeed, for the most unbiased ansatz VEC, β~\tilde{\beta} is very close to 0 when optimizing f\mathcal{L}_{f}. In summary, both the objective structures and the ansatz structures determine the effective temperature of resultant approximate ground states.

  4. 4.

    The same family of ansatzes with different hyperparameters, eg. VMC of different widths or tensor networks with different bond dimensions, usually display similar training dynamics. This observation further supports the conclusion that effective temperature can reflect the intrinsic properties of different numerical methods.

In addition, we also show the error bar Δβ~\Delta\tilde{\beta} given by the exponential fitting in Fig. S5. The uncertainty of the predicted decay factor is relatively small compared to the β~\tilde{\beta} value, implying that the exponential scaling is a very good description for the spectrum decomposition.

Refer to caption
Figure S1: Training dynamics for β~\tilde{\beta} for 1D XXZ model L=12L=12 with Jx=Jy=1,Jz=0.8,hz=0.02J_{x}=J_{y}=1,J_{z}=0.8,h_{z}=0.02 and energy objective.
Refer to caption
Figure S2: Training dynamics for β~\tilde{\beta} for 1D XXZ model L=12L=12 with Jx=Jy=1,Jz=0.8,hz=0.02J_{x}=J_{y}=1,J_{z}=0.8,h_{z}=0.02 and ground state fidelity objective.
Refer to caption
Figure S3: Training dynamics for β~\tilde{\beta} for 2D XXZ model with Jx=Jy=1,Jz=1.5J_{x}=J_{y}=1,J_{z}=1.5 on 4×44\times 4 lattice with energy (a) and ground state fidelity objective (b).
Refer to caption
Figure S4: Training dynamics for β~\tilde{\beta} for 2D XXZ model with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8 on 4×34\times 3 lattice with energy (a) and ground state fidelity objective (b).
Refer to caption
Figure S5: Training dynamics for the uncertainty Δβ~\Delta\tilde{\beta} of fitted β~\tilde{\beta} for 2D XXZ model with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8 on 4×44\times 4 lattice with energy (a) and ground state fidelity objective (b).

II Training dynamics for the scaling factor toward the ground state

When fitting the exponential decay spectrum, two parameters are required in the form Λeβ~ε\Lambda e^{-\tilde{\beta}\varepsilon}. We focus on the training dynamics for Λ\Lambda in this section. Parameter Λ\Lambda is more directly related to the overall approximation accuracy, as the infidelity can be roughly estimated as 1F=ε=Eminε=Emax𝑑ερ(ε)Λeβ~εΛ1-F=\int_{\varepsilon=E_{min}}^{\varepsilon=E_{max}}d\varepsilon\rho(\varepsilon)\Lambda e^{-\tilde{\beta}\varepsilon}\propto\Lambda, where ρ(ε)\rho(\varepsilon) is the density of states for the system. Some typical training dynamics for Λ\Lambda toward ground state are shown in Fig. S6, S7, and S8. We can observe an evident linear correlation during training between Λ\Lambda and 1F1-F.

Refer to caption
Figure S6: Training dynamics for Λ\Lambda for 2D XXZ model with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8 on 4×34\times 3 lattice with energy (a) and ground state fidelity objective (b).
Refer to caption
Figure S7: Training dynamics for Λ\Lambda for 2D XXZ model with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8 on 4×44\times 4 lattice with energy (a) and ground state fidelity objective (b).
Refer to caption
Figure S8: Training dynamics for Λ\Lambda for 2D XXZ model with Jx=Jy=1,Jz=1.5J_{x}=J_{y}=1,J_{z}=1.5 on 4×44\times 4 lattice with energy (a) and ground state fidelity objective (b).

III Training dynamics for effective temperature toward imaginary-time evolved states

We report training dynamics toward ITES in this section, and the limit β\beta\rightarrow\infty reduces to the results for approximate ground states. A typical instance of such training dynamics with varying beta is shown in Fig. S9. For optimization targeting high temperature ITES (low β\beta), β~\tilde{\beta} first increases and then saturates to the target β\beta value. Conversely, for low temperature (high β\beta) ITES targets, β~\tilde{\beta} first increases and then decreases during training and the value of β~\tilde{\beta} that can be reached is upper bounded by β\beta^{*}. The distinct training dynamical behaviors are also a reflection of the two-stage behavior. The training dynamics in low temperature regime is consistent with the one in approximating ground states. Via ITES of different temperatures, the results for ground state simulation can be taken as an interpolation from ITES and the maximal β~\tilde{\beta} reached during training toward ground state is also connected to β\beta^{*} in ITES training.

Refer to caption
Figure S9: Training dynamics for β~\tilde{\beta} for 2D XXZ model with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8 on 4×34\times 3 lattice and ITES β\beta targets. MPS ansatzes with bond dimensions χ=16\chi=16 (a) and χ=32\chi=32 (b) are employed.

IV Spectrum pattern for approximate states from different ansatzes

In this section, we show more scatter plots for (εi,|ci|2)(\varepsilon_{i},|c_{i}|^{2}) pairs from approximate states based on different forms of ansatzes. Fig. S10, S11 and S12 show the spectral decomposition of approximate states generated by MPS, VMC, and PEPS ansatzes, respectively. The approximate ground states in the former two cases are obtained via minimizing infidelity objective while for the latter, energy objective is employed. For ground state targets, the fitted factor β~\tilde{\beta} continuously varies with the optimization progress. For ITES targets, the spectrum from each type of approximate states exhibits the two-stage behavior with different critical temperatures β\beta^{*}.

We emphasize that the effective temperature is meaningful and argue against the misconception that the spectral plateau is merely an artifact of numerical precision. Our reasoning is as follows:

  1. 1.

    Many flat spectrum plateaus give |ci|2|c_{i}|^{2} in the order 10410810^{-4}\sim 10^{-8}, which are numerically significant and much higher than the machine precision floor 101610^{-16} for double precision (complex128) arithmetic.

  2. 2.

    Many approximate ITES instances exhibit the two-stage behavior such as Fig. S10, demonstrate distinct overlap behaviors for the same energy levels across different target β\beta values. For example, points with target overlap |ci|2=1010|c_{i}|^{2}=10^{-10} strictly follow the exponential decay in approximate high-temperature ITES while the overlap point of the same order fails to do so by forming a plateau for approximate low-temperature ITES. In other words, the plateau formed in the higher energy regime for approximate low-temperature ITES is not due to the naïve interpretation that the magnitudes of overlaps go below the machine precision floor.

  3. 3.

    The critical effective temperature β\beta^{*}s differ among different ansatzes. This fact further implies that the effective temperature is numerically meaningful. If the underlying mechanism were solely the machine precision floor, β\beta^{*} would be the same across different methods.

Refer to caption
Figure S10: 2D XXZ model on 4×34\times 3 square lattice with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8. MPS ansatz with bond dimension χ=64\chi=64 is employed. The optimization objective is the infidelity between approximate states and the target states. The target states are chosen as the ground states for (a)(b) and imaginary-time evolved states |ϕ(β)|\phi(\beta)\rangle with β=0.6\beta=0.6 (c) and β=0.9\beta=0.9 (d). The overlaps with different energy eigenstates are depicted as scatter plots in red and gray for approximate states and target states, respectively. The optimization is at the intermediate stage for (a) (less trained) while the optimization is converged for (b) (well trained) toward the ground state. The logarithmic overlaps with excited eigenstates show a linear pattern during training with varying fitted slope β~\tilde{\beta}, aka. inverse effective temperature. The fitted line is shown in dashdot in the figure. When the target state is imaginary-time evolved state, there is a phase transition in the spectrum behaviors of the approximate state. For small β\beta in (c), the optimized approximate state exhibits a near-perfect correspondence with the target state overlap, with a fitted slope β~\tilde{\beta} matching β=0.6\beta=0.6. Conversely, for large β\beta in (d), the overlaps from approximate states exhibit a distinct behavior – an exponential decay in the lower energy regime and a plateau in the higher energy regime. This behavior leads to a poor linear fit, characterized by a deviating slope β~<β\tilde{\beta}<\beta. The transition is not induced by numerical machine precision issue since the alignment is perfect for |ci|21015|c_{i}|^{2}\sim 10^{-15} in (c) while the plateau is around 101010^{-10} in (d), much higher than the machine precision floor.
Refer to caption
Figure S11: 2D XXZ model on 4×34\times 3 square lattice with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8. VMC ansatz with neural network width W=128LW=128L is employed. The optimization objective is the infidelity between approximate states and the target states. The target states are chosen as the ground states for (a)(b) and imaginary-time evolved states |ϕ(β)|\phi(\beta)\rangle with β=0.1\beta=0.1 (c) and β=0.6\beta=0.6 (d). The overlaps with different energy eigenstates are depicted as scatter plots in red and gray for approximate states and target states, respectively. The optimization is at the intermediate stage for (a) (less trained) while the optimization is converged for (b)(c)(d). The logarithmic overlaps with excited eigenstates show a linear pattern during training with varying fitted slope β~\tilde{\beta}, aka. inverse effective temperature. The fitted line is shown in dashdot line in the figure. When the target state is imaginary-time evolved state, there is a phase transition in the spectrum behaviors of the approximate state. For small β\beta in (c), the optimized approximate state exhibits a near-perfect correspondence with the target state overlap, with a fitted slope β~\tilde{\beta} matching β=0.6\beta=0.6. Conversely, for large β\beta in (d), the overlaps from approximate states exhibit a distinct behavior – an exponential decay in the lower energy regime and a plateau in the higher energy regime. This behavior leads to a poor linear fit, characterized by a deviating slope β~<β\tilde{\beta}<\beta.
Refer to caption
Figure S12: 2D XXZ model on 4×34\times 3 square lattice with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8. PEPS ansatz with bond dimension χ=4\chi=4 is employed. The optimization objective is the energy expectation for (a)(b) where the target states are the ground states with minimal energy. The training objective is the infidelity against imaginary-time evolved states |ϕ(β)|\phi(\beta)\rangle with β=0.4\beta=0.4 (c) and β=0.8\beta=0.8 (d). The overlaps with different energy eigenstates are depicted as scatter plots in red and gray for approximate states and target states, respectively. The optimization is at the intermediate stage for (a) (less trained) while the optimization is converged for (b)(c)(d). The logarithmic overlaps with excited eigenstates show a linear pattern during training with varying fitted slope β~\tilde{\beta}, aka. inverse effective temperature. The fitted line is shown in dashdot in the figure. When the target state is imaginary-time evolved state, there is a phase transition in the spectrum behaviors of the approximate state. For small β\beta in (c), the optimized approximate state exhibits a near-perfect correspondence with the target state overlap, with a fitted slope β~\tilde{\beta} matching β\beta. Conversely, for large β\beta in (d), the overlaps from approximate states exhibit a two-stage behavior – an exponential decay in the lower energy regime and a plateau in the higher energy regime. This distinct behavior leads to a poor linear fit, characterized by a deviating slope β~<β\tilde{\beta}<\beta.

The spectral decomposition for VQE states is special due to the specific structure and state preparation protocol we used in this work. The corresponding scatter plot at the early and late stages of ground state training is shown in Fig. S13. As we can see, several different branches emerge in the spectral decomposition with more training steps. These branches are labeled by different charge numbers as the system Hamiltonian is U(1)U(1) symmetric. Therefore, the decay factor by considering spectrum contribution from all charge sectors is larger than the one fitted only in the half-filling sector which is highlighted in orange in the figure. We always present the decay factor extracted from the half-filling sector in VQE cases in this work.

In addition, we also stack the spectral decomposition for a series of ansatzes in one figure to better illustrate the two-stage behavior for approximating ITES, see Fig. S14.

Refer to caption
Figure S13: 2D XXZ model on 4×34\times 3 square lattice with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8. VQE ansatz with depth d=6d=6 is employed. The optimization objective is the energy expectation targeting the ground state. The overlaps with different energy eigenstates are depicted as scatter plots in red and gray for optimized approximate states and target states, respectively. Specifically, we highlight the overlap with eigenstates in the half-filling sector with orange color. As we can see, the overlap points become separated for different charge sectors during VQE training which is a unique feature absent in other ansatzes. The optimization is at the intermediate stage for (a) (less trained) while the optimization is converged for (b) (well trained). The overlaps with excited eigenstates show a linear pattern during training with varying fitted slope β~\tilde{\beta}, aka. inverse effective temperature. The fitted line for all overlap points is shown in dashdot black line while the fitted line for overlap points within half-filling sector is shown in dashdot gray line. The half-filling sector spectrum contribution is dominant since the exact ground state lives in this sector.
Refer to caption
Figure S14: 2D XXZ model on 4×34\times 3 square lattice with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8. The spectrum decompositions of MPS ansatz with different bond dimensions are shown in the same figure. The training targets are ITES with β=0.5\beta=0.5 (a) and β=0.9\beta=0.9 (b).

V Other metrics for approximate imaginary-time evolved states

In this section, we report some other metrics for approximate ITES as further evidence for the two-stage behavior. To diagnose the deviation from the exponential decay in the spectrum, we introduce mean squared error inspired by machine learning. Specifically, we construct MSE as 12Li=12L(yiyi)2\frac{1}{2^{L}}\sum_{i=1}^{2^{L}}(y_{i}-y^{\prime}_{i})^{2}, where the target sequence yy and approximate sequence yy^{\prime} are spectral decomposition overlap log|ci0|2\log|c^{0}_{i}|^{2} for exact ITES and log|ci|2\log|c_{i}|^{2} for approximate ITES, respectively. When MSE begins to deviate from 0 with increasing β\beta, the training enters the low-temperature stage where β~<β\tilde{\beta}<\beta. The results of MSE are shown in Fig. S15.

We also report the squared Pearson coefficients and the uncertainty Δβ~\Delta\tilde{\beta} in Fig. S16 and Fig. S17. Pearson coefficient close to 1 and uncertainty on β~\tilde{\beta} close to 0 are both strong signals of good exponential fitting and the high-temperature stage. Notably, β\beta^{*} extracted from different metrics are consistent. It is an interesting future direction to explore whether universal critical behaviors can be identified around β\beta^{*} with these metrics.

Refer to caption
Figure S15: MSE for approximate ITES of different β\betas. The system is 2D XXZ model on 4×34\times 3 square lattice with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8. MSE begins to increase after β\beta^{*} for each ansatz.
Refer to caption
Figure S16: Squared Pearson coefficients for approximate ITES of different β\betas. The system is 2D XXZ model on 4×34\times 3 square lattice with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8. Pearson coefficient close to 1 indicates a good linear fitting.
Refer to caption
Figure S17: Uncertainty Δβ~\Delta\tilde{\beta} for approximate ITES of different β\betas. The system is 2D XXZ model on 4×34\times 3 square lattice with Jx=Jy=1,Jz=0.8J_{x}=J_{y}=1,J_{z}=0.8.

VI Expressiveness and trainability targeting imaginary-time evolved states

In this section, we provide quantitative analysis for the change of required expressive capacity and optimization efforts with varying β\beta when approximating ITES. For the numerical simulation in this section, we use MPS ansatz with bond dimension χ=32\chi=32 to approximate ITES in 2D XXZ model on 4×34\times 3 lattice with Jx=Jy=1J_{x}=J_{y}=1, Jz=0.8J_{z}=0.8. The optimization employs Adam optimizer with an exponential decay learning schedule (initial learning rate 10310^{-3}, and the learning rate halves every 10001000 step). We calculate the entanglement entropy for ITES of different β\betas, where the subsystem partition contains all sites on the left half. This quantity characterizes the expressive capacity required to accurately approximate the target states as higher entanglement requires larger bond dimensions to capture. We also conduct the optimization toward ITES and record the training steps required to reach a given infidelity threshold 10710^{-7}. Though the initialization and the optimizer are the same for different ITES, we find that the required number of training steps is much larger for low temperature (high β\beta) targets. In other words, through the lens of ITES, we identify that the optimization becomes more difficult and slow for low temperature ITES near ground state manifold. The expressive capacity and optimization time required for approximating ITES of different temperatures are summarized in Fig. S18. In this figure, we can see the evident change in expressiveness and optimization efforts required.

Refer to caption
Figure S18: Demonstration on expressive capacity and optimization hardness for approximating ITES of different β\beta. The MPS ansatz of bond dimension χ=32\chi=32 is employed. The system is 2D XXZ model on 4×34\times 3 lattice with Jx=Jy=1J_{x}=J_{y}=1, Jz=0.8J_{z}=0.8. The orange line represents the entanglement entropy of the half system, related to the expressiveness required for accurate approximation. The green line represents the optimization steps required to reach a given fidelity accuracy. With lower temperature (higher β\beta), the requirements on expressiveness are lower while the requirements on optimization are higher.

Based on the above observations, we can explain the different trends of approximation accuracy with varying target β\beta. For low-accuracy ansatzes, the bottleneck is at the expressive capacity to accurately capture the target state. Therefore, with larger β\beta, the state is closer to the ground state with smaller entanglement entropy, resulting in looser expressiveness requirements and the performance of these low-accuracy ansatzes is improved. On the contrary, for high accuracy ansatzes, the ansatz is powerful enough to represent ITES of any temperature and the bottleneck is in optimization. As larger β\beta slows down the optimization, the fidelity obtained via a fixed number of training steps becomes worse. This understanding of the competition between expressiveness and optimization perfectly explains Fig. 4 (b) in the main text.

We also conduct the optimization using a more suitable optimizer L-BFGS (default hyperparameters provided by SciPy with function value and gradient tolerance 102210^{-22} and max iteration 40004000), which greatly improves the fidelity for high-accuracy ansatzes in low-temperature regime (high β\beta). This fact further supports the claim that the expressive capacity of these high-accuracy ansatzes is sufficient, the fidelity drop is mainly due to optimization hardness and insufficient training steps. The training dynamics for MPS ansatz (χ=32\chi=32) toward ITES with both Adam and L-BFGS optimizers are shown in Fig. S19.

Refer to caption
Figure S19: Fidelity during training for approximating ITES. The MPS ansatz of bond dimension χ=32\chi=32 is employed. The system is 2D XXZ model on 4×34\times 3 lattice with Jx=Jy=1J_{x}=J_{y}=1, Jz=0.8J_{z}=0.8. The blue lines are training curves for Adam optimizer of identical hyperparameters while the red lines are training curves for L-BFGS optimizer of identical hyperparameters. L-BFGS greatly outperforms Adam optimizer, but the optimization hardness trend remains for low-temperature targets.

VII Hyperparameters for variational optimization

For the optimization hyperparameters of each method, we use the following settings in Table. 1 unless explicitly specified.

Table 1: Hyperparameters for optimization with different ansatzes.
 Method  Optimizer  Initial learning rate  Learning schedule
PEPS Adam 8×1038\times 10^{-3} constant
MPS Adam 3×1033\times 10^{-3} exponential decay: halved each 1000 steps
VMC Adam 1×1031\times 10^{-3} exponential decay (halved each 200 steps) at first 800 steps then constant
VQE Adam 1×1021\times 10^{-2} exponential decay: halved each 2000 steps
VEC Adam 2×1032\times 10^{-3} constant

In terms of the number of training steps, we keep the optimization process until convergence when approximating ground states. When targeting ITES, the total training steps for each method are listed in Table 2.

Table 2: The number of training steps in ITES training.
 Method  The number of training steps
PEPS 600
MPS 400
VMC 3500
VQE 2000
VEC 600