Markov Chain Monte Carlo in Tensor Network Representation

Synge Todo [email protected] Department of Physics, The University of Tokyo, Tokyo 113-0033, Japan Institute for Physics of Intelligence, The University of Tokyo, Tokyo 113-0033, Japan Institute for Solid State Physics, The University of Tokyo, Kashiwa, 277-8581, Japan

Abstract

Markov chain Monte Carlo (MCMC) is a powerful tool for sampling from complex probability distributions. Despite its versatility, MCMC often suffers from strong autocorrelation and the negative sign problem, leading to slowing down the convergence of statistical error. We propose a novel MCMC formulation based on tensor network representations to reduce the population variance and mitigate these issues systematically. By introducing stochastic projectors into the tensor network framework and employing Markov chain sampling, our method eliminates the systematic error associated with low-rank approximation in tensor contraction while maintaining the high accuracy of the tensor network method. We demonstrate the effectiveness of the proposed method on the two-dimensional Ising model, achieving an exponential reduction in statistical error with increasing bond dimension cutoff. Furthermore, we address the sign problem in systems with negative weights, showing significant improvements in average signs as bond dimension cutoff increases. The proposed framework provides a robust solution for accurate statistical estimation in complex systems, paving the way for broader applications in computational physics and beyond.

Markov chain Monte Carlo (MCMC) is a flexible algorithm for generating random samples according to any arbitrary multidimensional probability distribution [1, 2]. It has been widely applied to simulations of many-body systems in statistical physics, particle physics, and biophysics, as well as for statistical estimation, Bayesian inference, optimization problems, etc. MCMC generates random samples using a stochastic process called a Markov chain. The Markov chain must satisfy the global balance according to the target distribution and the ergodicity in the phase space. Although an MCMC method that satisfies these two conditions guarantees unbiased results in infinite time in principle, the MCMC dynamics in the phase space must be fast for the technique to work in practice [3, 4, 5]. The variance of an averaged quantity, $\sigma^{2}$ , along a Markov chain is generally given by

\sigma^{2}\approx\frac{2\sigma_{0}^{2}\tau_{\text{int}}}{M},

(1)

where $\sigma_{0}^{2}$ is the population variance of the raw time-series data, $\tau_{\text{int}}$ is the autocorrelation time, and $M$ is the number of Monte Carlo (MC) steps.

The MCMC method has an intrinsic difficulty that is even more severe than autocorrelation, called the negative sign problem. For example, the Boltzmann weights sometimes become negative, or even complex, in the path-integral representation for frustrated quantum magnets, and they can not be considered (unnormalized) probabilities for the Monte Carlo sampling anymore; one has to simulate an alternative system, whose weight function is the absolute value of the original one, and take into account the sign effect via the reweighting technique [6, 7, 8]. In many interesting systems, such as frustrated quantum magnets, itinerant electron systems, and real-time unitary evolution, however, the average sign, $S$ , the average of the reweighting factor, becomes exponentially small for larger system sizes, lower temperatures, or longer real-time integration. As a result, the r.h.s. in Eq. 1 gets an extra factor $S^{-2}$ , which ultimately leads to an explosive growth of the statistical error.

The increase of variance can be interpreted differently; due to the correlations, the number of samples that can be considered statistically independent decreases from $M$ to $MS^{2}/2\tau_{\text{int}}$ . So far, many efforts have been made to improve the statistics in MCMC sampling: non-local block updates [9, 10, 11, 12, 13, 14], extended ensemble methods [15, 16, 17, 18], irreversible transition kernels [19, 20, 21, 22], mitigation of negative sign problem [23, 24, 25, 26], etc. However, almost all previous attempts have focused on recovering the effective sample size.

In this paper, we propose a novel approach for improving the efficiency of MCMC from a different perspective; we consider systematically reducing the population variance of physical quantities. One might think that the population variance is not affected by the details of the sampling scheme since it is determined by the target distribution, e.g., the canonical ensemble, which reflects the system’s physical characteristics. However, this intuition is not that correct. The population variance can change depending on the representation of the partition function—the definition of configurations and associated weights on which MCMC sampling is performed. One example is the so-called improved estimator in the cluster and worm algorithms, in which the partition function and the estimators of physical quantities are defined in terms of graphs, and, in some cases, the population variance is reduced drastically [27, 28, 29, 30, 31, 32, 33]. Unfortunately, there has been no systematic research in this direction so far.

We propose an MCMC formulation in a tensor network representation in the following, which systematically improves the population variance. Although our method relies on the low-rank approximation in the conversion to a tensor network representation, the systematic error due to that approximation is eliminated by MCMC sampling. By adopting the tensor network representation, we demonstrate that statistical error decreases exponentially by increasing bond dimension cutoff in the low-rank approximation and that even the negative sign problem is prevented with polynomial computational time.

Recently, tensor network methods have attracted attention as a promising numerical technique. Tensor networks, a versatile data structure for compactly representing large matrices or tensors, have applications in various fields [34, 35, 36]. Beyond their first applications in statistical mechanics for many-body systems [37, 38, 39], they are employed for diverse tasks such as compressing machine learning models [40, 41, 42], simulating complex quantum circuits [43, 44, 45, 46], and providing high-precision approximate solutions for partial differential equations [47, 48, 49, 50].

There are two main approaches in the application of tensor networks. One is the variational (or Hamiltonian) approach, where tensor networks are used as approximate functions with high representability. They are used as variational wave functions for quantum many-body systems or as architectures for generative models in machine learning. The other is called the renormalization (or Lagrangian) approach. For example, the partition function of various classical lattice models and quantum circuits can be expressed precisely as tensor networks. Then, the partition function or quantum amplitude can be obtained as the contraction of the whole network.

However, efficient tensor network contraction is a non-trivial task in either approach [51, 52]. Generally, exact contraction requires exponential computational cost for high-dimensional tensor networks. Therefore, a low-rank approximation based on the singular value decomposition (SVD) is usually employed during the contraction. Since the introduction of the renormalization group methods such as density-matrix renormalization group [37], corner transfer matrix renormalization group [38], and tensor renormalization group method [39], various improvements have been proposed, such as generalization to higher-dimensional systems [53, 54, 55] and fermionic systems [56, 57, 58, 59], improvements in computational cost [54, 55], incorporation of environmental effects [60, 61, 62], and removal of local correlations [63, 64, 65]. The bond dimension cutoff, $d$ , which denotes the number of singular values retained in the low-rank approximation, is crucial in controlling the systematic error in the approximate contraction of tensor networks. The sum of squares of discarded singular values gives the squared Frobenius norm of the difference between the original and the approximated tensors. The systematic error decreases rapidly as the bond dimension cutoff increases, but the computational cost increases as some power of $d$ . In addition, since multiple approximations are combined during the renormalization, a decrease in systematic error is often not monotonic, making the extrapolation to the $d\rightarrow\infty$ limit difficult.

Refer to caption — Figure 1: Tensor network of the Levin-Nave TRG in the projector formulation. Black circles represent the tensors after the initial (exact) SVD. Red, blue, and orange pairs of three-leg tensors represent the projectors for the successive low-rank decompositions. See Appendix C for more details.

Let’s consider the tensor renormalization group (TRG) method proposed by Levin and Nave in 2007 [39] as an example. In TRG, at each renormalization step, a four-leg tensor is decomposed into two three-leg tensors, and then four three-leg tensors are contracted to form a renormalized four-leg tensor. By repeating this procedure, all tensors are contracted to produce the partition function (see Fig. 6 in Appendix C).

The successive low-rank decompositions in the renormalization process can be replaced by the insertion of projectors, as shown in Fig. 1, where each step is described by

\displaystyle A^{*}B\approx A^{*}PB,

(2)

where $P=W_{R}W_{L}^{*}=\sum_{i=1}^{d}\eta_{i}\xi_{i}^{*}\in\mathbb{C}^{r\times r}$ is a projector to $d$ dimensions, $A$ and $B$ are the tensors to be approximated, and $r$ denotes the bond dimension between $A^{*}$ and $B$ . By choosing largest $d$ singular values in SVD of $A^{*}B=U\Sigma V^{*}=\sum_{i=1}^{r}c_{i}\mu_{i}\nu_{i}^{*}$ , the projector corresponding to the best rank- $d$ approximation is defined by $\xi_{i}=c_{i}^{-1/2}A\mu_{i}$ and $\eta_{i}=c_{i}^{-1/2}B\nu_{i}$ ¹¹1Here, we consider the full-rank case, that is, we assume that the rank of $A^{*}B$ is the same as the bond dimension $r$ between $A^{*}$ and $B$ . See Appendix B for rank-deficient cases. (see Appendix B for more details). The total number of projectors, denoted by $N_{\text{p}}$ hereafter, is the number of repetitions of low-rank approximations and is in the same order as the number of initial tensors. Note that a pair of $W_{R}$ and $W_{L}$ that make up a single projector may be far apart in the final tensor network, as the projectors are inserted recursively (Fig. 1). Other SVD-based algorithms, such as TEBD [67, 68], ATRG [54], BTRG [62], and CATN [55], can be rewritten similarly in the projector formulation.

The core technique in our proposal is to replace the projectors with stochastic ones and sample them by utilizing MCMC. In Ref. 69, Ferris proposed a seminal sampling scheme that selects $d$ rank-1 projectors among $r$ according to a certain weight instead of choosing the optimal rank- $d$ projector in the process of the tensor renormalization group, where the number of possible combinations is given by $n_{\text{c}}=\binom{r}{d}$ . The sampling weights are set so that the rank-1 projectors corresponding to the larger singular values are selected with higher probabilities. A scale factor is introduced for each rank-1 projector so that the random average of stochastic projectors becomes the identity operator ²²2Although the inserted tensor is not exactly a projector due to the scale factors, it is referred to as a “projector” in the present paper for simplicity.:

\displaystyle\langle W_{R}(\theta)W_{L}^{*}(\theta)\rangle_{\theta}=\frac{1}{n_{\text{c}}}\sum_{\theta=1}^{n_{\text{c}}}W_{R}(\theta)W_{L}(\theta)p(\theta)=I_{r},

(3)

where $\theta$ denotes an index to specify the set of chosen rank-1 projectors, and $p(\theta)$ is the probability for choosing $\theta$ . (See Appendix D for more details.) Thus, the resulting tensor network with stochastic projectors defines an unbiased estimator for the partition function [69].

However, there is a fatal flaw with the approach in Ref. 69. The number of projectors, $N_{\text{p}}$ , is in the same order as the number of tensors in the original network. The variance of the contraction of a tensor network containing numerous random tensors explodes exponentially to $N_{\text{p}}$ , as it is essentially equivalent to a product of an extensive number of random numbers. Such naive importance sampling only works for relatively small systems as the estimator’s variance diverges exponentially with the system size. (See Appendix A for more details). We must introduce a method that can control the variance, such as the sequential Monte Carlo [71, 72], with resampling particles, or MCMC [1, 2]. In many cases, we are interested in the expected value of physical quantities, not the partition function itself, so the latter, MCMC, is the method of choice.

In the following, we denote the state of each stochastic projector as $\theta_{i}$ . We aim to implement MCMC based on a simple Metropolis-Hasting (MH) scheme [73, 74] with $\{\theta_{i}\}$ as the state variables. To do this, we need to introduce the following additional techniques.

Independent proposal distribution: In the original proposal by Ferris [69], a set of rank-1 projectors is obtained based on the results of the previous short-scale renormalization step, and then importance sampling is performed. In this method, when the short-scale projectors (e.g., red projectors in Fig. 1) are updated, the definition of longer-scale projectors (blue and orange ones) changes, so it is difficult to implement MCMC. In our method, on the other hand, we first execute the conventional deterministic tensor renormalization procedure (in the projector formulation) as an initialization process. At the same time, we determine all the rank-1 projectors and their weights using the results of the SVD in that process. During the MCMC process, we do not change the projector sets and the weights. This initialization scheme makes the proposal distribution of the MCMC projectors independent for each projector, and the partition function can be written as

\displaystyle Z=\sum_{\{\theta_{i}\}}g(\theta_{1},\theta_{2},\ldots\theta_{N_{\text{p}}})p(\theta_{1})p(\theta_{2})\cdots p(\theta_{N_{\text{p}}}).

(4)

In addition, since SVD is only performed for initialization, and in the MCMC process, only the selection of rank-1 projectors and the contraction of the tensor network with embedded projectors is executed, it has the advantage of being very high-performance and executable on modern computer architectures.

Computational graph: The weight factor, $g(\{\theta_{i}\})$ in Eq. 4, is the tensor network contraction with embedded projectors. When the projector configuration changes, the contraction of the whole network must be evaluated again. If we re-evaluate the entire tensor network every time, the computational complexity is $O(d^{5}N)$ ; that is, the cost of one MCMC sweep becomes $O(d^{5}N^{2})$ , which is very inefficient. To reduce the order of this computational complexity, we introduce the concept of a computational graph. The series of contractions can be represented as a tree structure. The initial tensors and projectors correspond to the leaves of the tree structure. When a projector, i.e., a pair of $W_{L}(\theta)$ and $W_{R}(\theta)$ , is updated, only their ancestors need to be re-evaluated. This way, in the case of TRG, the computational complexity of a single MCMC sweep is reduced to $O(d^{5}N\log N)$ .

Physical quantities: In the standard tensor network algorithms, finite difference or automatic differentiation of the free energy is used to evaluate the expectation value of the physical quantities, such as the internal energy and magnetization. In our MCMC algorithm, we adopt the impurity tensor method [75, 76, 77] instead. In the impurity tensor method, we prepare “impurities,” which are obtained by differentiating the initial tensors concerning the external variables, such as the temperature and the magnetic field, and then recursively evaluate the contraction of the tensor network that contains only one or two impurities in total, depending on the order of differentiation. This method is easily implemented in projector-base tensor network renormalization and is compatible with our MCMC formalism. One of the drawbacks of the impurity tensor method in ordinary tensor renormalization groups is that it does not consider projector derivatives, which introduces extra systematic errors. However, in our MCMC method, the projectors become the identity operators after taking the random average and do not depend on external variables; we have no systematic error in the impurity tensor method.

To demonstrate the effectiveness of the proposed method, we present simulation results of the two-dimensional square lattice Ising model, which Hamiltonian is given by

\displaystyle H=-J\sum_{\langle i,j\rangle}\sigma_{i}\sigma_{j}-h\sum_{i}\sigma_{i},

(5)

where $\sigma_{i}=\pm 1$ is the Ising spin variable at the $i$ -th site, $J$ is the coupling constant, and $h$ is the magnetic field.

First, we present the results at zero magnetic field, $h=0$ . In Fig. 2, we show the specific heat and the magnetization squared as functions of the temperature. The system size is $N=16\times 16$ , and the bond dimension cutoff is set to $d=6$ . The total MC steps are $2^{14}$ , of which the first $2^{11}$ steps are discarded as burn-in time. For comparison, we also show the exact results obtained by the transfer matrix method and the results by the Levin-Nave TRG with impurity tensors with $d=2,4,6,8,\ldots,16$ . Note that $d=256$ is required for TRG to obtain the exact result for $N=16\times 16$ . The TRG results suffer from a systematic error due to the finite bond dimension cutoff and the impurity tensor method. Furthermore, the convergence to the exact result is non-monotonic. On the other hand, the MCMC results using $d=6$ are fully consistent with the exact results, supporting that the systematic error is eliminated.

In the inset of Fig. 2, we show the $d$ -dependence of the asymptotic variance, $2\sigma_{0}^{2}\tau_{\text{int}}$ in Eq. 1, of the internal energy, specific heat, and magnetization squared at the critical temperature, $T_{c}=2J/\log(1+\sqrt{2})$ [78]. The asymptotic variance is evaluated using the binning analysis. The variance is smaller than the conventional MH method already at $d=4$ and decreases exponentially with the bond dimension cutoff $d$ . Considering our proposed method’s computational complexity is $O(d^{5}N\log N)$ , the present results indicate that exponential acceleration has been achieved.

Next, we present the result at a pure imaginary magnetic field, $h=i\pi/2\beta$ , or a negative fugacity, $z=e^{-2\beta h}=-1$ . This magnetic field is the intersection of the unit circle $|z|=1$ of the Yang-Lee zeros [79] and the negative real axis of $z$ . At $z=-1$ , the Boltzmann weight, in the original spin-base representation, is given by $W(\{\sigma_{i}\})=(-1)^{m/2}e^{\beta\sum\sigma_{i}\sigma_{j}}$ with $m$ being the total magnetization and the standard MCMC method suffers from the severe sign problem due to the factor $(-1)^{m/2}$ . In Fig. 3, we show the temperature dependence of the average sign. Our proposed method, where the average sign is given by $\langle\operatorname{sgn}g(\{\theta_{i}\})\rangle$ , also has negative signs at small $d$ . However, in contrast to the standard MCMC method, the average sign is improved systematically and approaches unity rapidly as we increase $d$ . Thus, the negative sign problem has also been prevented by paying a polynomial cost.

To summarize, we have introduced a novel MCMC framework in tensor network representations. By combining projector formulation of the tensor network renormalization and MCMC sampling, we eliminate systematic error arising from finite bond dimension cutoff while maintaining the computational efficiency of tensor network techniques. We demonstrated the effectiveness of our approach through simulations of the two-dimensional Ising model, showing exponential reductions in statistical error and a systematic resolution of the negative sign problem with polynomial computational cost. This framework is versatile and applicable to various sophisticated tensor network algorithms, such as HOTRG [53], TEBD [67, 68], ATRG [54], BTRG [62], and CATN [55].

The proposed method facilitates accurate evaluation of partition functions, physical quantities, and tensor network optimization, making it a powerful tool for tackling complex physics problems. Future work could explore extensions to higher-dimensional systems, applications in real-time unitary dynamics, and integration with advanced MCMC techniques, such as the extended ensemble methods [15, 16, 17, 18] and irreversible transition kernels [19, 20, 21, 22]. By offering both accuracy and computational efficiency, the proposed method represents a significant step forward in developing robust and scalable sampling techniques for computational science.

Acknowledgements.

The author thanks Hidemaro Suwa and Tsuyoshi Okubo for fruitful discussions and comments. This work was supported by JSPS KAKENHI Grant Numbers 20H01824 and 24K00543, the Center of Innovation for Sustainable Quantum AI (SQAI), JST Grant Number JPMJPF2221, and JST CREST Grant Number JPMJCR24I1.

Appendix A Product of Random Numbers

In this Appendix, we focus on the problem of sampling the product of $N$ independent random numbers. Suppose that we have $N$ random variables, $\{X_{i}\}_{i=1}^{N}$ , that are independent and each of which is distributed according to probability distributions with mean $\mu_{i}$ and variance $\sigma_{i}^{2}$ . We are interested in estimating the expectation of the product of these random variables, $\text{E}[X_{1}X_{2}\cdots X_{N}]$ . As $\{X_{i}\}$ are independent of each other, the expectation of the product is given by the product of their expectations:

\displaystyle\text{E}[X_{1}X_{2}\cdots X_{N}]=\text{E}[X_{1}]\text{E}[X_{2}]\cdots\text{E}[X_{N}]=\prod_{i}\mu_{i}\equiv\Gamma.

(6)

The sample mean of the product of $N$ random variables is an unbiased estimator of $\Gamma$ . However, the relative statistical error diverges exponentially with $N$ :

\displaystyle\begin{split}\text{Var}[X_{1}X_{2}\cdots X_{N}]&=\text{E}[X_{1}^{2}X_{2}^{2}\cdots X_{N}^{2}]-\text{E}[X_{1}X_{2}\cdots X_{N}]^{2}\\ &=\text{E}[X_{1}^{2}]\text{E}[X_{2}^{2}]\cdots\text{E}[X_{N}^{2}]-\Gamma^{2}\\ &=\prod_{i}(\mu_{i}^{2}+\sigma_{i}^{2})-\Gamma^{2}\\ &\approx\Gamma^{2}\prod_{i}(1+\sigma_{i}^{2}/\mu_{i}^{2})\qquad\text{for $N\gg 1$}.\end{split}

(7)

The divergence of the relative error can also be explained from a different viewpoint. The multiplication of $N$ random variables is interpreted as an $N$ -step random walk on the logarithmic axis as

\displaystyle\log(X_{1}X_{2}\cdots X_{N})=\sum_{i=1}^{N}\log X_{i}.

(8)

The linear increase of the variance of the random walk causes exponential divergence of the relative error in the estimate of $\Gamma$ .

Let’s consider a simple example: $\{X_{i}\}_{i=1}^{N}$ are independent and identity distributed (i.i.d.), uniformly in $(0,2]$ :

\displaystyle P(X=x)=\begin{cases}1/2&\text{for $0<x\leq 2$}\\ 0&\text{otherwise}.\end{cases}

(9)

The mean and the variance are $\mu=1$ and $\sigma^{2}=1/3$ , respectively. The probability density function of $Y=\log X$ is given by

\displaystyle P(Y=y)=\begin{cases}e^{y}&\text{for $y\leq\log 2$}\\ 0&\text{otherwise},\end{cases}

(10)

and the mean and the variance are $\text{E}[Y]=\log 2-1$ and $\text{Var}[Y]=1$ , respectively.

In Fig. 4, we show the histogram of samples, $\{\log(x_{1}x_{2}\cdots x_{N})\}$ for $N=4,8,\ldots,64$ . The number of samples is $M=2^{24}$ . As expected from Eq. 8, a biased random walk behavior is observed; the peak position shifts as $(\log 2-1)N$ , while the distribution width increases as $\sqrt{N}$ . In other words, in the original linear scale, the typical value of $x_{1}x_{2}\cdots x_{N}$ is exponentially small as $\exp[(\log 2-1)N]$ and rare events with extremely large contributions dominate the mean $\Gamma=1$ . In Fig. 5, we plot the variance as a function of $N$ , which diverges exponentially as $(4/3)^{N}$ as predicted by Eq. 7. When $N$ is large, the mean and the variance are greatly underestimated as the sample size is insufficient to pick up rare events.

Thus, even though as an estimator of the product of means, the sample mean of the product of the random variables is “perfect,” i.e., each sample is statistically independent and unbiased, it suffers from the exponential growth of statistical error when $N$ is large. This issue is the one that has surfaced in Ref. 69.

The present naive perfect sampling is equivalent to the sequential Monte Carlo (SMC) method without resampling. In the SMC method [71, 72], many particles, or walkers, are simulated simultaneously. To each particle, a weight is assigned. These particles are updated according to the model. However, a problem called “weight degeneracy” occurs, where a few particles gradually end up with extremely high weights while many others contribute almost nothing. To prevent this phenomenon, we perform resampling during particle updates. Resampling prunes particles with small weights and duplicates those with high weights to re-even out the overall distribution. This technique prevents information from being concentrated in a few particles and increases the effective sample size.

Here, we adopt one of the simplest methods, multinominal resampling, for the present problem, where the weight of each particle is the product of the random numbers [80]; first, we prepare $M$ particles and initialize their weights to unity. Then, each particle’s weight is multiplied by a random number $x$ at each step. After the update process, we perform resampling so that the weights of all the particles become the same. In the present problem, since there is no internal state in particles other than the weight, we replace the weight of all particles with their average. This set of update and resampling is repeated $N$ times to estimate the product’s mean. In Fig. 5, we also plot the results after the introduction of resampling. The suppression of the variance is remarkable. The variance is now given by replacing $\sigma^{2}$ in equation Eq. 7 by $\sigma^{2}/M$ , which is almost linear in $N$ :

\displaystyle\text{Var}[X_{1}X_{2}\cdots X_{N}]

\displaystyle=\Big{(}1+\frac{\sigma^{2}}{M}\Big{)}^{N}-1\approx\frac{\sigma^{2}N}{M}

(11)

as long as $M\gg N$ .

Another established approach for controlling the variance of weights is the MCMC method [1, 2]. If we are interested in the weighted average of quantities, not the average weight, the MCMC method is more suitable. In the main text, we adopt the MCMC method to estimate the average physical quantities in the canonical ensemble.

Appendix B Projector formulation of low-rank approximation

Let’s consider SVD, or rank-1 decomposition, of the product of two matrices, $A\in\mathbb{C}^{r\times p}$ and $B\in\mathbb{C}^{r\times q}$ :

\displaystyle A^{*}B=U\Sigma V^{*}=\sum_{i=1}^{s}c_{i}\mu_{i}\nu_{i}^{*},

(12)

where $\cdot^{*}$ denotes matrix conjugate transpose, $s$ $\leq\min(p,q,r)$ is the rank of $A^{*}B$ , $U\in\mathbb{C}^{p\times s}$ and $V\in\mathbb{C}^{q\times s}$ are isometries ( $U^{*}U=V^{*}V=I_{s}$ ), $\Sigma\in\mathbb{R}^{s\times s}$ is a diagonal matrix whose diagonal elements are singular values $\{c_{i}\}_{i=1}^{s}$ , and $\mu_{i}\in\mathbb{C}^{p}$ and $\nu_{i}\in\mathbb{C}^{q}$ denote the $i$ -th column vector of $U$ and $V$ , respectively. We assume that the singular values in $\Sigma$ are sorted in descending order. For later convenience, we define a diagonal matrix $\Lambda=\Sigma^{1/2}$ .

Rank- $d$ approximation gives a $p\times q$ matrix of rank $d$ ( $\leq s$ ) that approximates the input $A^{*}B$ . By using the SVD introduced above, the optimal rank- $d$ approximation that minimizes the Frobenius norm of the difference from the input is given by

\displaystyle A^{*}B\approx U\bar{\Lambda}(V\bar{\Lambda})^{*}.

(13)

Here, $\bar{\Lambda}\in\mathbb{R}^{s\times d}$ is a matrix consisting of the first $d$ columns of $\Lambda$ . This procedure also defines the low-rank decomposition: $A^{*}B\approx\bar{A}^{*}\bar{B}$ with $\bar{A}=(U\bar{\Lambda})^{*}\in\mathbb{C}^{d\times p}$ and $\bar{B}=(V\bar{\Lambda})^{*}\in\mathbb{C}^{d\times q}$ .

The low-rank approximation can be represented in a different form. We define $P=W_{R}W_{L}^{*}\in\mathbb{C}^{r\times r}$ with

	$\displaystyle W_{L}$	$\displaystyle=AU(\bar{\Lambda}^{+})^{*},$		(14)
	$\displaystyle W_{R}$	$\displaystyle=BV(\bar{\Lambda}^{+})^{*},$		(15)

where $\bar{\Lambda}^{+}\in\mathbb{R}^{d\times s}$ is the Moore-Penrose inverse of $\bar{\Lambda}$ . By noticing that $\bar{\Lambda}^{+}\Sigma=\bar{\Lambda}^{*}$ , one can confirm that $P$ satisfies

	$\displaystyle P^{2}$	$\displaystyle=(W_{R}W_{L}^{})^{2}=W_{R}W_{L}^{}=P,$		(16)
	$\displaystyle A^{*}PB$	$\displaystyle=AW_{R}W_{L}^{}B^{}=\bar{A}^{*}\bar{B}.$		(17)

Thus, $P$ is a projector and its insertion between $A^{*}$ and $B$ yields the rank- $d$ approximation, Eq. 13.

The projector $P$ can be decomposed into the sum of rank-1 projectors as

\displaystyle P=\sum_{i=1}^{d}\eta_{i}\xi_{i}^{*},

(18)

where $\xi_{i}$ and $\eta_{i}$ are the $i$ -th column vector of $W_{L}$ and $W_{R}$ , respectively, i.e.,

	$\displaystyle\xi_{i}$	$\displaystyle=c_{i}^{-1/2}A\mu_{i},$		(19)
	$\displaystyle\eta_{i}$	$\displaystyle=c_{i}^{-1/2}B\nu_{i},$		(20)

where $\{\xi_{i}\}_{i=1}^{d}$ and $\{\eta_{i}\}_{i=1}^{d}$ form a dual orthonormal basis set:

\displaystyle\xi_{i}^{*}\eta_{j}=\delta_{ij}\qquad\forall i,j\in[1,d],

(21)

or equivalently,

\displaystyle W_{L}^{*}W_{R}=I_{d}.

(22)

Note that by setting $d=s$ , one can recover the input matrix:

\displaystyle A^{*}PB=A^{*}B\qquad\text{when $d=s$}.

(23)

However, $P$ is not necessarily an identity matrix even in that case; if $s<r$ , the rank of $P$ is smaller than $r$ , and it can not be an identity.

In such a case, we augment basis vectors to form a complete dual orthonormal basis set, $\{\xi_{i}\}_{i=1}^{r}$ and $\{\eta_{i}\}_{i=1}^{r}$ ; we add $2(r-s)$ vectors to the basis set by using the Gramm-Schmidt orthonormalization. After the augmentation, $W_{L}$ and $W_{R}$ become square, and thus

\displaystyle P=W_{R}W_{L}^{*}=I_{r}

(24)

follows $W_{L}^{*}W_{R}=I_{r}$ .

In our MCMC method, we record these complete dual basis set $\{(\xi_{i},\eta_{i})\}$ together with the corresponding singular values $\{c_{i}\}$ during the initialization stage, where the standard tensor renormalization group is performed with optimal low-rank approximations. The completeness property, Eq. 24, is essential when introducing MC sampling of projectors, as the projectors will be inserted between matrices (or tensors), which are generated stochastically and different from those used in the initialization stage.

Appendix C Tensor Renormalization Group in Projector Formulation

The partition function of the Ising model, Eq. 5, can be represented by a tensor network. First, we regard the Boltzmann weight of each bond term as a $2\times 2$ matrix:

\displaystyle W

\displaystyle=\begin{bmatrix}e^{\beta J}&e^{-\beta J}\\ e^{-\beta J}&e^{\beta J}\end{bmatrix},

(25)

which is positive definite and can be decomposed as

	$\displaystyle W$	$\displaystyle=K^{*}K,$	(26)
where
	$\displaystyle K$	$\displaystyle=\begin{bmatrix}\cosh^{1/2}\beta J&\cosh^{1/2}\beta J\\ \sinh^{1/2}\beta J&-\sinh^{1/2}\beta J\end{bmatrix}.$	(27)

By using $K$ , we define a four-leg site tensor as

\displaystyle T_{ijkl}

\displaystyle=\sum_{m}K_{im}K_{jm}K_{km}K_{lm}e^{\beta h\sigma_{m}}.

(28)

The partition function is then represented by a tensor network of $T$ on the square lattice, as shown in Fig. 6(a).

At each iteration step of the Levin-Nave TRG [39], we decompose a four-leg tensor into two three-leg tensors [Fig. 6(b), (d), (f) (h)] and then contract four three-leg tensors to form a renormalized four-leg tensor [Fig. 6(c), (e), (g), (i)]. By repeating this procedure, all tensors are finally contracted to produce a scalar, the partition function. The decomposition process is performed by SVD with the optimal low-rank truncation of the singular values (Appendix B). In particular, the first decomposition step, Fig. 6(b), is performed explicitly as

	$\displaystyle T_{ijkl}$	$\displaystyle=\sum_{m}R_{ijm}R_{klm},$	(29)
where
	$\displaystyle R_{ijm}$	$\displaystyle=K_{im}K_{jm}e^{\beta h\sigma_{m}/2}.$	(30)

The impurity tensors for calculating the internal energy and the specific heat (the magnetization and the susceptibility) is obtained by taking derivatives of $R$ with respect to the inverse temperature $\beta$ (the external field $h$ ).

The above tensor renormalization procedure can be rewritten in the projector formulation (Appendix B), where projector insertion replaces tensor contraction-and-decomposition. The first step of the projector-base TRG is the same as the original TRG (Fig. 7(b)). Then, the next steps, Fig. 6(c) and (d), are replaced by Fig. 7(c), where projectors, i.e., pairs of three-leg tensors (red), are inserted between the tensors. This procedure is repeated as shown in Fig. 7(d) and (e). Note that in the original Levin-Nave TRG, tensors (denoted by black circles in Fig. 6) are transformed successively as the renormalization proceeds. On the other hand, in the projector-base TRG, the tree-leg tensors in Fig. 6(b) remain unchanged until the final tensor network [Fig. 6(e)].

In the original TRG and its projector representation, we can use the translational symmetry of the system to reduce the $N$ -dependence of the computational complexity from $O(N)$ to $O(\log N)$ . It should be noted, however, that when sampling projectors, the projectors at all locations must be treated as separate, or a systematic bias will be introduced [81].

As for the $d$ -dependence of the computational complexity, the original TRG has $O(d^{6})$ complexity for the SVD of a four-leg tensor into two three-leg tensors. However, by using the randomized SVD or the partial SVD, the complexity can be reduced to $O(d^{5})$ [82, 83]. In the initialization process of the proposed MCMC method, however, we need the full SVD to obtain all rank-1 projectors, whose complexity is still $O(d^{6})$ . In the later MCMC sampling stage, the complexity of the tensor network contraction is $O(d^{5})$ .

Appendix D Sampling Projectors

At each step of MCMC, a projector, a set of rank-1 projectors, is proposed as a candidate according to predetermined trial probabilities; we choose a projector in $r$ dimensions projecting onto a $d$ -dimensional subspace. A particular set of rank-1 projectors, $\{k_{1},k_{2},\ldots,k_{d}\}$ ( $1\leq k_{1}<k_{2}<\cdots<k_{d}\leq r$ ), is chosen with a probability proportional to the product of weights of each rank-1 projector, $\prod_{i=1}^{d}w_{k_{i}}$ . In this work, the weight of each rank-1 projector is defined by using the corresponding singular value $c_{i}$ as

\displaystyle w_{i}=[\max(c_{i},c_{\text{max}}\times 10^{-12})]^{\omega},

(31)

where $c_{\text{max}}$ is the largest singular value and the exponent $\omega$ is a hyperparameter that controls the distribution of the weights. For the numerical demonstration in the main text, we use $\omega=1$ , but the optimal value $\omega$ should be studied in more detail in the future. There are $\binom{r}{d}$ possible combinations to choose $d$ out of $r$ . In typical applications, $r\approx d^{2}$ . For example, for $d=32$ , the number of combinations is more than $10^{60}$ . Thus, the naive tower sampling does not work at all.

For the present purpose, we can employ the dynamic programming [84]. We should notice that if $m$ elements have already been chosen from $\{1,\ldots,k-1\}$ , the probability of choosing the $k$ -th element is given by

q^{(m)}_{k}=\frac{w_{k}R_{k,m+1}}{w_{k}R_{k,m+1}+R_{k,m}},

(32)

where $R_{k,m}$ ( $k=0,\ldots,r$ ; $m=0,\ldots,d$ ) is a sum of weight products of $\binom{r-k}{d-m}$ combinations of choosing $(d-m)$ elements from $\{k+1,\ldots,r\}$ . We can use this conditional probability to decide whether to adopt each element, $k=1,\ldots,r$ , one by one. Also, the probability that a chosen set includes the $k$ -th element $(k=1,\ldots,r)$ can be represented as

q_{k}=\frac{s_{k}}{z_{k}}

(33)

with

	$\displaystyle s_{k}$	$\displaystyle=\sum_{m=0}^{d-1}L_{k-1,m}w_{k}R_{k,m+1},$		(34)
	$\displaystyle z_{k}$	$\displaystyle=\sum_{m=0}^{d-1}L_{k-1,m}w_{k}R_{k,m+1}+\sum_{m=0}^{d}L_{k-1,m}R_{k,m},$		(35)

where $L_{k,m}$ ( $k=0,\ldots,r$ ; $m=0,\ldots,d$ ) is a sum of weights of $\binom{k}{m}$ combinations of choosing $m$ elements from $1,\ldots,k$ . $\{R_{k,m}\}$ and $\{L_{k,m}\}$ are calculated by using the recursion relations:

	$\displaystyle R_{k,m}$	$\displaystyle=\begin{cases}\delta_{m,d}&\text{for $k=r$}\\ w_{k+1}R_{k+1,m+1}+R_{k+1,m}&\text{otherwise,}\end{cases}$		(36)
	$\displaystyle L_{k,m}$	$\displaystyle=\begin{cases}\delta_{m,0}&\text{for $k=0$}\\ w_{k}L_{k-1,m-1}+L_{k-1,m}&\text{otherwise,}\end{cases}$		(37)

respectively, in advance at the cost of $O(rd)$ . In applying Eqs. 36 and 37, we should regard $w_{0}$ and $w_{d+1}$ as zero. It is immediately clear from the above definitions that $R_{k,d}=L_{k,0}=1$ for $k=1,\ldots,r$ . By using Eqs. 36 and 37, $z_{k}$ can be rewritten as

\displaystyle\begin{split}z_{k}&=\sum_{m=0}^{d-1}L_{k-1,m}(w_{k}R_{k,m+1}+R_{k,m})+L_{k-1,d}R_{k,d}\\ &=\sum_{m=0}^{d}L_{k-1,m}R_{k-1,m}.\end{split}

(38)

Note that $z_{k}$ , which is nothing but the partition function, does not depend on $k$ and is equal to $R_{0,0}$ or $L_{r,d}$ . In the Appendix of Ref. 69, an equivalent formulation has been presented in the tensor network language.

When applying the above expressions, additional care must be taken, as $\{R_{k,m}\}$ and $\{L_{k,m}\}$ are the sums of the products of many element weights, and overflow or underflow easily occurs if $r$ becomes large. One possible workaround is to use multi-precision floating-point arithmetic. Alternatively, one can store the logarithm of the value and proceed with the log-sum-exponential trick. In the following, we present a more natural implementation. As we see below, our method is portable as it does not require multi-precision floating-point arithmetic and is faster as it does not require logarithmic and exponential operations.

First, we introduce new quantities $\{S_{k,m}\}$ and $\{T_{k,m}\}$ , which are defined as the ratio of $\{R_{k,m}\}$ and $\{L_{k,m}\}$ , respectively:

	$\displaystyle S_{k,m}$	$\displaystyle=\begin{cases}R_{r,m}=\delta_{m,d}&\text{for $k=r$}\\ 1&\text{for $m=d$}\\ R_{k,m}/R_{k,m+1}&\text{for $m<d$ and $R_{k,m+1}>0$}\\ 0&\text{otherwise,}\end{cases}$		(39)
	$\displaystyle T_{k,m}$	$\displaystyle=\begin{cases}L_{0,m}=\delta_{m,0}&\text{for $k=0$}\\ 1&\text{for $m=0$}\\ L_{k,m}/L_{k,m-1}&\text{if $m>0$ and $L_{k,m-1}>0$}\\ 0&\text{otherwise.}\end{cases}$		(40)

Note that $\{S_{k,m}\}$ and $\{T_{k,m}\}$ are quantities of the order of $w_{k}$ unlike $\{R_{k,m}\}$ and $\{L_{k,m}\}$ . The following recursion relations manifest this fact:

	$\displaystyle S_{k,m}$	$\displaystyle=\begin{cases}w_{k+1}+S_{k+1,d-1}&\text{for $m=d-1$}\\ S_{k+1,m+1}\displaystyle\frac{w_{k+1}+S_{k+1,m}}{w_{k+1}+S_{k+1,m+1}}&\text{otherwise,}\end{cases}$		(41)
	$\displaystyle T_{k,m}$	$\displaystyle=\begin{cases}w_{k-1}+T_{k-1,1}&\text{for $m=1$}\\ T_{k-1,m-1}\displaystyle\frac{w_{k-1}+T_{k-1,m}}{w_{k-1}+T_{k-1,m-1}}&\text{otherwise.}\end{cases}$		(42)

By using $\{S_{k,m}\}$ , the conditional probability $q_{k}^{(m)}$ [Eq. 32] is represented as

\displaystyle q_{k}^{(m)}

\displaystyle=\begin{cases}\displaystyle\frac{w_{k}}{w_{k}+S_{k,m}}&\text{if $w_{k}>0$}\\ 0&\text{otherwise.}\end{cases}

(43)

Similarly, the marginal probability $q_{k}$ [Eq. 33] is represented using $\{S_{k,m}\}$ and $\{T_{k,m}\}$ as

\displaystyle\begin{split}q_{k}&=\frac{\displaystyle\sum_{m=0}^{d-1}\Big{[}\prod_{j=1}^{m}T_{k-1,j}\Big{]}w_{k}\Big{[}\prod_{j=m+1}^{d}S_{k,j}\Big{]}}{\displaystyle\Big{[}\prod_{j=0}^{d}S_{0,j}\Big{]}^{1/2}\Big{[}\prod_{j=0}^{d}T_{r,j}\Big{]}^{1/2}}\\ &=w_{k}\sum_{m=0}^{d-1}\Big{[}\prod_{j=1}^{m}\frac{T_{k-1,j}}{S_{0,j}^{1/2}T_{r,j}^{1/2}}\Big{]}\Big{[}\prod_{j=m+1}^{d}\frac{S_{k,j}}{S_{0,j}^{1/2}T_{r,j}^{1/2}}\Big{]},\end{split}

(44)

which is evaluated at the cost of $O(d^{2})$ for each $k$ and is free from overflow or underflow.

References

Newman and Barkema [1999] M. Newman and G. Barkema, Monte Carlo Methods in Statistical Physics (Oxford University Press, 1999).
Landau and Binder [2014] D. P. Landau and K. Binder, A Guide to Monte Carlo Simulations in Statistical Physics, 4th ed. (Cambridge University Press, Cambridge, 2014).
Sokal [1997] A. Sokal, Monte Carlo Methods in Statistical Mechanics: Foundations and New Algorithms, in Functional Integration: Basics and Applications, edited by C. DeWitt-Morette, P. Cartier, and A. Folacci (Springer US, Boston, MA, 1997) pp. 131–192.
Robert and Casella [2004] C. P. Robert and G. Casella, Monte Carlo Statistical Methods, 2nd ed. (Springer, New York, 2004).
Suwa and Todo [2024] H. Suwa and S. Todo, Control of probability flow in Markov chain Monte Carlo—Nonreversibility and lifting, J. Chem. Phys. 161, 10.1063/5.0233858 (2024).
Loh et al. [1990] E. Y. Loh, J. E. Gubernatis, R. T. Scalettar, S. R. White, D. J. Scalapino, and R. L. Sugar, Sign problem in the numerical simulation of many-electron systems, Phys. Rev. B 41, 9301 (1990).
Ferrenberg and Swendsen [1988] A. M. Ferrenberg and R. H. Swendsen, New Monte Carlo technique for studying phase transitions, Phys. Rev. Lett. 61, 2635 (1988).
Münger and Novotny [1991] E. P. Münger and M. A. Novotny, Reweighting in Monte Carlo and Monte Carlo renormalization-group studies, Phys. Rev. B 43, 5773 (1991).
Swendsen and Wang [1987] R. H. Swendsen and J.-S. Wang, Nonuniversal critical dynamics in Monte Carlo simulations, Phys. Rev. Lett. 58, 86 (1987).
Duane et al. [1987] S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth, Hybrid Monte Carlo, Phys. Lett. B 195, 216 (1987).
Evertz et al. [1993] H. G. Evertz, G. Lana, and M. Marcu, Cluster algorithm for vertex models, Phys. Rev. Lett. 70, 875 (1993).
Boninsegni et al. [2006] M. Boninsegni, N. Prokof’ev, and B. Svistunov, Worm Algorithm for Continuous-Space Path Integral Monte Carlo Simulations, Phys. Rev. Lett. 96, 070601 (2006).
Frías Pérez et al. [2023] M. Frías Pérez, M. Mariën, D. Pérez García, M. C. Bañuls, and S. Iblisdir, Collective Monte Carlo updates through tensor network renormalization, SciPost Physics 14, 123 (2023).
Chen et al. [2024] T. Chen, E. Guo, W. Zhang, P. Zhang, and Y. Deng, Tensor network Monte Carlo simulations for the two-dimensional random-bond Ising model (2024), arXiv:2409.06538 [cond-mat] .
Janke [1998] W. Janke, Multicanonical Monte Carlo simulations, Physica A: Statistical Mechanics and its Applications 254, 164 (1998).
Hukushima and Nemoto [1996] K. Hukushima and K. Nemoto, Exchange Monte Carlo Method and Application to Spin Glass Simulations, J. Phys. Soc. Jpn. 65, 1604 (1996).
Diaconis et al. [2000] P. Diaconis, S. Holmes, and R. M. Neal, Analysis of a nonreversible Markov chain sampler, Ann. Appl. Probab. 10, 726 (2000).
Turitsyn et al. [2011] K. S. Turitsyn, M. Chertkov, and M. Vucelja, Irreversible Monte Carlo algorithms for efficient sampling, Phys. D: Nonlinear Phenom. 240, 410 (2011).
Creutz [1987] M. Creutz, Overrelaxation and Monte Carlo simulation, Phys. Rev. D 36, 515 (1987).
Suwa and Todo [2010] H. Suwa and S. Todo, Markov Chain Monte Carlo Method without Detailed Balance, Phys. Rev. Lett. 105, 120603 (2010).
Bernard et al. [2009] E. P. Bernard, W. Krauth, and D. B. Wilson, Event-chain Monte Carlo algorithms for hard-sphere systems, Phys. Rev. E 80, 056704 (2009).
Michel et al. [2014] M. Michel, S. C. Kapfer, and W. Krauth, Generalized event-chain Monte Carlo: Constructing rejection-free global-balance algorithms from infinitesimal steps, The Journal of Chemical Physics 140, 054116 (2014).
Hangleiter et al. [2020] D. Hangleiter, I. Roth, D. Nagaj, and J. Eisert, Easing the Monte Carlo sign problem, Science Advances 6, eabb8341 (2020).
Klassen et al. [2020] J. Klassen, M. Marvian, S. Piddock, M. Ioannou, I. Hen, and B. M. Terhal, Hardness and Ease of Curing the Sign Problem for Two-Local Qubit Hamiltonians, SIAM J. Comput. 49, 1332 (2020).
Levy and Clark [2021] R. Levy and B. K. Clark, Mitigating the Sign Problem through Basis Rotations, Phys. Rev. Lett. 126, 216401 (2021).
[26] K. Murota and S. Todo, Local basis transformation to mitigate negative sign problems, preprint .
Wolff [1989] U. Wolff, Collective Monte Carlo Updating for Spin Systems, Phys. Rev. Lett. 62, 361 (1989).
Wolff [1990] U. Wolff, Critical slowing down, Nuclear Physics B - Proceedings Supplements 17, 93 (1990).
Hasenbusch [1990] M. Hasenbusch, Improved estimators for a cluster updating of O(n) spin models, Nuclear Physics B 333, 581 (1990).
Niedermayer [1990] F. Niedermayer, Improving the improved estimator in O(N) spin models, Physics Letters B 237, 473 (1990).
Baker and Kawashima [1995] Jr. Baker, George A. and N. Kawashima, Renormalized Coupling Constant for the Three-Dimensional Ising Model, Phys. Rev. Lett. 75, 994 (1995).
Horita et al. [2017] T. Horita, H. Suwa, and S. Todo, Upper and lower critical decay exponents of Ising ferromagnets with long-range interaction, Phys. Rev. E 95, 012143 (2017).
Suwa [2022] H. Suwa, Lifted directed-worm algorithm, Phys. Rev. E 106, 055306 (2022).
Schollwöck [2011] U. Schollwöck, The density-matrix renormalization group in the age of matrix product states, Annals of Physics 326, 96 (2011).
Orús [2014] R. Orús, A practical introduction to tensor networks: Matrix product states and projected entangled pair states, Annals of Physics 349, 117 (2014).
Xiang [2023] T. Xiang, Density Matrix and Tensor Network Renormalization, 1st ed. (Cambridge University Press, Cambridge New York Melbourne New Delhi Singapore, 2023).
White [1992] S. R. White, Density matrix formulation for quantum renormalization groups, Phys. Rev. Lett. 69, 2863 (1992).
Nishino and Okunishi [1996] T. Nishino and K. Okunishi, Corner Transfer Matrix Renormalization Group Method, J. Phys. Soc. Jpn. 65, 891 (1996).
Levin and Nave [2007] M. Levin and C. P. Nave, Tensor Renormalization Group Approach to Two-Dimensional Classical Lattice Models, Phys. Rev. Lett. 99, 120601 (2007).
Stoudenmire and Schwab [2017] E. M. Stoudenmire and D. J. Schwab, Supervised Learning with Quantum-Inspired Tensor Networks (2017), arXiv:1605.05775 [cond-mat, stat] .
Han et al. [2018] Z.-Y. Han, J. Wang, H. Fan, L. Wang, and P. Zhang, Unsupervised Generative Modeling Using Matrix Product States, Phys. Rev. X 8, 031012 (2018).
Gao et al. [2020] Z.-F. Gao, S. Cheng, R.-Q. He, Z. Y. Xie, H.-H. Zhao, Z.-Y. Lu, and T. Xiang, Compressing deep neural networks by matrix product operators, Phys. Rev. Res. 2, 023300 (2020).
Liu et al. [2021] Y. A. Liu, X. L. Liu, F. N. Li, H. Fu, Y. Yang, J. Song, P. Zhao, Z. Wang, D. Peng, H. Chen, C. Guo, H. Huang, W. Wu, and D. Chen, Closing the ”quantum supremacy” gap: Achieving real-time simulation of a random quantum circuit using a new Sunway supercomputer, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (ACM, St. Louis Missouri, 2021) pp. 1–12.
Pan and Zhang [2021] F. Pan and P. Zhang, Simulating the Sycamore quantum supremacy circuits (2021), arXiv:2103.03074 [physics, physics:quant-ph] .
Seitz et al. [2023] P. Seitz, I. Medina, E. Cruz, Q. Huang, and C. B. Mendl, Simulating quantum circuits using tree tensor networks, Quantum 7, 964 (2023), arXiv:2206.01000 [physics, physics:quant-ph] .
Ayral et al. [2023] T. Ayral, T. Louvet, Y. Zhou, C. Lambert, E. M. Stoudenmire, and X. Waintal, Density-Matrix Renormalization Group Algorithm for Simulating Quantum Circuits with a Finite Fidelity, PRX Quantum 4, 020304 (2023).
Ye and Loureiro [2022] E. Ye and N. F. G. Loureiro, Quantum-inspired method for solving the Vlasov-Poisson equations, Phys. Rev. E 106, 035208 (2022).
Gourianov et al. [2022] N. Gourianov, M. Lubasch, S. Dolgov, Q. Y. van den Berg, H. Babaee, P. Givi, M. Kiffner, and D. Jaksch, A quantum-inspired approach to exploit turbulence structures, Nat Comput Sci 2, 30 (2022).
Kornev et al. [2023] E. Kornev, S. Dolgov, K. Pinto, M. Pflitsch, M. Perelshtein, and A. Melnikov, Numerical solution of the incompressible Navier-Stokes equations for chemical mixers via quantum-inspired Tensor Train Finite Element Method (2023), arXiv:2305.10784 .
Sakurai et al. [2024] R. Sakurai, H. Takahashi, and K. Miyamoto, Learning parameter dependence for Fourier-based option pricing with tensor networks (2024), arXiv:2405.00701 .
Schutski et al. [2020] R. Schutski, T. Khakhulin, I. Oseledets, and D. Kolmakov, Simple heuristics for efficient parallel tensor contraction and quantum circuit simulation, Phys. Rev. A 102, 062614 (2020).
Gray and Chan [2022] J. Gray and G. K.-L. Chan, Hyper-optimized compressed contraction of tensor networks with arbitrary geometry (2022), arXiv:2206.07044 [cond-mat, physics:quant-ph] .
Xie et al. [2012] Z. Y. Xie, J. Chen, M. P. Qin, J. W. Zhu, L. P. Yang, and T. Xiang, Coarse-graining renormalization by higher-order singular value decomposition, Phys. Rev. B 86, 045139 (2012).
Adachi et al. [2020] D. Adachi, T. Okubo, and S. Todo, Anisotropic tensor renormalization group, Phys. Rev. B 102, 054432 (2020).
Pan et al. [2020] F. Pan, P. Zhou, S. Li, and P. Zhang, Contracting Arbitrary Tensor Networks: General Approximate Algorithm and Applications in Graphical Models and Quantum Circuit Simulations, Phys. Rev. Lett. 125, 060503 (2020).
Gu et al. [2010] Z.-C. Gu, F. Verstraete, and X.-G. Wen, Grassmann tensor network states and its renormalization for strongly correlated fermionic and bosonic states (2010), arXiv:1004.2563 [cond-mat] .
Gu [2013] Z.-C. Gu, Efficient simulation of Grassmann tensor product states, Phys. Rev. B 88, 115139 (2013).
Akiyama and Kadoh [2021] S. Akiyama and D. Kadoh, More about the Grassmann tensor renormalization group, J. High Energ. Phys. 2021 (10), 188, arXiv:2005.07570 [hep-lat] .
Akiyama et al. [2024] S. Akiyama, Y. Meurice, and R. Sakai, Tensor renormalization group for fermions, J. Phys.: Condens. Matter 36, 343002 (2024).
Orús and Vidal [2009] R. Orús and G. Vidal, Simulation of two-dimensional quantum systems on an infinite lattice revisited: Corner transfer matrix for tensor contraction, Phys. Rev. B 80, 094403 (2009).
Xie et al. [2009] Z. Y. Xie, H. C. Jiang, Q. N. Chen, Z. Y. Weng, and T. Xiang, Second Renormalization of Tensor-Network States, Phys. Rev. Lett. 103, 160601 (2009).
Adachi et al. [2022] D. Adachi, T. Okubo, and S. Todo, Bond-weighted tensor renormalization group, Phys. Rev. B 105, L060402 (2022).
Evenbly [2017] G. Evenbly, Algorithms for tensor network renormalization, Phys. Rev. B 95, 045117 (2017).
Yang et al. [2017] S. Yang, Z.-C. Gu, and X.-G. Wen, Loop Optimization for Tensor Network Renormalization, Phys. Rev. Lett. 118, 110504 (2017).
Hauru et al. [2018] M. Hauru, C. Delcamp, and S. Mizera, Renormalization of tensor networks using graph-independent local truncations, Phys. Rev. B 97, 045111 (2018).
Note [1] Here, we consider the full-rank case, that is, we assume that the rank of $A^{*}B$ is the same as the bond dimension $r$ between $A^{*}$ and $B$ . See Appendix B for rank-deficient cases.
Vidal [2004] G. Vidal, Efficient Simulation of One-Dimensional Quantum Many-Body Systems, Phys. Rev. Lett. 93, 040502 (2004).
Verstraete et al. [2004] F. Verstraete, J. J. García-Ripoll, and J. I. Cirac, Matrix Product Density Operators: Simulation of Finite-Temperature and Dissipative Systems, Phys. Rev. Lett. 93, 207204 (2004).
Ferris [2015] A. J. Ferris, Unbiased Monte Carlo for the age of tensor networks (2015), arXiv:1507.00767 [cond-mat, physics:physics] .
Note [2] Although the inserted tensor is not exactly a projector due to the scale factors, it is referred to as a “projector” in the present paper for simplicity.
Doucet et al. [2001] A. Doucet, N. Freitas, and N. Gordon, eds., Sequential Monte Carlo Methods in Practice (Springer, New York, NY, 2001).
Arulampalam et al. [2004] S. Arulampalam, N. Gordon, and B. Ristic, Beyond the Kalman Filter: Particle Filters for Tracking Applications (Artech House, Boston, 2004).
Metropolis et al. [1953] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, Equation of State Calculations by Fast Computing Machines, J. Chem. Phys. 21, 1087 (1953).
Hastings [1970] W. K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika 57, 97 (1970).
Corboz [2016] P. Corboz, Variational optimization with infinite projected entangled-pair states, Phys. Rev. B 94, 035133 (2016).
Morita and Kawashima [2019] S. Morita and N. Kawashima, Calculation of higher-order moments by higher-order tensor renormalization group, Computer Physics Communications 236, 65 (2019).
Morita and Kawashima [2024] S. Morita and N. Kawashima, Multi-impurity method for the bond-weighted tensor renormalization group (2024), arXiv:2411.13998 .
Kramers and Wannier [1941] H. A. Kramers and G. H. Wannier, Statistics of the Two-Dimensional Ferromagnet. Part I, Phys. Rev. 60, 252 (1941).
Yang and Lee [1952] C. N. Yang and T. D. Lee, Statistical Theory of Equations of State and Phase Transitions. I. Theory of Condensation, Phys. Rev. 87, 404 (1952).
Gordon et al. [1993] N. Gordon, D. Salmond, and A. Smith, Novel approach to nonlinear/non-Gaussian Bayesian state estimation, IEE Proceedings F (Radar and Signal Processing) 140, 107 (1993).
Arai et al. [2023] E. Arai, H. Ohki, S. Takeda, and M. Tomii, All-mode renormalization for tensor network with stochastic noise, Phys. Rev. D 107, 114515 (2023).
Halko et al. [2011] N. Halko, P. G. Martinsson, and J. A. Tropp, Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, SIAM Rev. 53, 217 (2011).
Morita et al. [2018] S. Morita, R. Igarashi, H.-H. Zhao, and N. Kawashima, Tensor renormalization group with randomized singular value decomposition, Phys. Rev. E 97, 033310 (2018).
Denardo [2003] E. V. Denardo, Dynamic Programming: Models and Applications (Dover Publications, Mineola, NY, 2003).