Free Energy Evaluation Using Marginalized Annealed Importance Sampling
Abstract
The evaluation of the free energy of a stochastic model is considered a significant issue in various fields of physics and machine learning. However, the exact free energy evaluation is computationally infeasible because the free energy expression includes an intractable partition function. Annealed importance sampling (AIS) is a type of importance sampling based on the Markov chain Monte Carlo method that is similar to a simulated annealing and can effectively approximate the free energy. This study proposes an AIS-based approach, which is referred to as marginalized AIS (mAIS). The statistical efficiency of mAIS is investigated in detail based on theoretical and numerical perspectives. Based on the investigation, it is proved that mAIS is more effective than AIS under a certain condition.
pacs:
Valid PACS appear hereI Introduction
The evaluation of the free energy (i.e., the negative log of the partition function) of a stochastic model is considered to be an important issue in various fields of physics. Moreover, the free energy plays an important role in machine learning models, e.g., Boltzmann machine Ackley et al. (1985). Boltzmann machine and its variants, e.g., restricted Boltzmann machine (RBM) Smolensky (1986); Hinton (2002) and deep Boltzmann machine (DBM) Salakhutdinov and Hinton (2009), have been actively investigated in the fields of machine learning and physics Roudi et al. (2009); Decelle and Furtlehner (2021); Chen et al. (2018); Nomura and Imada (2021); Torlai et al. (2018); Carleo and Troyer (2017); Gao and Duan (2017). However, an exact free energy evaluation is computationally infeasible, because the expression for the free energy includes an intractable partition function.
Annealed importance sampling (AIS) Neal (2001) is a type of importance sampling based on the Markov chain Monte Carlo (MCMC) method that is similar to simulated annealing, and can effectively approximate the free energy Neal (2001); Salakhutdinov and Murray (2008). In AIS, from a tractable initial distribution to the target distribution, a sequential sampling (or ancestral sampling) is executed, in which the transitions between the distributions are performed using, for example, Gibbs sampling Geman and Geman (1984). The AIS-based free energy evaluation is essentially the same as the method proposed by Jarzynski (the so-called Jarzynski equality) Jarzynski (1997). Several researchers have addressed the development of AIS Burda et al. (2015); Liu et al. (2015); Carlson et al. (2016); Mazzanti and Romero (2020); Krause et al. (2020); Yasuda and Sekimoto (2021). Recently, the relationship between AIS and the normalizing flow proposed in the deep learning field is investigated Caselle et al. (2022).
In AIS, the free energy is estimated as follows: first, we obtain the estimator of the partition function based on the sample average of the importance weights, which are obtained during the sequential sampling process, and subsequently, the free energy estimator is obtained as the negative log of the obtained partition function estimator. It is known that the partition function estimator is unbiased, whereas the free energy estimator is biased Burda et al. (2015). To evaluate the free energy, we propose an AIS-based approach that we refer to as marginalized AIS (mAIS). The concept of mAIS is simple; mAIS corresponds to AIS in a marginalized model; i.e., mAIS can be regarded as a special case of AIS. Therefore, the basic principle of mAIS is similar to AIS. Suppose that our target model represents a distribution in dimensional space. With regard to AIS, we perform a sampling procedure in the dimensional space to evaluate the free energy. However, in mAIS, the dimensional space, where the sampling procedure is performed, is smaller because the dimension of the model is reduced through marginalization. Intuitively, mAIS seems to be more effective than AIS considering the aforementioned statement. This intuition is valid under a certain condition. Under the condition, the following two statements can be proved (see section III.2): (1) the partition function estimator obtained from mAIS is more accurate than that obtained from AIS, and (2) the bias of the free energy estimator obtained from mAIS is lower than that of the estimator obtained from AIS. However, as discussed in section III.2, the condition assumed in the aforementioned statements limits the range in which the effectiveness of mAIS is assured. As discussed in section III.3, this condition can be satisfied with regard to the use of AIS and mAIS on Markov random fields (MRFs) defined on bipartite graphs. Moreover, a layer-wised marginal operation can be performed in bipartite-type MRFs. Bipartite-type MRFs include important applications, e.g., RBM and DBM.
The remainder of this paper is organized as follows: Section II explains the AIS-based free energy evaluation. Section III introduces mAIS Section III.1 details mAIS. Section III.2 describes the theoretical analysis of mAIS, and section III.3 discusses the application of mAIS to bipartite-type MRFs. Section IV numerically demonstrates the validity of mAIS. Section V summarizes the study, along with discussions.
II Free Energy Evaluation using Annealed Importance Sampling
Consider a distribution with random variables, , as
(1) |
where is the continuous or discrete sample space of and is the energy function (or the Hamiltonian); is the partition function and is defined as
(2) |
where denotes the multiple summation over all realizations of . When exhibits a continuous space, is replaced by the integration over .
This study evaluates free energy . The evaluation of the free energy is infeasible because it requires the evaluation of the intractable partition function. AIS is a type of importance sampling based on the MCMC method that is similar to simulated annealing, and can evaluate the free energy Neal (2001); Salakhutdinov and Murray (2008). This free energy evaluation method is essentially the same as the method proposed by Jarzynski Jarzynski (1997). The AIS-based free energy evaluation is briefly explained in the following.
First, we design a sequence of distributions as
(3) |
where , and each is expressed as
(4) |
where is the partition function of the th distribution. Because , and are assumed. In this sequence, the initial distribution is set to a tractable distribution (e.g., a uniform distribution). For example, the sequence expressed as
(5) |
for , is frequently used, where corresponds to the annealing temperature. The th distribution expressed by equation (5) corresponds to equation (4) with . However, we pursue the following arguments without specifying the form of .
On the th distribution, a transition probability , which satisfies the condition
(6) |
is defined. Using the transition probability, the annealing process from the initial state to the final state is defined as
(7) |
where . denotes the joint distribution over . For , the corresponding “reverse” process,
(8) |
is defined, where
(9) |
is the reverse transition. expresses the transition process from to ; while, its reverse process expresses the reverse transition from to . Because the normalization condition, , is ensured from the condition in equation (6), can be considered as the joint distribution over ; it satisfies
AIS is regarded as the importance sampling, in which and are considered the target and corresponding proposal distributions, respectively. Using equations (7), (8), and (9), the ratio between the target and proposal distributions, i.e., the (normalized) importance weight, is obtained as
(10) |
where
(11) | ||||
(12) |
Here, is the partition function of (i.e., a tractable distribution) and is the true partition function of the objective distribution. From the normalization condition of , the relation,
is obtained. Substituting equation (10) into this relation yields
(13) |
where denotes the multiple summation over all realizations of .
Sampling from the proposal distribution can be performed using ancestral sampling, from to , on , i.e., the sequence of the sample points, , is generated by the following sampling process:
(14) |
Using independent sample sequences obtained when this ancestral sampling process conducted times, equation (13) is approximated as
(15) |
Using equation (15), the free energy is approximated as
(16) |
is an unbiased estimator for the true partition function because its expectation converges to : , where denotes the expectation of the distribution over , i.e.,
(17) |
However, is not an unbiased estimator for the true free energy Burda et al. (2015). Using Jensen’s inequality,
(18) |
is obtained, which implies that the expectation of provides an upper bound of the true free energy.
III Proposed Method
Suppose that is divided into two mutually disjoint sets: and , i.e., and ; therefore, can be considered the joint distribution over and : . The method discussed in Section II is regarded as AIS based on this joint distribution. In contrast, mAIS proposed in this section is regarded as AIS based on a marginal distribution of .
III.1 Marginalized annealed importance sampling
For the sequence of the distributions in equation (3), we introduce the sequence of their marginal distributions as
(19) |
where is the marginal distribution of , i.e.,
(20) |
where
(21) |
is the energy function of the marginal distribution. From the definition, the partition functions of and are the same.
Based on the sequence of the marginal distributions in equation (19), in a similar manner to equation (7), the annealing process from the initial state to the final state can be defined as
(22) |
where and is the transition probability on the marginal distribution in equation (20). The transition probability satisfies the following condition:
(23) |
Using almost the same derivation to obtain equation (13), we derive
(24) |
where is the importance weight of mAIS defined as
(25) |
where
(26) |
In mAIS, the free energy can be evaluated using a technique similar to the derivation of equation (16). The sequence of the sample points, , is generated based on the ancestral sampling on :
(27) |
By conducting this ancestral sampling process times, independent sample sequences are obtained, and subsequently, using the sample sequences, equation (24) is approximated as
(28) |
Therefore, the free energy is evaluated as
(29) |
Similar to , acts as an unbiased estimator for the true partition function because , where denotes the expectation of the distribution over , i.e.,
(30) |
Similar to equation (18), based on Jensen’s inequality,
(31) |
can be obtained; therefore, is not an unbiased estimator for the true free energy as is the case with .
In the aforementioned discussions, we have considered mAIS based on the sequence of (we refer to this mAIS as “mAIS-V”). An opposite mAIS (referred to as “mAIS-H”), which is based on the sequence of
(32) |
where
(33) |
can be constructed in the same manner. However, we do not need to consider mAIS-H separately because the difference between the methods only lies in the way the variables are labelled, i.e., mAIS-V is identified with mAIS-H by exchanging the labels of variable sets. Therefore, we consider mAIS-V as mAIS in this paper.
Theoretically, mAIS can be applied to any model. However, whether it is practical or not strongly depend on the structures of because mAIS requires the marginal operation, as expressed in equation (20). mAIS can be efficiently applied to a bipartite-type MRF (i.e., an MRF defined on a bipartite undirected graph), which will be discussed in Section III.3.
Suppose that we have a model that mAIS can be efficiently applied to. Therefore, the free energy of the model can be evaluated based on the two methods, i.e., and . Intuitively, mAIS seems to be better because the entire sample space of the variables is reduced through the marginalization, which is similar to the concept of Rao–Blackwellization Liu (2001).
Here, we briefly explain Rao–Blackwellization. Suppose that we wish to evaluate the expectation of over a distribution : , where . Based on a simple sampling approximation (or Monte Carlo integration), it is approximated by
(34) |
where is the sample set generated from , and is the corresponding subset of . Using the decomposition , can be transformed as
(35) |
where is the conditional expectation expressed as
Therefore, based on equation (35), is approximated by
(36) |
where is the corresponding subset of . The Rao–Blackwell theorem guarantees that the estimator in equation (36) is more effective than the estimator in equation (34) from the perspective of the variance. The transformation of equation (35) is called Rao–Blackwellization. The applications of Rao–Blackwellization are widely investigated; for example, its application to MRFs, spatial Monte Carlo integration, has been developed Yasuda (2015); Yasuda and Uchizawa (2021); Yasuda and Sekimoto (2021).
III.2 Statistical efficiency of mAIS
First, we compare the statistical efficiencies of the two estimators for the partition function, i.e., and . The variance of is
(37) |
where . The variance of is
(38) |
where . As discussed in Sections II and III.1, both estimators are unbiased; therefore, the estimator with smaller variance is more effective.

Assume that the transition probability of AIS is expressed as
(39) |
where is the transition probability of mAIS satisfying equation (23), and
(40) |
is the conditional distribution on . According to equation (39), the state transition from to is performed as follows: first, the state of is updated to according to ; subsequently, the state of is updated to according to (see figure 1). The use of the transition probability of equation (39) is accepted in AIS because it satisfies the condition of equation (6) as follows:
Based on this assumption, the following theorem can be obtained.
Theorem 1
Based on this theorem, it is ensured that is statistically more efficient. The proof of this theorem is described in Appendix A.1. As mentioned in the final part of Appendix A.1, the claim of this theorem is essentially the same as that of the Rao–Blackwell theorem.
Next, the statistical efficiencies of the two estimators for the free energy, i.e., and , are compared. As expressed in equations (18) and (31), and are upper bounds of the true free energy. Based on the assumption of equation (39), the following theorem can be obtained.
Theorem 2
Based on this theorem, although and are biased, the bias of is ensured to be smaller. The proof of this theorem is presented in Appendix A.2. The aforementioned two theorems do not depend on the details of the design of and the values of and ; only the condition in equation (39) is required.
We have proved that mAIS is statistically more efficient than AIS with regard to the partition function and free energy when the condition of equation (39) is satisfied. The remaining issue is determining whether this condition is satisfied in practical situations. This condition significantly limits the modeling of ; e.g., the standard (synchronous or asynchronous) Gibbs sampling on does not satisfy the condition because it uses the states of all variables to obtain the subsequent states. As explained in Section III.3, the state transition according to equation (39) can be natural and practical when is a bipartite-type MRF.
III.3 Application to bipartite-type MRFs
The proposed mAIS is applicable when the marginal operation in equation (20) is feasible. Owing to this restriction, the range of application of mAIS is limited. As discussed below, mAIS is practical on bipartite-type MRFs, which include important applications such as RBM and DBM (a DBM can be observed as a bipartite-type MRF Gao and Duan (2017)). Moreover, a square-grid-type MRF is regarded as a bipartite-type MRF (see Fig. 2).

Suppose that is a bipartite-type MRF and that and are the variables of each layers. In this case, the energy function can be expressed as
(41) |
where and are one- and two-variable functions determined based on the design of ; the marginal operation in equation (20) is feasible and leads to
(42) |
On the bipartite-type MRF, the layer-wise blocked Gibbs sampling based on the conditional distributions and can be easily implemented, where is the conditional distribution presented in equation (40) and
(43) |
because s are conditionally independent from each other in and s are also conditionally independent from each other in , i.e.,
Based on the layer-wise blocked Gibbs sampling, the transition probability of mAIS can be modeled as
(44) |
which satisfies the condition in equation (23) as follows:
This transition probability can be considered as a collapsed Gibbs sampling Liu (1994). From equations (39) and (44), when the transition probability of AIS is modeled by
(45) |
the theoretical results mentioned in Section III.2 (Theorems 1 and 2) are applicable. Approximating the expectations in equations (44) and (45) using the sampling approximation with one sample point leads to the widely used sampling procedure, i.e., the blocked Gibbs sampling, on bipartite-type MRFs (see figure 3).

IV Numerical Experiments
The performance of mAIS (i.e., mAIS-V) is examined using numerical experiments on an RBM whose energy function is defined as
(46) |
where and is the temperature of the RBM. The temperature controls the complexity of the distribution: a lower expresses a higher-clustered multimodal distribution. On the RBM, the distribution sequence, , is designed according to equation (5), where and the initial distribution is fixed to a uniform distribution; therefore, in this case, and in equations (41) and (42) are
and
respectively.
IV.1 Quantitative investigation of theoretical results
The theoretical results obtained in section III.2 revealed the qualitatively effectiveness of mAIS. We investigate the quantitatively effectiveness of mAIS using numerical experiments. In the experiments, s and s were independently drawn from a uniform distribution in , and s are independently drawn from a normal distribution whose mean is zero and variance is ; the sizes of the sample set and were fixed to and . The blocked Gibbs sampling shown in figure 3 was used for the state transitions of AIS and mAIS.

Figure 4 shows the absolute percentage error (APE) between the true free energy, , and its approximation, , obtained using AIS or mAIS, against the inverse temperature , where the APE is defined as
(47) |
In this experiment, the size of is fixed to . We observe that the APEs obtained using mAIS are always lower than those obtained using AIS, as supported by Theorem 1. mAIS is particularly effective in the high-temperature region.
AIS | mAIS | ||||||
true | 10 | 30 | 60 | 10 | 30 | 60 | |
0.2 | |||||||
0.4 | |||||||
0.8 | |||||||
1 | |||||||
2 | |||||||
4 | |||||||
8 |
Next, we compare the true free energy with the trial averages of its approximations, , where was estimated based on the average of 30 trials. The trial average can be regarded as the approximation of in AIS or in mAIS. The results are listed in table 1. In this experiment, the size of was fixed to . The estimates of AIS and mAIS are higher than the true free energy and the estimates of mAIS are lower than those of AIS, as supported by Theorem 2.
IV.2 Dependency on the size ratio of two layers
We investigate the relative accuracy of mAIS compared with AIS for fraction . The experiments in section IV.1 correspond to the cases of . Intuitively, mAIS is more efficient for a larger because, as increases, the fraction of the size of remaining variables through marginalization to reduces (in other words, the dimension of space evaluated by the sampling approximation relatively shrinks). However, the theoretical results obtained in section III.2 do not directly support this assumption. Thus, we check the relative accuracy for several values using numerical experiments on the RBM. In the experiments, the relative accuracy is measured by the ratio of the APEs obtained using AIS and mAIS, i.e.,
The relative accuracy increases as increases. In the experiment, the sizes of the sample set and were fixed to and ; the parameter setting of the RBM was the same as that in section IV.1, and the blocked Gibbs sampling was used for the state transitions of AIS and mAIS. The distributions of for , , and are shown in figures 5–7, respectively. Each distribution in these figures was created based on the kernel density estimation Bishop (2006) using the results obtained from 4000 experiments, in which the Gaussian kernel with a bandwidth (or smoothing parameter) of 0.25 was used. In all cases, the distributions transit to the positive direction with an increase in , which implies that the relative accuracy monotonically improves as increases. This result implies that we should use mAIS-V when and mAIS-H when .



IV.3 Convergence property for

The experiments in sections IV.1 and IV.2 were conducted for relatively small values to emphasize the performance difference between AIS and mAIS under less-than-ideal conditions. However, in practice, is set to a sufficient large value to obtain precise estimations. To obtain precise estimations, the value of should be set to a value larger than the mixing time (or relaxation time). The mixing time tends to increase as the complexity of distribution increases. In this section, the convergence property of APE in equation (47) on a wider range of is demonstrated using numerical experiments.
The detailed discussions for the mixing time in RBM have been provided from both inference and learning perspectives Roussel et al. (2021); Decelle et al. (2021). Roussel et al. proved that the standard blocked Gibbs sampling shown in figure 3 is not efficient in RBMs having high clustered distributions from the perspective of the mixing time, and they proposed a more efficient sampling method by combining the blocked Gibbs sampling and Metropolis–Hastings (MH) method illustrated in figure 8 Roussel et al. (2021). Roussel’s sampling method can be employed as the transition in our framework. For the experiments, we used two different RBMs: the RBM with Gaussian interactions which is the same model used in sections IV.1 and IV.2, and the RBM with Hopfield-type interactions in which , where was determined uniformly at random. The Hopfield-type interactions can exhibit a clustered distribution Roussel et al. (2021). In both RBMs, the bias parameters, s and s, were independently drawn from a uniform distribution in , and and were fixed. The size of the sample set was fixed to .


Figures 9 and 10 show the behavior of the APE for the increase in . In the figures, “AIS” and “mAIS” indicate the results obtained using the standard blocked Gibbs sampling, while “AIS+MH” and “mAIS+MH” indicate the results obtained using Roussel’s sampling method. mAIS exhibits faster convergence than AIS in all experiments, and Roussel’s sampling method improves the convergence properties of both AIS and mAIS, although the improvement is very small in figure 10(b). mAIS+MH achieved the best performance the best in all experiments. The convergence speed of mAIS seems to be higher than that of AIS+MH; this implies that the improvement by mAIS is more effective than that by Roussel’s sampling method. Finally, we comment on the computational efficiency of the four methods: AIS, AIS+MH, mAIS, and mAIS+MH. The computational cost of the four methods have the same order; they are . However, mAIS is remarkably faster than AIS+MH with regard to the CPU time (around 10 times faster in our implementation). The MH procedure is the bottleneck in the Roussel’s sampling method. Thus, mAIS+MH is as slow as AIS+MH.
V Summary and Discussion
In this study, we have proposed mAIS, which is identified as AIS on a marginalized model. The proposed method is regarded as a special case of AIS, which can be used when a partial marginalization can be exactly evaluated, and it is not an improvement method of AIS. This study therefore contributes to several related studies for AIS such as those listed in section I. Two important theorems for mAIS (i.e., Theorems 1 and 2) have been obtained: when the transition probabilities of AIS and mAIS satisfy the condition of equation (39), (a) the partition-function estimator of mAIS is more accurate than that of AIS and (b) the bias of the free energy estimator of mAIS is lower than that of AIS. These results theoretically support the qualitative effectiveness of mAIS. Furthermore, the numerical results demonstrated in section IV empirically support the quantitative effectiveness of mAIS.
mAIS can be applied to a model in which a partial marginalization can be exactly evaluated. However, the effectiveness of the resultant mAIS in comparison with AIS cannot be theoretically guaranteed when the transition probabilities of AIS and mAIS do not satisfy the condition of equation (39). This condition significantly limits the applicability of the theory obtained in this study. As mentioned in section III.2, the condition of equation (39) is sufficient for Theorems 1 and 2, and has been not identified as a necessary condition. If it is not necessary, then it may be relaxed. A relaxation of the condition needs to be investigated in our future studies.
Appendix A Proofs
This appendix demonstrates the proofs of the two theorems presented in Section III.2.
A.1 Proof of Theorem 1
The relationship between the annealing processes in equations (7) and (22) is considered. Based on the assumption of equation (39), the marginal distribution of can be expressed as
Therefore,
(48) |
is obtained. Moreover, the assumption of equation (39) leads to
(49) |
From equations (48) and (49), the annealing process of AIS can be factorizable as
(50) |
where is the conditional distribution over . Using equation (50), the difference in the variances can be expressed as
(51) |
is obtained, which leads to
(52) |
Substituting equation (52) into (51) yields
Therefore, is ensured.
The above result can be understand based on Rao–Blackwellization. is the simple sampling approximation of equation (13). From equation (50), equation (13) can be rewritten as
From this equation and equation (52), equation (24) can be seen as the expectation of the conditional expectation of . mAIS, therefore, is identified with Rao–Blackwellization of AIS; and therefore, is ensured by the Rao–Blackwell theorem.
A.2 Proof of Theorem 2
As shown in equations (18) and (31), and are upper bounds of the true free energy , i.e., and . Using equation (50), can be rewritten as
(53) |
Using Jensen’s inequality, the following inequality is obtained:
(54) |
Equation (52) is used in the derivation of the last line of equation (54). From equations (53) and (54),
is ensured; here, the final inequality is obtained from equation (31).
Acknowledgment
This work was partially supported by JSPS KAKENHI (grant numbers 18K11459, 18H03303, 20K23342, 21K11778, and 21K17804).
References
- Ackley et al. (1985) D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, Cognitive Science 9, 147 (1985).
- Smolensky (1986) P. Smolensky, Parallel distributed processing: Explorations in the microstructure of cognition 1, 194 (1986).
- Hinton (2002) G. E. Hinton, Neural Computation 14, 1771 (2002).
- Salakhutdinov and Hinton (2009) R. Salakhutdinov and G. E. Hinton, In Proc. of the 12th International Conference on Artificial Intelligence and Statistics , 448 (2009).
- Roudi et al. (2009) Y. Roudi, E. Aurell, and J. Hertz, Frontiers in Computational Neuroscience 3, 1 (2009).
- Decelle and Furtlehner (2021) A. Decelle and C. Furtlehner, Chinese Physics B 30, 040202 (2021).
- Chen et al. (2018) J. Chen, S. Cheng, H. Xie, L. Wang, and T. Xiang, Physical Review B 97, 085104 (2018).
- Nomura and Imada (2021) Y. Nomura and M. Imada, Physical Review X 11, 031034 (2021).
- Torlai et al. (2018) G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko, and G. Carleo, Nature Physics 14, 447 (2018).
- Carleo and Troyer (2017) G. Carleo and M. Troyer, Science 355, 602 (2017).
- Gao and Duan (2017) X. Gao and L. Duan, Nature Communications 8, 1 (2017).
- Neal (2001) R. M. Neal, Statistics and Computing 11, 125 (2001).
- Salakhutdinov and Murray (2008) R. Salakhutdinov and I. Murray, In Proc. of the 25th International Conference on Machine Learning 25, 872 (2008).
- Geman and Geman (1984) S. Geman and D. Geman, IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721 (1984).
- Jarzynski (1997) C. Jarzynski, Phys. Rev. E 56, 5018 (1997).
- Burda et al. (2015) Y. Burda, R. B. Grosse, and R. Salakhutdinov, In Proc. of the 18th International Conference on Artificial Intelligence and Statistics , 102 (2015).
- Liu et al. (2015) Q. Liu, A. Ihler, J. Peng, and J. Fisher, In Proc. of the 31st Conference on Uncertainty in Artificial Intelligence , 514 (2015).
- Carlson et al. (2016) D. E. Carlson, P. Stinson, A. Pakman, and L. Paninski, In proc. of the 33rd International Conference on Machine Learning 6, 4248 (2016).
- Mazzanti and Romero (2020) F. Mazzanti and E. Romero, arXiv:2007.11926 (2020).
- Krause et al. (2020) O. Krause, A. Fischer, and C. Igel, Artificial Intelligence 278, 103195 (2020).
- Yasuda and Sekimoto (2021) M. Yasuda and K. Sekimoto, Physical Review E 103, 052118 (2021).
- Caselle et al. (2022) M. Caselle, E. Cellini, A. Nada, and M. Panero, Journal of High Energy Physics 2022, 015 (2022).
- Liu (2001) J. S. Liu, Monte Carlo strategies in scientific computing (Springer, 2001).
- Yasuda (2015) M. Yasuda, Journal of the Physical Society of Japan 84, 034001 (2015).
- Yasuda and Uchizawa (2021) M. Yasuda and K. Uchizawa, Neural Computation 33, 1037 (2021).
- Liu (1994) J. S. Liu, Journal of the American Statistical Association 89, 958 (1994).
- Bishop (2006) C. M. Bishop, Pattern Recognition and Machine Learning (Springer, 2006).
- Roussel et al. (2021) C. Roussel, S. Cocco, and R. Monasson, Phys. Rev. E 104, 034109 (2021).
- Decelle et al. (2021) A. Decelle, C. Furtlehner, and B. Seoane, In Proc. of the 35th Conference on Neural Information Processing Systems (2021).