This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Counting Phases and Faces Using
Bayesian Thermodynamic Integration

Alexander Lobashev1,∗, Mikhail V. Tamm2
1Skolkovo Institute of Science and Technology, Skolkovo, Russia
2ERA Chair for Cultural Data Analytics, School of Digital Technologies, Tallinn University, Tallinn, Estonia
Abstract

We introduce a new approach to reconstruction of the thermodynamic functions and phase boundaries in two-parametric statistical mechanics systems. Our method is based on expressing the Fisher metric in terms of the posterior distributions over a space of external parameters and approximating the metric field by a Hessian of a convex function. We use the proposed approach to accurately reconstruct the partition functions and phase diagrams of the Ising model and the exactly solvable non-equilibrium TASEP without any a priori knowledge about microscopic rules of the models. We also demonstrate how our approach can be used to visualize the latent space of StyleGAN models and evaluate the variability of the generated images.


e-mail: [email protected]

While machine learning methods are effective at analyzing statistical physics systems their application has mainly been limited to determining the boundaries of phase transitions and extracting the learned order parameters. However, in many cases it remains unclear how to relate these learned order parameters to known physical quantities that are commonly used to determine phase transitions.

We address here a general problem of reconstructing both phase boundaries and thermodynamic functions of a given statistical physics model. A model in this context is thought to be a stochastic mapping from a low-dimensional space of external parameters t to a multi-dimensional space of microstates ss. We don’t assume any prior knowledge of the number of phases and their location in the space of external parameters. The only label which is available to us is the values of external parameters for which a given microstate was generated.

Our contributions. The contributions of this paper are as follows:

  • We propose a new approach which we call Bayesian Thermodynamic Integration to approximate the Fisher information metric for the distributions over a high-dimensional microstate space depending on several external parameters. The approach is based on approximating the probability distribution with a function from the exponential family

    (s|t)=ef(s)TtZ(t)\mathbb{P}(s|\textbf{t})=\frac{e^{f(s)^{T}\textbf{t}}}{Z(\textbf{t})}

    where the normalization factor (partition function in the terminology of statistical physics) is a convex function of parameters. For the statistical physics problems this approach gives access to the partition function and other thermodynamic functions without a priori knowledge of the system Hamiltonian;

  • We apply the suggested approach to several two-parametric systems and demonstrate that it allows a satisfactory reproduction of the thermodynamic functions and phase transition lines. In the case of a two-parameter model of statistical mechanics based on the StyleGAN v3 generator, we observe signs of a second-order phase transition corresponding to the sharp changes of identity of a generated human face.

I Bayesian Thermodynamic Integration

The main problem of interest to statistical physics is the study of phase transitions. They correspond to sharp changes in typical microstates with a gradual change in external parameters. We use Jensen-Shannon divergence to measure the distance between probability distributions corresponding to different values of external parameters

JSD((s|t+δt),(s|t))=[12((s|t+δt)+(s|t))]12([(s|t+δt)]+[(s|t)]),\begin{split}\text{JSD}(\mathbb{P}(s|\textbf{t}+\delta\textbf{t}),\mathbb{P}(s|\textbf{t}))=\mathcal{H}\left[\frac{1}{2}(\mathbb{P}(s|\textbf{t}+\delta\textbf{t})+\mathbb{P}(s|\textbf{t}))\right]-\\ -\frac{1}{2}\left(\mathcal{H}[\mathbb{P}(s|\textbf{t}+\delta\textbf{t})]+\mathcal{H}[\mathbb{P}(s|\textbf{t})]\right),\end{split} (1)

where JSD is Jensen-Shannon divergence which is a proper metric on the manifold of probability distributions parameterized by t and []\mathcal{H}[\mathbb{P}] represents entropy of the corresponding distribution. For small variations of t

JSD((s|t+δt),(s|t))=δtTG(t)δt+O(δt3)\text{JSD}(\mathbb{P}(s|\textbf{t}+\delta\textbf{t}),\mathbb{P}(s|\textbf{t}))=\delta\textbf{t}^{T}G(\textbf{t})\delta\textbf{t}+O(\delta\textbf{t}^{3}) (2)

where G(t) is a Riemannian metric on the manifold of probability distributions called Fisher information metric amari2000methods .

Fisher information metric is defined as the second order Taylor expansion term of KL divergence between close probability distributions

G(t)=(s|t)tlog(s|t)(tlog(s|t))T𝑑sG(\textbf{t})=\int\mathbb{P}(s|\textbf{t})\nabla_{\textbf{t}}\log\mathbb{P}(s|\textbf{t})(\nabla_{\textbf{t}}\log\mathbb{P}(s|\textbf{t}))^{T}ds (3)

At phase transition points the Fisher metric is supposed to have singularities in the limit of large system size (i.e., for dimensionality of the space of ss approaching infinity). Apply Bayes’ formula to (s|t)\mathbb{P}(s|\textbf{t}) to find posterior distribution on the space of external parameters given observed microstate

(t|s)=(s|t)(t)(s)\mathbb{P}(\textbf{t}|s)=\frac{\mathbb{P}(s|\textbf{t})\mathbb{P}(\textbf{t})}{\mathbb{P}(s)} (4)

Thus, if prior distribution (t)\mathbb{P}(\textbf{t}) is uniform then

tlog(t|s)=tlog(s|t)\nabla_{\textbf{t}}\log\mathbb{P}(\textbf{t}|s)=\nabla_{\textbf{t}}\log\mathbb{P}(s|\textbf{t}) (5)

This implies that, if one knows the posterior distribution (t|s)\mathbb{P}(\textbf{t}|s) (for example, via some sort of Monte-Carlo approximation), then the Fisher information metric can be expressed as

G(t)1Kk=1Ktlog(t|sk)(tlog(t|sk))T,sk(s|t)\begin{split}G(\textbf{t})\approx\frac{1}{K}\sum_{k=1}^{K}\nabla_{\textbf{t}}\log\mathbb{P}(\textbf{t}|s_{k})(\nabla_{\textbf{t}}\log\mathbb{P}(\textbf{t}|s_{k}))^{T},\\ s_{k}\sim\mathbb{P}(s|\textbf{t})\end{split} (6)

I.1 Approximation of the Posterior

Posterior distribution (t|s)\mathbb{P}(\textbf{t}|s) works as a sort of “probabilistic thermometer”: it looks at a microstate ss and gives a probabilistic forecast of external parameters (for example temperature) at which this microstate was generated.

By definition of the problem, it is possible to sample from (s|t)\mathbb{P}(s|\textbf{t}), giving pairs

(si,ti)(s|t)(t),(s_{i},\textbf{t}_{i})\sim\mathbb{P}(s|\textbf{t})\mathbb{P}(\textbf{t}),

sampled from joint distribution, where (t)\mathbb{P}(\textbf{t}) is chosen to be uniform in some compact domain. These samples can be used to fit parametric family of distributions θ(t|s)\mathbb{P}_{\theta}(\textbf{t}|s) to approximate posterior (t|s)\mathbb{P}(\textbf{t}|s). Here θ\theta are the parameters of the family of distribution, which could be obtained by maximizing likelihood of true external parameters ti\textbf{t}_{i} given sis_{i} over all our samples or equivalently by minimizing negative log-likelihood

NegativeLogLikelihood(θ)=i=1Nlog(θ(ti|si))\text{NegativeLogLikelihood}(\theta)=-\sum_{i=1}^{N}\log(\mathbb{P}_{\theta}(\textbf{t}_{i}|s_{i})) (7)
(8)

Minimization of the log-likelihood (7) is equivalent to minimization of the KL divergence between the distribution of target labels and the predicted distribution

(θ)=𝔼t(t)KL(target(t|t)||θ(t|s(t))),\mathcal{L}(\theta)=\mathbb{E}_{\textbf{t}^{\prime}\sim\mathbb{P}(\textbf{t})}\text{KL}(\mathbb{P}_{\text{target}}(\textbf{t}|\textbf{t}^{\prime})||\mathbb{P}_{\theta}(\textbf{t}|s(\textbf{t}^{\prime}))), (9)

where s(t)s(\textbf{t}^{\prime}) is a microstate of the physical model sampled from the conditional distribution (s|t)\mathbb{P}(s|\textbf{t}^{\prime}). If labels are considered hard, the target distribution is target(t|t)=δ(tt)\mathbb{P}_{\text{target}}(\textbf{t}|\textbf{t}^{\prime})=\delta(\textbf{t}-\textbf{t}^{\prime}). In order to avoid divergences, it is convenient to use a smoothened distribution

target(t|t)=Cexp(12σ2tt2),\mathbb{P}_{\text{target}}(\textbf{t}|\textbf{t}^{\prime})=C\cdot\exp\left(-\frac{1}{2\sigma^{2}}||\textbf{t}^{\prime}-\textbf{t}||^{2}\right), (10)

instead. Here CC is the normalizing constant, it is obtained by integrating over the domain of interest in the space of macroscopic parameters. The standard deviation σ\sigma is defined to be equal to the mean distance from a point tt in the training dataset to its K-th nearest neighbor.

The resulting parametric distribution can be used as input of formula (6) in order to get an approximation of the Fisher metric. However, this approach has two important drawbacks.

First, in order to obtain a decent approximation the parametric family θ\mathbb{P}_{\theta} should be flexible enough, implying a large number of parameters θ\theta. However, if dim(θ)>>N\dim(\theta)>>N, i.e. if the number of parameters is much larger than the number of examples in the training set, the problem of overfitting arises.

Second, the Fisher metric depends on the derivative of the approximated function, and derivatives of approximated functions are typically much noisier than the functions themselves. As a result, without additional efforts to smooth the posterior distribution, approximation (6) turns out to be too noisy to be useful in practice.

In order to resolve the first problem we artificially augment the training set. Instead of using a single microstate ss as an input for the prediction of external parameters t we use a randomly shuffled tuple of KK microstates s1,,sKs_{1},\dots,s_{K} generated at external parameters close to t. That is to say, start with a set of NN training pairs (si,ti)(s_{i},\textbf{t}_{i}), for each ti\textbf{t}_{i} choose KK values tj\textbf{t}_{j} with the smallest distance titj||\textbf{t}_{i}-\textbf{t}_{j}||, form a randomly shuffled tuple out of the corresponding microstates, and use the set ({si,1,,si,K},ti)(\{s_{i,1},\dots,s_{i,K}\},\textbf{t}_{i}) as a training set, so that instead of θ(ti|si)\mathbb{P}_{\theta}(\textbf{t}_{i}|s_{i}) we are now training the model to predict θ(ti|si,1,,si,K)\mathbb{P}_{\theta}(\textbf{t}_{i}|s_{i,1},\dots,s_{i,K}). This trick allows to increase the size of the training set by a factor K!K! without generating any new microstates. The drawback is that the resolution with respect to t decreases with growing KK. However, this decrease is much slower, then the factorial growth of the size of the training set.

The resulting target distribution target(t|t)\mathbb{P}_{\text{target}}(\textbf{t}|\textbf{t}^{\prime}) is obtained by averaging target distributions (10) over the KK nearest neighbours

target(t|t)=1Kt^Neib(t)Cexp(12σ2t^t2),\mathbb{P}_{\text{target}}(\textbf{t}|\textbf{t}^{\prime})=\frac{1}{K}\sum_{\hat{\textbf{t}}\in\text{Neib}(\textbf{t}^{\prime})}C\cdot\exp\left(-\frac{1}{2\sigma^{2}}||\hat{\textbf{t}}-\textbf{t}||^{2}\right), (11)

In order to resolve the second issue outlined above we approximate the posterior distribution with a distribution from the exponential family parametrized by convex function, as described in the next section.

I.2 Free Energy Approximation via Convex Neural Network

We know that Fisher metric is a positive-definite matrix δtTG(t)δt0δt\delta\textbf{t}^{T}G(\textbf{t})\delta\textbf{t}\geq 0\ \ \forall\ \delta\textbf{t}. The main idea of our approach is to approximate this positive-definite matrix by the Hessian of a convex function. Note that this means that if dim(t)=1\dim(\textbf{t})=1, i.e. if there is a single scalar external macroscopic parameter, then this approximation is exact.

More precisely, we approximate the posterior distribution as a function of external parameters by a distribution from the so-called exponential family of distributions. It is instructive to introduce this family in a following way. Suppose for each value of t we fix average values of some macroscopic observables (s|t)f(s)𝑑s=Γ(t)\int\mathbb{P}(s|\textbf{t})f(s)ds=\Gamma(\textbf{t}). We are interested in a probability distribution (s)\mathbb{P}(s), which maximizes entropy

[(s)]=(s)log(s)𝑑s\mathcal{H}[\mathbb{P}(s)]=-\int\mathbb{P}(s)\log\mathbb{P}(s)ds (12)

while respecting this constraint. Maximizing the functional

L[(s)]=[(s)]tT((s)f(s)𝑑sΓ(t))λ0((s)𝑑s1),\begin{split}L[\mathbb{P}(s)]=\mathcal{H}[\mathbb{P}(s)]-\textbf{t}^{T}\left(\int\mathbb{P}(s)f(s)ds-\Gamma(\textbf{t})\right)-\\ -\lambda_{0}\left(\int\mathbb{P}(s)ds-1\right),\end{split} (13)

where t, and λ0\lambda_{0} are Lagrange multipliers, gives the exponential family distributions of the form

(s|t)=ef(s)TtZ(t)=argmax(s)L[(s)]\mathbb{P}(s|\textbf{t})=\frac{e^{f(s)^{T}\textbf{t}}}{Z(\textbf{t})}=\underset{\mathbb{P}(s)}{\text{argmax}}L[\mathbb{P}(s)] (14)

where the normalization factor

Z(t)=ef(s)Tt𝑑sZ(\textbf{t})=\int e^{f(s)^{T}\textbf{t}}ds (15)

is known as “the partition function” in equilibrium statistical mechanics.

For distributions from exponential family Fisher metric reduces to Hessian of the logarithm of the partition function

G(t)=(s|t)tlog(s|t)(tlog(s|t))T𝑑s==ttlogZ(t).\begin{split}G(\textbf{t})=\int\mathbb{P}(s|\textbf{t})\nabla_{\textbf{t}}\log\mathbb{P}(s|\textbf{t})(\nabla_{\textbf{t}}\log\mathbb{P}(s|\textbf{t}))^{T}ds=\\ =\nabla_{\textbf{t}\textbf{t}}\log Z(\textbf{t}).\end{split} (16)

while its first derivatives are the mean values of the function f(s)f(s) and are known as “thermodynamic forces”

f(s)(s|t)𝑑s=tlogZ(t)\int f(s)\mathbb{P}(s|\textbf{t})ds=\nabla_{\textbf{t}}\log Z(\textbf{t}) (17)

Fisher metric tensor is known to be positive-definite, and thus the function logZ(t)\log Z(\textbf{t}) is convex.

Now, in order to solve the problem of approximating the derivatives of the posterior distribution (t|s)\mathbb{P}(\textbf{t}|s), we want to find a function Zw(t)Z_{w}(\textbf{t}), depending on some set of parameters ww which is, on the one hand, convex, and on the other hand, such that the Jensen-Shannon divergence between the approximate posterior θ(t|s)\mathbb{P}_{\theta}(\textbf{t}|s) and the posterior distribution generated from the exponential family with partition function Zw(t)Z_{w}(\textbf{t}) is minimized.

If our prior is uniform then exponential-family posterior distribution takes the form

w(eq)(t|s)=w(s|t)(t)w(s|t)(t)𝑑t=ef(s)TtlogZw(t)ef(s)TtlogZw(t)𝑑t\mathbb{P}_{w}^{(\text{eq})}(\textbf{t}|s)=\frac{\mathbb{P}_{w}(s|\textbf{t})\mathbb{P}(\textbf{t})}{\int\mathbb{P}_{w}(s|\textbf{t})\mathbb{P}(\textbf{t})d\textbf{t}}=\frac{e^{f(s)^{T}\textbf{t}-\log Z_{w}(\textbf{t})}}{\int e^{f(s)^{T}\textbf{t}-\log Z_{w}(\textbf{t})}d\textbf{t}} (18)

Function f(s)f(s) is unknown, but since we use a tuple of microstates to estimate a single posterior, we replace f(s)f(s) by its expectation (17),

1Kk=1Kf(sk)tlogZ(t),skP(s|t).\frac{1}{K}\sum_{k=1}^{K}f(s_{k})\approx\nabla_{\textbf{t}}\log Z(\textbf{t}),\ s_{k}\sim P(s|\textbf{t}).

To find parameters ww we minimize Jensen-Shannon divergence between posterior θ(t|s)\mathbb{P}_{\theta}(\textbf{t}|s) predicted using maximum likelihood and posterior w(t|s)\mathbb{P}_{w}(\textbf{t}|s) with f(s)f(s) replaced by the gradient of the partition function

Pw(eq)(t|s(t))=exp[(tlogZw(t))TtlogZw(t)]exp[(tlogZw(t))TtlogZw(t)]𝑑tP_{w}^{(\text{eq})}(\textbf{t}|s(\textbf{t}^{\prime}))=\frac{\exp\left[(\nabla_{\textbf{t}^{\prime}}\log Z_{w}(\textbf{t}^{\prime}))^{T}\textbf{t}-\log Z_{w}(\textbf{t})\right]}{\int\exp\left[(\nabla_{\textbf{t}^{\prime}}\log Z_{w}(\textbf{t}^{\prime}))^{T}\textbf{t}-\log Z_{w}(\textbf{t})\right]d\textbf{t}} (19)
L(w)=𝔼t(t)JSD(θ(t|s(t)),Pw(eq)(t|s(t)))L(w)=\mathbb{E}_{\textbf{t}^{\prime}\sim\mathbb{P}(\textbf{t})}\text{JSD}\left(\mathbb{P}_{\theta}(\textbf{t}|s(\textbf{t}^{\prime})),P_{w}^{(\text{eq})}(\textbf{t}|s(\textbf{t}^{\prime}))\right) (20)
woptimal=argmin𝑤L(w)w_{\text{optimal}}=\underset{w}{\text{argmin}}\ L(w) (21)

The resulting approximation for the Fisher metric is

G(t)=ttlogZwoptimal(t).G(\textbf{t})=\nabla_{\textbf{t}\textbf{t}}\log Z_{w_{\text{optimal}}}(\textbf{t}).

We summarize the procedure described in this section by two algorithms outlined below, the first one generates the training dataset (1), the second one finds the approximations for the partition function and the Fisher metric (2).

Algorithm 1 Dataset Generation for Bayesian Thermodynamic Integration
  Input:    (s|t)\mathbb{P}(s|\textbf{t}) - likelihood function represented by an algorithm to sample data,     (t)\mathbb{P}(\textbf{t}) - uniform prior distribution on the space of external parameters,     NdatasetN_{\text{dataset}} - dataset length,     σ2\sigma^{2} - variance of the target gaussian distribution,     𝒟={t^1,,t^N𝒟}\mathcal{D}=\{\hat{\textbf{t}}_{1},\dots,\hat{\textbf{t}}_{N_{\mathcal{D}}}\} - a discretization of the support of the uniform prior distribution (t)\mathbb{P}(\textbf{t}).     Lsam\text{L}_{\text{sam}} = [] - list of input samples,     Lpar\text{L}_{\text{par}} = [] - list of external parameters,     Ltar\text{L}_{\text{tar}} = [] - list of target posterior distributions
  for i=1i=1 to NdatasetN_{\text{dataset}}  do
     sample ti(t)\textbf{t}_{i}\sim\mathbb{P}(\textbf{t}) and append ti\textbf{t}_{i} to Lpar\text{L}_{\text{par}}
     sample si(s|ti)s_{i}\sim\mathbb{P}(s|\textbf{t}_{i}) and append sis_{i} to Lsam\text{L}_{\text{sam}} ptar(i)p_{\text{tar}}^{(i)} = [] - discretized target probability distributions
     for j=1j=1 to N𝒟N_{\mathcal{D}}  do
        ptar(i)[j]=exp(12σ2t^jti2)p_{\text{tar}}^{(i)}[j]=\exp\left(-\frac{1}{2\sigma^{2}}||\hat{\textbf{t}}_{j}-\textbf{t}_{i}||^{2}\right)
     end for
     ptar(i)=normalize(ptar(i))p_{\text{tar}}^{(i)}=\text{normalize}(p_{\text{tar}}^{(i)}) and append ptar(i)p_{\text{tar}}^{(i)} to Ltar\text{L}_{\text{tar}}
  end for
  Output:      Lsam\text{L}_{\text{sam}} - dataset of input samples,     Lpar\text{L}_{\text{par}} - dataset of external parameters,     Ltar\text{L}_{\text{tar}} - dataset of target posterior distributions
Algorithm 2 Bayesian Thermodynamic Integration
  Input:     Lsam\text{L}_{\text{sam}} - train dataset of input samples,     Lpar\text{L}_{\text{par}} - train dataset of external parameters,     Ltar\text{L}_{\text{tar}} - train dataset of target posterior distributions,     KbundleK_{\text{bundle}} - bundle size or the number of nearest neighbours,     NU2Net stepsN_{\text{U2Net steps}} - steps for posterior approximation.     NICNN stepsN_{\text{ICNN steps}} - steps for free energy approximation.    θ(t|s1,,sKbundle)\mathbb{P}_{\theta}(\textbf{t}|s_{1},\dots,s_{K_{\text{bundle}}}) - neural network with output shape equal to the shape of elements of Ltar\text{L}_{\text{tar}},     Lneib\text{L}_{\text{neib}} = [] - list of KbundleK_{\text{bundle}} nearest neighbors of points in Lpar\text{L}_{\text{par}}.
  Lneib=KNN(Lpar,Kbundle,include_self=True)\text{L}_{\text{neib}}=\text{KNN}(\text{L}_{\text{par}},K_{\text{bundle}},\text{include\_self=True})
  for i=1i=1 to NU2Net stepsN_{\text{U2Net steps}} do
     choose t\textbf{t}^{\prime} randomly from Lpar\text{L}_{\text{par}}
     s(t)=(s1,,sKbundle)s(\textbf{t}^{\prime})=(s_{1},\dots,s_{K_{\text{bundle}}}) shuffle and concatenate KbundleK_{\text{bundle}} microstates corresponding to the nearest neighbors of t\textbf{t}^{\prime} taken from Lneib\text{L}_{\text{neib}}
     target=1Kbundlet^Neib(t)Ltar[index(t^)]\mathbb{P}_{\text{target}}=\frac{1}{K_{\text{bundle}}}\sum_{\hat{\textbf{t}}\in\text{Neib}(\textbf{t}^{\prime})}\text{L}_{\text{tar}}[\text{index}(\hat{\textbf{t}})]
     (θ)=𝔼t(t)KL(target(t|t)||θ(t|s(t)))\mathcal{L}(\theta)=\mathbb{E}_{\textbf{t}^{\prime}\sim\mathbb{P}(\textbf{t})}\text{KL}(\mathbb{P}_{\text{target}}(\textbf{t}|\textbf{t}^{\prime})||\mathbb{P}_{\theta}(\textbf{t}|s(\textbf{t}^{\prime})))
     θAdam(θ,θ(θ))\theta\leftarrow\text{Adam}(\theta,\nabla_{\theta}\mathcal{L}(\theta))
  end for    Zw(t)Z_{w}(\textbf{t}) - input convex neural network,
  for i=1i=1 to NICNN stepsN_{\text{ICNN steps}} do
     w(eq)(t|t)=e(tlogZw(t))TtlogZw(t)e(tlogZw(t))TtlogZw(t)𝑑t\mathbb{P}_{w}^{(\text{eq})}(\textbf{t}|\textbf{t}^{\prime})=\frac{e^{(\nabla_{\textbf{t}^{\prime}}\log Z_{w}(\textbf{t}^{\prime}))^{T}\textbf{t}-\log Z_{w}(\textbf{t})}}{\int e^{(\nabla_{\textbf{t}^{\prime}}\log Z_{w}(\textbf{t}^{\prime}))^{T}\textbf{t}-\log Z_{w}(\textbf{t})}d\textbf{t}}
     (w)=𝔼t(t)JSD(θ(t|s(t)),w(eq)(t|t))\mathcal{L}(w)=\mathbb{E}_{\textbf{t}^{\prime}\sim\mathbb{P}(\textbf{t})}\text{JSD}\left(\mathbb{P}_{\theta}(\textbf{t}|s(\textbf{t}^{\prime})),\mathbb{P}_{w}^{(\text{eq})}(\textbf{t}|\textbf{t}^{\prime})\right)
     wAdam(w,w(w))w\leftarrow\text{Adam}(w,\nabla_{w}\mathcal{L}(w))
  end for
  Output:     F(t)=logZw(t)F(\textbf{t})=\log Z_{w}(\textbf{t}) - approximated free energy,     G(t)=ttlogZw(t)G(\textbf{t})=\nabla_{\textbf{t}\textbf{t}}\log Z_{w}(\textbf{t}) - approximated Fisher metric.    KNN = k-nearest neighbors KL = Kullback–Leibler divergence JSD = Jensen–Shannon divergence

II Numerical experiments

II.1 Ising model


Refer to caption

Figure 1: Examples of the microstates of the 2D Ising model, temperature increases from right to left, magnetic field increases from bottom to top.

In this section we test whether our approach is capable of reconstructing thermodynamic functions for an equilibrium statistical mechanics model in which the distribution of microstates belongs to the exponential family. We consider a 2D Ising model ising1925beitrag , which is an archetypal model of phase transitions in statistical mechanics. A microstate of this model is a set ss of spin variables si=±1s_{i}=\pm 1 defined on each vertex of a square lattice of size L×LL\times L. At equilibrium probability distribution over the space of microstates is

(s|H,T)=1Z(H,T)eβi,jsisjβHisi\mathbb{P}(s|H,T)=\frac{1}{Z(H,T)}e^{-\beta\sum_{\langle i,j\rangle}s_{i}s_{j}-\beta H\sum_{i}s_{i}} (22)

where HH and T=1/βT=1/\beta are external parameters called magnetic field and temperature, respectively, the first sum is over all edges of the lattice, and Z(H,T)Z(H,T) is a normalization parameter known as a partition function:

Z(H,T)=s1=±1sN=±1eβi,jsisjβHisi.Z(H,T)=\sum_{s_{1}=\pm 1}\dots\sum_{s_{N}=\pm 1}e^{-\beta\sum_{\langle i,j\rangle}s_{i}s_{j}-\beta H\sum_{i}s_{i}}. (23)

This model is exactly solvable for H=0H=0 onsager1944crystal ; kac1952combinatorial ; baxter1978399th . In particular, it is known that at Tcr=2log(1+2)2.269T_{\text{cr}}=\frac{2}{\log(1+\sqrt{2})}\approx 2.269 a transition occurs between the high-temperature disordered state, where spin variables are on average equal zero, and the low-temperature ordered state in which average value of spin becomes distinct from zero. For general values of H0H\neq 0 the likelihood function (22) is intractable.

Our dataset consists of N=540000N=540000 samples of spin configurations on the square lattice of size L×L=128×128L\times L=128\times 128 with periodic boundary conditions. We consider the parameter ranges T[Tmin,Tmax]=[1,5],H[Hmin,Hmax]=[2,2]T\in[T_{\min},T_{\max}]=[1,5],H\in[H_{\min},H_{\max}]=[-2,2] similar to the ranges used in (Walker, 2019). Point (T,H)(T,H) is sampled uniformly from this rectangle, and then a sample spin configuration is created for these values of temperature and external field by starting with a random initial condition and equilibrating is with Glauber (one-spin Metropolis) dynamics for 104×128×1281.64×10810^{4}\times 128\times 128\approx 1.64\times 10^{8} iterations. We represent spin configuration as a single-channel image with values +1+1 and 1-1. When constructing target probability distributions we choose σ=150\sigma=\frac{1}{50} and set the discretization 𝒟\mathcal{D} of the square [Tmin,Tmax]×[Hmin,Hmax]=[1,5]×[2,2][T_{\min},T_{\max}]\times[H_{\min},H_{\max}]=[1,5]\times[-2,2] to be a uniform grid with L×L=128×128L\times L=128\times 128 grid cells.


Refer to caption

Figure 2: 2D Ising model. Train (left) and test (right) loss dynamics for different values of KbundleK_{\text{bundle}} in the bundle augmentation procedure.

Image-to-image network with U2-Net architecture qin2020u2 is used to approximate posterior θ(t|s)\mathbb{P}_{\theta}(\textbf{t}|s). The network takes as input a bundle of KbundleK_{\text{bundle}} images concatenated across channel dimension and outputs a categorical distribution representing density values in discrete grid points. For simplicity we choose the discretization 𝒟\mathcal{D} to be of the same spatial dimensions as the input image. For all our numerical experiments the training was performed on a single Nvidia-HGX compute node with 8 A100 GPUs. We trained U2-Net using Adam optimizer with learning rate 0.000010.00001 and batch size of 20482048 for NU2Net steps=20000N_{\text{U2Net steps}}=20000 gradient update steps. In all our experiments the training set consists of 80% of samples and the other 20% are used for testing.

Bundle augmentation. To explore how test loss depends on the bundle size KbundleK_{\text{bundle}}, which determines the number of microstates used to evaluate a single posterior distribution, we performed four different training runs with Kbundle{1,2,3,8}K_{\text{bundle}}\in\{1,2,3,8\}. The results are shown in 2. For Kbundle=1K_{\text{bundle}}=1, we observed overfitting after about the 12000th training iteration. However, even with Kbundle=2K_{\text{bundle}}=2, overfitting is significantly reduced and continues to decrease at large values of KbundleK_{\text{bundle}}. The downside of this approach is that the effective grid resolution of the posterior distribution decreases with growing KbundleK_{\text{bundle}} as

LeffNKbundle,L_{\text{eff}}\approx\sqrt{\frac{N}{K_{\text{bundle}}}},

where NN is the dataset size. In what follows, we set Kbundle=4K_{\text{bundle}}=4. Also, since we are using a fully convolutional architecture, increasing the number of input channels by KbundleK_{\text{bundle}} times has almost no effect on the overall GPU memory consumption during training.


Refer to caption

Figure 3: 2D Ising model. (a) Partial derivative of the reconstructed free energy with respect to temperature Frec(T,H)T\frac{\partial F_{\text{rec}}(T,H)}{\partial T}, (b) Partial derivative of the reconstructed free energy with respect to magnetic field Frec(T,H)H\frac{\partial F_{\text{rec}}(T,H)}{\partial H}, (c) energy of the Ising model E(H,T)=i,jsi(H,T)sj(H,T)E(H,T)=\sum_{\langle i,j\rangle}s_{i}(H,T)s_{j}(H,T), (d) magnetization of the Ising model M(H,T)=isiM(H,T)=\sum_{i}s_{i}, (e) reconstructed free energy.

Free energy approximation. We use Input Convex Neural Network amos2017input architecture to approximate the free energy. The fully connected ICNN has 5 layers with hidden dimension 512. We train network using Adam optimizer with learning rate 0.001.

Reconstructed free energy is shown in 3. The model was able to correctly determine 1st order phase transition line T<TcrT<T_{\text{cr}} and H=0H=0.

II.2 Totally asymmetric simple exclusion process


Refer to caption

Figure 4: Examples of microstates of the stationary state TASEP. The microstate is presented in raster ordering, the cell numbers increase downwards from row to row, and rightwards within a row. α\alpha and β\beta increase from bottom to top and from left to right, respectively. Once can clearly see the high density (left), low-density (bottom) and maximal current (density = 1/2, top right) phases.

Totally asymmetric simple exclusion process (TASEP) is a simple model of 1-dimensional transport phenomena. A microscopic configuration is a set of particles on a 1d lattice respecting the condition that there can be no more than one particle in each lattice cell. Each particle can move to the site to the right of it with probability pdtpdt per time dtdt provided that it is empty (we put p=1p=1 without loss of generality). When complemented with boundary conditions, the TASEP attains a stationary state as time goes to infinity.

One particular case is open boundary conditions, when a particle is added with probability αdt\alpha dt per time dtdt to the leftmost site provided that it is empty and removed with probability βdt\beta dt per time dtdt from the rightmost site provided that it is occupied. For this boundary condition the probability distribution is known exactly dehp ; evans_review ; krapivsky_book and it once again takes the form

(s|α,β)=f(s|α,β)Z(α,β);Z(α,β)=sf(s|α,β);\mathbb{P}(s|\alpha,\beta)=\frac{f(s|\alpha,\beta)}{Z(\alpha,\beta)};\;\;Z(\alpha,\beta)=\sum_{s}f(s|\alpha,\beta); (24)

where microstate ss is a concrete sequence of filled and empty cells, and f(s|α,β)f(s|\alpha,\beta) is some function of ss and external parameters α,β\alpha,\beta. Importantly, however, the function ff, which is known exactly for all system sizes MM and all values of s,α,βs,\alpha,\beta does not take the form (14). TASEP with free boundaries exhibits a rich phase behavior: for large system sizes three distinct phases - the low-density phase, the high-density phase and the maximal current phase are possible depending on the values of α,β\alpha,\beta, and the asymptotic ”free energy” which is defined as

FTASEP(α,β)=limMM1logZ(α,β)F_{\text{TASEP}}(\alpha,\beta)=\lim_{M\to\infty}M^{-1}\log Z(\alpha,\beta) (25)

and coincides with average flow per unit time, equals

FTASEP(α,β)={14,α>12,β>12;α(1α),α<β,α<12;β(1β),β<α,β<12.F_{\text{TASEP}}(\alpha,\beta)=\begin{cases}\frac{1}{4},\ \alpha>\frac{1}{2},\ \beta>\frac{1}{2};\\ \alpha(1-\alpha),\ \alpha<\beta,\ \alpha<\frac{1}{2};\\ \beta(1-\beta),\ \beta<\alpha,\ \beta<\frac{1}{2}.\\ \end{cases} (26)

where the first, second and third cases correspond to maximal current, high density and low density cases, respectively.

We generate a dataset of N=150000N=150000 stationary TASEP configurations on a 1d lattice with M=16384M=16384 sites. The rates α(β)\alpha(\beta) of adding (removing) particles at the left(right) boundary are sampled from the uniform prior distribution over a square [0,1]×[0,1][0,1]\times[0,1]. To reach the stationary state we start from a random initial condition and perform Nsteps=2×1098M2N_{\text{steps}}=2\times 10^{9}\approx 8M^{2} move attempts, which is known to be enough to achieve the stationary state except for the narrow vicinity of the transition line α=β<1/2\alpha=\beta<1/2 between high-density and low-density phases (in this case the stationary state has a slowly diffusing front of a shock wave in it, one needs of order M2M^{2} move attempts to form the shock but of order M3M^{3} move attempts for it to diffusively explore all possible positions of the shock).

We reshape 1d lattice with M=16384M=16384 sites into an image of size L×L=128×128L\times L=128\times 128 using raster scan ordering. To construct target probability distributions we set σ=1150\sigma=\frac{1}{150} and define the discretization 𝒟\mathcal{D} as a uniform grid on [αmin,αmax]×[βmin,βmax]=[0,1]×[0,1][\alpha_{\min},\alpha_{\max}]\times[\beta_{\min},\beta_{\max}]=[0,1]\times[0,1] with L×L=128×128L\times L=128\times 128 grid cells.


Refer to caption

Figure 5: TASEP. Left: reconstructed free energy (red) compared to the exact solution (blue), right: reconstruction error (red) vs. exact solution.

The comparison of the reconstructed free energy and the exact analytical solution is shown in 5.

II.3 Image space of the StyleGAN


Refer to caption

Figure 6: StyleGAN v3 pretrained on FFHQ dataset. Landscape of generated images.

Consider a synthetic two-parametric statistical mechanics model where microstates are sampled using the StyleGAN v3 generator karras2021alias . Let z1,z2z_{1},z_{2} and z3z_{3} be vectors from the latent space of dimension 512. Consider a 2d section of the latent space parameterized by aa and bb as follows

z(a,b)=z1+a(z2z1)+b(z3z1)+ξ,z(a,b)=z_{1}+a(z_{2}-z_{1})+b(z_{3}-z_{1})+\xi, (27)

where ξ\xi is a normally distributed random vector with zero mean and small enough standard deviation (we it to be 1/5 of the standard deviation of the prior standard normal distribution). External parameters aa and bb lie in the rectangle [0,1]×[0,1][0,1]\times[0,1], so that z(a,b)z(a,b) interpolates between three base latent vectors at the corners z(0,0)=z1z(0,0)=z_{1}, z(1,0)=z2z(1,0)=z_{2}, z(0,1)=z3z(0,1)=z_{3} of the rectangle. By putting latent code z(a,b)z(a,b) into the generator of StyleGAN we can sample images as functions of two external parameters.

s=G(z(a,b))s=G(z(a,b)) (28)

For FFHQ dataset we generate dataset of N=500000N=500000 human faces using StyleGAN at resolution of 1024×10241024\times 1024 and resize it to the resolution 128×128128\times 128 before feeding it to the U2Net model which approximates posterior distribution on the space of external parameters. Experiment setup was similar to the 2D Ising model.

Reconstructed Fisher metric is shown in 7. Unexpectedly, we found two diagonal lines in the components of the Fisher metric, which, from the point of view of statistical mechanics, are signs of second-order phase transitions. If we compare it with the faces presented in 6 and 13 we can observe that phase transition lines correspond to the sharp identity change of generated faces. In the next section we investigate whether this behaviour is determined by the mapping network part of StyleGAN or it depends only on the synthesis network.


Refer to caption

Figure 7: StyleGAN v3 with images as microstates. Components of the Fisher information metric G(a,b)G(a,b). (top left) Gaa(a,b)=2logZ(a,b)a2G_{aa}(a,b)=\frac{\partial^{2}\log Z(a,b)}{\partial a^{2}} component, (top right) Gab(a,b)==2logZ(a,b)abG_{ab}(a,b)==\frac{\partial^{2}\log Z(a,b)}{\partial a\partial b} component, (bottom left) Gba(a,b)=Gab(a,b)G_{ba}(a,b)=G_{ab}(a,b) component, (bottom right) Gbb(a,b)=2logZ(a,b)b2G_{bb}(a,b)=\frac{\partial^{2}\log Z(a,b)}{\partial b^{2}} component. Unexpectedly, we found two diagonal lines in the components of the Fisher metric, which, from the point of view of statistical mechanics, are signs of second-order phase transitions.

II.4 Intermediate latent space 𝒲\mathcal{W} of the StyleGAN


Refer to caption

Figure 8: StyleGAN v3 pretrained on FFHQ dataset. Landscape of intermediate latent codes from the 𝒲\mathcal{W} space.

Given a latent vector zz the StyleGAN v3 generates images in two steps. First, the mapping network is applied to generate a vector (code) in the intermediate latent space 𝒲\mathcal{W}. This latent code in case of the StyleGAN v3 model pretrained on FFHQ dataset is a matrix with shape 16×51216\times 512 karras2021alias ; karras2020analyzing . Second, this vector is used by the weight demodulation layers of the synthesis network to generate the image itself.

A lot of attention was devoted to the improvement of the synthesis network. The effect of the mapping network on the generated images attracted less attention. To study which network, mapping or synthesis, is responsible for the typical shape of the Fisher metric field of the StyleGAN we introduce another synthetic two-parametric statistical mechanics model where microstates are generated by the mapping network

w=Gmapping(z(a,b))w=G_{\text{mapping}}(z(a,b)) (29)

Similarly to the previous section we generate N=500000N=500000 intermediate latent code of the shape 16×51216\times 512 for the same latent vectors z(a,b)z(a,b) and resize it into an image of size 128×128128\times 128.

As it can be seen from the 9 and 7 Fisher metric fields for image space and for the 𝒲\mathcal{W} space almost coincide which means that mostly the mapping network defines variability of the generated faces and the synthesis network doesn’t change the distribution of output images. In other words the synthesis network performs almost a bijective transformation of vectors from the intermediate latent space 𝒲\mathcal{W}.


Refer to caption

Figure 9: StyleGAN v3 with latent vectors from the 𝒲\mathcal{W} space as microstates. Components of the Fisher information metric G(a,b)G(a,b). (top left) Gaa(a,b)G_{aa}(a,b) component, (top right) Gab(a,b)G_{ab}(a,b) component, (bottom left) Gba(a,b)=Gab(a,b)G_{ba}(a,b)=G_{ab}(a,b) component, (bottom right) Gbb(a,b)G_{bb}(a,b) component. Compared to the components of the Fisher metric obtained in the previous section, the two lines are closer to each other, but their general arrangement remains the same.

III Previous work

Machine learning methods have been actively used in the study of models of classical and quantum statistical physics van2017learning ; carrasquilla2017machine ; rem2019identifying ; wang2021unsupervised ; canabarro2019unveiling . The main problem of concern was to determine boundaries of phase transitions and to extract learned ”order parameters” which can be used to distinguish one phase from another. Since equilibrium distribution of microstates of a statistical mechanics system belongs to the exponential family (usually with intractable normalization constant) the problem of determining ”order parameters” is equivalent to the search of the sufficient statistics jiang2017learning ; chen2020neural . In walker2020deep it was observed that principal components of the mean and standard deviation vectors learned by a variational autoencoder trained on the configurations of a 2D Ising model were highly correlated with the known physical quantities, indicating that variational autoencoder implicitly extract sufficient statistics from the data.

In contrast to chen2020neural and walker2020deep where neural networks were used to extract sufficient statistics from the raw microstates our approach is based on approximating the partition function directly and expressing mean values of the sufficient statistics through the derivatives of the logarithm of the partition function.

Our approach to the visualization of the StyleGAN latent space in terms of the Fisher metric could be thought as an unsupervised alternative to the approach used in tran2018dist to study the GANs mode collapse problem, where the authors applied a classification network trained on MNIST to classify the generated images and construct a ”phase diagrams” of predicted classes in the space of two-dimensional latent parameters. Similar approach was used in carrasquilla2017machine to extract phase diagram of the Ising model by training network to classify low-temperature and high-temperature phases and drawing prediction on the space of external parameters. In our method, the singularities of the Fisher metric automatically highlight the boundaries of second-order phase transitions without an additional classification network or any other a priori knowledge about the number of phases and their location on the parameter space. However, we assume that all integrals over the space of external parameters are tractable, which is not the case for the general 512512-dimensional hidden space of StyleGAN. Since 512512-dimensional integrals cannot be computed efficiently using domain discretization, other approaches are needed to parameterize the predicted posterior distributions on the parameter space, such as tensor-train density estimation method novikov2021tensor , which admits a tractable partition function.

IV Discussion

We propose a new approach to the reconstruction of the thermodynamic functions: partition function, free energy and their derivatives as functions of the external parameters, and apply it to several two-parametric statistical mechanics models. Our method is based on expressing the Fisher metric on the manifold of probability distributions over a high-dimensional space of microstates through the posterior distributions over a space of external parameters. Log-partition function is obtained by approximating the metric field by a Hessian of a convex function parametrized by an Input Convex Neural Network (ICNN).

The proposed approach is used to reconstruct the partition functions and phase diagrams of the equilibrium Ising model and the exactly solvable non-equilibrium TASEP without any a priori knowledge about microscopic rules of the models. The only information we need is some algorithm allowing to sample microstates for given values of the external parameters.

We also demonstrate how our approach can be used to visualize the latent space of a StyleGAN v3 model and evaluate the variability of the generated images. The singularities of the Fisher metric in the two-dimensional section of the latent space are signs of a second-order phase transition, corresponding to a sharp change in the identity of the generated faces with a gradual change in the latent code. It is shown that the phase diagram of the generated images is mostly determined by the mapping network, so the synthesis network does not change the position of the phase boundaries. Potentially, this means that it is possible to extract semantically meaningful features by studying only the mapping network, since the most significant changes occur in directions orthogonal to the boundaries of the phase transition. In general, the reasons for the existence of the observed phase boundaries in StyleGAN models remain largely unclear, and this phenomenon requires further research.

Acknowledgements

The authors acknowledge the use of Zhores HPC zacharov2019zhores for obtaining the results presented in this paper. We are grateful to P. Krapivsky, V. Palyulin, A. Iakovlev and D. Egorov for interesting discussions. This work is supported by the RSF Grant No. 21-73-00176.

References

  • (1) Amari, S.-I., Nagaoka, H. Methods of information geometry, American Mathematical Soc., Providence, RI, 2000.
  • (2) Ising, E. Beitrag zur theorie des ferromagnetismus. Zeitschrift für Physik, 31, 253 (1925).
  • (3) Onsager, L. Crystal statistics. I. a two-dimensional model with an order-disorder transition. Physical Review, 65, 117 (1944).
  • (4) Kac, M. and Ward, J. C. A combinatorial solution of the two-dimensional Ising model. Physical Review, 88, 1332 (1952).
  • (5) Baxter, R. J. and Enting, I. G. 399th solution of the Ising model. J. Phys. A: Math. and Gen., 11, 2463 (1978).
  • (6) Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O. R., and Jagersand, M. U2-net: Going deeper with nested u-structure for salient object detection. Pattern Recognition, 106, 107404 (2020).
  • (7) Amos, B., Xu, L., and Kolter, J. Z. Input convex neural networks. In International Conference on Machine Learning, p. 146. (2017).
  • (8) Derrida, B., Evans, M. R., Hakim, V., and Pasquier, V. Exact solution of a 1d asymmetric exclusion model using a matrix formulation. J. Phys. A: Math. and Gen., 26, 1493 (1993).
  • (9) Blythe, R. A. and Evans, M. R. Nonequilibrium steady states of matrix-product form: a solver’s guide. J. Phys. A: Math. and Gen., 40, R333 (2007).
  • (10) Krapivsky, P. L., Redner, S., and Ben-Naim, E. A kinetic view of statistical physics. Cambridge University Press, 2010.
  • (11) Karras, T., Aittala, M., Laine, S., Härkönen, E., Hellsten, J., Lehtinen, J., and Aila, T. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34 (2021).
  • (12) Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., and Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110 (2020).
  • (13) Van Nieuwenburg, E. P., Liu, Y.-H., and Huber, S. D. Learning phase transitions by confusion. Nature Physics, 13, 435 (2017).
  • (14) Carrasquilla, J. and Melko, R. G. Machine learning phases of matter. Nature Physics, 13, 431 (2017).
  • (15) Rem, B. S., Käming, N., Tarnowski, M., Asteria, L., Fläschner, N., Becker, C., Sengstock, K., and Weitenberg, C. Identifying quantum phase transitions using artificial neural networks on experimental data. Nature Physics, 15, 917 (2019).
  • (16) Wang, J., Zhang, W., Hua, T., and Wei, T.-C. Unsupervised learning of topological phase transitions using the Calinski-Harabaz index. Physical Review Research, 3, 013074 (2021).
  • (17) Canabarro, A., Fanchini, F. F., Malvezzi, A. L., Pereira, R., and Chaves, R. Unveiling phase transitions with machine learning. Physical Review B, 100, 045129 (2019).
  • (18) Jiang, B., Wu, T.-Y., Zheng, C., and Wong, W. H. Learning summary statistic for approximate bayesian computation via deep neural network. Statistica Sinica, p. 1595 (2017).
  • (19) Chen, Y., Zhang, D., Gutmann, M., Courville, A., and Zhu, Z. Neural approximate sufficient statistics for implicit models, arXiv:2010.10079.
  • (20) Walker, N., Tam, K.-M., and Jarrell, M. Deep learning on the 2-dimensional Ising model to extract the crossover region with a variational autoencoder. Scientific reports, 10, 1 (2020).
  • (21) Tran, N.-T., Bui, T.-A., and Cheung, N.-M. Dist-gan: An improved GAN using distance constraints. In Proceedings of the European Conference on Computer Vision (ECCV), p. 370 (2018).
  • (22) Novikov, G. S., Panov, M. E., and Oseledets, I. V. Tensor-train density estimation. In Uncertainty in Artificial Intelligence, pp. 1321, PMLR, 2021.
  • (23) Zacharov, I., Arslanov, R., Gunin, M., Stefonishin, D., Bykov, A., Pavlov, S., Panarin, O., Maliutin, A., Rykovanov, S., and Fedorov, M. “Zhores”—petaflops supercomputer for data-driven modeling, machine learning and artificial intelligence installed in Skolkovo Institute of Science and Technology. Open Engineering, 9, 512 (2019).

Appendix A Learned posterior distributions for 2D Ising model


Refer to caption

Figure 10: (left) Uniform prior distribution on the square [Tmin,Tmax]×[Hmin,Hmax]=[1,5]×[2,2][T_{\min},T_{\max}]\times[H_{\min},H_{\max}]=[1,5]\times[-2,2]. (center) Observed microstate of the Ising model generated for T=2.04,H=0.613T=2.04,H=-0.613. (right) Posterior distribution on the square [Tmin,Tmax]×[Hmin,Hmax][T_{\min},T_{\max}]\times[H_{\min},H_{\max}] predicted by U2Net.

Refer to caption


Figure 11: (left) Uniform prior distribution on the square [Tmin,Tmax]×[Hmin,Hmax]=[1,5]×[2,2][T_{\min},T_{\max}]\times[H_{\min},H_{\max}]=[1,5]\times[-2,2]. (center) Observed microstate of the Ising model generated for T=3.32,H=0.0884T=3.32,H=0.0884. (right) Posterior distribution on the square [Tmin,Tmax]×[Hmin,Hmax][T_{\min},T_{\max}]\times[H_{\min},H_{\max}] predicted by U2Net.

Refer to caption

Figure 12: (left) Uniform prior distribution on the square [Tmin,Tmax]×[Hmin,Hmax]=[1,5]×[2,2][T_{\min},T_{\max}]\times[H_{\min},H_{\max}]=[1,5]\times[-2,2]. (center) Observed microstate of the Ising model generated for T=3.01,H=1.09T=3.01,H=-1.09. (right) Posterior distribution on the square [Tmin,Tmax]×[Hmin,Hmax][T_{\min},T_{\max}]\times[H_{\min},H_{\max}] predicted by U2Net.

Appendix B StyleGAN v3 linear interpolation between three base faces


Refer to caption

Figure 13: Landscape of generated faces on the plane of parameters a,ba,b of linear interpolation between three base faces. The discovered phase transitions occur between three macroscopic phases correspond to the faces generated for a=0,b=0a=0,b=0, a+b1a+b\approx 1 and a=1,b=1a=1,b=1

Appendix C Reconstructed Fisher metrics


Refer to caption

Figure 14: Components of the reconstructed Fisher metric. (a) 2D Ising model; (b) TASEP; (c) StyleGAN v3 trained on FFHQ with faces as microstates; (d) StyleGAN v3 trained on FFHQ with vectors from 𝒲\mathcal{W}-space as microstates (only the mapping network is used).

Appendix D StyleGAN v3 latent space visualization

D.1 Density-based visualization of the posterior distribution on the space of macroscopic parameters


Refer to caption

Figure 15: Posterior distributions on the space of parameters a,ba,b of the linear interpolation between three base faces. M=1M=1 distributions per plot. For every pixel maximum value of the density across all MM distributions was taken.

Refer to caption

Figure 16: Posterior distributions on the space of parameters a,ba,b of the linear interpolation between three base faces. M=25M=25 distributions per plot. For every pixel maximum value of the density across all MM distributions was taken.

Refer to caption

Figure 17: Posterior distribution on the space of parameters a,ba,b of the linear interpolation between three base faces. M=50M=50 distributions per plot. For every pixel maximum value of the density across all MM distributions was taken.

Refer to caption

Figure 18: Posterior distribution on the space of parameters a,ba,b of the linear interpolation between three base faces. M=100M=100 distributions per plot. For every pixel maximum value of the density across all MM distributions was taken.

Refer to caption

Figure 19: Posterior distribution the space of parameters a,ba,b of the linear interpolation between three base faces. M=200M=200 distributions per plot. For every pixel maximum value of the density across all distribution was taken.

D.2 Log-density-based visualization of the posterior distribution on the space of macroscopic parameters


Refer to caption


Figure 20: Logarithm of the posterior distributions on the space of parameters a,ba,b of the linear interpolation between three base faces. M=1M=1 distributions per plot. For every pixel maximum value of the log-density across all MM distributions was taken.

Refer to caption

Figure 21: Logarithm of the posterior distributions on the space of parameters a,ba,b of the linear interpolation between three base faces. M=25M=25 distributions per plot. For every pixel maximum value of the log-density across all MM distributions was taken.

Refer to caption

Figure 22: Logarithm of the posterior distribution on the space of parameters a,ba,b of the linear interpolation between three base faces. M=50M=50 distributions per plot. For every pixel maximum value of the log-density across all MM distributions was taken.

Refer to caption

Figure 23: Logarithm of the posterior distribution on the space of parameters a,ba,b of the linear interpolation between three base faces. M=100M=100 distributions per plot. For every pixel maximum value of the log-density across all MM distributions was taken.

Refer to caption

Figure 24: Logarithm of the posterior distribution the space of parameters a,ba,b of the linear interpolation between three base faces. M=200M=200 distributions per plot. For every pixel maximum value of the log-density across all distribution was taken.