This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quantum field theories, Markov random fields and machine learning

Dimitrios Bachtis1    Gert Aarts2,3 and Biagio Lucini1,4 1Department of Mathematics, Swansea University, Bay Campus, SA1 8EN, Swansea, Wales, United Kingdom 2Department of Physics, Swansea University, Singleton Campus, SA2 8PP, Swansea, Wales, United Kingdom 3European Centre for Theoretical Studies in Nuclear Physics and Related Areas (ECT*) & Fondazione Bruno Kessler Strada delle Tabarelle 286, 38123 Villazzano (TN), Italy 4Swansea Academy of Advanced Computing, Swansea University, Bay Campus, SA1 8EN, Swansea, Wales, United Kingdom [email protected], [email protected], [email protected]
Abstract

The transition to Euclidean space and the discretization of quantum field theories on spatial or space-time lattices opens up the opportunity to investigate probabilistic machine learning within quantum field theory. Here, we will discuss how discretized Euclidean field theories, such as the ϕ4\phi^{4} lattice field theory on a square lattice, are mathematically equivalent to Markov fields, a notable class of probabilistic graphical models with applications in a variety of research areas, including machine learning. The results are established based on the Hammersley-Clifford theorem. We will then derive neural networks from quantum field theories and discuss applications pertinent to the minimization of the Kullback-Leibler divergence for the probability distribution of the ϕ4\phi^{4} machine learning algorithms and other probability distributions.

1 Introduction

To construct a probability distribution in a high-dimensional space one can turn to the framework of probabilistic graphical models. Probabilistic graphical models comprise a set of random variables, positioned within a graph-based representation, that satisfy certain factorization as well as conditional dependence and independence properties. One notable case of probabilistic graphical models is a Markov field, in which the random variables are connected through undirected edges and satisfy a significant condition of locality, called the Markov property. Markov properties emerge as important mathematical conditions across distinct research fields, such as in machine learning [1] or in constructive quantum field theory [2].

In this contribution, we discuss the proof of Markov properties for discretized Euclidean field theories [3]. Specifically, we demonstrate, through the Hammersley-Clifford theorem, that the ϕ4\phi^{4} scalar field theory on a square lattice satisfies the local Markov property, and is therefore mathematically equivalent to a Markov field. Based on this equivalence, we introduce algorithms which generalize a notable class of neural networks, specifically restricted Boltzmann machines. Finally, we present applications pertinent to the minimization of the Kullback-Leibler divergence for the probability distribution of the ϕ4\phi^{4} machine learning algorithms and other probability distributions.

2 The ϕ4\phi^{4} Markov field

We denote as Λ\Lambda a finite set which is equivalently expressed as a graph 𝒢=(Λ,e)\mathcal{G}=(\Lambda,e), where the points of Λ\Lambda correspond to the vertices of 𝒢\mathcal{G} and ee denotes the edges of the graph. Two vertices i,jΛi,j\in\Lambda which are connected by an edge are neighbours. A clique is a set of neighbours, which is called maximal if no additional vertex can be included that is simultaneously a neighbour with all the vertices present in the clique, see Fig. 1. We now assign to each vertex ii in the graph 𝒢\mathcal{G} a continuous-valued random variable, which is denoted as ϕi\phi_{i}.

Refer to caption
Figure 1: A bipartite graph (a) and a square lattice (b). Examples of maximal cliques are {ϕ1,h1}\{\phi_{1},h_{1}\} and {ϕ3,ϕ4}\{\phi_{3},\phi_{4}\}, respectively.

A Markov random field is defined as a set of random variables on a graph 𝒢=(Λ,e)\mathcal{G}=(\Lambda,e) whose associated probability distribution p(ϕ)p(\phi) satisfies the local Markov property:

p(ϕi|(ϕj)jΛi)=p(ϕi|(ϕj)j𝒩i),p(\phi_{i}|(\phi_{j})_{j\in\Lambda-i})=p(\phi_{i}|(\phi_{j})_{j\in\mathcal{N}_{i}}), (1)

where 𝒩i\mathcal{N}_{i} is the set of neighbours of a given point ii. The local Markov property can be proven for a probability distribution that is encoded in a graph through the Hammersley-Clifford theorem:

Theorem 1 (Hammersley-Clifford)

A probability distribution pp, satisfying the condition of positivity, is associated with the events generated by a Markov network, iff pp can be factorized as a product of positive factors, or potential functions ψc\psi_{c}, over the cliques of the associated graph structure 𝒢\mathcal{G}:

p(ϕ)=1ZcCψc(ϕ),p(\phi)=\frac{1}{Z}\prod_{c\in C}\psi_{c}(\phi), (2)

where Z=ϕcCψc(ϕ)dϕZ=\int_{\bm{\phi}}\prod_{c\in C}\psi_{c}(\bm{\phi})d\bm{\phi} is a normalization constant, cCc\in C is a maximal clique, and ϕ\bm{\phi} denotes all configurations of the system.

The Euclidean action of the two-dimensional ϕ4\phi^{4} scalar field theory is

SE=κLijϕiϕj+(μL2+4κL)2iϕi2+λL4iϕi4,S_{E}=-\kappa_{L}\sum_{\langle ij\rangle}\phi_{i}\phi_{j}+\frac{(\mu_{L}^{2}+4\kappa_{L})}{2}\sum_{i}\phi_{i}^{2}+\frac{\lambda_{L}}{4}\sum_{i}\phi_{i}^{4}, (3)

where κL,μL2,λL\kappa_{L},\mu_{L}^{2},\lambda_{L} are dimensionless parameters. We redefine w=κLw=\kappa_{L}, a=(μL2+4κL)/2a=(\mu_{L}^{2}+4\kappa_{L})/2, b=λL/4b=\lambda_{L}/4 for simplicity, and consider them as inhomogeneous:

S(ϕ;θ)=ijwijϕiϕj+iaiϕi2+ibiϕi4.S(\phi;\theta)=-\sum_{\langle ij\rangle}w_{ij}\phi_{i}\phi_{j}+\sum_{i}a_{i}\phi_{i}^{2}+\sum_{i}b_{i}\phi_{i}^{4}. (4)

The ϕ4\phi^{4} inhomogeneous action is described by the coupling constants θ={wij,ai,bi}\theta=\{w_{ij},a_{i},b_{i}\}, and the Boltzmann probability distribution:

p(ϕ;θ)=exp[S(ϕ;θ)]ϕexp[S(ϕ,θ)]𝑑ϕ.p(\phi;\theta)=\frac{\exp\big{[}-S(\phi;\theta)\big{]}}{\int_{\bm{\phi}}{\exp[-S(\bm{\phi},\theta)]}d\bm{\phi}}. (5)

The lattice version of the ϕ4\phi^{4} theory is, by definition, expressed as a graph. To verify that the ϕ4\phi^{4} theory satisfies Markov properties we define the following potential function ψc\psi_{c} that is able to factorize the probability distribution in terms of maximal cliques cCc\in C

ψc=exp[wijϕiϕj+14(aiϕi2+ajϕj2+biϕi4+bjϕj4)],\psi_{c}=\exp\bigg{[}-w_{ij}\phi_{i}\phi_{j}+\frac{1}{4}(a_{i}\phi_{i}^{2}+a_{j}\phi_{j}^{2}+b_{i}\phi_{i}^{4}+b_{j}\phi_{j}^{4})\bigg{]}, (6)

where i,ji,j are nearest neighbours.

3 Machine learning with ϕ4\phi^{4} Markov random fields

3.1 Learning without predefined data

To compare the probability distribution p(ϕ;θ)p(\phi;\theta) of the Markov field with another probability distribution q(ϕ)q(\phi), we define the Kullback-Leibler divergence, a nonnegative quantity, as

KL(p||q)=p(ϕ;θ)lnp(ϕ;θ)q(ϕ)dϕ0.KL(p||q)=\int_{-\infty}^{\infty}{p(\bm{\phi};\theta)}\ln\frac{p(\bm{\phi};\theta)}{q(\bm{\phi})}d\bm{\phi}\geq 0. (7)

We now consider a target Boltzmann probability distribution q(ϕ)=exp[𝒜]/Z𝒜q(\phi)=\exp[-\mathcal{A}]/Z_{\mathcal{A}} that describes an arbitrary statistical system and substitute the two probability distributions in the Kullback-Leibler divergence to obtain:

F𝒜𝒜Sp(ϕ;θ)+F,F_{\mathcal{A}}\leq\langle\mathcal{A}-S\rangle_{p(\phi;\theta)}+F\equiv\mathcal{F}, (8)

where \mathcal{F} is the variational free energy, F𝒜=lnZ𝒜F_{\mathcal{A}}=-\ln Z_{\mathcal{A}}, and Op(ϕ;θ)\langle O\rangle_{p(\phi;\theta)} denotes the expectation value of an observable OO under the probability distribution p(ϕ;θ)p(\phi;\theta). By minimizing this quantity the two probability distributions p(ϕ;θ)p(\phi;\theta) and q(ϕ)q(\phi) will become equal and we can therefore use the distribution of the ϕ4\phi^{4} theory to draw samples from the target probability distribution q(ϕ)q(\phi).

Based on a gradient-based approach, we are able to minimize Eq. 8 via

θi=𝒜Sθi𝒜Sθi+SSθiSSθi,\frac{\partial\mathcal{F}}{\partial\theta_{i}}=\langle\mathcal{A}\rangle\Big{\langle}\frac{\partial S}{\partial\theta_{i}}\Big{\rangle}-\Big{\langle}\mathcal{A}\frac{\partial S}{\partial\theta_{i}}\Big{\rangle}+\Big{\langle}S\frac{\partial S}{\partial\theta_{i}}\Big{\rangle}-\langle S\rangle\Big{\langle}\frac{\partial S}{\partial\theta_{i}}\Big{\rangle}, (9)

and the coupling constants θ\theta are updated at each epoch tt as:

θ(t+1)=θ(t)η,\theta^{(t+1)}=\theta^{(t)}-\eta*\mathcal{L}, (10)

where η\eta is the learning rate and =/θ(t)\mathcal{L}=\partial\mathcal{F}/\partial\theta^{(t)}.

We now consider a variation of the ϕ4\phi^{4} theory which includes next-nearest neighbor interactions nnnnnn, and with a complex action 𝒜\mathcal{A} defined as

𝒜=k=15gk𝒜(k)=g1ijnnϕiϕj+g2iϕi2+g3iϕi4+g4ijnnnϕiϕj+ig5iϕi2,\mathcal{A}=\sum_{k=1}^{5}g_{k}\mathcal{A}^{(k)}=g_{1}\sum_{\langle ij\rangle_{nn}}\phi_{i}\phi_{j}+g_{2}\sum_{i}\phi_{i}^{2}+g_{3}\sum_{i}\phi_{i}^{4}+g_{4}\sum_{\langle ij\rangle_{nnn}}\phi_{i}\phi_{j}+ig_{5}\sum_{i}\phi_{i}^{2}, (11)

where ii denotes the imaginary unit. The coupling constants can have arbitrary values but for this example we consider g1=g4=1g_{1}=g_{4}=-1, g2=1.52425g_{2}=1.52425, g3=0.175g_{3}=0.175 and g5=0.15g_{5}=0.15, see Ref. [3].

Refer to caption
Refer to caption
Figure 2: [𝒜]\Re[\mathcal{A}] (left) and [m]\Re[m] (right) versus g4g_{4}. The statistical errors are comparable with the width of the line.

We now utilize the ϕ4\phi^{4} Markov field of action SS, to approximate a probability distribution which is described by an action 𝒜{4}=k=14gk𝒜(k)\mathcal{A}_{\{4\}}=\sum_{k=1}^{4}g_{k}\mathcal{A}^{(k)}. To investigate how accurate the equivalence between the two probability distributions is, we will implement reweighting based on the probability distribution of the ϕ4\phi^{4} Markov field to calculate expectation values of the full complex action 𝒜\mathcal{A}. The reweighting relation is given by

O=l=1NOlexp[Slgj𝒜l(j)k=1,kj5gk𝒜l(k)]l=1Nexp[Slgj𝒜l(j)k=1,kj5gk𝒜l(k)],\langle O\rangle=\frac{\sum_{l=1}^{N}O_{{l}}\exp[S_{{l}}-g_{j}^{\prime}\mathcal{A}_{{l}}^{(j)}-\sum_{k=1,k\neq j}^{5}g_{k}\mathcal{A}^{(k)}_{{l}}]}{\sum_{l=1}^{N}\exp[S_{{l}}-g_{j}^{\prime}\mathcal{A}_{{l}}^{(j)}-\sum_{k=1,k\neq j}^{5}g_{k}\mathcal{A}_{{l}}^{(k)}]}, (12)

details of which can be found in Refs. [3, 4, 5, 6].

We consider j=4j=4 and calculate the expectation values of the action 𝒜\mathcal{A} and the magnetization mm by extrapolating with reweighting in the range g4[1.15,0.85]g_{4}^{\prime}\in[-1.15,-0.85]. The results, shown in Fig. 2, overlap within statistical uncertainty with independent calculations, therefore verifying that observables which would correspond to probability distribution of the full action 𝒜\mathcal{A} can be accurately calculated based on reweighting from the inhomogeneous action SS.

3.2 Learning with predefined data

A different class of machine learning applications considers the case where the form of the target probability distribution is unknown but there exists a set of available data in which an empirical probability distribution is encoded. To explore this type of applications we consider the following expression of the Kullback-Leibler divergence:

KL(q||p)=q(ϕ)lnq(ϕ)p(ϕ;θ)dϕ0.KL(q||p)=\int_{-\infty}^{\infty}{q(\bm{\phi})}\ln\frac{q(\bm{\phi})}{p(\bm{\phi};\theta)}d\bm{\phi}\geq 0. (13)

By substituting and taking the derivative in terms of the variational parameters θ\theta we obtain:

lnp(ϕ;θ)θ=Sθp(ϕ;θ)Sθ,\frac{\partial\ln p(\phi;\theta)}{\partial\theta}=\Big{\langle}\frac{\partial S}{\partial\theta}\Big{\rangle}_{p(\phi;\theta)}-\frac{\partial S}{\partial\theta}, (14)

and the update rule of the parameters θ\theta at each epoch tt is given based on Eq. 10, where =lnp(ϕ;θ(t))/θ(t)\mathcal{L}=-\partial\ln p(\phi;\theta^{(t)})/{\partial\theta^{(t)}}.

As an example of a distribution to be learned by the machine learning algorithm, we consider the simple case of a Gaussian distribution with μ=0.5\mu=-0.5 and σ=0.05\sigma=0.05. Since the lattice action is invariant under the Z2Z_{2} symmetry we expect that the symmetric values of the dataset are equiprobable in being reproduced. This invariance can be removed via the inclusion of a term iriϕi\sum_{i}r_{i}\phi_{i} which breaks the symmetry of the action explicitly. The results are shown in Fig. 3 (left) where the anticipated behaviour is observed.

Finally, we illustrate the approach in an image from the CIFAR-10 dataset. The thermalization of the trained Markov field is depicted in Fig. 3 (right), where the image emerges within the equilibrium probability distribution. Since the machine learning algorithm learns the correct values of coupling constants in the action that solve the considered problem, extensions towards learning the appropriate coupling constants which describe renormalized systems can potentially be explored [7].

Refer to caption
Figure 3: Probability density function (PDF) versus ϕi\phi_{i} (left). The thermalization of the trained Markov field (right).

3.3 Machine learning with ϕ4\phi^{4} neural networks

To increase the expressivity of the machine learning algorithm, we will introduce a new set of latent or hidden variables hjh_{j} within the graph structure. In addition, we will restrict the interactions to be exclusively between the visible ϕi\phi_{i} and the hidden hjh_{j} variables, giving rise to the lattice action

Refer to caption
Figure 4: Extracted dependencies from the trained ϕ4\phi^{4} neural network.
S(ϕ,h;θ)=i,jwijϕihj+iriϕi+iaiϕi2+ibiϕi4+jsjhj+jmjhj2+jnjhj4.S(\phi,h;\theta)=-\sum_{i,j}w_{ij}\phi_{i}h_{j}+\sum_{i}r_{i}\phi_{i}+\sum_{i}a_{i}\phi_{i}^{2}+\sum_{i}b_{i}\phi_{i}^{4}+\sum_{j}s_{j}h_{j}+\sum_{j}m_{j}h_{j}^{2}+\sum_{j}n_{j}h_{j}^{4}. (15)

The ϕ4\phi^{4} neural network implemented above is a generalization of other neural network architectures: if bi=nj=0b_{i}=n_{j}=0 the ϕ4\phi^{4} neural network reduces to a Gaussian-Gaussian restricted Boltzmann machine and if bi=nj=mj=0b_{i}=n_{j}=m_{j}=0 and hj{1,1}h_{j}\in\{-1,1\} then one obtains a Gaussian-Bernoulli restricted Boltzmann machine, see Refs [8, 9]. Restricted Boltzmann machines were inspired by Ising models and since the ϕ4\phi^{4} theory can become equivalent to an Ising model [10], novel physical insights into how these machine learning algorithms work might emerge from the perspective of quantum field theory. We now train the ϕ4\phi^{4} neural network on the Olivetti faces dataset. Some representative features, which resemble face structures, are shown in Fig. 4.

4 Conclusions

In this contribution we have shown that discretized Euclidean field theories are Markov fields, hence it is now possible to investigate machine learning directly within quantum field theory [3]. Based on the Hammersley-Clifford theorem, we demonstrated that the ϕ4\phi^{4} theory, formulated on a square lattice, satisfies the local Markov property. In addition we have derived ϕ4\phi^{4} neural networks which generalize restricted Boltzmann machines (see also Refs. [11, 12]). Finally, we have presented applications pertinent to physics and computer science, opening up the opportunity for nonperturbative investigations of machine learning within lattice field theory.

\ack

The authors received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 813942. The work of GA and BL has been supported in part by the UKRI Science and Technology Facilities Council (STFC) Consolidated Grant ST/T000813/1. The work of BL is further supported in part by the Royal Society Wolfson Research Merit Award WM170010 and by the Leverhulme Foundation Research Fellowship RF-2020-461\9. Numerical simulations have been performed on the Swansea SUNBIRD system. This system is part of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund (ERDF) via Welsh Government.

References

References

  • [1] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques (Cambridge, MA: The MIT Press).
  • [2] E. Nelson, Construction of quantum fields from Markoff fields, Journal of Functional Analysis 12, 97 (1973).
  • [3] D. Bachtis, G. Aarts and B. Lucini, Quantum field-theoretic machine learning, Phys. Rev. D 103, 074510 (2021).
  • [4] D. Bachtis, G. Aarts and B. Lucini, Adding machine learning within Hamiltonians: Renormalization group transformations, symmetry breaking and restoration, Phys. Rev. Research 3, 013134 (2021).
  • [5] D. Bachtis, G. Aarts and B. Lucini, Mapping distinct phase transitions to a neural network, Phys. Rev. E 102, 053306 (2020).
  • [6] D. Bachtis, G. Aarts and B. Lucini, Extending machine learning classification capabilities with histogram reweighting, Phys. Rev. E 102, 033303 (2020).
  • [7] D. Bachtis, G. Aarts, F. Di Renzo and B. Lucini, Inverse renormalization group in quantum field theory, arXiv:2107.00466 [hep-lat] (2021).
  • [8] G. Hinton, Training products of experts by minimizing contrastive divergence, Neural Computation, 14, 8 (2002).
  • [9] A. Fischer and C. Igel, Training restricted Boltzmann machines: An introduction, Pattern Recognition 47, 25 (2014).
  • [10] A. Milchev, D. W. Heermann and K. Binder, Finite-size scaling analysis of the ϕ4\phi^{4} field theory on the square lattice, J. Stat. Phys. 44, 749 (1986).
  • [11] K. Hashimoto, S. Sugishita, A. Tanaka and A. Tomiya, Deep learning and the AdS/CFT correspondence, Phys. Rev. D 98, 046019 (2018).
  • [12] K. Hashimoto, AdS/CFT as a deep Boltzmann machine, Phys. Rev. D 99, 106017 (2019).