Quantum field theories, Markov random fields and machine learning

Dimitrios Bachtis¹ Gert Aarts^2,3 and Biagio Lucini^1,4 ¹Department of Mathematics, Swansea University, Bay Campus, SA1 8EN, Swansea, Wales, United Kingdom ²Department of Physics, Swansea University, Singleton Campus, SA2 8PP, Swansea, Wales, United Kingdom ³European Centre for Theoretical Studies in Nuclear Physics and Related Areas (ECT*) & Fondazione Bruno Kessler Strada delle Tabarelle 286, 38123 Villazzano (TN), Italy ⁴Swansea Academy of Advanced Computing, Swansea University, Bay Campus, SA1 8EN, Swansea, Wales, United Kingdom [email protected], [email protected], [email protected]

Abstract

The transition to Euclidean space and the discretization of quantum field theories on spatial or space-time lattices opens up the opportunity to investigate probabilistic machine learning within quantum field theory. Here, we will discuss how discretized Euclidean field theories, such as the $\phi^{4}$ lattice field theory on a square lattice, are mathematically equivalent to Markov fields, a notable class of probabilistic graphical models with applications in a variety of research areas, including machine learning. The results are established based on the Hammersley-Clifford theorem. We will then derive neural networks from quantum field theories and discuss applications pertinent to the minimization of the Kullback-Leibler divergence for the probability distribution of the $\phi^{4}$ machine learning algorithms and other probability distributions.

1 Introduction

To construct a probability distribution in a high-dimensional space one can turn to the framework of probabilistic graphical models. Probabilistic graphical models comprise a set of random variables, positioned within a graph-based representation, that satisfy certain factorization as well as conditional dependence and independence properties. One notable case of probabilistic graphical models is a Markov field, in which the random variables are connected through undirected edges and satisfy a significant condition of locality, called the Markov property. Markov properties emerge as important mathematical conditions across distinct research fields, such as in machine learning [1] or in constructive quantum field theory [2].

In this contribution, we discuss the proof of Markov properties for discretized Euclidean field theories [3]. Specifically, we demonstrate, through the Hammersley-Clifford theorem, that the $\phi^{4}$ scalar field theory on a square lattice satisfies the local Markov property, and is therefore mathematically equivalent to a Markov field. Based on this equivalence, we introduce algorithms which generalize a notable class of neural networks, specifically restricted Boltzmann machines. Finally, we present applications pertinent to the minimization of the Kullback-Leibler divergence for the probability distribution of the $\phi^{4}$ machine learning algorithms and other probability distributions.

2 The $\phi^{4}$ Markov field

We denote as $\Lambda$ a finite set which is equivalently expressed as a graph $\mathcal{G}=(\Lambda,e)$ , where the points of $\Lambda$ correspond to the vertices of $\mathcal{G}$ and $e$ denotes the edges of the graph. Two vertices $i,j\in\Lambda$ which are connected by an edge are neighbours. A clique is a set of neighbours, which is called maximal if no additional vertex can be included that is simultaneously a neighbour with all the vertices present in the clique, see Fig. 1. We now assign to each vertex $i$ in the graph $\mathcal{G}$ a continuous-valued random variable, which is denoted as $\phi_{i}$ .

Refer to caption — Figure 1: A bipartite graph (a) and a square lattice (b). Examples of maximal cliques are $\{\phi_{1},h_{1}\}$ and $\{\phi_{3},\phi_{4}\}$ , respectively.

A Markov random field is defined as a set of random variables on a graph $\mathcal{G}=(\Lambda,e)$ whose associated probability distribution $p(\phi)$ satisfies the local Markov property:

p(\phi_{i}|(\phi_{j})_{j\in\Lambda-i})=p(\phi_{i}|(\phi_{j})_{j\in\mathcal{N}_{i}}),

(1)

where $\mathcal{N}_{i}$ is the set of neighbours of a given point $i$ . The local Markov property can be proven for a probability distribution that is encoded in a graph through the Hammersley-Clifford theorem:

Theorem 1 (Hammersley-Clifford)

A probability distribution $p$ , satisfying the condition of positivity, is associated with the events generated by a Markov network, iff $p$ can be factorized as a product of positive factors, or potential functions $\psi_{c}$ , over the cliques of the associated graph structure $\mathcal{G}$ :

p(\phi)=\frac{1}{Z}\prod_{c\in C}\psi_{c}(\phi),

(2)

where $Z=\int_{\bm{\phi}}\prod_{c\in C}\psi_{c}(\bm{\phi})d\bm{\phi}$ is a normalization constant, $c\in C$ is a maximal clique, and $\bm{\phi}$ denotes all configurations of the system.

The Euclidean action of the two-dimensional $\phi^{4}$ scalar field theory is

S_{E}=-\kappa_{L}\sum_{\langle ij\rangle}\phi_{i}\phi_{j}+\frac{(\mu_{L}^{2}+4\kappa_{L})}{2}\sum_{i}\phi_{i}^{2}+\frac{\lambda_{L}}{4}\sum_{i}\phi_{i}^{4},

(3)

where $\kappa_{L},\mu_{L}^{2},\lambda_{L}$ are dimensionless parameters. We redefine $w=\kappa_{L}$ , $a=(\mu_{L}^{2}+4\kappa_{L})/2$ , $b=\lambda_{L}/4$ for simplicity, and consider them as inhomogeneous:

S(\phi;\theta)=-\sum_{\langle ij\rangle}w_{ij}\phi_{i}\phi_{j}+\sum_{i}a_{i}\phi_{i}^{2}+\sum_{i}b_{i}\phi_{i}^{4}.

(4)

The $\phi^{4}$ inhomogeneous action is described by the coupling constants $\theta=\{w_{ij},a_{i},b_{i}\}$ , and the Boltzmann probability distribution:

p(\phi;\theta)=\frac{\exp\big{[}-S(\phi;\theta)\big{]}}{\int_{\bm{\phi}}{\exp[-S(\bm{\phi},\theta)]}d\bm{\phi}}.

(5)

The lattice version of the $\phi^{4}$ theory is, by definition, expressed as a graph. To verify that the $\phi^{4}$ theory satisfies Markov properties we define the following potential function $\psi_{c}$ that is able to factorize the probability distribution in terms of maximal cliques $c\in C$

\psi_{c}=\exp\bigg{[}-w_{ij}\phi_{i}\phi_{j}+\frac{1}{4}(a_{i}\phi_{i}^{2}+a_{j}\phi_{j}^{2}+b_{i}\phi_{i}^{4}+b_{j}\phi_{j}^{4})\bigg{]},

(6)

where $i,j$ are nearest neighbours.

3 Machine learning with $\phi^{4}$ Markov random fields

3.1 Learning without predefined data

To compare the probability distribution $p(\phi;\theta)$ of the Markov field with another probability distribution $q(\phi)$ , we define the Kullback-Leibler divergence, a nonnegative quantity, as

KL(p||q)=\int_{-\infty}^{\infty}{p(\bm{\phi};\theta)}\ln\frac{p(\bm{\phi};\theta)}{q(\bm{\phi})}d\bm{\phi}\geq 0.

(7)

We now consider a target Boltzmann probability distribution $q(\phi)=\exp[-\mathcal{A}]/Z_{\mathcal{A}}$ that describes an arbitrary statistical system and substitute the two probability distributions in the Kullback-Leibler divergence to obtain:

F_{\mathcal{A}}\leq\langle\mathcal{A}-S\rangle_{p(\phi;\theta)}+F\equiv\mathcal{F},

(8)

where $\mathcal{F}$ is the variational free energy, $F_{\mathcal{A}}=-\ln Z_{\mathcal{A}}$ , and $\langle O\rangle_{p(\phi;\theta)}$ denotes the expectation value of an observable $O$ under the probability distribution $p(\phi;\theta)$ . By minimizing this quantity the two probability distributions $p(\phi;\theta)$ and $q(\phi)$ will become equal and we can therefore use the distribution of the $\phi^{4}$ theory to draw samples from the target probability distribution $q(\phi)$ .

Based on a gradient-based approach, we are able to minimize Eq. 8 via

\frac{\partial\mathcal{F}}{\partial\theta_{i}}=\langle\mathcal{A}\rangle\Big{\langle}\frac{\partial S}{\partial\theta_{i}}\Big{\rangle}-\Big{\langle}\mathcal{A}\frac{\partial S}{\partial\theta_{i}}\Big{\rangle}+\Big{\langle}S\frac{\partial S}{\partial\theta_{i}}\Big{\rangle}-\langle S\rangle\Big{\langle}\frac{\partial S}{\partial\theta_{i}}\Big{\rangle},

(9)

and the coupling constants $\theta$ are updated at each epoch $t$ as:

\theta^{(t+1)}=\theta^{(t)}-\eta*\mathcal{L},

(10)

where $\eta$ is the learning rate and $\mathcal{L}=\partial\mathcal{F}/\partial\theta^{(t)}$ .

We now consider a variation of the $\phi^{4}$ theory which includes next-nearest neighbor interactions $nnn$ , and with a complex action $\mathcal{A}$ defined as

\mathcal{A}=\sum_{k=1}^{5}g_{k}\mathcal{A}^{(k)}=g_{1}\sum_{\langle ij\rangle_{nn}}\phi_{i}\phi_{j}+g_{2}\sum_{i}\phi_{i}^{2}+g_{3}\sum_{i}\phi_{i}^{4}+g_{4}\sum_{\langle ij\rangle_{nnn}}\phi_{i}\phi_{j}+ig_{5}\sum_{i}\phi_{i}^{2},

(11)

where $i$ denotes the imaginary unit. The coupling constants can have arbitrary values but for this example we consider $g_{1}=g_{4}=-1$ , $g_{2}=1.52425$ , $g_{3}=0.175$ and $g_{5}=0.15$ , see Ref. [3].

We now utilize the $\phi^{4}$ Markov field of action $S$ , to approximate a probability distribution which is described by an action $\mathcal{A}_{\{4\}}=\sum_{k=1}^{4}g_{k}\mathcal{A}^{(k)}$ . To investigate how accurate the equivalence between the two probability distributions is, we will implement reweighting based on the probability distribution of the $\phi^{4}$ Markov field to calculate expectation values of the full complex action $\mathcal{A}$ . The reweighting relation is given by

\langle O\rangle=\frac{\sum_{l=1}^{N}O_{{l}}\exp[S_{{l}}-g_{j}^{\prime}\mathcal{A}_{{l}}^{(j)}-\sum_{k=1,k\neq j}^{5}g_{k}\mathcal{A}^{(k)}_{{l}}]}{\sum_{l=1}^{N}\exp[S_{{l}}-g_{j}^{\prime}\mathcal{A}_{{l}}^{(j)}-\sum_{k=1,k\neq j}^{5}g_{k}\mathcal{A}_{{l}}^{(k)}]},

(12)

details of which can be found in Refs. [3, 4, 5, 6].

We consider $j=4$ and calculate the expectation values of the action $\mathcal{A}$ and the magnetization $m$ by extrapolating with reweighting in the range $g_{4}^{\prime}\in[-1.15,-0.85]$ . The results, shown in Fig. 2, overlap within statistical uncertainty with independent calculations, therefore verifying that observables which would correspond to probability distribution of the full action $\mathcal{A}$ can be accurately calculated based on reweighting from the inhomogeneous action $S$ .

3.2 Learning with predefined data

A different class of machine learning applications considers the case where the form of the target probability distribution is unknown but there exists a set of available data in which an empirical probability distribution is encoded. To explore this type of applications we consider the following expression of the Kullback-Leibler divergence:

KL(q||p)=\int_{-\infty}^{\infty}{q(\bm{\phi})}\ln\frac{q(\bm{\phi})}{p(\bm{\phi};\theta)}d\bm{\phi}\geq 0.

(13)

By substituting and taking the derivative in terms of the variational parameters $\theta$ we obtain:

\frac{\partial\ln p(\phi;\theta)}{\partial\theta}=\Big{\langle}\frac{\partial S}{\partial\theta}\Big{\rangle}_{p(\phi;\theta)}-\frac{\partial S}{\partial\theta},

(14)

and the update rule of the parameters $\theta$ at each epoch $t$ is given based on Eq. 10, where $\mathcal{L}=-\partial\ln p(\phi;\theta^{(t)})/{\partial\theta^{(t)}}$ .

As an example of a distribution to be learned by the machine learning algorithm, we consider the simple case of a Gaussian distribution with $\mu=-0.5$ and $\sigma=0.05$ . Since the lattice action is invariant under the $Z_{2}$ symmetry we expect that the symmetric values of the dataset are equiprobable in being reproduced. This invariance can be removed via the inclusion of a term $\sum_{i}r_{i}\phi_{i}$ which breaks the symmetry of the action explicitly. The results are shown in Fig. 3 (left) where the anticipated behaviour is observed.

Finally, we illustrate the approach in an image from the CIFAR-10 dataset. The thermalization of the trained Markov field is depicted in Fig. 3 (right), where the image emerges within the equilibrium probability distribution. Since the machine learning algorithm learns the correct values of coupling constants in the action that solve the considered problem, extensions towards learning the appropriate coupling constants which describe renormalized systems can potentially be explored [7].

3.3 Machine learning with $\phi^{4}$ neural networks

To increase the expressivity of the machine learning algorithm, we will introduce a new set of latent or hidden variables $h_{j}$ within the graph structure. In addition, we will restrict the interactions to be exclusively between the visible $\phi_{i}$ and the hidden $h_{j}$ variables, giving rise to the lattice action

S(\phi,h;\theta)=-\sum_{i,j}w_{ij}\phi_{i}h_{j}+\sum_{i}r_{i}\phi_{i}+\sum_{i}a_{i}\phi_{i}^{2}+\sum_{i}b_{i}\phi_{i}^{4}+\sum_{j}s_{j}h_{j}+\sum_{j}m_{j}h_{j}^{2}+\sum_{j}n_{j}h_{j}^{4}.

(15)

The $\phi^{4}$ neural network implemented above is a generalization of other neural network architectures: if $b_{i}=n_{j}=0$ the $\phi^{4}$ neural network reduces to a Gaussian-Gaussian restricted Boltzmann machine and if $b_{i}=n_{j}=m_{j}=0$ and $h_{j}\in\{-1,1\}$ then one obtains a Gaussian-Bernoulli restricted Boltzmann machine, see Refs [8, 9]. Restricted Boltzmann machines were inspired by Ising models and since the $\phi^{4}$ theory can become equivalent to an Ising model [10], novel physical insights into how these machine learning algorithms work might emerge from the perspective of quantum field theory. We now train the $\phi^{4}$ neural network on the Olivetti faces dataset. Some representative features, which resemble face structures, are shown in Fig. 4.

4 Conclusions

In this contribution we have shown that discretized Euclidean field theories are Markov fields, hence it is now possible to investigate machine learning directly within quantum field theory [3]. Based on the Hammersley-Clifford theorem, we demonstrated that the $\phi^{4}$ theory, formulated on a square lattice, satisfies the local Markov property. In addition we have derived $\phi^{4}$ neural networks which generalize restricted Boltzmann machines (see also Refs. [11, 12]). Finally, we have presented applications pertinent to physics and computer science, opening up the opportunity for nonperturbative investigations of machine learning within lattice field theory.

\ack

The authors received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 813942. The work of GA and BL has been supported in part by the UKRI Science and Technology Facilities Council (STFC) Consolidated Grant ST/T000813/1. The work of BL is further supported in part by the Royal Society Wolfson Research Merit Award WM170010 and by the Leverhulme Foundation Research Fellowship RF-2020-461\9. Numerical simulations have been performed on the Swansea SUNBIRD system. This system is part of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund (ERDF) via Welsh Government.

References

[1] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques (Cambridge, MA: The MIT Press).
[2] E. Nelson, Construction of quantum fields from Markoff fields, Journal of Functional Analysis 12, 97 (1973).
[3] D. Bachtis, G. Aarts and B. Lucini, Quantum field-theoretic machine learning, Phys. Rev. D 103, 074510 (2021).
[4] D. Bachtis, G. Aarts and B. Lucini, Adding machine learning within Hamiltonians: Renormalization group transformations, symmetry breaking and restoration, Phys. Rev. Research 3, 013134 (2021).
[5] D. Bachtis, G. Aarts and B. Lucini, Mapping distinct phase transitions to a neural network, Phys. Rev. E 102, 053306 (2020).
[6] D. Bachtis, G. Aarts and B. Lucini, Extending machine learning classification capabilities with histogram reweighting, Phys. Rev. E 102, 033303 (2020).
[7] D. Bachtis, G. Aarts, F. Di Renzo and B. Lucini, Inverse renormalization group in quantum field theory, arXiv:2107.00466 [hep-lat] (2021).
[8] G. Hinton, Training products of experts by minimizing contrastive divergence, Neural Computation, 14, 8 (2002).
[9] A. Fischer and C. Igel, Training restricted Boltzmann machines: An introduction, Pattern Recognition 47, 25 (2014).
[10] A. Milchev, D. W. Heermann and K. Binder, Finite-size scaling analysis of the $\phi^{4}$ field theory on the square lattice, J. Stat. Phys. 44, 749 (1986).
[11] K. Hashimoto, S. Sugishita, A. Tanaka and A. Tomiya, Deep learning and the AdS/CFT correspondence, Phys. Rev. D 98, 046019 (2018).
[12] K. Hashimoto, AdS/CFT as a deep Boltzmann machine, Phys. Rev. D 99, 106017 (2019).

Quantum field theories, Markov random fields and machine learning

Abstract

1 Introduction

2 The ϕ4\phi^{4} Markov field

Theorem 1 (Hammersley-Clifford)

3 Machine learning with ϕ4\phi^{4} Markov random fields

3.1 Learning without predefined data

3.2 Learning with predefined data

3.3 Machine learning with ϕ4\phi^{4} neural networks

4 Conclusions

References

References

2 The $\phi^{4}$ Markov field

3 Machine learning with $\phi^{4}$ Markov random fields

3.3 Machine learning with $\phi^{4}$ neural networks