This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Interacting Dreaming Neural Networks

Pietro Zanin and Nestor Caticha Instituto de Fisica, Universidade de Sao Paulo
Sao Paulo, SP, Brasil
[email protected], [email protected]
Abstract

We study the interaction of agents, where each one consists of an associative memory neural network trained with the same memory patterns and possibly different reinforcement-unlearning dreaming periods. Using replica methods, we obtain the rich equilibrium phase diagram of the coupled agents. It shows phases such as the student-professor phase, where only one network benefits from the interaction while the other is unaffected; a mutualism phase, where both benefit; an indifferent phase and an insufficient phase, where neither are benefited nor impaired; a phase of amensalism where one is unchanged and the other is damaged. In addition to the paramagnetic and spin glass phases, there is also one we call the reinforced delusion phase, where agents concur without having finite overlaps with memory patterns. For zero coupling constant, the model becomes the reinforcement and removal dreaming model, which without dreaming is the Hopfield model. For finite coupling and a single memory pattern, it becomes a Mattis version of the Ashkin-Teller model.

Keywords: Neural Networks, Associative Memories, Learning algorithms

The authors declare that they have no conflict of interest.

Partial Funding from: CNPq National Council for Scientific and Technological Development and Fapesp, the São Paulo Research Foundation.

’This is the Accepted Manuscript version of an article accepted for publication in Journal of Statistical Mechanics: Theory and Experiment: J. Stat. Mech. (2023) 043401. Neither SISSA Medialab Srl nor IOP Publishing Ltd is responsible for any errors or omissions in this version of the manuscript or any version derived from it. The Version of Record is available online at : http://dx.doi.org/10.1088/1742-5468/acc72b

1 Introduction

Neural networks are supposed to implement meaningful information processing based on data. Meaningful, of course, has to be defined. We are interested in the behavior of interacting neural networks, where the data available to each network is partially provided by other neural networks. Shared meaning emerges from this interaction. We obtain analytical results, for a system of two interacting agents, each one modeled by an associative memory. Each agent receives the same memory patterns from the environment, however they may undergo different levels of post-learning dynamics, in the form of dreaming, i.e. unlearning or reinforcing by relearning. Hopfield, Feinstein and Palmer [11], following ideas from Crick and Mitchinson [6], introduced a learning algorithm based on unlearning that improved significantly the performance of the Hopfield [16, 12] network with pure Hebbian learning. This anti-Hebbian mechanism was inspired by the suggestion [6] that rapid eye moving (REM) type dreaming, permits mammalians to forget spurious memories, increasing their capacity to retrieve relevant memories. Further analysis and simulations by van Hemmen et al. [10] and [20], showed that iterating this algorithm initially increases the network’s capacity. However, if iterated beyond a certain threshold, retrieval is destroyed. It was shown that this increment occurs in part due to the fact that the synaptic matrix approaches the pseudo-inverse synaptic matrix [14], despite not converging to it. Dotsenko [7] used a similar algorithm to justify a new symmetric synaptic matrix, that allowed reaching the theoretical upper bound in the thermodynamic limit: number of patterns pp, equal to the number of units NN, but only at T=0T=0.

Fachechi et al. [8] showed analytically that beneficial unlearning plus the enhancement of the pure states, which they called reinforcement, can increase the stability of the patterns, increasing the critical capacity. Their model has a much larger retrieval phase than the Hopfield model, not only saturating the maximal capacity at T=0T=0 but also increasing the stability of the model to temperature changes. Most importantly, there is no catastrophic upper bound on the length of the dreaming process. This algorithm resembles developments by Plakhov, Semenov and Shuvalova [19] [18], but results in a more interesting and varied synaptic matrix. Also relevant for this discussion are [5],[3],[9], where they manage to connect effectively learning in Hopfield models with learning in Boltzmann machines.

Dreaming, which is a process internal to the neural network, represents an extra study period of the available information by the agent. We allow agents to interact through a coupling that extends this internal dreaming to an externally driven process, that also acts on the minima of the Hamiltonian. We show that this process has the potential to make the networks perform better or worse, depending on the conditions of the system.

2 Model of interacting neural networks

2.1 Introduction of the model

We work on a statistical mechanics problem in the canonical formalism with temperature T=β1T=\beta^{-1}. The components of the quenched memory patterns 𝝃={ξiμ}μ=1p,i=1N\bm{\xi}=\{\xi_{i}^{\mu}\}_{\mu=1\cdots p,i=1\cdots N}, are drawn independently from a uniform Bernoulli distribution. The Hamiltonian of an isolated agent is [8]

1(t,𝝃,𝝈)\displaystyle\mathcal{H}_{1}(t,\bm{\xi},\bm{\sigma}) =\displaystyle= 12Ni,j=1Nμ,ν=1pξiμξjν(1+t𝟏+t𝑪)μνσiσj,\displaystyle-\frac{1}{2N}\sum_{i,j=1}^{N}\sum_{\mu,\nu=1}^{p}\xi_{i}^{\mu}\xi_{j}^{\nu}(\frac{1+t}{\bm{1}+t\bm{C}})_{\mu\nu}\sigma_{i}\sigma_{j}, (1)

where t0t\geq 0 quantifies the length of the dreaming process. 𝑪\bm{C} is the p×pp\times p matrix with elements Cμν=1Ni=1NξiμξiνC_{\mu\nu}=\frac{1}{N}\sum_{i=1}^{N}\xi_{i}^{\mu}\xi_{i}^{\nu}, σi=±1\sigma_{i}=\pm 1 is an Ising variable. For t=0t=0 the Hopfield Hebbian Hamiltonian is recovered and as tt\rightarrow\infty, t1t^{-1} acts as a Tikhonov regularizer, leading to the pseudo-inverse model [17][14]. The two member group Hamiltonian is

(t1,t2,𝝈,𝑺,ϵ)\displaystyle\mathcal{H}(t_{1},t_{2},\bm{\sigma},\bm{S},\epsilon) =\displaystyle= 1(t1,𝝃,𝝈)+1(t2,𝝃,𝑺)+ϵN(iσiSi)2,\displaystyle\mathcal{H}_{1}(t_{1},\bm{\xi},\bm{\sigma})+\mathcal{H}_{1}(t_{2},\bm{\xi},\bm{S})+\frac{\epsilon}{N}(\sum_{i}\sigma_{i}S_{i})^{2},

ϵ\epsilon is the coupling constant of the interaction, the sums over i,ji,j run from 11 to NN, the sums over μ,ν\mu,\nu from 11 to pp. We study this problem in equilibrium or offline with quenched disorder. A related problem, the inverse problem of learning online a spin glass Hamiltonian, was studied in [15].

The first two terms of the Hamiltonian in expression 2.1 represent two distinct neural network agents, the third term connects them.

This interaction is similar to dreaming since it changes the couplings from Jij(1)Jij(1)+2ϵNSiSjJ^{(1)}_{ij}\rightarrow J_{ij}^{(1)}+\frac{2\epsilon}{N}S_{i}S_{j} and Jij(2)Jij(2)+2ϵNσiσjJ^{(2)}_{ij}\rightarrow J_{ij}^{(2)}+\frac{2\epsilon}{N}\sigma_{i}\sigma_{j}, i.e. they pick up a Hebbian contribution from the other agent. However, we emphasize the fact that they are different, as in the interaction all states are included, and not only the stables ones, as it is done in the unlearning algorithm.

For ϵ=0\epsilon=0, this is the model of reinforcement and removal model [8]; and with ϵ0\epsilon\neq 0 and p=1p=1, the infinite range Askhin-Teller model[13].

Without any loss, we take t1t2t_{1}\leq t_{2} and refer to the neural network agents as NN1 and NN2 respectively.

2.2 Order parameters

The free-energy is obtained using the replica method, where the Hamiltonian is a sum of the individual replica Hamiltonians. The calculation follows roughly those for the single agent case by Fachechi et al. [8], except for a few important steps where it is necessary to be careful. The expression of the free energy and the self-consistent equations for the order parameters appear in [1]. Since we are interested in the memory retrieval phases, we have not looked in detail into the spin-glass phase, hence, in order to understand global features of the five dimensional phase diagram (t1,t2,ϵ,α,βt_{1},t_{2},\epsilon,\alpha,\beta), the important order parameters for replica aa are ha=1Ni=1NσiaSiah^{a}=\langle\langle\frac{1}{N}\sum_{i=1}^{N}\sigma_{i}^{a}S_{i}^{a}\rangle\rangle, m1a=1Ni=1Nξi1σiam_{1}^{a}=\langle\langle\frac{1}{N}\sum_{i=1}^{N}\xi^{1}_{i}\sigma_{i}^{a}\rangle\rangle and m2a=1Ni=1Nξi1Siam_{2}^{a}=\langle\langle\frac{1}{N}\sum_{i=1}^{N}\xi_{i}^{1}S_{i}^{a}\rangle\rangle, where the double brackets represent the thermal and quenched average over the patterns. We will also consider the changes induced by the interaction, i.e. the difference between the order parameters for ϵ>0\epsilon>0 and ϵ=0\epsilon=0, for a more careful identification of the different phases. Other order parameters of the form Δρ\Delta^{\rho}, rρr^{\rho}, QρQ^{\rho} and qρq^{\rho} are related to the overlap between the ρ\rho elements, where ρ\rho is an index that refers to either agent σ\sigma, or SS or to their overlap σS\sigma S.

We assume the following replica symmetry ansatz:

m1a=m1;m2a=m2;ha=ha,\displaystyle m_{1}^{a}=m_{1};\,\,\,m_{2}^{a}=m_{2};\,\,\,h^{a}=h\,\,\,\,\forall a, (2)
qabρ=Qρδab+qρ(1δab)a,b,ρ\displaystyle q_{ab}^{\rho}=Q^{\rho}\delta_{ab}+q^{\rho}(1-\delta_{ab})\,\,\,\,\forall a,b,\rho
rabρ=Rρδab+rρ(1δab)a,b,ρ.\displaystyle r_{ab}^{\rho}=R^{\rho}\delta_{ab}+r^{\rho}(1-\delta_{ab})\,\,\,\,\forall a,b,\rho.

where

qabρ=1Ni(σia,ρ+itβ(1+t)ϕa,ρ)(σib,ρ+itβ(1+t)ϕb,ρ),\displaystyle q_{ab}^{\rho}=\langle\langle\frac{1}{N}\sum_{i}(\sigma_{i}^{a,\rho}+i\sqrt{\frac{t}{\beta(1+t)}}\phi^{a,\rho})(\sigma_{i}^{b,\rho}+i\sqrt{\frac{t}{\beta(1+t)}}\phi^{b,\rho})\rangle\rangle, (3)

and rabr_{ab} is the auxiliary variable of qabq_{ab}, so it has a similar physical meaning.

The first line of (2) is not hard to justify, since there are no reasons to expect that any replica is privileged.

The Ansatz related to qabρq_{ab}^{\rho} and rabρr_{ab}^{\rho} for ρ=σS\rho=\sigma S deserves a comment. It represents the idea that the correlations between the agents will be different for the same replica than for different replicas, because the interaction is not between different replicas, but within the same, so we expect a greater intrareplica similarity of the agents than the interreplica similarity.

A homogeneous Ansatz (qabσS=qσS,rabσS=rσSa,bq^{\sigma S}_{ab}=q^{\sigma S},r^{\sigma S}_{ab}=r^{\sigma S}\forall a,b), leads to the puzzling result that the free energy does not depend on qσSq^{\sigma S}, which indicates that this is not the right Ansatz. This can be seen from the fact that the dependence of QσSQ^{\sigma S} and qσSq^{\sigma S} on the free energy occurs via the differences QσSqσSQ^{\sigma S}-q^{\sigma S} and RσSrσSR^{\sigma S}-r^{\sigma S}, so using a homogeneous Ansatz would lead to no dependence on qσSq^{\sigma S}.

It is important to recognize that qabq_{ab} does not have the exact same meaning as the original Edwards-Anderson parameter. Its additional complex part inside the average leads to different values, and in particular, qσSq^{\sigma S} is different from hh for t1,t20t_{1},t_{2}\neq 0, despite having some similarity. It also explains why already in the original article [8] the usual choice qaa=1q_{aa}=1 was not used. Still, this parameter gives us a measure of the correlations in the system, and can still be used to determine the existence of the spin-glass state.

We emphasize that despite using the same Ansatz for different ρ\rho’s, the justification behind them is distinct. The replica symmetric free energy depends on 9 independent order parameters (m1,m2m_{1},m_{2}, hh, QσQ^{\sigma}, QSQ^{S}, QσSQ^{\sigma S}, qσq^{\sigma}, qSq^{S}, qσSq^{\sigma S}). We also have 6 dependent variables (rσr^{\sigma}, rSr^{S}, rσSr^{\sigma S}, Δσ\Delta^{\sigma}, ΔS\Delta^{S}, ΔσS\Delta^{\sigma S}), which can be fully eliminated from the equations of state.

Table 1 shows the different phases, obtained from the possible regimes of m1,m2,hm_{1},m_{2},h and, in addition, a measure of the benefit or damage due to the interaction, given by

Δm1=m1(ϵ,t1,t2,α,β)m1(0,t1,t2,α,β),\displaystyle\Delta m_{1}=m_{1}(\epsilon,t_{1},t_{2},\alpha,\beta)-m_{1}(0,t_{1},t_{2},\alpha,\beta), (4)
Δm2=m2(ϵ,t1,t2,α,β)m2(0,t1,t2,α,β),\displaystyle\Delta m_{2}=m_{2}(\epsilon,t_{1},t_{2},\alpha,\beta)-m_{2}(0,t_{1},t_{2},\alpha,\beta),
Δh=h(ϵ,t1,t2,α,β)h(0,t1,t2,α,β)\displaystyle\Delta h=h(\epsilon,t_{1},t_{2},\alpha,\beta)-h(0,t_{1},t_{2},\alpha,\beta)
Phases OP m1m_{1} m2m_{2} hh Δm1\Delta m_{1} Δm2\Delta m_{2} Δh\Delta h
Student-professor >0>0 >0>0 >0>0 >0>0 0\approx 0 >0>0
Mutualism >0>0 >0>0 >0>0 >0>0 >0>0 >0>0
Disordered 0\approx 0 0\approx 0 0\approx 0 0\approx 0 0\approx 0 0\approx 0
Reinforced delusion 0\approx 0 0\approx 0 >0>0 0\approx 0 0\approx 0 >0>0
Insufficient 0\approx 0 >> 0 0\approx 0 0\approx 0 0\approx 0 0\approx 0
Indifferent >0>0 >0>0 >0>0 0\approx 0 0\approx 0 0\approx 0
Amensalism 0\approx 0 0\approx 0 0\approx 0 0\approx 0 <0<0 0\approx 0
Table 1: The different phases are characterized by the values of the order parameters (OP), we are considering that t1t2t_{1}\leq t_{2} and that m1,m2,h0m_{1},m_{2},h\geq 0.

Before entering into details in section 2.3, we give a rough description of the types of phases where the system may be found. The insufficient phase occurs when NN2 is in the retrieval phase and NN1 is not and ϵ\epsilon is small enough such that no relevant changes in the learning occurs. As the interaction increases, it can transition to the student-professor phase, where the performance of only NN1 is enhanced significantly. Further increase of ϵ\epsilon can result in the mutualism phase, where both neural networks have noticeable increases in their capacity. The disordered phase includes the paramagnetic and spin glass phases, where the interaction cannot lead to interesting results. It is not hard to detect the spin glass phase, since for h=0h=0, the role of ϵ\epsilon vanishes, hence the line of the spin glass phase is simply the same obtained in [2].

The indifferent phase occurs where, even without interaction, both neural networks could already process information in an adequate manner. The reinforced delusion phase indicates the creation of an alternate ordered state, where the agents, despite having no overlap with the memory patterns, are similar to each other. From their own perspective, the agents are in a retrieval ferromagnetic phase. For a third party, the individual agents cannot process information, and seem to be in a disordered phase. The interaction still preserves the glassy state, so it is still a spin glass phase. The last phase is the amensalism, where while NN1 is not significantly affected by the interaction, NN2 ends up losing the understanding of the situation, replicating an interaction of amensalism.

2.3 Phase diagrams and discussion

We analyze numerically the self-consistent equations obtained from the extreme conditions of the free energy. Three types of 2-dimensional cuts are used to describe the five dimensional parameter space, yielding phase diagrams in the: (i) α,Tc\alpha,T_{c}, (ii) ϵ,αc\epsilon,\alpha_{c} and (iii) ϵ,Tc\epsilon,T_{c} planes, shown respectively in panels 1, 2 and 3. In all diagrams, since t2t1t_{2}\geq t_{1}, we have m2m1m_{2}\geq m_{1}.

2.3.1 TT-αc\alpha_{c} plane

For fixed t1,t2t_{1},t_{2} and ϵ\epsilon, TT is varied to find the maximum capacity such that m10m_{1}\neq 0 or m20m_{2}\neq 0. We compare them to the non-interacting neural networks. The main results can be seen in panel 1.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 1: The metastable phase border critical capacities αc,i(T;t1,t2,ϵ)\alpha_{c,i}(T;t_{1},t_{2},\epsilon). A) Symmetrical case with t1=t2=0t_{1}=t_{2}=0 and ϵ=0,0.1,0.4\epsilon=0,0.1,0.4. Ferromagnetic (FM) phase increases with the interaction and no dreaming. Note that for high ϵ\epsilon, at α=0\alpha=0, the FM phase persists up to Tc>1.T_{c}>1. B) t1=0=t2=0t_{1}=0=t_{2}=0 and ϵ=0,0.1,0.4\epsilon=0,0.1,0.4. For low temperatures, high interactions lead to a NN2’s critical capacity decrease, while it is always beneficial for the critical capacity of NN1. C) t1=2t_{1}=2, t2=10t_{2}=10 and ϵ=0,0.1,0.4\epsilon=0,0.1,0.4. A similar to (B), but with t1t_{1} higher and closer to t2t_{2}, even for small ϵ\epsilon the two agents have the same behavior. Loss of capacity for the NN2 is much smaller here, since NN1 is not harmful. D) t1=t2=10t_{1}=t_{2}=10 for ϵ=0,0.1,0.4\epsilon=0,0.1,0.4. Interaction between well-trained peers is helpful.

The behavior can be understood by comparing the difference between the properties of an interacting agent with t1t_{1} or t2t_{2}, with the properties of the non-interacting agents with the same amount of dreaming.

Diagram 1.B shows the no dreaming interacting agents, t1=t2=0t_{1}=t_{2}=0. The interaction yields a higher critical capacity and both agents benefit equally from the interaction. Diagram 1.A represents the typical situation of having a high value of the dreaming disparity t1=0,t2=10t_{1}=0,t_{2}=10. Agent 1 always benefits, since its capacity is always larger than without interaction. For small interaction, ϵ=0.1\epsilon=0.1, agent 2 slightly benefits for low α\alpha but is damaged by the interaction at higher values of α\alpha. For high values of the interaction, say ϵ=0.4\epsilon=0.4, they have the same value of critical capacity for all temperatures. Agent 2 benefits considerably from the interaction at low α\alpha, but is again damaged at higher loads, because it is learning from the spurious minima of the low dreaming agent 1, which is able to benefit only up to a certain point by the interaction.

Diagram 1.C represents the typical situation of having an intermediate and high dreaming load, t1=2,t2=10t_{1}=2,t_{2}=10. Above a certain ϵ<0.1\epsilon<0.1, αc,1=αc,2\alpha_{c,1}=\alpha_{c,2} and in general the interaction is substantially beneficial for both neural networks, except at very low temperatures T0.05T\lesssim 0.05, where there is a slight decrease in the pattern retrieval critical capacity of agent 2.

Diagram 1.D presents the case where both agents have dreamt abundantly and equally t1=t2=10t_{1}=t_{2}=10. Again m1=m2m_{1}=m_{2} by symmetry, and the interaction increases significantly the phase of the pattern retrieval, and differently from the previous situation it is always beneficial. As in diagram 1.A, interacting with a peer, brings an advantage to both agents, independently of dream load.

2.3.2 ϵ\epsilon-αc\alpha_{c} plane

For t1,t2t_{1},t_{2} and TT fixed, we varied ϵ\epsilon and α\alpha to obtain the value of αc,i(ϵ)\alpha_{c,i}(\epsilon) such that for all α>αc,i(ϵ)\alpha>\alpha_{c,i}(\epsilon) we have mi=0m_{i}=0. In particular, we have significant differences at low temperatures (T0.3)(T\lesssim 0.3) and high temperatures (0.3T0.3\lesssim T). The main results can be seen in panel 2.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: Critical capacities αc,i(ϵ;t1,t2,T)\alpha_{c,i}(\epsilon;t_{1},t_{2},T) . A) β=0.95\beta=0.95 and t1=t2=10t_{1}=t_{2}=10, there is only one critical line since t1=t2t_{1}=t_{2}; we see a typical situation of mutualism, where the two agents manage to obtain non-zero magnetization with strong interaction. B) β=2\beta=2, t1=0t_{1}=0, t2=10t_{2}=10, typical student-professor situation for small interaction, the critical capacity of NN2 is constant up to ϵ0.12\epsilon\approx 0.12 and then turns to mutualism. C) β=5\beta=5, t1=0t_{1}=0 and t2=10t_{2}=10, the student-professor phase turns to amensalism, with NN2 harmed by the interaction. D)β=5\beta=5, t1=2t_{1}=2 and t2=4t_{2}=4. A student professor situation at low ϵ\epsilon turns to mutualism for larger ϵ\epsilon.

The first 2 diagrams 2.A and 2.B represent the high temperature region. While the second diagram shows the mutual benefits of the interaction for networks with high difference of tt, the first shows that with enough interaction, it is possible to extend the capacity to regions where it was impossible before the interaction. This retrieval above T=1T=1 occurs because the coupled system acts as a system with a higher number of neurons. An interesting detail shown in diagram 2.B is that for low values of the interaction, only agent 1 has an increase in the capacity, while agent 2 is unchanged. It only modifies its retrieval capacity when the capacities start matching. The former student reaches the level of the professor and both can profit from the interaction.

Panels 2.C and 2.D show the behavior for the low temperature region T=0.2T=0.2. Diagram 2.D shows that for low differences of tt we still have a qualitatively similar behavior as in diagram 2.B. However, for big differences in dreaming load, there is a discontinuous transition at high values of the interaction in which the capacity of agent 2 decreases substantially. It is easy to see that this behavior is consistent with diagram 1.B. Of course, the critical capacity of agent 2, despite having no benefit from the interaction, is still larger than in the high temperature case shown in figure 2.B. For the parameters where the two agents are beneficial to each other, the increase in their interaction is not detrimental. We see this in the increase of the upper boundary of the yellow region. But this cannot improve forever, and the alphas (figure 2.A, 2.B and 2.D) tend to a limit value.

2.3.3 ϵ\epsilon-TcT_{c} plane

Here, we fixed t1,t2,Tt_{1},t_{2},T and varied ϵ\epsilon and TT to obtain the value of Tc(ϵ)T_{c}(\epsilon) such that for T>Tc(ϵ)T>T_{c}(\epsilon) we have m1=0m_{1}=0 or m2=0m_{2}=0. In these, differently from the previous diagrams, we incorporated the denomination of different phases mentioned in subsection II.B, as they are easier to visualize. The main results can be seen in panel 3:

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Phase diagram in the temperature T,ϵT,\epsilon plane for fixedt1,t2,α)t_{1},t_{2},\alpha). A) For fixed α=0.08\alpha=0.08, t1=0t_{1}=0 and t2=10t_{2}=10. Since the difference between t1t_{1} and t2t_{2} is large, both the insufficient phase and the student-professor phase are present. B) With α=0.12\alpha=0.12, t1=2t_{1}=2 and t2=3t_{2}=3, the student-professor and the insufficient phases have disappeared due to a small difference between t1t_{1} and t2t_{2}. C) α=0.02\alpha=0.02 and t1=t2=10t_{1}=t_{2}=10. Same dreaming loads leads to symmetry and therefore no student-professor nor insufficient phases. This also happens in D) α=0.08\alpha=0.08 and t1=t2=10t_{1}=t_{2}=10, note that the disordered and reinforced delusion phases increase and the mutualism phase decreases significantly.

The indifferent phase is usually small and only appears in models where the t1t_{1} and t2t_{2} difference is high, in our examples it can only be visualized in the first case. The mutualism phase only increases as the interaction increases, and tends to a plateau for high values, as expected. The reinforced delusion only appears for high values of interaction and increases beyond the mutualism phase plateau. It is important to explain that the behavior visualized in diagram 3.A is not in contradiction with diagram 2.C, as in the diagram 3.A, we have a low α\alpha value, so the phenomenon of decreasing magnetization with interaction does not occur. Additionally, we do not see the amensalism phase in panel 3 because we are only considering relatively low capacities, where this phase is not present.

To get a more clear view of the phase changes in this model, we add panel 4 that shows how Δm1\Delta m_{1}, Δm2\Delta m_{2} and Δh\Delta h change with temperature for fixed ϵ\epsilon, α\alpha, t1t_{1} and t2t_{2}

Refer to captionRefer to caption
Refer to captionRefer to caption
Figure 4: Δmi(T;t1,t2,α,ϵ)\Delta m_{i}(T;t_{1},t_{2},\alpha,\epsilon) and Δh(T;t1,t2,α,ϵ)\Delta h(T;t_{1},t_{2},\alpha,\epsilon). The legend shows Δm,Δh\Delta m,\Delta h for a particular cut of ϵ\epsilon given (t1,t2,α)(t_{1},t_{2},\alpha). A) A cut with α=0.12\alpha=0.12, ϵ=0.05\epsilon=0.05, t1=0t_{1}=0 and t2=10t_{2}=10; it is interesting to compare it with diagram 3.A, as they are similar situations. Disordered phase here corresponds to the spin-glass phase. B) A cut with α=0.08\alpha=0.08, ϵ=0.2\epsilon=0.2, t1=2t_{1}=2 and t2=3t_{2}=3; compare it to diagram 3.B. Disordered phase here corresponds to the paramagnetic phase. C) α=0.02\alpha=0.02, ϵ=0.35\epsilon=0.35 and t1=t2=10t_{1}=t_{2}=10, we write mm1=m2m\equiv m_{1}=m_{2} as they are equal, compare it to the diagram 3.C and 3.D. Disordered phase here corresponds to the paramagnetic phase. D) With α=0.08\alpha=0.08, ϵ=0.45\epsilon=0.45, t1=0t_{1}=0 and t2=10t_{2}=10, see that the small tail in the Δh\Delta h at high values of ϵ\epsilon indicates the reinforced delusion phase; compare this image to diagram 3.A. Disordered phase here corresponds to the spin-glass phase.

3 Conclusions

Statistical Mechanics techniques were used to model and study the interaction between information processing machines. Simple exposure to information, without the benefit of post-processing, in the form of dreaming, is not efficient. The agents received the same information in the form of P=αNP=\alpha N patterns. The removal and reinforcement of minima can be thought of as a further exercise, of duration t1t_{1} and t2t_{2}, that the agents undergo after being exposed to the information in the memory patterns; in a metaphorical sense, the agents ponder, meditate, think over, dream about the received information. In addition to the individual learning process, further changes in their properties are elicited by an interaction quantified by ϵ\epsilon, and a rich and non-trivial behavior ensues; interactions can be irrelevant, beneficial or harmful to the information retrieval. We return to what is meaningful information processing? It depends on who assigns meaning. If the retrieval is used to gauge the success of the machines, as an independent third party would, the region of reinforced delusion shows no meaningful information processing. However, from the perspective of the agents, it seems to be fine. They agree on their version of what is correct, despite it doesn’t reflect the objectively relevant information in the memory patterns. The ubiquitous use of machines that learn demands the study of their interactions, not only with other machines, but with humans too. Modifications, generalizations or simplifications of our approach are needed.

We thank Felippe Alves Pereira for useful discussions. This work was partially supported by FAPESP and CNPq. The number of FAPESP’s project is nº2021/07951-7.

References

  • [1] , See Supplemental Material at [URL will be inserted by publisher] for free energy and equations of state., 2022
  • [2] Elena Agliari, Francesco Alemanno, Adriano Barra and Alberto Fachechi “Dreaming neural networks: rigorous results” In Journal of Statistical Mechanics: Theory and Experiment 2019.8 IOP Publishing, 2019, pp. 083503 DOI: 10.1088/1742-5468/ab371d
  • [3] Francesco Alemanno et al. “Supervised Hebbian learning” In Europhysics Letters IOP Publishing, 2023
  • [4] D.J Amit, Hanoch Gutfreund and H.Sompolinsky “Storing Infinite Numbers of Patterns in a Spin-Glass Model of Neural Networks” In Physical Review letters, 1985
  • [5] Adriano Barra, Alberto Bernacchia, Enrica Santucci and Pierluigi Contucci “On the equivalence of Hopfield Networks and Boltzmann Machines” In Neural networks : the official journal of the International Neural Network Society 34, 2012, pp. 1–9 DOI: 10.1016/j.neunet.2012.06.003
  • [6] F Crick and G Mitchinson “The function of dream sleep” In Nature, 1983
  • [7] V.. Dotsenko “Statistical mechanics of Hopfield-like neural networks with modified interactions” In J. Phys. A: Math. Gen., 1991
  • [8] A. Fachechi, E. Agliari and A. Barra “Dreaming neural networks: Forgetting spurious memories and reinforcing pure ones” In Neural Networks, 2019
  • [9] Alberto Fachechi, Adriano Barra, Elena Agliari and Francesco Alemanno “Outperforming RBM Feature-Extraction Capabilities by “Dreaming” Mechanism” In IEEE Transactions on Neural Networks and Learning Systems 35, 2022, pp. 1–10 DOI: 10.1109/TNNLS.2022.3182882
  • [10] Leo Hemmen and Nikolaus Klemmer “Unlearning and Its Relevance to REM Sleep: Decorrelating Correlated Data” In Neural Network Dynamics, 1992
  • [11] J. Hopfield, D.. Feinstein and R.. Palmer “Unlearning has a stabilizing effect in collective memories” In Nature, 1983
  • [12] John J Hopfield “Neural networks and physical systems with emergent collective computational abilities” In Proceedings of the national academy of sciences 79.8 National Acad Sciences, 1982, pp. 2554–2558
  • [13] Leo Kadanoff and Franz Wegner “Some critical properties of the eight-vertex model” In Physical review B, 1971
  • [14] I Kanter and H Sompolinsky “Associative recall of memory without errors” In Phys Rev A 35 American Physical Society, pp. 380–392 DOI: 10.1103/PhysRevA.35.380
  • [15] S. Kuva, O. Kinouchi and N. Caticha “Learning a spin glass: determining Hamiltonians from metastable states” In Journal of Physics A, 1997
  • [16] LA Pastur and AL Figotin “Theory of disordered spin systems” In Theor Math Phys 35, 1978, pp. 403–414 URL: https://doi.org/10.1007/BF01039111
  • [17] Dreyfus G. Personnaz L. “Information storage and retrieval in spin-glass like neural networks” In Journal Physics Letters, 1985
  • [18] A. Plakhov and S. Semenov “Neural networks: iterative unlearning algorithm converging to the projector rule matrix” In Journal de Physique I, 1994
  • [19] S. Semenov and I. Shuvalova “Some results on convergent unlearning algorithm” In Advances in Neural Information Processing Systems 8 (NIPS 1995), 1995
  • [20] S. Wimbauer, N. Klemmer and J Leo van Hemmen “Universality of unlearning” In Neural Networks, 1994

Supplemental Material

Interacting Dreaming Neural Networks

Pietro Zanin and Nestor Caticha

Instituto de Fisica, Universidade de Sao Paulo

Appendix A Details of the solution

The average over the patterns of the replicated partition function is

Zn=σn,Snexp[β2Nai,jμ,νξiμξiν(1+t1𝟙+t1C)μνσiaσja\displaystyle\langle Z^{n}\rangle=\langle\sum_{\sigma^{n},S^{n}}\exp[\frac{\beta}{2N}\sum_{a}\sum_{i,j}\sum_{\mu,\nu}\xi_{i}^{\mu}\xi_{i}^{\nu}(\frac{1+t_{1}}{\mathbb{1}+t_{1}C})_{\mu\nu}\sigma_{i}^{a}\sigma_{j}^{a} (5)
+β2Nai,jμ,νξiμξjν(1+t2𝟙+t2C)μνSiaSja+βϵN(1NiσiaSia)2].\displaystyle+\frac{\beta}{2N}\sum_{a}\sum_{i,j}\sum_{\mu,\nu}\xi_{i}^{\mu}\xi_{j}^{\nu}(\frac{1+t_{2}}{\mathbb{1}+t_{2}C})_{\mu\nu}S_{i}^{a}S_{j}^{a}+\beta\epsilon N(\frac{1}{N}\sum_{i}\sigma_{i}^{a}S_{i}^{a})^{2}]\rangle.

Introducing integrals to remove the inverse matrix and considering that we are only dealing with one condensed pattern (𝝃1\bm{\xi}^{1}) :

Zn=σn,SnaDx1aDy1ai,aDΦiaDϕiaaDha\displaystyle\langle Z^{n}\rangle=\sum_{\sigma^{n},S^{n}}\int\prod_{a}Dx_{1}^{a}Dy_{1}^{a}\prod_{i,a}D\Phi_{i}^{a}D\phi_{i}^{a}\prod_{a}Dh^{a} (6)
×exp[β(1+t1)Ni,ax1aξi1(σia+it1β(t1+1)ϕia)\displaystyle\times\exp[\sqrt{\frac{\beta(1+t_{1})}{N}}\sum_{i,a}x_{1}^{a}\xi_{i}^{1}(\sigma_{i}^{a}+i\sqrt{\frac{t_{1}}{\beta(t_{1}+1)}}\phi_{i}^{a})
+β(1+t2)Ny1aξi1(Sia+it2β(t2+1)Φia)\displaystyle+\sqrt{\frac{\beta(1+t_{2})}{N}}y_{1}^{a}\xi_{i}^{1}(S_{i}^{a}+i\sqrt{\frac{t_{2}}{\beta(t_{2}+1)}}\Phi_{i}^{a})
+2βϵNaiσiaSiaha]aμ2DxμaDyμaexp{i,aμ2\displaystyle+\sqrt{\frac{2\beta\epsilon}{N}}\sum_{a}\sum_{i}\sigma_{i}^{a}S_{i}^{a}h^{a}]\langle\prod_{a}\prod_{\mu\geq 2}Dx_{\mu}^{a}Dy_{\mu}^{a}\exp\{\sum_{i,a}\sum_{\mu\geq 2}
×ξiμ[β(t1+1)Nxμa(σia+it1β(t1+1)ϕia)\displaystyle\times\xi_{i}^{\mu}[\sqrt{\frac{\beta(t_{1}+1)}{N}}x_{\mu}^{a}(\sigma_{i}^{a}+i\sqrt{\frac{t_{1}}{\beta(t_{1}+1)}}\phi_{i}^{a})
+β(t2+1)Nyμa(Sia+it2β(t2+1)Φia)]}.\displaystyle+\sqrt{\frac{\beta(t_{2}+1)}{N}}y_{\mu}^{a}(S_{i}^{a}+i\sqrt{\frac{t_{2}}{\beta(t_{2}+1)}}\Phi_{i}^{a})]\}\rangle.

The average over the patterns can be done explicitly when we have only one condensed pattern, as it turns out to be a sum of exponentials:

exp{iaμ2ξiμ[β(t1+1)Nxμa(σia+t1ϕia)+β(t2+1)Nyμa(Sia+t2Φia)]}\displaystyle\langle\exp\{\sum_{ia}\sum_{\mu\geq 2}\xi_{i}^{\mu}[\sqrt{\frac{\beta(t_{1}+1)}{N}}x_{\mu}^{a}(\sigma_{i}^{a}+t_{1}^{\prime}\phi_{i}^{a})+\sqrt{\frac{\beta(t_{2}+1)}{N}}y_{\mu}^{a}(S_{i}^{a}+t_{2}^{\prime}\Phi_{i}^{a})]\}\rangle
={cosh[iaμ2(β(t1+1)Nxμa(σia+t1ϕia)+β(t2+1)Nyμa(Sia+t2Φia))]}N\displaystyle=\{\cosh[\sum_{ia}\sum_{\mu\geq 2}(\sqrt{\frac{\beta(t_{1}+1)}{N}}x_{\mu}^{a}(\sigma^{a}_{i}+t_{1}^{\prime}\phi^{a}_{i})+\sqrt{\frac{\beta(t_{2}+1)}{N}}y_{\mu}^{a}(S^{a}_{i}+t_{2}^{\prime}\Phi^{a}_{i}))]\}^{N}
=exp{β2Niabμ2(1+t1)(σia+t1ϕia)(σib+t1ϕib)xμaxμb\displaystyle=\exp\{\frac{\beta}{2N}\sum_{iab}\sum_{\mu\geq 2}(1+t_{1})(\sigma^{a}_{i}+t_{1}^{\prime}\phi^{a}_{i})(\sigma^{b}_{i}+t_{1}^{\prime}\phi^{b}_{i})x_{\mu}^{a}x_{\mu}^{b} (7)
+(1+t2)yμayμb(Sia+t2Φia)(Sib+t2Φib)\displaystyle+(1+t_{2})y_{\mu}^{a}y_{\mu}^{b}(S^{a}_{i}+t_{2}^{\prime}\Phi^{a}_{i})(S^{b}_{i}+t_{2}^{\prime}\Phi^{b}_{i})
+2(1+t1)(1+t2)[(σia+t1ϕia)(Sib+t2Φib)xμayμb]}\displaystyle+2\sqrt{(1+t_{1})(1+t_{2})}[(\sigma^{a}_{i}+t_{1}^{\prime}\phi^{a}_{i})(S^{b}_{i}+t_{2}^{\prime}\Phi^{b}_{i})x_{\mu}^{a}y_{\mu}^{b}]\}
=abdqabσdqabSdqabσSdrabσdrabσdrabSdrabσS\displaystyle=\int\prod_{ab}dq_{ab}^{\sigma}dq_{ab}^{S}dq_{ab}^{\sigma S}dr_{ab}^{\sigma}dr_{ab}^{\sigma}dr_{ab}^{S}dr_{ab}^{\sigma S}
×exp{Nirabσ[qabσ(σa+t1ϕa)(σb+t1ϕb)]\displaystyle\times\exp\{N\sum ir_{ab}^{\sigma}[q_{ab}^{\sigma}-(\sigma^{a}+t_{1}^{\prime}\phi^{a})(\sigma^{b}+t_{1}^{\prime}\phi^{b})]
+NabirabS[qabS(Sa+t2Φa)(Sb+t2Φb)]\displaystyle+N\sum_{ab}ir_{ab}^{S}[q_{ab}^{S}-(S^{a}+t_{2}^{\prime}\Phi^{a})(S^{b}+t_{2}^{\prime}\Phi^{b})]
+NabirabσS[qabσS(σa+t1ϕa)(Sb+t2Φb)]\displaystyle+N\sum_{ab}ir_{ab}^{\sigma S}[q_{ab}^{\sigma S}-(\sigma^{a}+t_{1}^{\prime}\phi^{a})(S^{b}+t_{2}^{\prime}\Phi^{b})]
+β2ab[μ2(1+t1)qabσxμaxμb+(1+t2)yμayμbqabS\displaystyle+\frac{\beta}{2}\sum_{ab}[\sum_{\mu\geq 2}(1+t_{1})q_{ab}^{\sigma}x_{\mu}^{a}x_{\mu}^{b}+(1+t_{2})y_{\mu}^{a}y_{\mu}^{b}q_{ab}^{S}
+1+t11+t2(qabσSxμayμb+xμbyμaqbaσS)]},\displaystyle+\sqrt{1+t_{1}}\sqrt{1+t_{2}}(q_{ab}^{\sigma S}x_{\mu}^{a}y_{\mu}^{b}+x_{\mu}^{b}y_{\mu}^{a}q_{ba}^{\sigma S})]\},

where in the third equality we introduced Edward-Anderson variables and their auxiliary variables via the integral representation of delta function.

Now we look at the integrals in the non-condensed part:

a,bDxaDybexp{β2ab[(1+t1)qabσxaxb+(1+t2)yaybqabS\displaystyle\int\prod_{a,b}Dx^{a}Dy^{b}\exp\{\frac{\beta}{2}\sum_{ab}[(1+t_{1})q_{ab}^{\sigma}x^{a}x^{b}+(1+t_{2})y^{a}y^{b}q_{ab}^{S} (8)
+(1+t1)(1+t2)(qabσSxayb+xbyaqbaσS)]}=a,bdxadyb\displaystyle+\sqrt{(1+t_{1})(1+t_{2})}(q_{ab}^{\sigma S}x^{a}y^{b}+x^{b}y^{a}q^{\sigma S}_{ba})]\}=\int\prod_{a,b}dx^{a}dy^{b}
×exp{12abδabxaxbδabyayb+β[(1+t1)qabσxaxb+(1+t2)yaybqabS\displaystyle\times\exp\{\frac{1}{2}\sum_{ab}-\delta_{ab}x^{a}x^{b}-\delta_{ab}y^{a}y^{b}+\beta[(1+t_{1})q_{ab}^{\sigma}x^{a}x^{b}+(1+t_{2})y^{a}y^{b}q_{ab}^{S}
+1+t11+t2(qabσSxμayμb+xμbyμaqbaσS)]}\displaystyle+\sqrt{1+t_{1}}\sqrt{1+t_{2}}(q_{ab}^{\sigma S}x_{\mu}^{a}y_{\mu}^{b}+x_{\mu}^{b}y_{\mu}^{a}q_{ba}^{\sigma S})]\}\ (9)
=a,bdxadybexp(12zT[𝟙βq^)z],\displaystyle=\int\prod_{a,b}dx^{a}dy^{b}\exp(-\frac{1}{2}z^{T}[\mathbb{1}-\beta\hat{q})z],

where q^\hat{q} is a square matrix with dimension 2n×2n2n\times 2n and the vector of dimension 2n2n:

q^((1+t1)qσ(1+t1)(1+t2)qσS(1+t1)(1+t2)(qσS)T(1+t2)qS)\displaystyle\hat{q}\equiv\begin{pmatrix}(1+t_{1})q^{\sigma}&\sqrt{(1+t_{1})(1+t_{2})}q^{\sigma S}\\ &\\ \sqrt{(1+t_{1})(1+t_{2})}(q^{\sigma S})^{T}&(1+t_{2})q^{S}\end{pmatrix}
z(x1,,xn,y1,,yn)T.\displaystyle z\equiv\begin{pmatrix}x_{1},...,x_{n},y_{1},...,y_{n}\end{pmatrix}^{T}. (10)

We have that

adzaexp[12z(𝟙βq^)z]=[det(𝟙βq^)]12.\displaystyle\int\prod_{a}dz^{a}\exp[-\frac{1}{2}z(\mathbb{1}-\beta\hat{q})z]=[\det(\mathbb{1}-\beta\hat{q})]^{\frac{1}{2}}. (11)

We rescale the auxiliary Edwards-Anderson variables and the magnetization variables:

x1βN1+t1m1a;y1βN1+t2m2a;\displaystyle x_{1}\rightarrow\sqrt{\frac{\beta N}{1+t_{1}}}m_{1}^{a};y_{1}\rightarrow\sqrt{\frac{\beta N}{1+t_{2}}}m_{2}^{a}; (12)
rabρiαβ22rabρρ;ha2ϵβNha.\displaystyle r_{ab}^{\rho}\rightarrow i\frac{\alpha\beta^{2}}{2}r_{ab}^{\rho}\forall\rho;h^{a}\rightarrow\sqrt{2\epsilon\beta N}h^{a}.

With these changes, we get the following expression for the average of the replicated partition function:

Zn=adm1adm2adhaabdqabσdqabSdqabσSdrabσdrabSdrabσS\displaystyle\langle Z^{n}\rangle=\int\prod_{a}dm_{1}^{a}dm_{2}^{a}dh^{a}\prod_{ab}dq_{ab}^{\sigma}dq_{ab}^{S}dq_{ab}^{\sigma S}dr_{ab}^{\sigma}dr_{ab}^{S}dr_{ab}^{\sigma S} (13)
×exp{βN2a[(m1a)21+t1+(m1a)21+t2+2ϵ(ha)2]log[det(𝟙βq^)]p2\displaystyle\times\exp\{-\frac{\beta N}{2}\sum_{a}[\frac{(m_{1}^{a})^{2}}{1+t_{1}}+\frac{(m_{1}^{a})^{2}}{1+t_{2}}+2\epsilon(h^{a})^{2}]-\log[\det(\mathbb{1}-\beta\hat{q})]^{\frac{p}{2}}
Nαβ22ab(rabσqabσ+rabSqabS+rabσSqabσS)\displaystyle-\frac{N\alpha\beta^{2}}{2}\sum_{ab}(r_{ab}^{\sigma}q_{ab}^{\sigma}+r_{ab}^{S}q_{ab}^{S}+r_{ab}^{\sigma S}q_{ab}^{\sigma S})
+log[σn,SniaDΦiaDϕibexp(αβ22ab(rabσ(σia+tϕia)(σib+t1ϕib)\displaystyle+\log[\sum_{\sigma^{n},S^{n}}\int\prod_{i}\prod_{a}D\Phi_{i}^{a}D\phi_{i}^{b}\exp(\frac{\alpha\beta^{2}}{2}\sum_{ab}(r_{ab}^{\sigma}(\sigma_{i}^{a}+t^{\prime}\phi_{i}^{a})(\sigma_{i}^{b}+t_{1}^{\prime}\phi_{i}^{b})
+rabS(Sia+t2Φia)(Sib+t2Φib)+rabσS(σia+tϕia)(Sib+t2Φib))\displaystyle+r_{ab}^{S}(S_{i}^{a}+t_{2}^{\prime}\Phi_{i}^{a})(S_{i}^{b}+t_{2}^{\prime}\Phi_{i}^{b})+r_{ab}^{\sigma S}(\sigma_{i}^{a}+t^{\prime}\phi_{i}^{a})(S_{i}^{b}+t_{2}^{\prime}\Phi_{i}^{b}))
+βiaξi1(m1a(σia+t1ϕia)+m2a(Sia+t2Φia))+2βϵaSiσiaha)]}.\displaystyle+\beta\sum_{i}\sum_{a}\xi_{i}^{1}(m_{1}^{a}(\sigma_{i}^{a}+t_{1}^{\prime}\phi_{i}^{a})+m_{2}^{a}(S_{i}^{a}+t_{2}^{\prime}\Phi_{i}^{a}))+2\beta\epsilon\sum_{a}S_{i}\sigma_{i}^{a}h^{a})]\}.

Now we apply the replica symmetry ansatz:

12na((m1a)21+t1+(m2a)21+t2+2ϵ(ha)2)=m122+2t1+m222+2t2+ϵh2,\displaystyle\frac{1}{2n}\sum_{a}(\frac{(m_{1}^{a})^{2}}{1+t_{1}}+\frac{(m_{2}^{a})^{2}}{1+t_{2}}+2\epsilon(h^{a})^{2})=\frac{m_{1}^{2}}{2+2t_{1}}+\frac{m_{2}^{2}}{2+2t_{2}}+\epsilon h^{2}, (14)
αβ2nabrabσqabσ=(Δσ1)(1+t2)2t2Qσ+αβ2rσ(Qσqσ),\displaystyle\frac{\alpha\beta}{2n}\sum_{ab}r_{ab}^{\sigma}q_{ab}^{\sigma}=\frac{(\Delta^{\sigma}-1)(1+t_{2})}{2t_{2}}Q^{\sigma}+\frac{\alpha\beta}{2}r^{\sigma}(Q^{\sigma}-q^{\sigma}),
αβ2nabrabSqabS=(ΔS1)(1+t1)2tQS+αβ2rS(QSqS),\displaystyle\frac{\alpha\beta}{2n}\sum_{ab}r_{ab}^{S}q_{ab}^{S}=\frac{(\Delta^{S}-1)(1+t_{1})}{2t}Q^{S}+\frac{\alpha\beta}{2}r^{S}(Q^{S}-q^{S}),
αβ2nabrabσSqabσS=ΔσSQσS+αβrσS2(QσSqσS),\displaystyle\frac{\alpha\beta}{2n}\sum_{ab}r_{ab}^{\sigma S}q_{ab}^{\sigma S}=\Delta^{\sigma S}Q^{\sigma S}+\frac{\alpha\beta r^{\sigma S}}{2}(Q^{\sigma S}-q^{\sigma S}),

with

Δσ1+αβt11+t1(Rσrσ);ΔS1+αβt21+t2(RSrS);\displaystyle\Delta^{\sigma}\equiv 1+\alpha\beta\frac{t_{1}}{1+t_{1}}(R^{\sigma}-r^{\sigma});\Delta^{S}\equiv 1+\alpha\beta\frac{t_{2}}{1+t_{2}}(R^{S}-r^{S});
ΔσSαβ2(RσSrσS).\displaystyle\Delta^{\sigma S}\equiv\frac{\alpha\beta}{2}(R^{\sigma S}-r^{\sigma S}). (15)

The matrix q^\hat{q} has 4 eigenvalues, their degeneracies and values are

g1=1,λ1=12{a+(n1)b+c+(n1)d\displaystyle g_{1}=1,\lambda_{1}=\frac{1}{2}\{a+(n-1)b+c+(n-1)d (16)
[(a+(n1)b+c+(n1)d)2\displaystyle-[(a+(n-1)b+c+(n-1)d)^{2}
4((a+(n1)b)(c+(n1)d)(e+(n1)f)2)]12},\displaystyle-4((a+(n-1)b)(c+(n-1)d)-(e+(n-1)f)^{2})]^{\frac{1}{2}}\},
g2=1,λ2=12{a+(n1)b+c+(n1)d\displaystyle g_{2}=1,\lambda_{2}=\frac{1}{2}\{a+(n-1)b+c+(n-1)d
+[(a+(n1)b+c+(n1)d)2\displaystyle+[(a+(n-1)b+c+(n-1)d)^{2}
4((a+(n1)b)(c+(n1)d)(e+(n1)f)2)]12},\displaystyle-4((a+(n-1)b)(c+(n-1)d)-(e+(n-1)f)^{2})]^{\frac{1}{2}}\},
g3=n1,λ3=12{ab+cd\displaystyle g_{3}=n-1,\lambda_{3}=\frac{1}{2}\{a-b+c-d
(ab+cd)2+4[(a+b)(cd)+(ef)2]},\displaystyle-\sqrt{(a-b+c-d)^{2}+4[(-a+b)(c-d)+(e-f)^{2}]}\},
g4=n1,λ4=12{ab+cd\displaystyle g_{4}=n-1,\lambda_{4}=\frac{1}{2}\{a-b+c-d
+(ab+cd)2+4[(a+b)(cd)+(ef)2]};\displaystyle+\sqrt{(a-b+c-d)^{2}+4[(-a+b)(c-d)+(e-f)^{2}]}\};

where

a=1βQσ(1+t1),b=βqσ(1+t1),\displaystyle a=1-\beta Q^{\sigma}(1+t_{1}),\,\,\,b=-\beta q^{\sigma}(1+t_{1}), (17)
c=1βQS(1+t2),d=βqS(1+t2)\displaystyle c=1-\beta Q^{S}(1+t_{2}),\,\,\,d=-\beta q^{S}(1+t_{2})
e=βQσS(1+t1)(1+t2),f=βqσS(1+t1)(1+t2).\displaystyle e=-\beta Q^{\sigma S}\sqrt{(1+t_{1})(1+t_{2})},\,\,\,f=-\beta q^{\sigma S}\sqrt{(1+t_{1})(1+t_{2})}.

The last step is to do two Hubbard-Stratonovich transformations on the quadratic variables, but first we need to massage to terms to get an useful formula:

abrσ(σa+t1ϕa)(σb+t1ϕb)+abrS(Sa+t2Φa)(Sb+t2)\displaystyle\sum_{ab}r^{\sigma}(\sigma^{a}+t_{1}^{\prime}\phi^{a})(\sigma^{b}+t_{1}^{\prime}\phi^{b})+\sum_{ab}r^{S}(S^{a}+t_{2}^{\prime}\Phi^{a})(S^{b}+t_{2}^{\prime}) (18)
+abrσS(σa+t1ϕa)(Sb+t2)=12[rσa(σa+t1ϕa)\displaystyle+\sum_{ab}r^{\sigma S}(\sigma^{a}+t_{1}^{\prime}\phi^{a})(S^{b}+t_{2}^{\prime})=\frac{1}{2}[\sqrt{r^{\sigma}}\sum_{a}(\sigma^{a}+t_{1}^{\prime}\phi^{a})
+rSa(Sa+t2Φa)]2(1+rσS2rσrS)+12[rσa(σa+t1ϕa)\displaystyle+\sqrt{r^{S}}\sum_{a}(S^{a}+t_{2}^{\prime}\Phi^{a})]^{2}(1+\frac{r^{\sigma S}}{2\sqrt{r^{\sigma}r^{S}}})+\frac{1}{2}[\sqrt{r^{\sigma}}\sum_{a}(\sigma^{a}+t_{1}^{\prime}\phi^{a})
rSa(Sa+t2Φa)]2(1rσS2rσrS).\displaystyle-\sqrt{r^{S}}\sum_{a}(S^{a}+t_{2}^{\prime}\Phi^{a})]^{2}(1-\frac{r^{\sigma S}}{2\sqrt{r^{\sigma}r^{S}}}).

Only now we do the Hubbard-Stratonovich transformations in these quadratic terms. If we have an arbitrary number of agents \aleph, we can always do an analogous transformation, as it is equivalent to a change of coordinate system xx\vec{x}\to\vec{x}^{\prime} such that the ixi2+i,j,i<jaixixj=ixi2\sum_{i}^{\aleph}x_{i}^{2}+\sum_{i,j,i<j}^{\aleph}a_{i}x_{i}x_{j}=\sum_{i}^{\aleph}x_{i}^{{}^{\prime}2} with arbitrary coefficients aia_{i}, but it is not hard to notice that this procedure leads to many additional terms, whose number scales faster than linear in the number of agents. For comparison, the formula that should be used with 3 agents is the following:

x2+y2+z2+axy+bxz+cyz\displaystyle x^{2}+y^{2}+z^{2}+axy+bxz+cyz
=14{[(x+y)1+a2+z]2(1+c+b21+a2)\displaystyle=\frac{1}{4}\{[(x+y)\sqrt{1+\frac{a}{2}}+z]^{2}(1+\frac{c+b}{2\sqrt{1+\frac{a}{2}}})
+[(x+y)1+a2z]2(1c+b21+a2)\displaystyle+[(x+y)\sqrt{1+\frac{a}{2}}-z]^{2}(1-\frac{c+b}{2\sqrt{1+\frac{a}{2}}})
+[(yx)1a2+z]2(1+cb21a2)\displaystyle+[(y-x)\sqrt{1-\frac{a}{2}}+z]^{2}(1+\frac{c-b}{2\sqrt{1-\frac{a}{2}}})
+[(yx)1a2z]2(1cb21a2)}\displaystyle+[(y-x)\sqrt{1-\frac{a}{2}}-z]^{2}(1-\frac{c-b}{2\sqrt{1-\frac{a}{2}}})\} (19)

This is the main reason why we believe that it is not easy to generalize our results for interactions between 3,4, or more neural networks.

After that manipulation, we can sum over the spin states and get the free energy.

Appendix B Free energy expression and equations of state

The free energy for two interacting agents is

f(m1,m2,h,Qσ,QS,QσS,qσ,qS,qσS,t1,t2,β,α,ϵ)=\displaystyle f(m_{1},m_{2},h,Q^{\sigma},Q^{S},Q^{\sigma S},q^{\sigma},q^{S},q^{\sigma S},t_{1},t_{2},\beta,\alpha,\epsilon)= (20)
m122+2t1+m222+2t2+ϵh2+(Δσ1)(1+t1)2t1Qσ+log[ΔσΔSt12t22β2(ΔσS)2]2β\displaystyle\frac{m_{1}^{2}}{2+2t_{1}}+\frac{m_{2}^{2}}{2+2t_{2}}+\epsilon h^{2}+\frac{(\Delta^{\sigma}-1)(1+t_{1})}{2t_{1}}Q^{\sigma}+\frac{\log[\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]}{2\beta}
+αβ2rσ(Qσqσ)+(ΔS1)(1+t2)2t2QS+αβ2rS(QSqS)\displaystyle+\frac{\alpha\beta}{2}r^{\sigma}(Q^{\sigma}-q^{\sigma})+\frac{(\Delta^{S}-1)(1+t_{2})}{2t_{2}}Q^{S}+\frac{\alpha\beta}{2}r^{S}(Q^{S}-q^{S})
+ΔσSQσS+αβrσS2(QσSqσS)\displaystyle+\Delta^{\sigma S}Q^{\sigma S}+\frac{\alpha\beta r^{\sigma S}}{2}(Q^{\sigma S}-q^{\sigma S})
+α2βlog{12[2+β(1+t1)(qσQσ)+β(1+t2)(qSQS)\displaystyle+\frac{\alpha}{2\beta}\log\{\frac{1}{2}[2+\beta(1+t_{1})(q^{\sigma}-Q^{\sigma})+\beta(1+t_{2})(q^{S}-Q^{S})
β(((1+t1)(qσQσ)(1+t2)(qSQS))2\displaystyle-\beta(((1+t_{1})(q^{\sigma}-Q^{\sigma})-(1+t_{2})(q^{S}-Q^{S}))^{2}
+4(1+t1)(1+t2)(QσS+qσS)2)12]}\displaystyle+4(1+t_{1})(1+t_{2})(-Q^{\sigma S}+q^{\sigma S})^{2})^{\frac{1}{2}}]\}
+α2βlog{12[2+β(1+t1)(qσQσ)+β(1+t2)(qSQS)\displaystyle+\frac{\alpha}{2\beta}\log\{\frac{1}{2}[2+\beta(1+t_{1})(q^{\sigma}-Q^{\sigma})+\beta(1+t_{2})(q^{S}-Q^{S})
+β(((1+t1)(qσQσ)(1+t2)(qSQS))2\displaystyle+\beta(((1+t_{1})(q^{\sigma}-Q^{\sigma})-(1+t_{2})(q^{S}-Q^{S}))^{2}
+4(1+t1)(1+t2)(QσS+qσS)2)12]}\displaystyle+4(1+t_{1})(1+t_{2})(-Q^{\sigma S}+q^{\sigma S})^{2})^{\frac{1}{2}}]\}
α{qσ(1+t1)[1β(QSqS)(1+t2)]\displaystyle-\alpha\{q^{\sigma}(1+t_{1})[1-\beta(Q^{S}-q^{S})(1+t_{2})]
+qS(1+t2)[1β(Qσqσ)(1+t1)]+β(1+t1)(1+t2)qσS(qσS+QσS)}\displaystyle+q^{S}(1+t_{2})[1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1})]+\beta(1+t_{1})(1+t_{2})q^{\sigma S}(-q^{\sigma S}+Q^{\sigma S})\}
×{2[1β(1+t1)(Qσqσ)][1β(1+t2)(QSqS)]\displaystyle\times\{2[1-\beta(1+t_{1})(Q^{\sigma}-q^{\sigma})][1-\beta(1+t_{2})(Q^{S}-q^{S})]
2β2(1+t1)(1+t2)(QσS+qσS)2}1\displaystyle-2\beta^{2}(1+t_{1})(1+t_{2})(-Q^{\sigma S}+q^{\sigma S})^{2}\}^{-1}
1βDxDylog(L1)\displaystyle-\frac{1}{\beta}\int DxDy\log(L_{1})
[ΔSt12β(αrσ+m12+1β2t14)+Δσt22β(αrS+m22+1β2t24)\displaystyle-[\Delta^{S}t_{1}^{\prime 2}\beta(\alpha r^{\sigma}+m_{1}^{2}+\frac{1}{\beta^{2}t_{1}^{\prime 4}})+\Delta^{\sigma}t_{2}^{\prime 2}\beta(\alpha r^{S}+m_{2}^{2}+\frac{1}{\beta^{2}t_{2}^{\prime 4}})
+2t12t22β2ΔσS(rσSα2+m1m2)][2ΔσΔS2t12t22β2(ΔσS)2]1,\displaystyle+2t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}\Delta^{\sigma S}(\frac{r^{\sigma S}\alpha}{2}+m_{1}m_{2})][2\Delta^{\sigma}\Delta^{S}-2t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]^{-1},

where we use the auxiliary definitions

r±=1±rσS2rσrS,t1=it1β(t1+1),t2=it2β(t2+1),\displaystyle r_{\pm}=\sqrt{1\pm\frac{r_{\sigma S}}{2\sqrt{r^{\sigma}r^{S}}}},t_{1}^{\prime}=i\sqrt{\frac{t_{1}}{\beta(t_{1}+1)}},t_{2}^{\prime}=i\sqrt{\frac{t_{2}}{\beta(t_{2}+1)}}, (21)
η=2ϵh+ΔσSΔσΔSt12t22β2(ΔσS)2,\displaystyle\eta=2\epsilon h+\frac{\Delta^{\sigma S}}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}},
υ={ΔS[m1+αrσ2(r+x+ry)]+t22βΔσS[m2+αrS2(r+xry)]}\displaystyle\upsilon=\{\Delta^{S}[m_{1}+\sqrt{\frac{\alpha r^{\sigma}}{2}}(r_{+}x+r_{-}y)]+t_{2}^{\prime 2}\beta\Delta^{\sigma S}[m_{2}+\sqrt{\frac{\alpha r^{S}}{2}}(r_{+}x-r_{-}y)]\}
×[ΔσΔSt12t22β2(ΔσS)2]1,\displaystyle\times[\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]^{-1},
Υ={Δσ[m2+αrS2(r+xry)]+t12βΔσS[m1+αrσ2(r+x+ry)]},\displaystyle\Upsilon=\{\Delta^{\sigma}[m_{2}+\sqrt{\frac{\alpha r^{S}}{2}}(r_{+}x-r_{-}y)]+t_{1}^{\prime 2}\beta\Delta^{\sigma S}[m_{1}+\sqrt{\frac{\alpha r^{\sigma}}{2}}(r_{+}x+r_{-}y)]\},
×[ΔσΔSt12t22β2(ΔσS)2]1,\displaystyle\times[\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]^{-1},
L1=cosh[β(υ+Υ)]exp(βη)+cosh[β(υΥ)]exp(βη),\displaystyle L_{1}=\cosh[\beta(\upsilon+\Upsilon)]\exp(\beta\eta)+\cosh[\beta(\upsilon-\Upsilon)]\exp(-\beta\eta),
L2=cosh[β(υ+Υ)]exp(βη)cosh[β(υΥ)]exp(βη),\displaystyle L_{2}=\cosh[\beta(\upsilon+\Upsilon)]\exp(\beta\eta)-\cosh[\beta(\upsilon-\Upsilon)]\exp(-\beta\eta),
L3=sinh[β(υ+Υ)]exp(βη)+sinh[β(υΥ)]exp(βη),\displaystyle L_{3}=\sinh[\beta(\upsilon+\Upsilon)]\exp(\beta\eta)+\sinh[\beta(\upsilon-\Upsilon)]\exp(-\beta\eta),
L4=sinh[β(υ+Υ)]exp(βη)sinh[β(υΥ)]exp(βη).\displaystyle L_{4}=\sinh[\beta(\upsilon+\Upsilon)]\exp(\beta\eta)-\sinh[\beta(\upsilon-\Upsilon)]\exp(-\beta\eta).

Using the variational principle that the partial derivatives of the free energy must be 0 at the equilibrium point, we get 15 different equations for 15 different variables. We can simplify them by writing 6 of the variables as functions of the other 9, we call them former dependent and the latter independent.

The equations of the 6 dependent variables are

rσ={qσ(1+t1)2[1β(QSqS)(1+t2)]2\displaystyle r^{\sigma}=\{q^{\sigma}(1+t_{1})^{2}[1-\beta(Q^{S}-q^{S})(1+t_{2})]^{2} (22)
+β(QσSqσS)(1+t1)2(1+t2)[qσS(1β(QSqS)(1+t2))\displaystyle+\beta(Q^{\sigma S}-q^{\sigma S})(1+t_{1})^{2}(1+t_{2})[q^{\sigma S}(1-\beta(Q^{S}-q^{S})(1+t_{2}))
+qS(1+t2)β(QσSqσS)]}\displaystyle+q^{S}(1+t_{2})\beta(Q^{\sigma S}-q^{\sigma S})]\}
×{[1β(Qσqσ)(1+t1)][1β(QSqS)(1+t2)]\displaystyle\times\{[1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1})][1-\beta(Q^{S}-q^{S})(1+t_{2})]
(QσSqσS)2(1+t1)(1+t2)β2}2,\displaystyle-(Q^{\sigma S}-q^{\sigma S})^{2}(1+t_{1})(1+t_{2})\beta^{2}\}^{-2},
Δσ=1+αt1{(1β(QSqS)(1+t2))2(1β(Qσqσ)(1+t1))\displaystyle\Delta^{\sigma}=1+\alpha t_{1}\{(1-\beta(Q^{S}-q^{S})(1+t_{2}))^{2}(1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1})) (23)
(1β(QSqS)(1+t2))(QσSqσS)2β2(1+t1)(1+t2)}\displaystyle-(1-\beta(Q^{S}-q^{S})(1+t_{2}))(Q^{\sigma S}-q^{\sigma S})^{2}\beta^{2}(1+t_{1})(1+t_{2})\}
×{[1β(Qσqσ)(1+t1)][1β(QSqS)(1+t2)]\displaystyle\times\{[1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1})][1-\beta(Q^{S}-q^{S})(1+t_{2})]
(QσSqσS)2(1+t1)(1+t2)β2}2,\displaystyle-(Q^{\sigma S}-q^{\sigma S})^{2}(1+t_{1})(1+t_{2})\beta^{2}\}^{-2},
rS=(qS(1+t2)2(1β(Qσqσ)(1+t1))2\displaystyle r^{S}=(q^{S}(1+t_{2})^{2}(1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1}))^{2} (24)
+β(QσSqσS)(1+t1)(1+t2)2(qσS(1β(Qσqσ)(1+t1))\displaystyle+\beta(Q^{\sigma S}-q^{\sigma S})(1+t_{1})(1+t_{2})^{2}(q^{\sigma S}(1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1}))
+qσ(1+t1)β(QσSqσS)))\displaystyle+q^{\sigma}(1+t_{1})\beta(Q^{\sigma S}-q^{\sigma S})))
×{[1β(Qσqσ)(1+t1)][1β(QSqS)(1+t2)]\displaystyle\times\{[1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1})][1-\beta(Q^{S}-q^{S})(1+t_{2})]
(QσSqσS)2(1+t1)(1+t2)β2}2,\displaystyle-(Q^{\sigma S}-q^{\sigma S})^{2}(1+t_{1})(1+t_{2})\beta^{2}\}^{-2},
ΔS=1+αt2{[1β(Qσqσ)(1+t1)]2[1β(QSqS)(1+t2)]\displaystyle\Delta^{S}=1+\alpha t_{2}\{[1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1})]^{2}[1-\beta(Q^{S}-q^{S})(1+t_{2})] (25)
[1β(Qσqσ)(1+t1)](QσSqσS)2β2(1+t1)(1+t2)}\displaystyle-[1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1})](Q^{\sigma S}-q^{\sigma S})^{2}\beta^{2}(1+t_{1})(1+t_{2})\}
×{[1β(Qσqσ)(1+t1)][1β(QSqS)(1+t2)]\displaystyle\times\{[1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1})][1-\beta(Q^{S}-q^{S})(1+t_{2})]
(QσSqσS)2(1+t1)(1+t2)β2}2,\displaystyle-(Q^{\sigma S}-q^{\sigma S})^{2}(1+t_{1})(1+t_{2})\beta^{2}\}^{-2},
ΔσS=αβ(qσS+QσS)(1+t1)(1+t2)\displaystyle\Delta^{\sigma S}=\alpha\beta(-q^{\sigma S}+Q^{\sigma S})(1+t_{1})(1+t_{2}) (26)
×{2[1β(Qσqσ)(1+t1)][1β(QSqS)(1+t2)]\displaystyle\times\{2[1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1})][1-\beta(Q^{S}-q^{S})(1+t_{2})]
2β2(QσS+qσS)2(1+t1)(1+t2)}1,\displaystyle-2\beta^{2}(-Q^{\sigma S}+q^{\sigma S})^{2}(1+t_{1})(1+t_{2})\}^{-1},
rσS=(1+t1)(1+t2){qσS\displaystyle r^{\sigma S}=(1+t_{1})(1+t_{2})\{q^{\sigma S} (27)
×[(1β(Qσqσ)(1+t1))(1β(QSqS)(1+t2))\displaystyle\times[(1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1}))(1-\beta(Q^{S}-q^{S})(1+t_{2}))
β2(QσSqσS)2(1+t1)(1+t2)]1\displaystyle-\beta^{2}(Q^{\sigma S}-q^{\sigma S})^{2}(1+t_{1})(1+t_{2})]^{-1}
+2[(qσS+QσS)β(qσ(1+t1)(1β(QSqS)(1+t2))\displaystyle+2[(-q^{\sigma S}+Q^{\sigma S})\beta(q^{\sigma}(1+t_{1})(1-\beta(Q^{S}-q^{S})(1+t_{2}))
+qS(1+t2)(1β(Qσqσ)(1+t1))+qσS(1+t1)(1+t2)β(qσS+QσS))]\displaystyle+q^{S}(1+t_{2})(1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1}))+q^{\sigma S}(1+t_{1})(1+t_{2})\beta(-q^{\sigma S}+Q^{\sigma S}))]
×[(1β(Qσqσ)(1+t1))(1β(QSqS)(1+t2))\displaystyle\times[(1-\beta(Q^{\sigma}-q^{\sigma})(1+t_{1}))(1-\beta(Q^{S}-q^{S})(1+t_{2}))
β2(QσSqσS)2(1+t1)(1+t2)2]2}.\displaystyle-\beta^{2}(Q^{\sigma S}-q^{\sigma S})^{2}(1+t_{1})(1+t_{2})^{2}]^{-2}\}.

The equations of the 9 independent variables are

h=DxDyL2L1,\displaystyle h=\int DxDy\frac{L_{2}}{L_{1}}, (28)
m1(11+t1ΔSt12βΔσΔSt12t22β2(ΔσS)2)=m2t12t22β2ΔσSΔσΔSt12t22β2(ΔσS)2\displaystyle m_{1}(\frac{1}{1+t_{1}}-\frac{\Delta^{S}t_{1}^{\prime 2}\beta}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}})=\frac{m_{2}t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}\Delta^{\sigma S}}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}}
+DxDy1L1(ΔSL3ΔσΔSt12t22β2(ΔσS)2+t12βΔσSL4ΔσΔSt12t22β2(ΔσS)2),\displaystyle+\int DxDy\frac{1}{L_{1}}(\frac{\Delta^{S}L_{3}}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}}+\frac{t_{1}^{\prime 2}\beta\Delta^{\sigma S}L_{4}}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}}), (29)
m2(11+t2Δσt22βΔσΔSt12t22β2(ΔσS)2)=m1t12t22β2ΔσSΔσΔSt12t22β2(ΔσS)2\displaystyle m_{2}(\frac{1}{1+t_{2}}-\frac{\Delta^{\sigma}t_{2}^{\prime 2}\beta}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}})=\frac{m_{1}t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}\Delta^{\sigma S}}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}}
+DxDy1L1(ΔσSt22βL3ΔσΔSt12t22β2(ΔσS)2+ΔσL4ΔσΔSt12t22β2(ΔσS)2),\displaystyle+\int DxDy\frac{1}{L_{1}}(\frac{\Delta^{\sigma S}t_{2}^{\prime 2}\beta L_{3}}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}}+\frac{\Delta^{\sigma}L_{4}}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}}), (30)
Qσ(1+t1)2t1=\displaystyle\frac{Q^{\sigma}(1+t_{1})}{2t_{1}}= (31)
ΔS2βΔσΔS2t12t22β3(ΔσS)2[(ΔS)2t12β(αrσ+m12+1β2t14)\displaystyle-\frac{\Delta^{S}}{2\beta\Delta^{\sigma}\Delta^{S}-2t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{3}(\Delta^{\sigma S})^{2}}-[(\Delta^{S})^{2}t_{1}^{\prime 2}\beta(\alpha r^{\sigma}+m_{1}^{2}+\frac{1}{\beta^{2}t_{1}^{\prime 4}})
+t12t24β3(ΔσS)2(αrS+m22+1β2t24)+2t12t22β2ΔσSΔS(rσSα2+m1m2)]\displaystyle+t_{1}^{\prime 2}t_{2}^{\prime 4}\beta^{3}(\Delta^{\sigma S})^{2}(\alpha r^{S}+m_{2}^{2}+\frac{1}{\beta^{2}t_{2}^{\prime 4}})+2t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}\Delta^{\sigma S}\Delta^{S}(\frac{r^{\sigma S}\alpha}{2}+m_{1}m_{2})]
×{2[ΔσΔSt12t22β2(ΔσS)2]2}1\displaystyle\times\{2[\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]^{2}\}^{-1}
+DxDy{L1[ΔσΔSt12t22β2(ΔσS)2]2}1\displaystyle+\int DxDy\{L_{1}[\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]^{2}\}^{-1}
×{[(ΔS)2(m1+αrσ2(r+x+ry))\displaystyle\times\{[-(\Delta^{S})^{2}(m_{1}+\sqrt{\frac{\alpha r^{\sigma}}{2}}(r_{+}x+r_{-}y))
t22βΔSΔσS(m2+αrS2(r+xry))]L3\displaystyle-t_{2}^{\prime 2}\beta\Delta^{S}\Delta^{\sigma S}(m_{2}+\sqrt{\frac{\alpha r^{S}}{2}}(r_{+}x-r_{-}y))]L_{3}
[t12t22β2(ΔσS)2(m2+αrS2(r+xry))\displaystyle-[t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}(m_{2}+\sqrt{\frac{\alpha r^{S}}{2}}(r_{+}x-r_{-}y))
+t12βΔSΔσS(m1+αrσ2(r+x+ry))]L4ΔSΔσSL2},\displaystyle+t_{1}^{\prime 2}\beta\Delta^{S}\Delta^{\sigma S}(m_{1}+\sqrt{\frac{\alpha r^{\sigma}}{2}}(r_{+}x+r_{-}y))]L_{4}-\Delta^{S}\Delta^{\sigma S}L_{2}\},
QS(1+t2)2t2=Δσ2βΔσΔS2t12t22β3(ΔσS)2\displaystyle\frac{Q^{S}(1+t_{2})}{2t_{2}}=-\frac{\Delta^{\sigma}}{2\beta\Delta^{\sigma}\Delta^{S}-2t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{3}(\Delta^{\sigma S})^{2}} (32)
[(Δσ)2t22β(αrS+m22+1β2t24)+t22t14β3(ΔσS)2(αrσ+m12+1β2t4)\displaystyle-[(\Delta^{\sigma})^{2}t_{2}^{\prime 2}\beta(\alpha r^{S}+m_{2}^{2}+\frac{1}{\beta^{2}t_{2}^{\prime 4}})+t_{2}^{\prime 2}t_{1}^{\prime 4}\beta^{3}(\Delta^{\sigma S})^{2}(\alpha r^{\sigma}+m_{1}^{2}+\frac{1}{\beta^{2}t^{\prime 4}})
+2t12t22β2ΔσSΔσ(rσSα2+m1m2)]{2[ΔσΔSt12t22β2(ΔσS)2]2}1\displaystyle+2t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}\Delta^{\sigma S}\Delta^{\sigma}(\frac{r^{\sigma S}\alpha}{2}+m_{1}m_{2})]\{2[\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]^{2}\}^{-1}
+DxDy{L1[ΔσΔSt12t22β2(ΔσS)2]2}1\displaystyle+\int DxDy\{L_{1}[\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]^{2}\}^{-1}
×{[(Δσ)2(m2+αrS2(r+xry))\displaystyle\times\{[-(\Delta^{\sigma})^{2}(m_{2}+\sqrt{\frac{\alpha r^{S}}{2}}(r_{+}x-r_{-}y))
t12βΔσΔσS(m1+αrσ2(r+x+ry))]L4\displaystyle-t_{1}^{\prime 2}\beta\Delta^{\sigma}\Delta^{\sigma S}(m_{1}+\sqrt{\frac{\alpha r^{\sigma}}{2}}(r_{+}x+r_{-}y))]L_{4}
[t12t22β2(ΔσS)2(m1+αrσ2(r+x+ry))\displaystyle-[t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}(m_{1}+\sqrt{\frac{\alpha r^{\sigma}}{2}}(r_{+}x+r_{-}y))
+t22βΔσΔσS(m2+αrS2(r+xry))]L3ΔσΔσSL2},\displaystyle+t_{2}^{\prime 2}\beta\Delta^{\sigma}\Delta^{\sigma S}(m_{2}+\sqrt{\frac{\alpha r^{S}}{2}}(r_{+}x-r_{-}y))]L_{3}-\Delta^{\sigma}\Delta^{\sigma S}L_{2}\},
QσS=t12t22βΔσSΔσΔSt12t22β2(ΔσS)2+t12t22β2(rσS2+m1m2)ΔσΔSt12t22β2(ΔσS)2\displaystyle Q^{\sigma S}=\frac{t_{1}^{\prime 2}t_{2}^{\prime 2}\beta\Delta^{\sigma S}}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}}+\frac{t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\frac{r^{\sigma S}}{2}+m_{1}m_{2})}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}} (33)
+{t12t22β2ΔσS[ΔSt12β(αrσ+m12+1β2t14)+Δσt22β(αrS+m22+1β2t24)\displaystyle+\{t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}\Delta^{\sigma S}[\Delta^{S}t_{1}^{\prime 2}\beta(\alpha r^{\sigma}+m_{1}^{2}+\frac{1}{\beta^{2}t_{1}^{\prime 4}})+\Delta^{\sigma}t_{2}^{\prime 2}\beta(\alpha r^{S}+m_{2}^{2}+\frac{1}{\beta^{2}t_{2}^{\prime 4}})
+2t12t22β2ΔσS(rσSα2+m1m2)]}[ΔσΔSt12t22β2(ΔσS)2]2\displaystyle+2t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}\Delta^{\sigma S}(\frac{r^{\sigma S}\alpha}{2}+m_{1}m_{2})]\}[\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]^{-2}
+DxDy{L1[ΔσΔSt12t22β2(ΔσS)2]2}1{L3\displaystyle+\int DxDy\{L_{1}[\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]^{2}\}^{-1}\{L_{3}
×[2t12t22β2ΔσSΔS(m1+αrσ2(r+x+ry))+(t12t24β3(ΔσS)2+t22βΔσΔS)\displaystyle\times[2t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}\Delta^{\sigma S}\Delta^{S}(m_{1}+\sqrt{\frac{\alpha r^{\sigma}}{2}}(r_{+}x+r_{-}y))+(t_{1}^{\prime 2}t_{2}^{\prime 4}\beta^{3}(\Delta^{\sigma S})^{2}+t_{2}^{\prime 2}\beta\Delta^{\sigma}\Delta^{S})
×(m2+αrS2(r+xry))]+[2t12t22β2ΔσSΔσ(m2+αrS2(r+xry))\displaystyle\times(m_{2}+\sqrt{\frac{\alpha r^{S}}{2}}(r_{+}x-r_{-}y))]+[2t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}\Delta^{\sigma S}\Delta^{\sigma}(m_{2}+\sqrt{\frac{\alpha r^{S}}{2}}(r_{+}x-r_{-}y))
+(t12βΔσΔS+t14t22β3(ΔσS)2)(m1+αrσ2(r+x+ry))]L4\displaystyle+(t_{1}^{\prime 2}\beta\Delta^{\sigma}\Delta^{S}+t_{1}^{\prime 4}t_{2}^{\prime 2}\beta^{3}(\Delta^{\sigma S})^{2})(m_{1}+\sqrt{\frac{\alpha r^{\sigma}}{2}}(r_{+}x+r_{-}y))]L_{4}
+[ΔσΔS+t12t22β2(ΔσS)2]L2},\displaystyle+[\Delta^{\sigma}\Delta^{S}+t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]L_{2}\},
qσ=QσΔSt12ΔσΔSt12t22β2(ΔσS)21αβDxDy\displaystyle q^{\sigma}=Q^{\sigma}-\frac{\Delta^{S}t_{1}^{\prime 2}}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}}-\frac{1}{\alpha\beta}\int DxDy (34)
×{L1[4ΔσΔS4t12t22β2(ΔσS)2]}1{L3\displaystyle\times\{L_{1}[4\Delta^{\sigma}\Delta^{S}-4t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]\}^{-1}\{L_{3}
×[ΔS(4α2rσ(r+x+ry)+rσSrσα2rS(xr++yr))\displaystyle\times[\Delta^{S}(4\sqrt{\frac{\alpha}{2r^{\sigma}}}(r_{+}x+r_{-}y)+\frac{r^{\sigma S}}{r^{\sigma}}\sqrt{\frac{\alpha}{2r^{S}}}(-\frac{x}{r_{+}}+\frac{y}{r_{-}}))
rσSt22βΔσSα2(rσ)3(xr++yr)]\displaystyle-r^{\sigma S}t_{2}^{\prime 2}\beta\Delta^{\sigma S}\sqrt{\frac{\alpha}{2(r^{\sigma})^{3}}}(\frac{x}{r_{+}}+\frac{y}{r_{-}})]
+L4[t12βΔσS(4α2rσ(r+x+ry)+rσSrσα2rS(xr++yr))\displaystyle+L_{4}[t_{1}^{\prime 2}\beta\Delta^{\sigma S}(4\sqrt{\frac{\alpha}{2r^{\sigma}}}(r_{+}x+r_{-}y)+\frac{r^{\sigma S}}{r^{\sigma}}\sqrt{\frac{\alpha}{2r^{S}}}(-\frac{x}{r_{+}}+\frac{y}{r_{-}}))
rσSΔσα2(rσ)3(xr++yr)]},\displaystyle-r^{\sigma S}\Delta^{\sigma}\sqrt{\frac{\alpha}{2(r^{\sigma})^{3}}}(\frac{x}{r_{+}}+\frac{y}{r_{-}})]\},
qS=QSΔσt22ΔσΔSt12t22β2(ΔσS)21αβDxDy\displaystyle q^{S}=Q^{S}-\frac{\Delta^{\sigma}t_{2}^{\prime 2}}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}}-\frac{1}{\alpha\beta}\int DxDy (35)
×{L1[4ΔσΔS4t12t22β2(ΔσS)2]}1{L4\displaystyle\times\{L_{1}[4\Delta^{\sigma}\Delta^{S}-4t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]\}^{-1}\{L_{4}
×[Δσ(4α2rS(r+xry)rσSrSα2rσ(xr++yr)]\displaystyle\times[\Delta^{\sigma}(4\sqrt{\frac{\alpha}{2r^{S}}}(r_{+}x-r_{-}y)-\frac{r^{\sigma S}}{r^{S}}\sqrt{\frac{\alpha}{2r^{\sigma}}}(\frac{x}{r_{+}}+\frac{y}{r_{-}})]
+rσSt12βΔσSα2(rS)3(xr++yr)}+L3\displaystyle+r^{\sigma S}t_{1}^{\prime 2}\beta\Delta^{\sigma S}\sqrt{\frac{\alpha}{2(r^{S})^{3}}}(-\frac{x}{r_{+}}+\frac{y}{r_{-}})\}+L_{3}
×[t22βΔσS(4α2rS(r+xry)\displaystyle\times[t_{2}^{\prime 2}\beta\Delta^{\sigma S}(4\sqrt{\frac{\alpha}{2r^{S}}}(r_{+}x-r_{-}y)
rσSrSα2rσ(xr++yr))+rσSΔSα2(rS)3(xr++yr)]},\displaystyle-\frac{r^{\sigma S}}{r^{S}}\sqrt{\frac{\alpha}{2r^{\sigma}}}(\frac{x}{r_{+}}+\frac{y}{r_{-}}))+r^{\sigma S}\Delta^{S}\sqrt{\frac{\alpha}{2(r^{S})^{3}}}(-\frac{x}{r_{+}}+\frac{y}{r_{-}})]\},
qσS=QσSt12t22βΔσSΔσΔSt12t22β2(ΔσS)2\displaystyle q^{\sigma S}=Q^{\sigma S}-\frac{t_{1}^{\prime 2}t_{2}^{\prime 2}\beta\Delta^{\sigma S}}{\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}} (36)
1αβDxDy{2L1[ΔσΔSt12t22β2(ΔσS)2]}1\displaystyle-\frac{1}{\alpha\beta}\int DxDy\{2L_{1}[\Delta^{\sigma}\Delta^{S}-t_{1}^{\prime 2}t_{2}^{\prime 2}\beta^{2}(\Delta^{\sigma S})^{2}]\}^{-1}
×{L3[ΔSα2rS(xr+yr)+t22βΔσSα2rσ(xr++yr)]\displaystyle\times\{L_{3}[\Delta^{S}\sqrt{\frac{\alpha}{2r^{S}}}(\frac{x}{r_{+}}-\frac{y}{r_{-}})+t_{2}^{\prime 2}\beta\Delta^{\sigma S}\sqrt{\frac{\alpha}{2r^{\sigma}}}(\frac{x}{r_{+}}+\frac{y}{r_{-}})]
+L4[Δσα2rσ(xr++yr)+t12βΔσSα2rS(xr+yr)]}.\displaystyle+L_{4}[\Delta^{\sigma}\sqrt{\frac{\alpha}{2r^{\sigma}}}(\frac{x}{r_{+}}+\frac{y}{r_{-}})+t_{1}^{\prime 2}\beta\Delta^{\sigma S}\sqrt{\frac{\alpha}{2r^{S}}}(\frac{x}{r_{+}}-\frac{y}{r_{-}})]\}.

Appendix C Zero temperature equations

To obtain zero temperature equations, it is necessary to deal with the three following combinations:

I1=L2L1,I2=L3L1,I3=L4L1.\displaystyle I_{1}=\frac{L_{2}}{L_{1}},\,\,\,I_{2}=\frac{L_{3}}{L_{1}},\,\,\,I_{3}=\frac{L_{4}}{L_{1}}. (37)

They become

I1=14{[sign(υ+Υ)+1][sign(υΥ)+1]sign(η+Υ)\displaystyle I_{1}=\frac{1}{4}\{[\operatorname{sign}(\upsilon+\Upsilon)+1][\operatorname{sign}(\upsilon-\Upsilon)+1]\operatorname{sign}(\eta+\Upsilon) (38)
+[sign(υ+Υ)+1][sign(υΥ)+1]sign(η+υ)\displaystyle+[\operatorname{sign}(\upsilon+\Upsilon)+1][-\operatorname{sign}(\upsilon-\Upsilon)+1]\operatorname{sign}(\eta+\upsilon)
+[sign(υ+Υ)+1][sign(υΥ)+1]sign(ηυ)\displaystyle+[-\operatorname{sign}(\upsilon+\Upsilon)+1][\operatorname{sign}(\upsilon-\Upsilon)+1]\operatorname{sign}(\eta-\upsilon)
+[sign(υ+Υ)+1][sign(υΥ)+1]sign(ηΥ)}.\displaystyle+[-\operatorname{sign}(\upsilon+\Upsilon)+1][-\operatorname{sign}(\upsilon-\Upsilon)+1]\operatorname{sign}(\eta-\Upsilon)\}.
I2=14{[sign(υ+Υ)+1][sign(υΥ)+1]\displaystyle I_{2}=\frac{1}{4}\{[\operatorname{sign}(\upsilon+\Upsilon)+1][\operatorname{sign}(\upsilon-\Upsilon)+1]
+[sign(υ+Υ)+1][sign(υΥ)+1]sign(η+υ)\displaystyle+[\operatorname{sign}(\upsilon+\Upsilon)+1][-\operatorname{sign}(\upsilon-\Upsilon)+1]\operatorname{sign}(\eta+\upsilon)
+[sign(υ+Υ)+1][sign(υΥ)+1]sign(υη)\displaystyle+[-\operatorname{sign}(\upsilon+\Upsilon)+1][\operatorname{sign}(\upsilon-\Upsilon)+1]\operatorname{sign}(\upsilon-\eta)
[sign(υ+Υ)+1][sign(υΥ)+1]}\displaystyle-[-\operatorname{sign}(\upsilon+\Upsilon)+1][-\operatorname{sign}(\upsilon-\Upsilon)+1]\}
I3=14{[sign(υ+Υ)+1][sign(υΥ)+1]sign(η+Υ)\displaystyle I_{3}=\frac{1}{4}\{[\operatorname{sign}(\upsilon+\Upsilon)+1][\operatorname{sign}(\upsilon-\Upsilon)+1]\operatorname{sign}(\eta+\Upsilon)
+[sign(υ+Υ)+1][sign(υΥ)+1]sign(Υη)\displaystyle+[-\operatorname{sign}(\upsilon+\Upsilon)+1][-\operatorname{sign}(\upsilon-\Upsilon)+1]\operatorname{sign}(\Upsilon-\eta)
+[sign(υ+Υ)+1][sign(υΥ)+1]\displaystyle+[\operatorname{sign}(\upsilon+\Upsilon)+1][-\operatorname{sign}(\upsilon-\Upsilon)+1]
[sign(υ+Υ)+1][sign(υΥ)+1]}.\displaystyle-[-\operatorname{sign}(\upsilon+\Upsilon)+1][\operatorname{sign}(\upsilon-\Upsilon)+1]\}.

It is possible to remove one of the integrals as it is done in [4] and [8], but in this case the equations would become substantially bigger and there would not be a visible simplification. Besides that change, it is necessary to substitute qρq^{\rho} by cρβ(Qρqρ)c^{\rho}\equiv\beta(Q^{\rho}-q^{\rho}) in the same way that it is done in [4] and [8]. We did not find any particularly interesting property in this region, so we did not focus on it.