Gibbs measures with multilinear forms
Abstract.
In this paper, we study a class of multilinear Gibbs measures with Hamiltonian given by a generalized -statistic and with a general base measure. Expressing the asymptotic free energy as an optimization problem over a space of functions, we obtain necessary and sufficient conditions for replica-symmetry. Utilizing this, we obtain weak limits for a large class of statistics of interest, which includes the “local fields/magnetization”, the Hamiltonian, the global magnetization, etc. An interesting consequence is a universal weak law for contrasts under replica symmetry, namely, weakly, if . Our results yield a probabilistic interpretation for the optimizers arising out of the limiting free energy. We also prove the existence of a sharp phase transition point in terms of the temperature parameter, thereby generalizing existing results that were only known for quadratic Hamiltonians. As a by-product of our proof technique, we obtain exponential concentration bounds on local and global magnetizations, which are of independent interest.
Key words and phrases:
Graph limits, magnetization, phase transition, replica-symmetry, tensor Ising model1991 Mathematics Subject Classification:
82B20, 05C801. Introduction
Suppose is a (non-degenerate) probability measure on . Let be a finite graph with vertices labeled , and maximum degree . Fixing , define a function
(1.1) |
where be a multilinear form, defined by
(1.2) |
Here is the set of all distinct tuples from (so that ), and is a symmetric matrix with on the diagonal. If is such that is finite, we can define a Gibbs probability measure on by setting
(1.3) |
Several Gibbs measures of interest can be expressed in the form (1.3) with various choices of . Below we give two examples of such Gibbs measures which have been well studied in Probability and Statistics.
-
•
If is an edge, then
is a Gibbs measure with a quadratic Hamiltonian. In particular if is supported on , then is the celebrated Ising model on with coupling matrix (see [24, 3, 12, 1] for various examples). Popular examples of include the adjacency matrix of the complete graph, line graph, random graphs such as Erdős-Rényi or random -regular, and so on.
- •
In this paper, we study the generalized model (1.3), when the sequence of matrices converge in weak cut metric (defined by (1.4)). Our main contributions are:
(a) We give an exact characterization for replica-symmetry for the asymptotic free energy/log partition function (see 1.2).
(b) We obtain weak limits for a large family of statistics which include the Hamiltonian, “local magnetizations”, global magnetization, and contrasts (see Theorems 1.3 and 1.6).
(c) We provide tail bounds for global and local magnetizations (see 1.4).
(d) We show the existence of a “phase transition” for multilinear Gibbs measures of the form (1.3) with compactly supported (see 1.9).
1.1. Main results
To establish our main results, we will assume throughout that the sequence of matrices converges in the weak cut distance (defined below). Cut distance/cut metric has been introduced in the combinatorics literature to study limits of graphs and matrices (see [21]), and have received significant attention in the recent literature ([8, 9, 10, 11]). For more details on cut metric and its manifold applications, we refer the interested reader to [27]. Below we formally introduce the notion of strong and weak cut distances used in this paper.
Definition 1.1.
Suppose is the space of all symmetric real-valued functions in . Given two functions , define the strong cut distance between by setting
In the above display, the supremum is taken over all measurable subsets of . Define the weak cut distance by
where ranges from all measure preserving bijections and .
Given a symmetric matrix , define a function by setting
We will assume throughout the paper that the sequence of matrices introduced in (1.2) converge in weak cut distance, i.e. for some ,
(1.4) |
We now introduce some notation that will be used throughout the rest of the paper.
Definition 1.2.
Let denote the set of probability measures on , equipped with weak topology. Given a probability measure , let and denote its first and second marginals respectively. Also define for . Define as follows:
Also define as follows:
(1.5) |
Note that is a closed subset of (by Fatou’s Lemma), and is a closed subset of , in the weak topology. For two measures on , define
where the supremum is over the set of functions which are -Lipschitz.
We now introduce the exponential tilt of the base measure , and some related notations. This requires the following assumption, which we make throughout the paper: For all and some , we have
(1.6) |
where the case corresponds to assuming is compactly supported.
Definition 1.3.
Given (1.6), the function
is finite for all . Define the -exponential tilt of by setting
Then the function is infinitely differentiable, with
Consequently the function is strictly increasing on , and has an inverse , where is an open interval. Let denote the closure of a set in , and extend to a (possibly infinite valued) function on by setting
We write to denote the standard Kullback-Leibler divergence. Define a function by setting
Definition 1.4.
Let denote the space of all measurable functions such that .
Define a map as follows:
For any , if , then , and given , one has
Definition 1.5.
Fix and let be as defined above. Define the functional by setting
whenever (see 1.1 below for sufficient conditions).
Finally, let be a map from to defined by
(1.7) |
The following proposition characterizes the asymptotics of the log partition function/free energy in terms of an infinite dimensional optimization problem, and gives a characterization for the class of optimizers in terms of a fixed point equation.
Proposition 1.1.
Suppose that satisfies (1.6) for some and all . Let be a sequence of matrices such that (1.4) holds for some , and
(1.8) |
for some such that . Then the following conclusions hold.
(i) The function is well-defined on , i.e., for all .
(ii) With as in (1.1), we have and
(1.9) | |||||
The above proposition follows from [4, Theorems 1.1 and 1.6].
Remark 1.1.
1.1.1. Replica-symmetry
The above proposition shows that the infinite dimensional optimization problem in the second line of (1.9) is useful for understanding the Gibbs measure . (see parts (iii) and (iv)). A natural question is when does the set of optimizers of (1.9) consist only of constant functions. Equivalently, borrowing terminology from statistical physics, we want to understand the “replica-symmetry” phase of the Gibbs measure . Our first main result provides necessary and sufficient conditions for optimizers to be constant functions. For this we need the following two definitions.
Definition 1.6.
Given a symmetric matrix , define a symmetric tensor
where denote the set of all permutations of . In a similar vein, given a symmetric function , define the symmetric function
As an example, if , equals
whereas if , equals
Let
(1.12) |
provided the integral exists and is finite.
Definition 1.7.
Let be a measure in and be as in 1.3. We will say is stochastically non-negative, if for any , if then , and
We now state the first main result of this paper.
Theorem 1.2 (Replica-symmetry).
Suppose we are in the setting of 1.1. Then is finite a.s., and the following conclusions hold:
(i) Any maximizer of the optimization problem (1.9) satisfies
(1.13) |
(ii) If and is not constant a.s., none of the maximizers in (1.9) are non-zero constant functions.
(iii) If is constant a.s., and is strictly positive a.s., then all of the maximizers in (1.9) are constant functions, provided either is even or is stochastically non-negative.
(iv) is stochastically non-negative if one of the following conditions hold:
(a) is supported on the non-negative half line, or
(b) is a non-negative tilt of a symmetric measure, i.e. there exists and a symmetric measure , such that .
Remark 1.2.
It follows from the construction of the map in 1.4 that if is a constant function, then is a product measure. Thus under the conditions of 1.2 part (iii), any weak limit of the empirical measure (introduced in (1.7)) under is a product measure. Further these product measures have first marginal and second marginal of the form where satisfies the fixed point equation
by 1.2 part (i). In particular if and is supported on with for some , the above fixed point equation simplifies to
The solutions of this equation for , are well understood, see e.g., [14, Page 2] and [18, page 144, Section 1.1.3]. In 1.7 below we study a broader class of measures , which includes this as a special case.
Example 1.1 (Necessity of conditions on ).
To demonstrate the necessity of stochastic non-negativeness for in 1.2 part (iii), we provide an example of a measure and a graphon with is constant a.s. and , but none of the maximizers are constant functions. Set and
In this case is the graphon corresponding to a complete tripartite graph. Let be a probability measure on with . Set and
Numerical computations show that
Thus all global optimizers must be non-constant functions when , and the measure is a negative tilt of a symmetric distribution. Note that the function in this counterexample is not strictly positive a.s., but this can be circumvented by a continuity argument to allow for small positive values in the diagonal blocks.
Example 1.2 (Necessity of ).
The requirement is indeed necessary in 1.2 part (iii). To see a counterexample, consider the case and
Let be a compactly supported probability measure which is symmetric about . If is large and negative, then one can show numerically that no optimizer of (1.9) is a constant function, and any optimizer is of the form
where and are of opposite signs. Similar to the previous remark, even though is not strictly positive a.s., this can be circumvented by a continuity argument to allow for small positive values in the diagonal blocks.
1.1.2. Weak laws and tail bounds
Having studied the optimizers of the limiting free energy under model (1.3) in 1.2, the next natural question is to obtain weak laws for various statistics of interest under (1.3). Some popular examples include the Hamiltonian (see (1.2)), the global magnetization or other interesting linear combinations of ’s. In the sequel, we will obtain a family of such weak limits in a unified fashion. Along the way, we will provide a probabilistic interpretation of the optimizers of the limiting free energy. The key tool that will help us address these question simultaneously is a sharp analysis of the vector of local fields, which we define below.
Definition 1.8 (Local fields).
The ’s defined in (1.14) are local magnetizations/fields which capture “how well” one can predict given all , . More precisely, under model (1.3), the conditional distribution of given , is completely determined by in the following manner:
(1.16) |
with as in 1.3. In particular, by (1.16) and 1.3,
(1.17) |
Consequently, understanding the behavior of the vector or
plays an important role in obtaining correlation bounds, tail decay estimates and fluctuations for such Gibbs measures (see [22, 13, 17, 16, 7] and the references therein). The major focus of the existing literature is on the special case of and supported on , which is not needed in our paper.
Our second main result gives a weak law for the vector , in terms of the empirical measure . For any measure (see (1.5)), define
(1.18) |
where , and is as in 1.6. In the Theorem below, we will provide sufficient conditions under which is well-defined. While the definition of may seem abstract at first, it simplifies nicely in the context of 1.2. To see this, assume that for some , where is as in 1.1 part (iv). By 1.4 we have for some . So,
By invoking (1.13), we then get for a.e. ,
(1.19) |
Therefore, there is a direct one-one correspondence between the two sets of functions and . This observation will be crucial in the proof of 1.5 below.
We are now in a position to state our second main result.
Theorem 1.3.
Suppose (1.6) holds for some . Assume that (1.4) and (1.8) holds with some satisfying . Then the following conclusions hold:
(ii) Set
Then we have:
(1.20) |
The above theorem gives a weak limit for the empirical measure of the local field vector . The weak law for the empirical measure of the conditional means (introduced in (1.17)) then follows from 1.3 by a continuous mapping type argument. The limit in that case will naturally be the set
(1.21) |
We stress here that no assumption of replica-symmetry is necessary for 1.3 to hold.
Remark 1.3.
Given , let denote the conditional distribution of the second coordinate given the first coordinate. The proof of (1.20) can be adapted to show the following stronger conclusion:
where
Since this version is not necessary for our applications, we do not prove it here.
In order to obtain weak laws for common statistics of interest using 1.3, we require appropriate tail estimates for the ’s, the ’s and the ’s. In particular, we will derive exponential tail bounds for the said quantities below, which is of possible independent interest.
Theorem 1.4.
In order to interpret part (ii) of the above theorem, note that , , and converge weakly in probability to the set of probability measures (by (1.10)), (by (1.20)), and (as discussed around (1.21)) respectively. 1.4 part (ii) shows that these limiting set of measures have uniformly bounded moments of a suitable order.
In view of the above results, coupled with the observation made in (1.19), it seems intuitive to expect a correspondence between elements of and the map . This is made precise in the following corollary.
Remark 1.4.
Recall from (1.17) that . Therefore (1.26) shows that the functions in are “close” to the vector of conditional expectations of ’s given all the other coordinates. In particular, if is a singleton and is continuous on , then (1.26) and (1.17) together imply that
For the special case where , is supported on with , and a.s, consists of two constant functions of the form for some (see [18, Page 144]; also see 1.7 for general ). The symmetry of around , coupled with (1.26) implies
Remark 1.5.
Note that the above displays are not true with replacing . This shows ’s are “more concentrated” than ’s.
As applications of 1.3 and 1.4, we obtain weak limits for linear statistics as well as the Hamiltonian under replica-symmetry, both of which are of independent interest.
Theorem 1.6.
Suppose (defined via (1.3)) for some base measure which satisfies (1.6) for some . Assume that either is even or is stochastically non-negative. Further, suppose that satisfies (1.4) and (1.8) for some satisfying , and that the limiting is strictly positive and satisfies a.s. Then, setting
(1.27) |
we have is a finite set, and the following conclusions hold:
(i) Suppose is a real sequence satisfying , and for some such that . Then we have:
(ii) If we replace with , then
(iii) The Hamiltonian satisfies
Remark 1.6.
Part (i) of the above Theorem shows that for contrast vectors (i.e. ) which are delocalized (in the sense ), the corresponding linear statistic exhibits a universal behavior across general Gibbs measures with higher order multilinear interactions which doesn’t depend on the matrix sequence , as long as is constant, i.e. the symmetrized tensor is regular. In a similar manner, part (ii) gives a universal behavior for the global magnetization for regular tensors. Universality results for were earlier obtained for regular Ising models, which correspond to and is supported on with (see [3, Theorem 2.1] and [17, Theorems 1.1—1.4]). In this special case, for we have for some (see Remark 1.4). In this case symmetry implies (see 1.8 part (ii) below for a more general result) that:
The more recent work of [25] demonstrates universality for quadratic interactions for log concave (see [25, Theorem 1.1 and Corollary 1.4]). We note that the results of the current paper requires neither quadratic interactions, nor log-concave base measures. In the following subsection, we will apply 1.6 to analyze a broad class of Gibbs measures which are not necessarily quadratic, and cover cubic and higher order interactions (see 1.9 below).
1.2. Examples
We now apply our general results to analyze some Gibbs measures of interest. In 1.2, we proved that for regular tensors (i.e. when a.s.) the optimization problem (1.9) has only constant functions as optimizers, under mild assumptions on or . In this section, we focus on particular examples of the regular case, and provide more explicit description for the optimizers.
1.2.1. Quadratic interaction models with symmetric base measure
Suppose is a probability measure on which is symmetric about the origin. Define a Gibbs measure on by setting
(1.28) |
where , . In particular if is supported on reduces the above model to the celebrated Ising model, which has attracted significant attention in probability and statistics (c.f. [3, 18, 20, 30, 17] and references there-in). The following results analyzes the optimization problem (1.9) in the particular setting (which corresponds to setting ).
Lemma 1.7.
Let be a probability measure symmetric about , which satisfies (1.6) with , and let , , and be as in 1.3. Assume that
(1.29) |
Then, setting
for , , the following conclusions hold:
-
(i)
If and , then has a unique maximizer at .
-
(ii)
If , then has a unique maximizer with the same sign as that of which satisfies .
-
(iii)
If and , then has two maximizers , where , and .
Remark 1.7.
A few comments about the extra condition (1.29) utilized in the above lemma are in order. First note that that if any measure satisfies the celebrated GHS inequality of statistical physics (see [19]), then must satisfy (1.29). Indeed, taking the matrix in [19, (1.2)] to be the matrix, it follows on applying the GHS inequality ([19, (1.4)] that for , which immediately implies (1.29). Sufficient conditions on for the GHS inequality (and hence (1.29)) can be found in [19, Theorem 1.2]. In [19, (1.5)] the authors give a counterexample where GHS inequality fails. Using the same example, it is not hard to show that (1.29) fails in this case as well, and does not have a unique maximizer for and some .
Proposition 1.8.
Suppose that the measure satisfies the assumptions of 1.7, and satisfy (1.4) and (1.8) with (i.e. ). Also assume that the limiting graphon is strictly positive a.s., and satisfies a.s. With as in 1.4, define the functional by setting
(1.30) |
Then is well defined by 1.1 part (i), and further, the following conclusions hold:
(i) For any , the optimization problem
(1.31) |
has only constant global maximizers, given by
Here is as in 1.7.
(ii) The following weak limits hold:
Remark 1.8.
We note here that in contrast to 1.6 part (iii) which only allows us to identify possible limit points for the random variable , in this case we are able to identify the limit, even in the low temperature regime . This is because, even though has two roots which are both global optimzers (i.e. in defined in 1.6), the optimizers are symmetric, and the limit of the Hamiltonian is the same under both optimizers.
1.2.2. Gibbs measures with higher order interactions
We now focus on Gibbs measures with higher order interactions, which has gained significant attention in recent years (see [29, 2, 23, 26, 31, 32, 33, 6] and the references therein). Here, we analyze the optimization problem (1.27), under some conditions on and . We point the reader to [5, Section 2.1] for related results in the special case where is supported on .
Theorem 1.9.
Consider the optimization problem
(1.32) |
Then the following conclusions hold:
(i) Fixing , the maximizers of the optimization problem are attained and satisfies the equation
(1.33) |
(ii) If is constant a.s., and is strictly positive a.s., and , then all of the maximizers are constant functions, provided provided either is even or is stochastically non-negative. Further any such constant maximizer satisfies
(1.34) |
(iii) Suppose further that is compactly supported on Then the following hold:
-
(a)
There exists such that if , the optimization problem has a unique maximizer.
-
(b)
If and , there exists such that if , the optimization problem has the unique maximizer , whereas if , then is not a global maximizer.
1.3. Proof overview and future scope
Let us discuss the proof techniques employed in the characterization of replica symmetry (see 1.2) and weak laws (see Theorems 1.3 and 1.6). In 1.2 part (i), we establish the first-order conditions (in (1.13)) for the optimization problem (1.9). It is immediate from (1.13) that if all optimizers of (1.9) are constants, then is a constant function (1.2, part (ii)). For the other direction, if is a constant, the crucial observation is that is a (possibility un-normalized) probability density function on , with marginals. The conclusion in 1.2 part (iii) then follows from the equality conditions of Hölder’s inequality. We provide examples to demonstrate that our conditions required for replica symmetry are essentially tight.
The weak limits we prove involve a number of technical steps. We distil some of the main ideas here in the context of the universal result that provided , under any multilinear Gibbs in the replica symmetry phase (see 1.6 part (ii)). For simplicity, let us assume that the optimization problem (1.9) has a unique constant optimizer, say (note that the actual result does not require uniqueness of optimizers). Let us split the proof outline into a few steps.
Step (i). Recall the definition of from (1.14). We first show that
This is the subject of 2.10 part (a), and proceeds with a second moment argument, after a suitable truncation. The above display now suggests that it is sufficient to show that .
Step (ii). Based on step (i), it is natural to focus on the vector of local fields . The advantage of working with rather than is that each is a -th order “weighted average”, and hence they are much more “concentrated” than ’s. We provide a formalization in 1.3 where (in the current setting) we show that
(1.35) |
which is a degenerate limit. In contrast, by 1.1 part (iii), which is a non-degenerate limit. The proof of 1.3 relies primarily on 2.5, which is a stability lemma, the proof of which proceeds relies on counting lemma for graphons. In fact, 2.5 can be viewed as a refinement of the counting lemma in [8, Proposition 2.19] for “star-like” graphs.
Step (iii). Based on (1.35) in step (ii), it is natural to consider the following approximation:
In the first approximation, we have essentially replaced the ’s by the corresponding weak limit from (1.35). To make this rigorous, we need some moment estimates which are immediate byproducts of the exponential tail bounds in 1.4. The final conclusion in the above display uses the condition .
More generally, the weak limit of in 1.3 has broad applications. We use it in 1.5 to provide a probabilistic interpretation of the optimizers of (1.9) (note that this does not require the optimizers to be constant functions). We also use 1.3 to derive other weak laws of interest in 1.6. We note in passing that many other statistics of interest, such as the maximum likelihood or the pseudo-maximum likelihood estimators for the inverse temperature parameter are also expressible (sometimes implicitly) as functions of . Consequently one can derive appropriate weak laws for these estimators using 1.3 as well.
Our work leads to several important future research directions. Our results apply, as a special case, to Ising models with quadratic Hamiltonians, and a general base measure. A first question is to extend the techniques of this paper to cover more general Hamiltonians from statistical physics, such as Potts models. Another related question is to go beyond the setting of cut metric convergence, and allow for the matrix to converge in other topologies (such as local weak convergence on bounded degree graphs). A third question is to study Gibbs measure under more general tensor Hamiltonians, which cannot be specified by a matrix . This would require significant development of cut metric theory for cubic and higher order functions. Finally, it remains to be seen whether we can answer more delicate questions about such Gibbs measures, which include Central Limit Theorems/limit distributions.
1.4. Outline of the paper
In Sections 2 and 3, we prove our main results from Sections 1.1 and 1.2 respectively. The proofs of the major technical lemmas (in the order in which they are presented in the paper) are provided in Section 4. In the Appendix 5, we defer the proof some of our supporting lemmas, which deal with properties of the base measure , and general results on weak convergence.
2. Proof of Main Results
2.1. Proofs of 1.1 and 1.2
In order to prove 1.1, we need the following preparatory result.
Proposition 2.1.
Fix any , , such that and . Fix any probability measure supported on with first marginal and sample . Then the following conclusions hold:
-
(i)
We have:
-
(ii)
For any measurable we have
-
(iii)
With as in 1.6, we have
Parts (i) and (ii) above follow from [10, Proposition 2.19] and [6, Lemma 2.2] respectively. However, part (iii) is new and a proof is provided in Section 5.2. While the proof of 1.1 only uses 2.1 part (ii), the other parts of 2.1 will be useful in the rest of the paper.
Remark 2.1.
When the RHS of the display in part (ii) of 2.1 is finite, we can define
Proof of 1.1.
(ii), (iii) These are restatements of parts (i) and (ii) of [6, Theorem 1.6]. The fact that follows from the proof of [6, Corollary 1.3]. To prove is compact, we invoke [6, Remark 2.1] to note that
where the function defined by
with has compact level sets
(by [6, Corollary 1.3, part (ii)]), and is a closed subset of probability measures.
∎
Remark 2.2.
We now claim that throughout the rest of the paper, without loss of generality we can assume that converges to in strong cut metric , instead of the weak cut metric (see 1.1). Indeed, by definition of the weak-cut convergence, there exists a sequence of permutations with such that
Then setting we have
Set to be the Gibbs probability measure given by
where is the corresponding normalizing constant. Now if , then so does , and so
Thus for any we have
In the above display, all quantities are finite and well defined using 1.1 part (ii). Thus the distribution of under is same as the distribution of under . Since , by replacing by without loss of generality we can assume as claimed, which we do throughout the rest of the paper.
Next, we state an elementary property of that will be useful in proving 1.2 below. A short proof is provided in Section 5.
Lemma 2.2.
The function is a continuous (possible extended) real-valued function.
Proof of 1.2.
(i) By switching the variables of integration, it is easy to check that the optimization problem (1.9) is equivalent to maximizing the function
Note that for all , and , the function , and so . This gives
which is equivalent to
(2.1) |
We will show that , where denotes the Lebesgue measure on . Let us assume the contrary. Without loss of generality, assume that . On this set, we have
yielding
This implies that there exists such that
Define a function by setting
Note that , as , and , contradicting (2.1). This shows that a.s., as desired.
(ii) We will prove the contrapositive. Suppose there exists an almost surely constant function , say for a.e. . Then by (1.13), we have for a.e. . This implies is constant almost surely, which is a contradiction.
(iii) Without loss of generality, assume that for a.e. . Then is a probability density function on with all marginals uniformly distributed on . By an application of Hölder’s inequality with respect to the probability measure induced by , we then have
Consequently, it holds that
(2.2) |
(a) If is even, then (2.2) gives
Equality holds in the above display by taking to be constant functions. To find out the maximizing , we need
equality in Hölder’s inequality. So must be a constant function a.s.
(b) If for all , the same inequality continues to hold for all by 2.2. Thus (2.2) gives
Again equality holds in the above display by taking supremum over constant functions, and the maximizing is again constant a.s.
(iv) The result for (a) follows immediately on noting that for all . We thus focus on proving (b). In this case there exists a symmetric measure such that . Fixing such that , using symmetry of it follows that , and
Thus, with denoting the inverse of , we have for all , where is the natural parameter space of . This gives
(2.3) |
As is symmetric about , so and are even functions. The assumption along with (2.3) gives for , completing the proof. ∎
In the sequel, we will first prove 1.4 independently. Then we will prove 1.3 using 1.4. In order to prove 1.4, we need the following lemma whose proof we defer.
Lemma 2.3.
Proof of 1.4.
We next prove (1.23). Fix and use Hölder’s inequality to note that
(2.4) |
Raising both sides to the power and summing (2.1) over gives
(2.5) |
where the last inequality uses 2.1 part (c), with . The conclusion then follows by (1.8) and (1.22).
Next, we will prove (1.24). By 2.3 part (i), there exists such that for all we have
(2.6) |
Now, note the following chain of equalities/inequalities with explanations to follow.
The first inequality follows directly from (2.6). The second inequality follows from (2.1). Raising both sides to the power and summing over , we get:
(2.7) |
The final inequality follows by noting that and then using 2.1 part (c), with . The conclusion follows by (1.8) and (1.22).
(ii) The proof of (1.25) is very similar to the proof of (a). Firstly, follows from [6, Eq 2.29]. This also implies:
(2.8) |
Next, in the same vein as (2.1), we get
by invoking (2.8) and (1.11), thereby proving the second conclusion. Finally, proceeding similar to (2.1), we have
where we have used (2.8) and (1.11). This completes the proof of part (b).
∎
2.2. Proof of 1.3
(i) The fact that follows directly from (1.25). By an application of Hölder’s inequality with 2.1 part (iii), we get:
which is finite on using (1.6) and (1.11). By Fubini’s Theorem, (see (1.18)) is well-defined a.s. on , as desired.
Remark 2.3.
Note that the above argument does not require but the weaker condition .
(ii) We begin the proof with the following definition.
Definition 2.1.
Let and be as in 1.1 and 1.2 respectively. Recall that (from 1.2) for . Define
Construct the following function (the space of probability measures on ) by setting:
(2.9) |
Here , and is as in (1.18). Note that is well-defined for , as the function is well defined a.s. by 1.3 part (i). Also for and a random variable , set . For any measure and , let denote the distribution of the truncated random variable .
Next we generate . We define a map from to given by
(2.11) |
In view of (2.11), note that , , , and denotes the laws of , , , and conditioned on , respectively. Also, with as in 2.1, we have
(2.12) |
In order to prove the above, note that, given any bounded continuous real-valued function on , we have:
and so the first conclusion of (2.12) holds. The proof of the second conclusion is similar. By the definition of in 1.3, we have
(2.13) |
With as in (1.15), triangle inequality gives
where the second equality uses (2.12) and (2.13). We now show that each of the terms on the right hand side converge to as we take limits with first, followed by . Towards this direction, we observe that:
(2.14) |
Based on the above two displays, it now suffices to prove the following:
(2.15) |
(2.16) |
as followed by , and
(2.17) |
as for every fixed , and
(2.18) |
as , followed by .
We now split the proof into four parts, proving the four preceding displays. We begin with the proof of (2.15) which requires the following lemma. It is a variant of [6, Lemma 2.7]. We omit the details of the proof for brevity.
Lemma 2.4.
Proof of (2.15).
With as in Definition 1.8, define
(2.19) |
It then suffices to prove the following:
(2.20) |
(2.21) |
for any , and for any fixed ,
(2.22) |
Proof of (2.20) Fix such that . For any we have
(2.23) |
as , followed by . Here the second line uses (2.19), the third line uses Hölder’s inequality, and the fourth inequality follows from 2.1 part (iii) along with the inequalities
The conclusion holds on noting that the RHS of (2.2) converges to on letting followed by , since and (which are direct consequences of (1.8) and (1.22) respectively). This proves (2.20).
Proof of (2.22). Using (2.22), observe that
The RHS above converges to as , using 2.4 with along with triangle inequality.
∎
In order to prove (2.16) and (2.17), we need the following additional lemma whose proof we defer to Section 4.
Lemma 2.5.
Fix a graph with vertices and maximum degree as before. Fix such that , . Then is well-defined on , and the following conclusions hold:
-
(i)
Fix . Then
-
(ii)
Suppose , such that as , and for . Fix and let denote the subset of for which the second marginal is compactly supported on . Then we have
-
(iii)
Fix such that , and let such that . Then,
Proof of (2.16).
Proof of (2.17).
The final step is to establish (2.18) for which we need two results. The first one is an immediate corollary of 2.5 parts (i) and (iii) (and hence it’s proof is omitted), while the second one is a simple convergence lemma, whose proof is provided in Section 5.2.
Corollary 2.6.
Consider the same setting as in 2.5. For , define
(2.24) |
Suppose be such that , then is continuous on in the weak topology.
Lemma 2.7.
Suppose and be two Polish spaces. Let be a sequence of -valued random variables such that for some closed set . Assume that there exists a compact set such that
(2.25) |
Finally consider a function such that is continuous on . Then we have
Proof of (2.18).
Applying 2.5 part (i), for every we have
It thus suffices to show that
(2.26) |
To this effect, use 1.1 part (iv) to note that
(2.27) |
where the set is compact in the weak topology. Also note that by (1.22), there exists such that
(2.28) |
We will now invoke 2.7 with and , both coupled with weak topology, , , and . Once we verify the conditions of 2.7 with the above specifications, we will then conclude (2.26), which in turn, completes the proof.
To verify the conditions of 2.7, note that is compact, and is a subset of by (1.25). Further, (2.27) implies . The conclusion in (2.28) implies (2.25). The fact that is well-defined on follows from 1.3 part (a) (also see 2.1). Finally, the continuity of on follows from 2.6.
This finally completes the proof of 1.3. ∎
2.3. Proofs of 1.5 and 1.6
In order to prove 1.5, we need the following results. The first result is a lemma about a sequence of functions converging in measure. It’s proof is deferred to Section 5.2.
Lemma 2.8.
Let and .
-
(i)
Suppose is a sequence of measurable real-valued functions on such that
Then for any we have:
(2.29) -
(ii)
If for some such that and , then a.s.
For stating the second result, we recall the definitions of , , , , from 1.3, (1.21), (1.18), 1.1 part (iv), and 1.4, respectively. Also define
(2.30) |
We note an elementary observation here which will be used in the sequel. To wit, recall from 1.3 that . Consequently from the definition of it follows that:
(2.31) |
We now state the following lemma, formalizes a key property of the sets and . It’s proof is deferred to Section 4.
Lemma 2.9.
Consider the same setting as in 1.3. Then the set is a compact subset of in the weak topology, whereas is a compact subset of in the weak topology.
We are now in the position to prove 1.5.
Proof of 1.5.
By arguments similar to (2.2) we have
Consequently by invoking 1.3 part (b) we get:
(2.32) |
Further by (1.23), there exists such that
(2.33) |
With the above observations in mind, we invoke 2.7 with , equipped with the topology of weak convergence, , , and (with chosen as in (2.33)). Once we verify the assumptions of 2.7 with the above specifications, by (2.31), we obtain:
(2.34) |
To verify the conditions of 2.7, note that by (1.25). Further, (2.32) implies and 1.4 part (ii) along with Fatou’s lemma implies is a compact subset of . The conclusion in (2.33) implies (2.25). Finally, the continuity of on follows from the continuity of .
We now use (2.34) to complete the proof. The key tool will once again be 2.7. To set things up, fixing equip with the weak topology. Pick any . Then is distributed as , where and is measurable with . Consequently by 2.8 part (ii), the map , (for some ) given by is well-defined.
With this observation, we will invoke 2.7 with , , equipped with the topologies of weak convergence and respectively, and , , , and with chosen from (2.36). Once we verify the conditions of 2.7, an application of (2.35) will yield
which will complete the proof of (1.26).
To verify the conditions of 2.7, note that by (1.25). Further, (2.34) implies , and 1.4 part (ii) implies is a compact subset of . The conclusion in (2.36) implies (2.25). The fact that is well-defined on follows from 2.8 part (b). Continuity of on follows from 2.8 part (i).
∎
Lemma 2.10.
Suppose is a sample from the model (1.3) ( need not be non-negative). Suppose , satisfy (1.6), and .
(i) Given any vector such that , we have
(ii) If , then
Proof of 1.6.
By 1.2 part (iii), all the optimizers of the problem in (1.9) are constant functions. Further, (1.25) shows that there exists (depending on ) such that all the optimizers of (1.9) have norm bounded by . Combining these two observations, we have that consists only of constant functions where the constants are given by
As analytic non-constant functions can only have finitely many optimizers in a compact set, it follows that is a finite set.
(i) Define and . We claim that result follows given the following display:
(2.37) |
This is because given any and any (recall this implies ), the following inequalities hold:
Taking an infimum over gives the bound
The first and last terms above converge to in probability as first, followed by , by using (2.37) and the assumption for . The remaining terms converge to as for fixed , by using 2.10 part (i), (1.26), and , respectively. This completes the proof.
Next, we prove (2.37). Fix such that . By Hölder’s inequality,
This, along with (1.22) and the assumption , establishes (2.37).
(ii) As in part (i), we have
The first term converges to in probability by part (i), the second converges to by 2.10 part (a), and the third term converges to by (1.26).
This completes the proof.
(iii) We begin by observing that for any , we have:
both of which follow from standard properties of exponential families. Recall that for any , we have by 1.2 part (i). This gives
and so for all large (depending on ) we have
(2.38) |
The RHS above converges to in probability, as , followed by . This is because, the first term in (2.3) converges to in probability as for every fixed , by using (1.26). The second term converges to to in probability by taking first, followed by , by using (1.23).
Next choose such that . Note that for any , (using the relation ). In the same vein as (2.3), by using (1.17), we also get for all large enough:
(2.39) |
The RHS above converges to in probability, as , followed by . This is because, the first term converges to as by (2.3), and the second term converges to as followed by by using (1.23) and (1.22). Finally, the conclusion in part (iii) follows by combining (2.3) with 2.10 part (ii).
∎
3. Proof of Results from Section 1.2
Proof of 1.8.
(i) Note that quadratic forms correspond to the choice and in (1.2). Let be the tilted probability measure on obtained from as in 1.3. Then a direct computation using (1.28) gives
(3.1) |
Using this along with 1.1 part (iii) we get
(3.2) |
Here and are as in 1.3, but for the tilted measure instead of , and the last equality uses (2.3). By invoking 1.2 part (iii), if is even, the set of optimizers in the above display are constant functions, where the constant is an optimizer of the following optimization problem:
(3.3) |
By 1.7 (parts (i) and (ii)), if either (a) , or (b) , , then the optimizer is .
On the other hand when and , by 1.7 part (iii)
the optimizers are .
Using this, the desired conclusion of part (i) follows.
(ii) Recall the definition of from (1.27), and use part (i) to note that all functions in are constant functions, with constants belonging to the set . Since , we have
where we use 1.6 part (iii).
For the weak limit of we invoke 1.6 part (ii) with which implies . The conclusion follows by noting that when , the symmetry of about the origin implies that and have the same distribution. ∎
Proof of 1.9.
(i) Let be the tilted measure obtained from as in 1.3, and let be as in 1.3, but for the measure instead of . Using (2.3) we get
using which the optimization problem in (1.32) (ignoring the additive constant ) becomes
(3.4) |
Now, we invoke 1.2 part (i) to conclude that any maximizer of the above display satisfies the fixed point equation (1.33).
(ii) It suffices to show that all optimizers of (3.4) are constant functions, for which invoking 1.2 part (iii) it suffices to show that is stochastically non-negative (as per 1.7), if
is stochastically non-negative. In this case we have for . Along with (2.3), this gives
where we use the fact that . This shows that is stochastically non-negative as well.
Hence, by the proof of 1.2 part (iii), the maximizers of (1.32) are constant functions provided either is even or is stochastically non-negative. Finally, (1.33) follows from (1.32), on setting to be a constant function.
(iii)(a) The optimization problem (1.32) reduces to maximizing
(3.5) |
over . Differentiating we get
(3.6) |
Since, is supported on , we have , and so
as . Hence, there exists such that for we have . If is a global maximizer of , then we have
However, on the interval , using boundedness of support, we have
Thus is strictly concave on the interval , and so the global maximizer must be unique.
(iii)(b) We break the proof into the following steps:
-
•
There exists such that is the unique global maximizer for .
Since is compactly supported on , we haveThus for we have
and so is strictly concave. Since , is the unique global maximizer of .
-
•
There exists such that for , is not a global maximizer of
We consider two separate cases:
-
–
is stochastically non-negative.
In this case there exists such that . Then setting , for we have
and so cannot be a global optimizer of
-
–
is even.
If there exists such that , we are through by previous argument. Othewise, since is not degenerate at , there exists such that . Again setting with even, the same proof works.
-
–
-
•
For any , let be any non-negative global optimizer of . Then the map is non-decreasing.
Suppose by way of contradiction there exists such that . By optimality of we haveHere the last implication uses the fact . But this contradicts the fact that is a global maximizer for .
Combining the last three claims, the conclusion of part (iii)(b) follows on setting
∎
4. Proof of Main Lemmas
Proof of 2.5.
As before, we also choose and such that . Note that is well-defined on (see 2.1) by using 2.1 (ii), with as is, , and replaced with .
(i) Recall the definition of from (1.18), and the connection between and from 2.1. We will prove the following stronger claim.
(4.1) |
Towards this end, fix and note that
For every non empty fixed set , an application of 2.1 part (ii) with as is,
and replaced by , on the above bound, gives
(4.2) |
as . This proves (4.1), and hence completes part (i).
(ii) Given , and any , define
(4.3) |
where . For , and , define
for . With this notation, by a truncation followed by a simple method of moments argument, the conclusion in part (ii) will follow if we can show the following:
(4.4) |
(4.5) |
(4.6) |
as , for every , and every .
Proof of (4.4). To begin, for any we have the bound
which gives
Therefore, (4.4) will follow if we can show that
(4.7) |
We now complete the proof based on the following claim, whose proof we defer.
(4.8) |
We will now deal with (4.7) term by term. First note that:
By the first claim in (4.8), the right hand side above converges to by taking followed by , thus proving the first claim in (4.7). For the second claim in (4.7), setting Hölder’s inequality gives
where the final quantity above converges to taking followed by using both claims in (4.8). This proves the second claim in (4.7), and hence completes the verification of (4.4), subject to proving (4.8).
Proof of (4.8). Note that
where the inequality follows from Lyapunov’s inequality (the function is non-decreasing on ). On integrating over we get
where the last inequality follows from 2.1, part (iii). By our assumption , the first conclusion in (4.8) follows. The second conclusion follows similarly.
Proof of (4.5). This follows the exact same line of argument as the proof of (4.4), and hence is omitted for brevity.
Proof of (4.6). Set , and use the definition in (1.18) to note that
We can similarly write out an expression for with is replaced by . Accordingly, to establish (4.6), replacing each by sequentially, it suffices to show that:
(4.9) |
for every fixed and , where
In order to establish (4.9), let us further define
and note that
(4.10) |
Proceeding to show (4.9), integrating with respect to all the variables other than , we get
Observe that ’s are bounded by for , is bounded by definition, and further and are both bounded by (4.10). The conclusion in (4.9) then follows from [6, Proposition 3.1 part (ii)].
(iii) Note that there exists a sequence of bounded continuous functions such that as . The triangle inequality implies that given any , , we have:
(4.11) |
By part (ii), we have:
Further from the definition of weak convergence we have, for every fixed ,
Combining the two displays above with (4) establishes part (iii). ∎
Proof of 2.9.
Recall from (2.31) that , where with continuous (see 1.3 for definition of ). The facts that and follow directly from (1.25). It thus suffices to prove compactness of (which will imply compactness of ).
To this effect, invoking (1.25) there exists such that (see (2.24) for the definition of ). Also by 2.6 the function is continuous on with respect to weak topology. Since is compact in the weak topology (see 1.1 part (iv)), and (from (2.13)), compactness of follows.
∎
Proof of 2.10.
(i) For any under we have
(4.12) |
where and . The RHS of (4.12) converges to as followed by by using (1.22). Since , setting
we note that
Consequently,
(4.13) |
which converges to as followed by , by using (1.22) and the fact that . Combining (4.12) and (4.13), it suffices to show . Towards this direction, we further define, for ,
and observe that
For the random variable is measurable with respect to the sigma field generated by , and consequently,
for . Combining the last two displays gives
(4.14) |
It suffices to show that the second term in the RHS of (4.14) converges to for every fixed . To control this second term, define
(4.15) |
for , where denotes the set of all distinct tuples in , such that none of the elements equal to or . For any , by the triangle inequality we have the following for any ,
(4.16) |
It suffices to show that the RHS of (4) converges to as , followed by . Now let us complete this proof based on the following claim, whose proof we defer:
(4.17) |
By combining (4.17) with (1.23), we also have:
(4.18) |
By combining (4.18) with (1.23), it is immediate that the first term in the RHS of (4) converges to as , followed by . For the second term in the RHS of (4), let us define the function:
where is the exponential tilt of as introduced in 1.3. From standard properties of exponential families, has a continuous derivative on and therefore,
where depends on , , and . Hence,
(4.19) |
for every fixed , , and . This completes the proof that (4) converges to as followed by .
Proof of (4.17). The symmetry of implies
Using this, we bound the left hand side of (4) below.
where the second inequality follows from Hölder’s inequality, and the third inequality uses 2.1 part (c). The above display, on taking expectation, gives
Here, we have used Lyapunov’s inequality coupled with the observation that . As by (1.8), an application of (1.22) in the last display above completes the proof of (4.17). ∎
Proof of 2.10.
(ii) Choose and such that . Fixing we have
where the limit is to be understood as followed by . Here we used (1.22) and (1.23). Now, from standard analysis we have the existence of a function such that for , , , and . This gives
Using this bound along with Hölder’s inequality, we get:
as followed by , on using (1.22) and (1.23). Combining the above displays we get
as followed by . A similar computation shows
in the same sense. Using the last two displays above, it suffices to show that
(4.20) |
as for fixed . Towards this direction, we will use the definitions of , , from the proof of 2.10 part (a). Observe that
By Markov’s inequality, in order to establish (4.20), it suffices to show that the above display converges to as for any fixed . As
for , it suffices to show that
The left hand display is what we bounded in (4). As , the right hand display above follows directly from (4.17). ∎
5. Appendix
In this Section, we will prove the auxiliary lemmas from earlier in the paper. Section 5.1 collects all results on the properties of the base measure , and Section 5.2 contains some general probabilistic convergence results.
5.1. Proofs of Lemmas 1.7, 2.2, and 2.3
Proof of 1.7.
With as in the statement of the lemma, differentiation gives . Using 2.3 part (ii) we get (since , and so the continuous function attains its global maximizers on , and any maximizer (local or global) satisfies , which is equivalent to solving , where
(5.1) |
(i) Here , and symmetry of gives . To show that is the only root of (and hence the unique maximizer of ), using symmetry of it suffices to show that does not have any other roots on To this effect, using (1.29) it follows that is non-increasing on , and so is convex using (5.1). Since , it follows that is also a global minimizer of , and so is non-positive. If there exists a positive root of , then by convexity (and symmetry) we have on . But this implies is linear on this domain, and so must be a quadratic, which is only possible only if is a Gaussian. This contradicts (1.6), and hence completes the proof of part (i).
(ii) By symmetry, it suffices to consider the case . Comparing with and using the symmetry of , it follows that all global maximizers lie in . Also in this case , which implies . As by 2.3 part (i), either has a unique positive root, or at least positive roots. If the latter holds, using (5.1) must have two positive roots , which on using (1.29) gives that on the interval . As in part (i), this implies that is Gaussian, a contradiction to (1.6). Thus has a unique positive maximizer .
(iii) In this case and . Therefore, either has a unique positive root or at least positive roots. From there we argue, similar to part (ii) above, that has exactly one positive root . By symmetry, it follows that is the unique negative root of , and are the global maximizers of . ∎
Proof of 2.2.
The function is smooth () on , and the function is smooth on . Consequently, the function is smooth on . To verify continuity on , it suffices to cover the (possible) boundary cases:
-
•
If , then .
-
•
If , then .
We will only prove the first case, as the other case follows similarly. Note that,
where the second limit is in weak topology. Further,
(5.2) |
by the lower semi-continuity of Kullback-Leibler divergence. If , then , and (5.2) yields the desired conclusion. If , then . Also, for any , we have
For all such that (which holds for all close to ), this gives
Combining the above display with (5.2) gives as desired. ∎
Proof of 2.3.
(i) We prove , noting that the proof of the other limit is similar. To this effect, we consider the following two cases separately:
-
•
.
Fixing and , we have:
where we use the bound on the set . Letting we have , as . Since the numerator in the second term in the display above is finite invoking (1.6), the second term above converges to as , allowing us to conclude . Since is arbitrary, the desired limit follows.
-
•
.
In this case, . Since is non-decreasing, exists as a finite (non-positive) number. Consequently we have .
(ii) We only study the case when . If , then the conclusion is immediate as the denominator converges to a finite number while the numerator diverges. Therefore, we only focus on the case . To this effect, fixing using part (i) gives that for all large enough (depending on ) we have
Taking limit gives
Since is arbitrary, we conclude the desired conclusion follows. ∎
5.2. Proofs of 2.1 and Lemmas 2.7 and 2.8
Proof of 2.1.
Proof of 2.7.
By using (2.25), it follows that the sequence is tight. Passing to a subsequence, w.l.o.g. we can assume , where (as is closed). By the Portmanteau Theorem,
(5.3) |
Next we will show that . Towards this direction let be a closed set. We will write to denote the inverse image of the set under . Another application of the Portmanteau Theorem implies:
(5.4) |
The last line uses the fact that is closed which in turn follows from the continuity of on . Finally, by (2.25) and (5.3), we have:
(5.5) |
(5.6) |
By combining (5.2), (5.5), and (5.6), it follows that
By the Portmanteau theorem, this yields . So for any we get
as a.s. Since is arbitrary, , as desired. ∎
Proof of 2.8.
(i) Since , it follows that the sequence is uniformly integrable, and so . By standard approximation results, given any , there exists (depending on ) such that is continuous on and . Continuous mapping theorem gives . Since is uniformly integrable, . As is arbitrary, this completes the proof of part (a).
(ii) The conclusion follows by applying part (a) on the sequence of measures alternating between and along odd and even subsequences. ∎
References
- Adamczak et al., [2019] Adamczak, R., Kotowski, M., Polaczyk, B., and Strzelecki, M. (2019). A note on concentration for polynomials in the Ising model. Electron. J. Probab., 24:Paper No. 42, 22.
- Barra, [2009] Barra, A. (2009). Notes on Ferromagnetic p-spin and REM. Mathematical methods in the applied sciences, 32(7):783–797.
- Basak and Mukherjee, [2017] Basak, A. and Mukherjee, S. (2017). Universality of the mean-field for the Potts model. Probab. Theory Related Fields, 168(3-4):557–600.
- [4] Bhattacharya, B. B., Fang, X., and Yan, H. (2022a). Normal approximation and fourth moment theorems for monochromatic triangles. Random Structures & Algorithms, 60(1):25–53.
- Bhattacharya et al., [2020] Bhattacharya, B. B., Mukherjee, S., and Mukherjee, S. (2020). The second-moment phenomenon for monochromatic subgraphs. SIAM Journal on Discrete Mathematics, 34(1):794–824.
- [6] Bhattacharya, S., Deb, N., and Mukherjee, S. (2022b). LDP for inhomogeneous U-Statistics. arXiv preprint arXiv:2212.03944.
- Bhattacharya et al., [2021] Bhattacharya, S., Mukherjee, R., and Ray, G. (2021). Sharp signal detection under Ferromagnetic Ising models. arXiv preprint arXiv:2110.02949.
- Borgs et al., [2019] Borgs, C., Chayes, J., Cohn, H., and Zhao, Y. (2019). An theory of sparse graph convergence I: Limits, sparse random graph models, and power law distributions. Transactions of the American Mathematical Society, 372(5):3019–3062.
- Borgs et al., [2018] Borgs, C., Chayes, J. T., Cohn, H., and Zhao, Y. (2018). An theory of sparse graph convergence II: Ld convergence, quotients and right convergence. The Annals of Probability, 46(1):337–396.
- Borgs et al., [2008] Borgs, C., Chayes, J. T., Lovász, L., Sós, V. T., and Vesztergombi, K. (2008). Convergent sequences of dense graphs. I. Subgraph frequencies, metric properties and testing. Adv. Math., 219(6):1801–1851.
- Borgs et al., [2012] Borgs, C., Chayes, J. T., Lovász, L., Sós, V. T., and Vesztergombi, K. (2012). Convergent sequences of dense graphs II. Multiway cuts and statistical physics. Ann. of Math. (2), 176(1):151–219.
- Bresler and Nagaraj, [2019] Bresler, G. and Nagaraj, D. (2019). Stein’s method for stationary distributions of Markov chains and application to Ising models. Ann. Appl. Probab., 29(5):3230–3265.
- Chatterjee, [2007] Chatterjee, S. (2007). Estimation in spin glasses: a first step. Ann. Statist., 35(5):1931–1946.
- Comets and Gidas, [1991] Comets, F. and Gidas, B. (1991). Asymptotics of maximum likelihood estimators for the Curie-Weiss model. Ann. Statist., 19(2):557–578.
- Daskalakis et al., [2020] Daskalakis, C., Dikkala, N., and Panageas, I. (2020). Logistic regression with peer-group effects via inference in higher-order Ising models. In International Conference on Artificial Intelligence and Statistics, pages 3653–3663. PMLR.
- Deb et al., [2020] Deb, N., Mukherjee, R., Mukherjee, S., and Yuan, M. (2020). Detecting structured signals in Ising models. The Annals of Applied Probability (To Appear).
- Deb and Mukherjee, [2023] Deb, N. and Mukherjee, S. (2023). Fluctuations in mean-field Ising models. The Annals of Applied Probability, 33(3):1961 – 2003.
- Dembo and Montanari, [2010] Dembo, A. and Montanari, A. (2010). Gibbs measures and phase transitions on sparse random graphs. Brazilian Journal of Probability and Statistics, 24(2):137–211.
- Ellis et al., [1976] Ellis, R. S., Monroe, J. L., and Newman, C. M. (1976). The GHS and other correlation inequalities for a class of even Ferromagnets. Comm. Math. Phys., 46(2):167–182.
- Ellis and Newman, [1978] Ellis, R. S. and Newman, C. M. (1978). The statistics of Curie-Weiss models. J. Statist. Phys., 19(2):149–161.
- Frieze and Kannan, [1999] Frieze, A. and Kannan, R. (1999). Quick approximation to matrices and applications. Combinatorica, 19(2):175–220.
- Gheissari et al., [2019] Gheissari, R., Hongler, C., and Park, S. (2019). Ising model: Local spin correlations and conformal invariance. Communications in Mathematical Physics, 367(3):771–833.
- Heringa et al., [1989] Heringa, J., Blöte, H., and Hoogland, A. (1989). Phase transitions in self-dual Ising models with multispin interactions and a field. Physical review letters, 63(15):1546.
- Ising, [1925] Ising, E. (1925). Beitrag zur theorie des Ferromagnetismus. Zeitschrift für Physik, 31(1):253–258.
- Lacker et al., [2022] Lacker, D., Mukherjee, S., and Yeung, L. C. (2022). Mean field approximations via log-concavity. arXiv preprint arXiv:2206.01260.
- Liu et al., [2019] Liu, J., Sinclair, A., and Srivastava, P. (2019). The Ising partition function: Zeros and deterministic approximation. Journal of Statistical Physics, 174(2):287–315.
- Lovász, [2012] Lovász, L. (2012). Large networks and graph limits, volume 60 of American Mathematical Society Colloquium Publications. American Mathematical Society, Providence, RI.
- Mukherjee et al., [2021] Mukherjee, S., Son, J., and Bhattacharya, B. B. (2021). Fluctuations of the magnetization in the p-spin Curie–Weiss model. Communications in Mathematical Physics, 387(2):681–728.
- Mukherjee et al., [2022] Mukherjee, S., Son, J., and Bhattacharya, B. B. (2022). Estimation in tensor ising models. Information and Inference: A Journal of the IMA, 11(4):1457–1500.
- Sly and Sun, [2014] Sly, A. and Sun, N. (2014). Counting in two-spin models on -regular graphs. Ann. Probab., 42(6):2383–2416.
- Suzuki and Fisher, [1971] Suzuki, M. and Fisher, M. E. (1971). Zeros of the partition function for the Heisenberg, Ferroelectric, and general Ising models. Journal of Mathematical Physics, 12(2):235–246.
- Turban, [2016] Turban, L. (2016). One-dimensional Ising model with multispin interactions. Journal of Physics A: Mathematical and Theoretical, 49(35):355002.
- Yamashiro et al., [2019] Yamashiro, Y., Ohkuwa, M., Nishimori, H., and Lidar, D. A. (2019). Dynamics of reverse annealing for the fully connected p-spin model. Physical Review A, 100(5):052321.