This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Non-Asymptotic Concentration of Magnetization in the Curie-Weiss Model at Subcritical Temperatures

Yingdong Lu
IBM T.J.Watson Research Center
Abstract.

In this short paper, we obtain non-asymptotic concentration results for magnetization of the Curie-Weiss model at subcritical temperatures, which leads to a diffusion limit theorem of the scaled and centered magnetization driven by a Metropolis-Hasting algorithm. These results are complementary to results at supercritical and critical temperatures in [Bierkens and Roberts(2017)].

Key words and phrases:
Curie-Weiss Model, Concentration, diffusion limit

1. Introduction

Curie-Weiss model is among the simplest ferromagnetic models in statistical mechanics for quantifying phase transition, see detailed explanations in Chapter IV of [Ellis(2006)]. Mathematically, Curie-Weiss model is defined by the following probability measure on 𝐒:={1,1}n{\bf S}:=\{-1,1\}^{n},

(1) πn(x)=exp[βHn(x)]/Zn,x𝐒,\displaystyle\pi^{n}(x)=\exp[-\beta H^{n}(x)]/Z_{n},\quad x\in{\bf S},

with

Hn(x)=12ni,j=1nxixjhi=1nxi,\displaystyle H^{n}(x)=\frac{1}{2n}\sum_{i,j=1}^{n}x_{i}x_{j}-h\sum_{i=1}^{n}x_{i},

where ZnZ_{n} is the normalizing constant (also known as the partition function), β>0\beta>0 represents the reciprocal of the absolute temperature, and hh\in\mathbb{R} is the strength of external applied magnetic field. For each state x𝐒x\in{\bf S}, a key quantity magnetization, denoted as mn(x)m^{n}(x), is defined as mn(x):=i=1nxim^{n}(x):=\sum_{i=1}^{n}x_{i}. Then, the Hamiltonian function Hn(x)H^{n}(x) can be expressed as,

Hn(x)=n[12(mn(x))2+hmn(x)].\displaystyle H^{n}(x)=-n\left[\frac{1}{2}(m^{n}(x))^{2}+hm^{n}(x)\right].

A common practice in computing the partition function, thus the entire probability distribution, is to use Markov chain Monte Carlo (MCMC) methods, and Metropolis-Hastings algorithm is one of such methods that is found to be successful in many cases. In [Bierkens and Roberts(2017)], it is shown that magnetization, under proper a centering and scaling transformation, the Markov chain generated from the Metropolis-Hastings algorithm has a diffusion limit for the critical (β=1\beta=1 and h=0h=0) and supercritical (β<1\beta<1) temperatures. The derivation depends on results non-asymptotic concentration derived in these two cases. In the paper, we are able to obtain a similar non-asymptotic concentration at subcritical temperatures ((β<1\beta<1 and h0h\neq 0), thus obtain a similar diffusion limit.

The rest of the paper is organized as follows, in Sec. 2, we will present concepts and results related to diffusion approximations of centered and scaled megnetization; in Sec.3, the key results in non-asymptotic concentration are presented.

2. Diffusion approximations to Magnetization

2.1. Concentration of meanetization

It is known, see e.g. section IV.4 in [Ellis(2006)], that the megnetization mnm^{n} concentrates around m0m_{0}, the unique minimum point of the following quantity,

(2) (12βm2+βhm)+1m2log(1m)+1+m2log(1+m),m(1,1).\displaystyle-\left(\frac{1}{2}\beta m^{2}+\beta hm\right)+\frac{1-m}{2}\log(1-m)+\frac{1+m}{2}\log(1+m),\quad m\in(-1,1).

In [Ellis(2006)], the analysis of concentration and identification of m0m_{0} are obtained through a large deviations principle analysis, see also [Bovier(2006)] for a different and simplified derivation. Furthermore, m0m_{0} satisfies the following Curie-Weiss equation

(3) βm+βh=12log1+m1m,\displaystyle\beta m+\beta h=\frac{1}{2}\log\frac{1+m}{1-m},

which can also be written as m=tanh(β(m+h))m=\tanh(\beta(m+h)). Quantitatively, the concentration can be elaborated by the following key result from [Chatterjee(2007)],

Proposition 1 [Chatterjee(2007)]).

For all β0\beta\geq 0, hh\in\mathbb{R} and t0t\geq 0,

(4) πn(|mntanh(β(mn+h))|βn+tn)2exp(t24(1+β)).\displaystyle\pi^{n}\left(|m^{n}-\tanh(\beta(m^{n}+h))|\geq\frac{\beta}{n}+\frac{t}{\sqrt{n}}\right)\leq 2\exp\left(-\frac{t^{2}}{4(1+\beta)}\right).

2.2. Metropolis-Hastings Algorithm

A typical implementation of the Metropolis-Hasting algorithm for sampling from a distribution known up to the partition function uses a random walk to generate a proposal, then accept or reject the proposal according the ratio of the density function of the target distribution, and this does not require the calculation of the partition function.

The random walk on 𝐒{\bf S} under consideration randomly picks (with probability 1/n1/n each) one coordinate and flip its sign. Therefore, the transition probability takes the form of

P(x,y)={1nyR(x),0otherwise\displaystyle P(x,y)=\left\{\begin{array}[]{cc}\frac{1}{n}&y\in R(x),\\ 0&\hbox{otherwise}\end{array}\right.

where the set R(x)={y𝐒:yk=xk for some k[n], and yi=xi for all ik}R(x)=\{y\in{\bf S}:\hbox{$y_{k}=-x_{k}$ for some $k\in[n]$, and $y_{i}=-x_{i}$ for all $i\neq k$}\}. To ensure reversibility hence convergence, the Metropolis-Hastings step accepts a proposed move generated by the random walk with probability

min{1,πn(y)P(y,x)πn(x)P(x,y)}=min{1,πn(y)πn(x)}.\displaystyle\min\left\{1,\frac{\pi^{n}(y)P(y,x)}{\pi^{n}(x)P(x,y)}\right\}=\min\left\{1,\frac{\pi^{n}(y)}{\pi^{n}(x)}\right\}.

The equality is due the fact that P(y,x)=P(x,y)=1nP(y,x)=P(x,y)=\frac{1}{n} in the case that yy is a proposed move to xx.

2.3. Centered and Scaled Magnetization

The key quantity that will be studied is the following centered and scaled magnetization,

ηn(x):=n1/2[mn(x)m0],x𝐒.\displaystyle\eta^{n}(x):=n^{1/2}[m^{n}(x)-m_{0}],\quad x\in{\bf S}.

It was termed shifted and renormalized magnetization in [Bierkens and Roberts(2017)]. Following Metropolis-Hastings algorithm, the scaled Markov chain, denoted by XnX^{n}, has the transition probability,

Pn(η,η±2n12)=Qn(η,η±2n12)(1exp{β[Φn(η)Φn(η±2n12)]}),\displaystyle P^{n}(\eta,\eta\pm 2n^{-\frac{1}{2}})=Q^{n}(\eta,\eta\pm 2n^{-\frac{1}{2}})(1\wedge\exp\{\beta[\Phi^{n}(\eta)-\Phi^{n}(\eta\pm 2n^{\frac{1}{2}})]\}),

with

Qn(η,η±2n12):=12(1(m0+n12η)),Φn(η)=12η2n12(m0+h)η.\displaystyle Q^{n}(\eta,\eta\pm 2n^{-\frac{1}{2}}):=\frac{1}{2}(1\mp(m_{0}+n^{-\frac{1}{2}}\eta)),\quad\Phi^{n}(\eta)=-\frac{1}{2}\eta^{2}-n^{\frac{1}{2}}(m_{0}+h)\eta.

Let YnY^{n} represent the stationary continuous time Markov chain that jumps at rate nn with transition probability PnP^{n}, and its stationary distribution μnexp(βΦn(η))\mu^{n}\propto\exp(-\beta\Phi^{n}(\eta)).

2.4. Diffusion Approximation Theorem

In [Bierkens and Roberts(2017)], analysis has been provided for the diffusion limit of the centered and scaled magnetization, as nn approaches infinity in the critical (β=1\beta=1 and h=0h=0) and supercritical (β<1\beta<1) temperature. In this note, we extend their result to the subcritical phase β>1\beta>1 and h0h\neq 0, and establish the following result.

Theorem 2.

Suppose β>1\beta>1 and h0h\neq 0, YnY^{n} jumps at rate nn, then YnY^{n} converge weakly in D([0,),)D([0,\infty),\mathbb{R}) to YY, where YY is the stationary Ornstein-Uhlenbeck process defined by

(5) dY(t)=2(h,β)Y(y)dt+σ(h,β)dB(t),\displaystyle dY(t)=2\ell(h,\beta)Y(y)dt+\sigma(h,\beta)dB(t),

with

σ(h,β)=21|m0(h,β)|,(h,β)=11+|m0(h,β)|β(1|m0(h,β)|).\displaystyle\sigma(h,\beta)=2\sqrt{1-|m_{0}(h,\beta)|},\quad\ell(h,\beta)=\frac{1}{1+|m_{0}(h,\beta)|}-\beta(1-|m_{0}(h,\beta)|).

The key to proving Theorem 2 is to establish non-asymptotic concentration results, which will be carried out in the next section.

3. Non-Asymptotic Concentration Inequality for Sub-Critical Phase

When β>1\beta>1, h>0h>0 (The case of β>1\beta>1, h<0h<0 can be dealt similar by symmetry), it is known, see e.g. Chapter IV in [Ellis(2006)], there are potentially three roots to the Curie-Weiss equation (3), with m0>0m_{0}>0 being the largest, and two other roots being negative. It was pointed in both  [Ellis(2006), Chapter VI] and  [Bierkens and Roberts(2017)], in the case of h0h\neq 0, the other two roots are not global minimum. The following lemma decides the order of derivatives of the two sides of (3) at m0m_{0}.

Lemma 3.

β<11m02\beta<\frac{1}{1-m_{0}^{2}} when β>1\beta>1 and h>0h>0.

Proof.

Define m1:=(β1)/βm_{1}:=\sqrt{(\beta-1)/\beta}, hence m1[0,1)m_{1}\in[0,1) and β=11m12\beta=\frac{1}{1-m_{1}^{2}}. Note that β\beta is the slope of the left hand side of the Curie-Weiss equation (3), which is in an affine form. 11m2\frac{1}{1-m^{2}} is the derivative of the right hand side, and it is increasing in mm. m1m_{1} is the point where these two derivatives are the same. When mm1m\geq m_{1}, the derivative of the left hand side will remain to be β\beta, on the other hand, the derivative of the right hand side will increase in mm. In the following, we will show that βm1+βh>12log1+m11m1\beta m_{1}+\beta h>\frac{1}{2}\log\frac{1+m_{1}}{1-m_{1}}. Therefore, the equality of the two sides must happen after m1m_{1}, i.e., m1<m0m_{1}<m_{0}, and the result follows.

At m1m_{1}, write both sides of the Curie-Weiss equation (3) as functions of β\beta. We have, the left hand side takes the form of fL(β)=(β1)β+hβf_{L}(\beta)=\sqrt{(\beta-1)\beta}+h\beta, and the right hand side fR(β)=log(β+β1)f_{R}(\beta)=\log(\sqrt{\beta}+\sqrt{\beta-1}). Hence, fL(β)=2β12(β1)β+hf^{\prime}_{L}(\beta)=\frac{2\beta-1}{2\sqrt{(\beta-1)\beta}}+h, and fR(β)=12(β1)βf^{\prime}_{R}(\beta)=\frac{1}{2\sqrt{(\beta-1)\beta}}. So, fL(β)>fRβ)f^{\prime}_{L}(\beta)>f^{\prime}_{R}\beta) for β1\beta\geq 1 and h>0h>0. Furthermore, fL(1)=h>0=fR(1)f_{L}(1)=h>0=f_{R}(1). Thus we conclude fL(β)>fR(β)f_{L}(\beta)>f_{R}(\beta), i.e. βm1+βh>12log1+m11m1\beta m_{1}+\beta h>\frac{1}{2}\log\frac{1+m_{1}}{1-m_{1}}. ∎

Lemma 4.

There exist ι0>0\iota_{0}>0 and M1,M2[0,1]M_{1},M_{2}\in[0,1] satisfying 0M1<m00\leq M_{1}<m_{0} and m0<M21m_{0}<M_{2}\leq 1, such that

(6) βm+βh12log1+m1m,\displaystyle\beta m+\beta h\geq\frac{1}{2}\log\frac{1+m^{\prime}}{1-m^{\prime}},

holds for all M1mm0M_{1}\leq m\leq m_{0}, with m=mι0(mm0)m^{\prime}=m-\iota_{0}(m-m_{0}). Meanwhile,

(7) βm+βh12log1+m1m,\displaystyle\beta m+\beta h\leq\frac{1}{2}\log\frac{1+m^{\prime}}{1-m^{\prime}},

holds for all m0mM2m_{0}\leq m\leq M_{2}.

Proof.

Consider a bivariate function K(m,ι)=12log1+mι(mm0)1mι(mm0)(βm+βh)K(m,\iota)=\frac{1}{2}\log\frac{1+m-\iota(m-m_{0})}{1-m-\iota(m-m_{0})}-(\beta m+\beta h). It is easy to see that K(m0,0)K(m_{0},0)=0. Meanwhile, we have, mK(m0,0)=11m02β>0\frac{\partial}{\partial m}K(m_{0},0)=\frac{1}{1-m_{0}^{2}}-\beta>0 from Lemma 3. Therefore, there exists a neighborhood of (m0,0)(m_{0},0) within which mK(m,ι)>0\frac{\partial}{\partial m}K(m,\iota)>0. Pick an ι0>0\iota_{0}>0, as well as M1,M2[0,1]M_{1},M_{2}\in[0,1] satisfying 0M1<m00\leq M_{1}<m_{0} and m0<M21m_{0}<M_{2}\leq 1, such that, mK(m,ι0)>0\frac{\partial}{\partial m}K(m,\iota_{0})>0 for m[M1,M2]m\in[M_{1},M_{2}]. Apparently, K(m,ι0)0K(m,\iota_{0})\leq 0 when mm0m\leq m_{0} and K(m,ι0)0K(m,\iota_{0})\geq 0 when mm0m\geq m_{0} , and they correspond to the two inequalities (6) and (7). ∎

Denote Fn,δ:={ηXn:|η|nδ}F^{n,\delta}:=\{\eta\in X^{n}:|\eta|\leq n^{\delta}\}. Then the above lemmas allow us to obtain the following result, which extends Lemma 5 in [Bierkens and Roberts(2017)] to subcritical temperature.

Lemma 5.

Let 0<δ<120<\delta<\frac{1}{2}. For β>1\beta>1, h0h\neq 0, and any α>0\alpha>0, we have,

limnnαπn(ηn(x)Fn,δ)=0.\displaystyle\lim_{n\rightarrow\infty}n^{\alpha}\pi^{n}(\eta^{n}(x)\notin F^{n,\delta})=0.
Proof.

Lemma 4 indicates that there exists ι0>0\iota_{0}>0, such that inequalities (6) and (7) are satisfied, or equivalently, |mtanh(β(m+h))|ι0|mm0||m-tanh(\beta(m+h))|\geq\iota_{0}|m-m_{0}| for m[M1,M2]m\in[M_{1},M_{2}]. Therefore, for sufficiently large nn, we have

πn(|mm0|nδ12)\displaystyle\pi^{n}(|m-m_{0}|\geq n^{\delta-\frac{1}{2}})\leq πn(|mtanh(β(m+h))|ι0nδ12)\displaystyle\pi^{n}(|m-tanh(\beta(m+h))|\geq\iota_{0}n^{\delta-\frac{1}{2}})
=\displaystyle= πn(|mtanh(β(m+h))|βn+tnn)2exp(tn24(1+β)),\displaystyle\pi^{n}\left(|m-tanh(\beta(m+h))|\geq\frac{\beta}{n}+\frac{t_{n}}{\sqrt{n}}\right)\leq 2\exp\left(-\frac{t_{n}^{2}}{4(1+\beta)}\right),

with

tn=(ι0nδ12βn)n12.t_{n}=\left(\iota_{0}n^{\delta-\frac{1}{2}}-\frac{\beta}{n}\right)n^{\frac{1}{2}}.

The result follows from the fact that tnt_{n}\rightarrow\infty in the order of nδn^{\delta} as nn\rightarrow\infty. ∎

The following lemma is known in [Bierkens and Roberts(2017)].

Lemma 6.

Let 0<δ<120<\delta<\frac{1}{2}. For h0,β>0h\neq 0,\beta>0. we have, for r<min(1,(k+1)(12δ))r<\min(1,(k+1)(\frac{1}{2}-\delta)),

limnsupηFn,δ|nEηn[(Yη)2](h,β)2|=0,\displaystyle\lim_{n\rightarrow\infty}\sup_{\eta\in F^{n,\delta}}|nE_{\eta}^{n}[(Y-\eta)^{2}]-\ell(h,\beta)^{2}|=0,
limnsupηFn,δ|nEηn[(Yη)2]σ(h,β)2|=0.\displaystyle\lim_{n\rightarrow\infty}\sup_{\eta\in F^{n,\delta}}|nE_{\eta}^{n}[(Y-\eta)^{2}]-\sigma(h,\beta)^{2}|=0.
limnsupηFn,δnEηn[|Yη|p]=0,\displaystyle\lim_{n\rightarrow\infty}\sup_{\eta\in F^{n,\delta}}nE_{\eta}^{n}[|Y-\eta|^{p}]=0,

for any p>2p>2.

Proof of Theorem 2.

The proof follows the basic arguments of that of Theorem 1 in [Bierkens and Roberts(2017)], along with the non-asymptotic concentration result in Lemma 5, and we reproduce the key steps for completeness. The basic approach is to compare the action on function in the domain of the generators of YnY^{n} and the diffusion process (5). Suppose that GnG^{n} is the generator of YnY^{n} and GG is the one for the diffusion. Then, we know that,

Gnϕ(η)=n[Pnϕ(η)ϕ(η)],\displaystyle G^{n}\phi(\eta)=n[P^{n}\phi(\eta)-\phi(\eta)],

and

Gϕ(η)=2(h,β)ηdϕdη+12σ2(h,β)d2ϕdη2,\displaystyle G\phi(\eta)=-2\ell(h,\beta)\eta\frac{d\phi}{d\eta}+\frac{1}{2}\sigma^{2}(h,\beta)\frac{d^{2}\phi}{d\eta^{2}},

for ϕD(G):={ϕC02():ηηϕ(η)C0()}\phi\in D(G):=\{\phi\in C_{0}^{2}(\mathbb{R}):\eta\mapsto\eta\phi^{\prime}(\eta)\in C_{0}(\mathbb{R})\}. We have,

supηFn,δ|Gnϕ(η)Gϕ(η)|\displaystyle\sup_{\eta\in F^{n,\delta}}|G^{n}\phi(\eta)-G\phi(\eta)|
\displaystyle\leq supηFn,δ|nEηn[ϕ(Yn)ϕ(η)]nEηn[ϕ(η)(Ynη)+12ϕ′′(η)(Ynη)2]|\displaystyle\sup_{\eta\in F^{n,\delta}}\Big{|}nE^{n}_{\eta}[\phi(Y^{n})-\phi(\eta)]-nE^{n}_{\eta}[\phi^{\prime}(\eta)(Y^{n}-\eta)+\frac{1}{2}\phi^{\prime\prime}(\eta)(Y^{n}-\eta)^{2}]\Big{|}
\displaystyle\leq 16ϕ3Eηn[|Ynη|3]0, as n,\displaystyle\frac{1}{6}\|\phi^{3}\|_{\infty}E^{n}_{\eta}[|Y^{n}-\eta|^{3}]\rightarrow 0,\quad\hbox{ as }n\rightarrow\infty,

where the first inequality follows from the definitions and second inequality follows from Taylor expansion and the convergence is the consequence of estimations of the first, second and higher order terms in Lemma 6. And

Pn(YnFn,δ for some 0tT)nπn(ηn(x)Fn,δ)0 as n,\displaystyle P^{n}(Y^{n}\notin F^{n,\delta}\hbox{ for some }0\leq t\leq T)\leq n\pi^{n}(\eta^{n}(x)\notin F^{n,\delta})\rightarrow 0\quad\hbox{ as }n\rightarrow\infty,

follows from Lemma 5. Then the desired result follows from Corollary 4.8.7 in [Ethier and Kurtz(2009)]. ∎

References

  • [Bierkens and Roberts(2017)] J. Bierkens and G. Roberts. A piecewise deterministic scaling limit of lifted metropolis–hastings in the curie–weiss model. Ann. Appl. Probab., 27(2):846–882, 04 2017. doi: 10.1214/16-AAP1217. URL https://doi.org/10.1214/16-AAP1217.
  • [Bovier(2006)] A. Bovier. Statistical Mechanics of Disordered Systems: A Mathematical Perspective. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2006. doi: 10.1017/CBO9780511616808.
  • [Chatterjee(2007)] S. Chatterjee. Stein’s method for concentration inequalities. Probability Theory and Related Fields, 138(1):305–321, 2007. doi: 10.1007/s00440-006-0029-y. URL https://doi.org/10.1007/s00440-006-0029-y.
  • [Ellis(2006)] R. Ellis. Entropy, Large Deviations, and Statistical Mechanics. Classics in Mathematics. Springer, 2006. ISBN 9783540290599. URL https://books.google.com/books?id=jNxKUjihQRYC.
  • [Ethier and Kurtz(2009)] S. Ethier and T. Kurtz. Markov Processes: Characterization and Convergence. Wiley Series in Probability and Statistics. Wiley, 2009. ISBN 9780470317327. URL https://books.google.com/books?id=zvE9RFouKoMC.