This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimal Mixing via Tensorization for Random Independent Sets on Arbitrary Trees

Charilaos Efthymiou Department of Computer Science, University of Warwick, UK. Email: [email protected]. Research supported by EPSRC NIA, grant EP/V050842/1, and Centre of Discrete Mathematics and Applications (DIMAP).    Thomas P. Hayes Dept. of Computer Science and Engineering, University at Buffalo. Email: [email protected].    Daniel Štefankovič Department of Computer Science, University of Rochester. Email: [email protected]. Research supported in part by NSF grant CCF-1563757.    Eric Vigoda Department of Computer Science, University of California, Santa Barbara. Email: [email protected]. Research supported in part by NSF grant CCF-2147094.
Abstract

We study the mixing time of the single-site update Markov chain, known as the Glauber dynamics, for generating a random independent set of a tree. Our focus is obtaining optimal convergence results for arbitrary trees. We consider the more general problem of sampling from the Gibbs distribution in the hard-core model where independent sets are weighted by a parameter λ>0\lambda>0; the special case λ=1\lambda=1 corresponds to the uniform distribution over all independent sets. Previous work of Martinelli, Sinclair and Weitz (2004) obtained optimal mixing time bounds for the complete Δ\Delta-regular tree for all λ\lambda. However, Restrepo, Stefankovic, Vera, Vigoda, and Yang (2014) showed that for sufficiently large λ\lambda there are bounded-degree trees where optimal mixing does not hold. Recent work of Eppstein and Frishberg (2022) proved a polynomial mixing time bound for the Glauber dynamics for arbitrary trees, and more generally for graphs of bounded tree-width.

We establish an optimal bound on the relaxation time (i.e., inverse spectral gap) of O(n)O(n) for the Glauber dynamics for unweighted independent sets on arbitrary trees. We stress that our results hold for arbitrary trees and there is no dependence on the maximum degree Δ\Delta. Interestingly, our results extend (far) beyond the uniqueness threshold which is on the order λ=O(1/Δ)\lambda=O(1/\Delta). Our proof approach is inspired by recent work on spectral independence. In fact, we prove that spectral independence holds with a constant independent of the maximum degree for any tree, but this does not imply mixing for general trees as the optimal mixing results of Chen, Liu, and Vigoda (2021) only apply for bounded degree graphs. We instead utilize the combinatorial nature of independent sets to directly prove approximate tensorization of variance via a non-trivial inductive proof.

1 Introduction

This paper studies the mixing time of the Glauber dynamics for the hard-core model assuming that the underlying graph is an arbitrary tree. In the hard-core model, we are given a graph G=(V,E)G=(V,E) and an activity λ>0\lambda>0. The model is defined on the collection of all independent sets of GG (regardless of size), which we denote as Ω\Omega. Each independent set σΩ\sigma\in\Omega is assigned a weight w(σ)=λ|σ|w(\sigma)=\lambda^{|\sigma|} where |σ||\sigma| is the number of vertices contained in the independent set σ\sigma. The Gibbs distribution μ\mu is defined on Ω\Omega: for σΩ\sigma\in\Omega, let μ(σ)=w(σ)/Z\mu(\sigma)=w(\sigma)/Z where Z=τΩw(τ)Z=\sum_{\tau\in\Omega}w(\tau) is known as the partition function. When λ=1\lambda=1 then every independent set has weight one and hence the Gibbs distribution μ\mu is the uniform distribution over (unweighted) independent sets.

Our goal is to sample from μ\mu (or estimate ZZ) in time polynomial in n=|V|n=|V|. Our focus is on trees. These sampling and counting problems are computationally easy on trees using dynamic programming algorithms. Nevertheless, our interest is to understand the convergence properties of a simple Markov Chain Monte Carlo (MCMC) algorithm known as the Glauber dynamics for sampling from the Gibbs distribution.

The Glauber dynamics (also known as the Gibbs sampler) is the simple single-site update Markov chain for sampling from the Gibbs distribution of a graphical model. For the hard-core model with activity λ\lambda, the transitions XtXt+1X_{t}\rightarrow X_{t+1} of the Glauber dynamics are defined as follows: first, choose a random vertex vv. Then, with probability λ1+λ\frac{\lambda}{1+\lambda} set X=Xt{v}X^{\prime}=X_{t}\cup\{v\} and with the complementary probability set X=Xt{v}X^{\prime}=X_{t}\setminus\{v\}. If XX^{\prime} is an independent set, then set Xt+1=XX_{t+1}=X^{\prime} and otherwise set Xt+1=XtX_{t+1}=X_{t}.

We consider two standard notions of convergence to stationarity. The relaxation time is the inverse spectral gap, i.e., (1λ)1(1-\lambda^{*})^{-1} where λ=max{λ2,|λN|}\lambda^{*}=\max\{\lambda_{2},|\lambda_{N}|\} and 1=λ1>λ2λN>11=\lambda_{1}>\lambda_{2}\geq\dots\geq\lambda_{N}>-1 are the eigenvalues of the transition matrix PP for the Glauber dynamics. The relaxation time is a key quantity in the running time for approximate counting algorithms (see, e.g., Štefankovič, Vempala, and Vigoda [ŠVV09]). The mixing time is the number of steps, from the worst initial state, to reach within total variation distance 1/2e\leq 1/2e of the stationary distribution, which in our case is the Gibbs distribution μ\mu.

We say that O(n)O(n) is the optimal relaxation time and that O(nlogn)O(n\log{n}) is the optimal mixing time (see Hayes and Sinclair [HS07] for a matching lower bound for any constant degree graph). Here, nn denotes the size of the underlying graph. More generally, we say the Glauber dynamics is rapidly mixing when the mixing time is poly(n)\mathrm{poly}(n).

We establish bounds on the mixing time of the Glauber dynamics by means of approximate tensorization inequalities for the variance of the hard-core model. Interestingly, our analysis utilizes nothing further than the inductive nature of the tree, e.g., we do not make any assumptions about spatial mixing properties of the Gibbs distribution. As a consequence, the bounds we obtain have no dependence on the maximum degree of the graph.

To be more specific we derive the following two group of results: We establish approximate tensorization of variance of the hard-core model on the tree for all λ<1.1\lambda<1.1. This implies optimal O(n)O(n) relaxation time for the Glauber dynamics. Notably this also includes the uniform distribution over independent sets, i.e., λ=1\lambda=1.

We can now state our main results.

Theorem 1.1.

For any nn-vertex tree, for any λ<1.1\lambda<1.1 the Glauber dynamics for sampling λ\lambda-weighted independent sets in the hard-core model has an optimal relaxation time of O(n)O(n).

We believe the optimal mixing results of Theorem 1.1 are related to the reconstruction threshold, which we describe now. Consider the complete Δ\Delta-regular tree of height hh; this is the rooted tree where all nodes at distance <h\ell<h from the root have Δ1\Delta-1 children and all nodes at distance hh from the root are leaves. We are interested in how the configuration at the leaves affects the configuration at the root.

Consider fixing an assignment/configuration σ\sigma to the leaves (i.e., specifying which leaves are fixed to occupied and which are unoccupied), we refer to this fixed assignment to the leaves as a boundary condition σ\sigma. Let μσ\mu_{\sigma} denote the Gibbs distribution conditional on this fixed boundary condition σ\sigma, and let pσp_{\sigma} denote the marginal probability that the root is occupied in μσ\mu_{\sigma}.

The uniqueness threshold λc(Δ)\lambda_{c}(\Delta) measures the affect of the worst-case boundary condition on the root. For all λ<λc(Δ)\lambda<\lambda_{c}(\Delta), all σσ\sigma\neq\sigma^{\prime}, in the limit hh\rightarrow\infty, we have pσ=pσp_{\sigma}=p_{\sigma}^{\prime}; this is known as the (tree) uniqueness region. In contrast, for λ>λc(Δ)\lambda>\lambda_{c}(\Delta) there are pairs σσ\sigma\neq\sigma^{\prime} (namely, all even occupied vs. odd occupied) for which the limits are different; this is the non-uniqueness region. The uniqueness threshold is at λc(Δ)=(Δ1)Δ1/(Δ2)Δ=O(1/Δ)\lambda_{c}(\Delta)=(\Delta-1)^{\Delta-1}/(\Delta-2)^{\Delta}=O(1/\Delta).

In contrast, the reconstruction threshold λr(Δ)\lambda_{r}(\Delta) measures the affect on the root of a random/typical boundary condition. In particular, we fix an assignment cc at the root and then generate the Gibbs distribution via an appropriately defined broadcasting process. Finally, we fix the boundary configuration σ\sigma and ask whether, in the conditional Gibbs distribution μσ\mu_{\sigma}, the root has a bias towards the initial assignment cc. The non-reconstruction region λ<λr(Δ)\lambda<\lambda_{r}(\Delta) corresponds to when we cannot infer the root’s initial value, in expectation over the choice of the boundary configuration σ\sigma and in the limit hh\rightarrow\infty, see Mossel [Mos04] for a more complete introduction to reconstruction.

The reconstruction threshold is not known exactly but close bounds were established by Bhatnagar, Sly, and Tetali [BST16] and Brightwell and Winkler [BW04]; they showed that for constants C1,C2>0C_{1},C_{2}>0 and sufficiently large Δ\Delta: C1log2Δ/loglogΔλr(Δ)C2log2ΔC_{1}\log^{2}{\Delta}/\log\log{\Delta}\leq\lambda_{r}(\Delta)\leq C_{2}\log^{2}{\Delta}, and hence λr(Δ)\lambda_{r}(\Delta) is “increasing asymptotically” with Δ\Delta whereas the uniqueness threshold is a decreasing function of Δ\Delta. Martin [Mar03] showed that λr(Δ)>e1\lambda_{r}(\Delta)>e-1 for all Δ\Delta. As a consequence, we conjecture that Theorem 1.1 holds for all trees for all λ<e1\lambda<e-1, which is close to the bound we obtain. A slowdown in the reconstruction region is known: Restrepo, Štefankovič, Vera, Vigoda, and Yang [RŠV+14] showed that there are trees for which there is a polynomial slow down for λ>C\lambda>C for a constant C>0C>0; an explicit constant CC is not stated in [RŠV+14] but using the Kesten-Stigum bound one can show C28C\approx 28 (by considering binary trees).

For general graphs the appropriate threshold is the uniqueness threshold, which is λc(Δ)=O(1/Δ)\lambda_{c}(\Delta)=O(1/\Delta). In particular, for bipartite random Δ\Delta-regular graphs the Glauber dynamics has optimal mixing in the uniqueness region by Chen, Liu and Vigoda [CLV21], and is exponentially slow in the non-uniqueness region by Mossel, Weitz, and Wormald [MWW07] (see also [GŠV16]). Moreover, for general graphs there is a computational phase transition at the uniqueness threshold: optimal mixing on all graphs of maximum degree Δ\Delta in the uniqueness region [CLV21, CFYZ22, CE22], and NP-hardness to approximately count/sample in the non-uniqueness region by Sly [Sly10] (see also, [GŠV16, SS14]).

There are a variety of mixing results for the special case on trees, which is the focus of this paper. In terms of establishing optimal mixing time bounds for the Glauber dynamics, previous results only applied to complete, Δ\Delta-regular trees. Seminal work of Martinelli, Sinclair, and Weitz [MSW03, MSW04] proved optimal mixing on complete Δ\Delta-regular trees for all λ\lambda. The intuitive reason this holds for all λ\lambda is that the complete tree corresponds to one of the two extremal phases (all even boundary or all odd boundary) and hence it does not exhibit the phase co-existence which causes mixing. As mentioned earlier, [RŠV+14] shows that there is a fixed assignment τ\tau for the leaves of the complete, Δ\Delta-regular tree so that the mixing time slows down in the reconstruction region; intuitively, this boundary condition τ\tau corresponds to the assignment obtained by the broadcasting process.

For more general trees the following results were known. A classical result of Berger, Kenyon, Mossel and Peres [BKMP05] proves polynomial mixing time for trees with constant maximum degree [BKMP05]. A very recent result of Eppstein and Frishberg [EF23] proved polynomial mixing time nC(λ)n^{C(\lambda)} of the Glauber dynamics for graphs with bounded tree-width which includes arbitrary trees, however the polynomial exponent is C(λ)=O(1+|log(λ)|)C(\lambda)=O(1+|\log(\lambda)|) for trees; see more recent work of Chen [Che24] for further improvements. For other combinatorial models, rapid mixing for the Glauber dynamics on trees with bounded maximum degree was established for kk-colorings in [LMP09] and edge-colorings in [DHP20].

Spectral independence is a powerful notion in the analysis of the convergence rate of Markov Chain Monte Carlo (MCMC) algorithms. For independent sets on an nn-vertex graph G=(V,E)G=(V,E), spectral independence considers the n×nn\times n pairwise influence matrix G{\mathcal{I}}_{G} where G(v,w)=Probσμ(vσ|wσ)Probσμ(vσ|wσ){\mathcal{I}}_{G}(v,w)=\mathrm{Prob}_{\sigma\sim\mu}(v\in\sigma\ |\ w\in\sigma)-\mathrm{Prob}_{\sigma\sim\mu}(v\in\sigma\ |\ w\notin\sigma); this matrix is closely related to the vertex covariance matrix. We say that spectral independence holds if the maximum eigenvalue of G{\mathcal{I}}_{G^{\prime}} for all vertex-induced subgraphs GG^{\prime} of GG are bounded by a constant. Spectral independence was introduced by Anari, Liu, and Oveis Gharan [ALO21]. Chen, Liu, and Vigoda [CLV21] proved that spectral independence, together with a simple condition known as marginal boundedness which is a lower bound on the marginal probability of a valid vertex-spin pair, implies optimal mixing time of the Glauber dynamics for constant-degree graphs. This has led to a flurry of optimal mixing results, e.g., [CLV22, BCC+22, Liu21, CLMM23, CG24].

The limitation of the above spectral independence results is that they only hold for graphs with constant maximum degree Δ\Delta. There are several noteworthy results that achieve a stronger form of spectral independence which establishes optimal mixing for unbounded degree graphs [CFYZ22, CE22]; however all of these results for general graphs only achieve rapid mixing in the tree uniqueness region which corresponds to λ=O(1/Δ)\lambda=O(1/\Delta) whereas we are aiming for λ=Θ(1)\lambda=\Theta(1).

The inductive approach we use to establish approximate tensorization inequalities can also be utilized to establish spectral independence. In fact, we show that spectral independence holds for any tree when λ<1.3\lambda<1.3, including the case where λ=1\lambda=1, see Section 4. Applying the results of Anari, Liu, and Oveis Gharan [ALO21] we obtain a poly(n)\mathrm{poly}(n) bound on the mixing time, but with a large constant in the exponent of nn. For constant degree trees we obtain the following optimal mixing result by applying the results of Chen, Liu, and Vigoda [CLV21] (see also [BCC+22, CFYZ22, CE22]).

Theorem 1.2.

For all constant Δ\Delta, all λ1.3\lambda\leq 1.3, for any tree TT with maximum degree Δ\Delta, the Glauber dynamics for sampling λ\lambda-weighted independent sets has an optimal mixing time of O(nlogn)O(n\log{n}).

Interestingly, combining the spectral independence results from the proof Theorem 1.2 with results of Chen, Feng, Yin, and Zhang [CFYZ24, Theorem 1.9], we are able to strengthen Theorem 1.1 by allowing larger fugacities, i.e., λ1.3\lambda\leq 1.3.

Corollary 1.3.

For any nn-vertex tree, for any λ1.3\lambda\leq 1.3 the Glauber dynamics for sampling λ\lambda-weighted independent sets in the hard-core model has an optimal relaxation time of O(n)O(n).

In the next section we recall the key functional definitions and basic properties of variance that will be useful later in the proofs. In Section 3 we prove approximate tensorization of variance which establishes Theorem 1.1. We establish spectral independence and prove Theorem 1.2 in Section 4.

Remark 1.4.

An earlier version of this paper [EHŠV23] claimed to prove O(nlogn)O(n\log{n}) mixing time for λ.44\lambda\leq.44 for any tree (without any constraint on the maximum degree). There was a mistake in that proof. In particular, inequality (54) is false. Moreover, Zongchen Chen pointed out a simple test function ff which shows that entropy tensorization and modified log-Sobolev inequality (MLSI) do not hold for the star graph with constants independent of the degree.

A recent paper of Chen, Yang, Yin, and Zhang [CYYZ24] improves Corollary 1.3 to obtain optimal relaxation time on trees for λ<e2\lambda<e^{2}.

2 Preliminaries

2.1 Standard Definitions

Let PP be the transition matrix of a Markov chain {Xt}\{X_{t}\} with a finite state space Ω\Omega and equilibrium distribution μ\mu. For t0t\geq 0 and σΩ\sigma\in\Omega, let Pt(σ,)P^{t}(\sigma,\cdot) denote the distribution of XtX_{t} when the initial state of the chain satisfies X0=σX_{0}=\sigma. The mixing time of the Markov chain {Xt}t0\{X_{t}\}_{t\geq 0} is defined by

Tmix\displaystyle T_{\rm mix} =maxσΩmin{t>0Pt(σ,)μTV12e}.\displaystyle=\max_{\sigma\in\Omega}\min{\left\{t>0\mid\|P^{t}(\sigma,\cdot)-\mu\|_{\rm TV}\leq\frac{1}{2\mathrm{e}}\right\}}. (1)

The transition matrix PP with stationary distribution μ\mu is called time reversible if it satisfies the so-called detailed balance relation, i.e., for any x,yΩx,y\in\Omega we have μ(x)P(x,y)=P(y,x)μ(y)\mu(x)P(x,y)=P(y,x)\mu(y). For PP that is time reversible the set of eigenvalues are real numbers and we denote them as 1=λ1λ2λ|Ω|11=\lambda_{1}\geq\lambda_{2}\geq\ldots\lambda_{|\Omega|}\geq-1. Let λ=max{|λ2|,|λ|Ω||}\lambda^{*}=\max\{|\lambda_{2}|,|\lambda_{|\Omega|}|\}, then we define the relaxation time TrelaxT_{\text{relax}} by

Trelax(P)=11λ.\displaystyle T_{\text{relax}}(P)=\frac{1}{1-\lambda^{*}}. (2)

The quantity 1λ1-\lambda^{*} is also known as the spectral gap of PP. We use TrelaxT_{\text{relax}} to bound TmixT_{\text{mix}} by using the following inequality

Tmix(P)Trelax(P)log(2eminxΩμ(x)).\displaystyle T_{\text{mix}}(P)\leq T_{\text{relax}}(P)\cdot\log\left(\frac{2e}{\min_{x\in\Omega}\mu(x)}\right). (3)

2.2 Gibbs Distributions and Functional Analytic Definitions

For a graph G=(V,E)G=(V,E) and λ>0\lambda>0, let μ=μG,λ\mu=\mu_{G,\lambda} be the hard-core model on this graph with activity λ\lambda, while let Ω2V\Omega\subseteq 2^{V} be the support of μ\mu, i.e., Ω\Omega are the collection of independent sets of GG.

For any function f:Ω0f:\Omega\to\mathbb{R}_{\geq 0}, we let μ(f)\mu(f) is the expected value of ff with respect to μ\mu, i.e.,

μ(f)\displaystyle\mu(f) =σΩμ(σ)f(σ).\displaystyle=\sum_{\sigma\in\Omega}\mu(\sigma)f(\sigma).

Analogously, we define variance of ff with respect to μ\mu by

Var(f)=μ(f2)(μ(f))2.\mathrm{Var}(f)=\textstyle\mu(f^{2})-\left(\mu(f)\right)^{2}.

We are also using the following equation for Var(f)\mathrm{Var}(f),

Var(f)=12σ,τΩμ(σ)μ(τ)(f(σ)f(τ))2.\mathrm{Var}(f)=\textstyle\frac{1}{2}\sum_{\sigma,\tau\in\Omega}\mu(\sigma)\mu(\tau)\left(f(\sigma)-f(\tau)\right)^{2}.

For any subset SVS\subseteq V, let ΩS\Omega_{S} denote the set of independent sets on SS. Then, let μS\mu_{S} denote the marginal of μ\mu on SS; that is, for any σΩS\sigma\in\Omega_{S}, we have that

μS(σ)=ηΩ𝟏{ηS=σ}μ(η).\mu_{S}(\sigma)=\sum_{\eta\in\Omega}\mathbf{1}\{\eta\cap S=\sigma\}\mu(\eta).

For any SVS\subset V, any τΩVS\tau\in\Omega_{V\setminus S}, we let μSτ\mu^{\tau}_{S} be the distribution μ\mu conditional on the configuration τ\tau on VSV\setminus S, and let ΩSτ\Omega_{S}^{\tau} to be the support of μSτ\mu^{\tau}_{S}.

For any SVS\subseteq V, for any τΩVS\tau\in\Omega_{V\setminus S}, we define the function fτ:ΩSτ0f_{\tau}:\Omega^{\tau}_{S}\to\mathbb{R}_{\geq 0} such that fτ(σ)=f(τσ)f_{\tau}(\sigma)=f(\tau\cup\sigma) for all σΩSτ\sigma\in\Omega^{\tau}_{S}.

Let

μSτ(f)=σΩSτμSτ(σ)fτ(σ).\mu_{S}^{\tau}(f)=\sum_{\sigma\in\Omega^{\tau}_{S}}\mu^{\tau}_{S}(\sigma)f_{\tau}(\sigma).

Let VarSτ(f)\mathrm{Var}_{S}^{\tau}(f) denote the variance of fτf_{\tau} with respect to the conditional distribution μSτ\mu^{\tau}_{S}:

VarSτ(f)\displaystyle\mathrm{Var}_{S}^{\tau}(f) =μSτ(f2)(μSτ(f))2\displaystyle=\textstyle\mu^{\tau}_{S}(f^{2})-\left(\mu^{\tau}_{S}(f)\right)^{2} =12σ,ηΩ𝟏{σS=τ,ηS=τ}μ(σ)μ(η)(σ^Ω𝟏{σ^S=τ}μ(σ^))2(f(σ)f(η))2.\displaystyle=\frac{1}{2}\sum_{\sigma,\eta\in\Omega}\frac{\mathbf{1}\{\sigma\setminus S=\tau,\ \eta\setminus S=\tau\}\mu(\sigma)\mu(\eta)}{\left(\sum_{\hat{\sigma}\in\Omega}\mathbf{1}\{\hat{\sigma}\setminus S=\tau\}\mu(\hat{\sigma})\right)^{2}}\left(f(\sigma)-f(\eta)\right)^{2}. (4)

Furthermore, we let

μ(VarS(f))=τΩVSμVS(τ)VarSτ(f),\mu(\mathrm{Var}_{S}(f))=\sum_{\tau\in\Omega_{V\setminus S}}\mu_{V\setminus S}(\tau)\cdot\mathrm{Var}_{S}^{\tau}(f), (5)

i.e., μ(VarS(f))\mu(\mathrm{Var}_{S}(f)) is the average of VarSτ(f)\mathrm{Var}_{S}^{\tau}(f) with respect to τ\tau being distributed as in μVS()\mu_{V\setminus S}(\cdot). For the sake of brevity, when S={v}S=\{v\}, i.e., the set SS is a singleton, we use μ(Varv(f))\mu(\mathrm{Var}_{v}(f)).

Finally, let

Var(μS(f))=μ((μS(f))2)(μ(μS(f)))2=𝔼τμVS[(μSτ(f))2](𝔼τμVS[μSτ(f)])2,\mathrm{Var}(\mu_{S}(f))=\mu\left(\left(\mu_{S}(f)\right)^{2}\right)-\left(\mu\left(\mu_{S}(f)\right)\right)^{2}=\mathbb{E}_{\tau\sim\mu_{V\setminus S}}\left[\left(\mu^{\tau}_{S}(f)\right)^{2}\right]-\left(\mathbb{E}_{\tau\sim\mu_{V\setminus S}}\left[\mu^{\tau}_{S}(f)\right]\right)^{2}, (6)

i.e., Var(μS(f))\mathrm{Var}(\mu_{S}(f)) is the variance of μSτ(f)\mu_{S}^{\tau}(f) with respect to τ\tau being distributed as in μVS()\mu_{V\setminus S}(\cdot).

When XX is the following two-valued random variable:

X={A with probability pB with probability 1p,X=\begin{cases}A&\mbox{ with probability $p$}\\ B&\mbox{ with probability $1-p$},\end{cases}

then one formulation for the variance that will be convenient for us is

Var(X)=p(1p)(AB)2.\mathrm{Var}(X)=p(1-p)(A-B)^{2}. (7)

2.3 Approximate Tensorization of Variance

To bound the convergence rate of the Glauber dynamics we consider the approximate tensorization of variance as introduced in [CMT15].

We begin with the definition of approximate tensorization of variance.

Definition 2.1 (Variance Tensorization).

A distribution μ\mu with support Ω{±1}V\Omega\subseteq\{\pm 1\}^{V} satisfies the approximate tensorisation of Variance with constant C>0C>0, denoted using the predicate VT(C)VT(C), if for all f:Ω0f:\Omega\to\mathbb{R}_{\geq 0} we have that

Var(f)CvVμ(Varv(f)).\displaystyle\mathrm{Var}(f)\leq C\cdot\sum_{v\in V}\mu\left(\mathrm{Var}_{v}(f)\right).

For a vertex xx, recall that Varx[f]=σμV{x}(σ)Varxσ[fσ]\mathrm{Var}_{x}[f]=\sum_{\sigma}\mu_{V\setminus\{x\}}(\sigma)\mathrm{Var}^{\sigma}_{x}[f_{\sigma}]. Variance tensorization VT(C)VT(C) yields the following bound on the relaxation time of the Glauber dynamics [CMT15, Cap23]:

TrelaxCn.T_{\text{relax}}\leq Cn. (8)

2.4 Decomposition of Variance

We use the following basic decomposition properties for variance. The following lemma follows from a decomposition of relative entropy, see [CP21, Lemma 3.1] (see also [BCC+22, Lemma 2.3]).

Lemma 2.2.

For any SVS\subset V, for any f0f\geq 0:

Var(f)\displaystyle\mathrm{Var}(f) =μ[VarS(f)]+Var(μS(f)),\displaystyle=\mu[\mathrm{Var}_{S}(f)]+\mathrm{Var}(\mu_{S}(f)), (9)

where Var(μS(f))\mathrm{Var}(\mu_{S}(f)) is defined in eq. 6.

We utilize the following variance factorisation for product measures, see [Cap23, Eqn (4.7)].

Lemma 2.3.

Consider U,WVU,W\subset V where dist(U,W)2\mathrm{dist}(U,W)\geq 2. Then for all f0f\geq 0 we have:

μ[VarU(μW(f))]\displaystyle\mu[\mathrm{Var}_{U}(\mu_{W}(f))] μ[VarU(f)],\displaystyle\leq\mu[\mathrm{Var}_{U}(f)], (10)

On a first account, the reader might find it challenging to parse the expression μ[VarU(μW(f))]\mu[\mathrm{Var}_{U}(\mu_{W}(f))]. In that respect, (5) and (6) might be useful. Specifically, μ[VarU(μW(f))]\mu[\mathrm{Var}_{U}(\mu_{W}(f))] is the expectation of VarUτ(μW(f))\mathrm{Var}^{\tau}_{U}(\mu_{W}(f)) with respect to τ\tau being distributed as in μU¯W¯\mu_{\bar{U}\cap\bar{W}}. Furthermore, VarUτ(μW(f))\mathrm{Var}^{\tau}_{U}(\mu_{W}(f)) corresponds to the variance of μWτ,σ(f)\mu_{W}^{\tau,\sigma}(f) with respect to the configurations τ\tau at V(UW)V\setminus(U\cup W) and σ\sigma at UU, while τ\tau is fixed and σ\sigma is distributed as in μUτ()\mu^{\tau}_{U}(\cdot).

Proof.

We apply [Cap23, Eqn (4.7)], which reaches the same conclusion under the assumptions that UW=U\cap W=\emptyset and μUμW=μWμU\mu_{U}\mu_{W}=\mu_{W}\mu_{U}. In the current context, the reason these conditional expectation operators commute here is that, because dist(U,W)2\mathrm{dist}(U,W)\geq 2, if we let SS be an independent set sampled according to distribution μ\mu, then the random variables SUS\cap U and SWS\cap W are conditionally independent given S(UW)S\setminus(U\cup W). ∎

3 Variance Factorization

In this section we prove Theorem 1.1, establishing an optimal bound on the relaxation time for the Glauber dynamics on any tree for λ1.1\lambda\leq 1.1. We will prove this by establishing variance tensorization, see Definition 2.1.

Consider a graph G=(V,E)G=(V,E) and a collection of fugacities λi\lambda_{i} for each iVi\in V. Throughout this section we will assume that all the fugacities are bounded by 1.11.1. Consider the following more general definition of the Gibbs distribution μ\mu for the hard-core model, where for an independent set SS,

μ(S)iSλi.\mu(S)\propto\prod_{i\in S}\lambda_{i}. (11)

Let T=(V,E)T^{\prime}=(V^{\prime},E^{\prime}) be a tree, let {λw}wV\{\lambda^{\prime}_{w}\}_{w\in V^{\prime}} be a collection of fugacities and let μ\mu^{\prime} be the corresponding Gibbs distribution. We will establish the following variance tensorization inequality: for all f:2Vf^{\prime}:2^{V^{\prime}}\rightarrow{\mathbb{R}}

Var(f)xVF(λx)μ(Varx(f)),\mathrm{Var}(f^{\prime})\leq\sum_{x\in V^{\prime}}F(\lambda^{\prime}_{x})\mu^{\prime}(\mathrm{Var}_{x}(f^{\prime})), (12)

where F:00F:\mathbb{R}_{\geq 0}\rightarrow\mathbb{R}_{\geq 0} is a function to be determined later (in Lemma 3.1). We refer to Var(f)\mathrm{Var}(f^{\prime}) as the “global” variance and we refer to μ(Varx(f))\mu^{\prime}(\mathrm{Var}_{x}(f^{\prime})) as the “local” variance (of ff^{\prime} at xx).

We will establish (12) using induction. Let vv be a vertex of degree 11 in TT^{\prime} and let uu be the unique neighbor of vv. Let T=(V,E)T=(V,E) be the tree by removing vv from TT^{\prime}, i.e., TT is the induced subgraph of TT^{\prime} on V=V{v}V=V^{\prime}\setminus\{v\}. Let {λw}wV\{\lambda_{w}\}_{w\in V} be a collection of fugacities where λw=λw\lambda_{w}=\lambda^{\prime}_{w} for wuw\neq u and λu=λu/(1+λv)\lambda_{u}=\lambda^{\prime}_{u}/(1+\lambda^{\prime}_{v}). Let μ\mu be the hard-core measure on TT with fugacities {λw}wV\{\lambda_{w}\}_{w\in V}.

Note that for SVS\subseteq V we have

μ(S)=μ(S)+μ(S{v})=μV(S).\mu(S)=\mu^{\prime}(S)+\mu^{\prime}(S\cup\{v\})=\mu^{\prime}_{V}(S). (13)

Fix a function f:2Vf^{\prime}:2^{V^{\prime}}\rightarrow\mathbb{R}. Let f:2Vf:2^{V}\rightarrow\mathbb{R} be defined by

f(S)=μ(S)f(S)+μ(S{v})f(S{v})μ(S)+μ(S{v})=𝔼Zμ[f(Z)ZV=S]=μv(f)(S).f(S)=\frac{\mu^{\prime}(S)f^{\prime}(S)+\mu^{\prime}(S\cup\{v\})f^{\prime}(S\cup\{v\})}{\mu^{\prime}(S)+\mu^{\prime}(S\cup\{v\})}=\mathbb{E}_{Z\sim\mu^{\prime}}[f^{\prime}(Z)\mid Z\cap V=S]=\mu^{\prime}_{v}(f^{\prime})(S). (14)

Note that ff^{\prime} is defined on independent sets of the tree TT^{\prime} and ff is the natural projection of ff^{\prime} to the tree TT. Since f=μv(f)f=\mu^{\prime}_{v}(f^{\prime}), then by Lemma 2.2 we have that:

Var(f)=μ(Varv(f))+Var(f).\mathrm{Var}(f^{\prime})=\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime}))+\mathrm{Var}(f). (15)

For measure μ\mu^{\prime}, when we condition on the configuration at uu, the configuration at V{u}V\setminus\{u\} is independent of that at {v}\{v\}. Hence, from eq. 10 for any x{u,v}x\not\in\{u,v\} (by setting U={x}U=\{x\} and W={v}W=\{v\}) we have:

μ(Varx(f))μ(Varx(f)).\mu^{\prime}(\mathrm{Var}_{x}(f))\leq\mu^{\prime}(\mathrm{Var}_{x}(f^{\prime})).

Since by (13) we have μ(Varx(f))=μ(Varx(f))\mu(\mathrm{Var}_{x}(f))=\mu^{\prime}(\mathrm{Var}_{x}(f)), the above implies that

μ(Varx(f))μ(Varx(f)).\mu(\mathrm{Var}_{x}(f))\leq\mu^{\prime}(\mathrm{Var}_{x}(f^{\prime})). (16)

The following lemma is the main technical ingredient. It bounds the local variance at uu for the smaller graph TT in terms of the local variance at uu and vv in the original graph TT^{\prime}.

Lemma 3.1.

For F(x)=1000(1+x)2exp(1.3x)F(x)=1000(1+x)^{2}\exp(1.3x) and any λv,λu(0,1.1]\lambda_{v},\lambda_{u}\in(0,1.1] we have:

F(λu)μ(Varu(f))(F(λv)1)μ(Varv(f))+F(λu)μ(Varu(f)).F(\lambda_{u})\mu(\mathrm{Var}_{u}(f))\leq(F(\lambda^{\prime}_{v})-1)\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime}))+F(\lambda^{\prime}_{u})\mu^{\prime}(\mathrm{Var}_{u}(f^{\prime})). (17)

We now utilize the above lemma to prove the main theorem for the relaxation time. We then go back to prove Lemma 3.1.

Proof of Theorem 1.1.

Note eq. 17 is equivalent to:

μ(Varv(f))+F(λu)μ(Varu(f))F(λv)μ(Varv(f))+F(λu)μ(Varu(f)).\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime}))+F(\lambda_{u})\mu(\mathrm{Var}_{u}(f))\leq F(\lambda^{\prime}_{v})\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime}))+F(\lambda^{\prime}_{u})\mu^{\prime}(\mathrm{Var}_{u}(f^{\prime})). (18)

We can prove variance tensorization by induction as follows:

Var(f)\displaystyle\mathrm{Var}(f^{\prime}) =μ(Varv(f))+Var(f)\displaystyle=\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime}))+\mathrm{Var}(f)
μ(Varv(f))+xVF(λx)μ(Varx(f))\displaystyle\leq\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime}))+\sum_{x\in V}F(\lambda_{x})\mu(\mathrm{Var}_{x}(f))
μ(Varv(f))+F(λu)μ(Varu(f))+xV{u}F(λx)μ(Varx(f))\displaystyle\leq\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime}))+F(\lambda_{u})\mu(\mathrm{Var}_{u}(f))+\sum_{x\in V\setminus\{u\}}F(\lambda^{\prime}_{x})\mu^{\prime}(\mathrm{Var}_{x}(f^{\prime}))
F(λv)μ(Varv(f))+F(λu)μ(Varu(f))+xV{u}F(λx)μ(Varx(f))\displaystyle\leq F(\lambda^{\prime}_{v})\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime}))+F(\lambda^{\prime}_{u})\mu^{\prime}(\mathrm{Var}_{u}(f^{\prime}))+\sum_{x\in V\setminus\{u\}}F(\lambda^{\prime}_{x})\mu^{\prime}(\mathrm{Var}_{x}(f^{\prime}))
=xVF(λx)μ(Varx(f)).\displaystyle=\sum_{x\in V^{\prime}}F(\lambda^{\prime}_{x})\mu^{\prime}(\mathrm{Var}_{x}(f^{\prime})).

For the first line, we use eq. 15. The second line follows from the inductive hypothesis. For the third line, we use eq. 16 and the fact that F(λx)F(λx)F(\lambda_{x})\leq F(\lambda^{\prime}_{x}), since FF is increasing and λxλx\lambda_{x}\leq\lambda^{\prime}_{x}. The fourth line follows by using eq. 18. ∎

Our task now is to prove Lemma 3.1. The following technical inequality will be useful.

Lemma 3.2.

Let p[0,1]p\in[0,1]. Suppose s1,s2>0s_{1},s_{2}>0 satisfy s1s21s_{1}s_{2}\geq 1. Then for all A,B,CA,B,C\in\mathbb{R} we have

(CpA(1p)B)2(1+s1)(CA)2+(1p)2(1+s2)(BA)2.(C-pA-(1-p)B)^{2}\leq(1+s_{1})(C-A)^{2}+(1-p)^{2}(1+s_{2})(B-A)^{2}. (19)
Proof.

Equation (19) is equivalent to

2(1p)(CA)(AB)s1(CA)2+s2(1p)2(BA)2.2(1-p)(C-A)(A-B)\leq s_{1}(C-A)^{2}+s_{2}(1-p)^{2}(B-A)^{2}. (20)

We derive the above by subtracting (CA)2(C-A)^{2} and (1p)2(BA)2(1-p)^{2}(B-A)^{2} from both sides of equation (19) and rearranging.

A simple application of the AM-GM inequality implies that

2s1s2(1p)(CA)(AB)s1(CA)2+s2(1p)2(BA)2.2\sqrt{s_{1}s_{2}}(1-p)(C-A)(A-B)\leq s_{1}(C-A)^{2}+s_{2}(1-p)^{2}(B-A)^{2}.

Then, equation (20) follows from the observation that the left-hand side of the above inequality is at least 2(1p)(CA)(AB)2(1-p)(C-A)(A-B), i.e., since s1s21s_{1}s_{2}\geq 1. ∎

We can now prove the main lemma.

Proof of Lemma 3.1.

Our goal is to prove eq. 17, let us recall its statement:

F(λu)μ(Varu(f))(F(λv)1)μ(Varv(f))+F(λu)μ(Varu(f)).F(\lambda_{u})\mu(\mathrm{Var}_{u}(f))\leq(F(\lambda^{\prime}_{v})-1)\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime}))+F(\lambda^{\prime}_{u})\mu^{\prime}(\mathrm{Var}_{u}(f^{\prime})). (17)

We will consider each of these local variances μ(Varu(f))\mu(\mathrm{Var}_{u}(f)), μ(Varv(f))\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime})), and μ(Varu(f))\mu^{\prime}(\mathrm{Var}_{u}(f^{\prime})). We will express each of them as a sum over independent sets SS of VV^{\prime}. We can then establish eq. 17 in a pointwise manner by considering the corresponding inequality for each independent set SS.

Let us begin by looking at the general definition of the expected local variance μ(Varx(f))\mu^{\prime}(\mathrm{Var}_{x}(f^{\prime})) for any xVx\in V^{\prime}. Applying the definition in eq. 5 and then simplifying we obtain the following (a reader familar with the notation can apply eq. 7 to skip directly to the last line):

μ(Varx(f))\displaystyle\mu^{\prime}(\mathrm{Var}_{x}(f^{\prime}))
=SV{x}μV{x}(S)VarxS(fS)\displaystyle=\sum_{S\subseteq V^{\prime}\setminus\{x\}}\mu^{\prime}_{V^{\prime}\setminus\{x\}}(S)\cdot\mathrm{Var}_{x}^{S}(f_{S})
=SV{x}(T{x}μ(ST))(12T,U{x},TUμxS(T)μxS(U)(f(ST)f(SU))2)\displaystyle=\sum_{S\subseteq V^{\prime}\setminus\{x\}}\left(\sum_{T\subseteq\{x\}}\mu^{\prime}(S\cup T)\right)\left(\frac{1}{2}\sum_{T,U\subseteq\{x\},T\neq U}\mu_{x}^{{}^{\prime}S}(T)\mu_{x}^{{}^{\prime}S}(U)(f^{\prime}(S\cup T)-f^{\prime}(S\cup U))^{2}\right)
=SV{x}(T{x}μ(ST))(μxS(x)μxS()(f(S)f(S{x}))2)\displaystyle=\sum_{S\subseteq V^{\prime}\setminus\{x\}}\left(\sum_{T\subseteq\{x\}}\mu^{\prime}(S\cup T)\right)\left(\mu_{x}^{{}^{\prime}S}(x)\mu_{x}^{{}^{\prime}S}(\emptyset)(f^{\prime}(S)-f^{\prime}(S\cup\{x\}))^{2}\right)
=SV{x}(μ(S)+μ(S{x}))μ(S)μ(S{x})(μ(S)+μ(S{x}))2(f(S)f(S{x}))2.\displaystyle=\sum_{S\subseteq V^{\prime}\setminus\{x\}}\Big{(}\mu^{\prime}(S)+\mu^{\prime}(S\cup\{x\})\Big{)}\frac{\mu^{\prime}(S)\mu^{\prime}(S\cup\{x\})}{(\mu^{\prime}(S)+\mu^{\prime}(S\cup\{x\}))^{2}}\Big{(}f^{\prime}(S)-f^{\prime}(S\cup\{x\})\Big{)}^{2}. (21)

Notice in eq. 21 that the only SV{x}S\subset V^{\prime}\setminus\{x\} which contribute are those where xx is unblocked (i.e., no neighbor of xx is included in the independent set SS) because we need that SS and S{x}S\cup\{x\} are both independent sets and hence have positive measure in μ\mu^{\prime}.

Let us now consider each of the local variances appearing in eq. 17 (expressed using carefully chosen summations that will allow us to prove (17) term-by-term in terms of SS).

Let Q1:=μ(Varu(f))Q_{1}:=\mu(\mathrm{Var}_{u}(f)) denote the expected local variance of ff at uu. We will use (21) (applied to TT instead of TT^{\prime}); note that only SS where uu is unblocked (that is, when no neighbor of uu is occupied) contribute to the local variance. Moreover, we can express the expected local variance of ff at uu in terms of only those SS where uSu\in S. In particular, consider an independent set SS^{\prime} where uSu\notin S^{\prime}. Note that if uu is blocked (i.e., N(u)SN(u)\cap S^{\prime}\neq\emptyset) then the local variance at uu is zero for this term. And for those SS^{\prime} where uSu\notin S^{\prime} and uu is unblocked then the local variance has the same contribution as S=S{u}S=S^{\prime}\cup\{u\} times 1/λu1/\lambda_{u} (since μ(S{u})=λuμ(S)\mu(S^{\prime}\cup\{u\})=\lambda_{u}\mu(S^{\prime})). Hence the expected local variance of ff at uu is given by

Q1:=μ(Varu(f))=SV;uSμ(S)(1+1λu)λu1+λu11+λu(f(S{u})f(S))2.\begin{split}Q_{1}:=\mu(\mathrm{Var}_{u}(f))=\sum_{S\subseteq V;u\in S}\mu(S)\left(1+\frac{1}{\lambda_{u}}\right)\frac{\lambda_{u}}{1+\lambda_{u}}\frac{1}{1+\lambda_{u}}\left(f(S\setminus\{u\})-f(S)\right)^{2}.\end{split}

We have f(S)=f(S)f(S)=f^{\prime}(S) (since uSu\in S) and f(S{u})=11+λvf(S{u})λv1+λvf(S{u}{v})f(S\setminus\{u\})=\frac{1}{1+\lambda^{\prime}_{v}}f^{\prime}(S\setminus\{u\})-\frac{\lambda^{\prime}_{v}}{1+\lambda^{\prime}_{v}}f^{\prime}(S\setminus\{u\}\cup\{v\}). Plugging these in and simplifying we obtain the following:

Q1=1+λv1+λu+λvSV;uSμ(S)(f(S)11+λvf(Su)λv1+λvf(Su+v))2.\begin{split}Q_{1}=\frac{1+\lambda^{\prime}_{v}}{1+\lambda^{\prime}_{u}+\lambda^{\prime}_{v}}\sum_{S\subseteq V;u\in S}\mu(S)\left(f^{\prime}(S)-\frac{1}{1+\lambda^{\prime}_{v}}f^{\prime}(S-u)-\frac{\lambda^{\prime}_{v}}{1+\lambda^{\prime}_{v}}f^{\prime}(S-u+v)\right)^{2}.\end{split} (22)

We now consider Q2:=μ(Varu(f))Q_{2}:=\mu^{\prime}(\mathrm{Var}_{u}(f^{\prime})). As we did for Q1Q_{1} we can express Q2Q_{2} as a sum over indpendent sets SS where uSu\in S. In this case for an independent set SS where uSu\in S, consider S=S{u}S^{\prime}=S\setminus\{u\} and note that the following hold

μ(S)=μ(S)11+λv=μ(S)1λu11+λv,\mu^{\prime}(S^{\prime})=\mu(S^{\prime})\frac{1}{1+\lambda^{\prime}_{v}}=\mu(S)\frac{1}{\lambda^{\prime}_{u}}\frac{1}{1+\lambda^{\prime}_{v}},

and

μ(S)=μ(S).\mu^{\prime}(S)=\mu(S).

Hence, we have the following:

Q2:=μ(Varu(f))\displaystyle Q_{2}:=\mu^{\prime}(\mathrm{Var}_{u}(f^{\prime})) =SV;uSμ(S)(1+1λu11+λv)λu1+λu11+λu(f(Su)f(S))2\displaystyle=\sum_{S\subseteq V;u\in S}\mu(S)\left(1+\frac{1}{\lambda^{\prime}_{u}}\frac{1}{1+\lambda^{\prime}_{v}}\right)\frac{\lambda^{\prime}_{u}}{1+\lambda^{\prime}_{u}}\frac{1}{1+\lambda^{\prime}_{u}}\left(f^{\prime}(S-u)-f^{\prime}(S)\right)^{2}
=(λu+11+λv)1(1+λu)2SV;uSμ(S)(f(S)f(Su))2.\displaystyle=\left(\lambda^{\prime}_{u}+\frac{1}{1+\lambda^{\prime}_{v}}\right)\frac{1}{(1+\lambda^{\prime}_{u})^{2}}\sum_{S\subseteq V;u\in S}\mu(S)\left(f^{\prime}(S)-f^{\prime}(S-u)\right)^{2}. (23)

Finally, we consider μ(Varv(f))\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime})), the expected local variance of ff^{\prime} at vv. We will establish a lower bound which we will denote by Q3Q_{3} (note, Q1Q_{1} and Q2Q_{2} were identities but here we will have an inequality).

To compute μ(Varv(f))\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime})), the expected local variance of ff^{\prime} at vv, we need to generate an independent set SS^{\prime} from μ\mu^{\prime}. Only those SS^{\prime} where vv is unblocked (that is where uu is not in SS^{\prime}) contribute to the local variance. We can generate SS^{\prime} by generating SS from μ\mu (whether we add or do not add vv does not change the contribution to the local variance). As in eq. 21, we obtain the following:

μ(Varv(f))\displaystyle\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime})) =SV;uSμ(S)11+λvλv1+λv(f(S{v})f(S))2\displaystyle=\sum_{S^{\prime}\subseteq V;u\not\in S^{\prime}}\mu(S^{\prime})\frac{1}{1+\lambda^{\prime}_{v}}\frac{\lambda^{\prime}_{v}}{1+\lambda^{\prime}_{v}}\left(f^{\prime}(S^{\prime}\cup\{v\})-f^{\prime}(S^{\prime})\right)^{2}
SV;uSu is unblockedμ(S)11+λvλv1+λv(f(S{v})f(S))2\displaystyle\geq\sum_{\begin{subarray}{c}S^{\prime}\subseteq V;u\not\in S^{\prime}\\ \textrm{$u$ is unblocked}\end{subarray}}\mu(S^{\prime})\frac{1}{1+\lambda^{\prime}_{v}}\frac{\lambda^{\prime}_{v}}{1+\lambda^{\prime}_{v}}\left(f^{\prime}(S^{\prime}\cup\{v\})-f^{\prime}(S^{\prime})\right)^{2}
=SV;uSμ(S)1λu11+λvλv1+λv(f(S{v}{u})f(S{u}))2.\displaystyle=\sum_{S\subseteq V;u\in S}\mu(S)\frac{1}{\lambda^{\prime}_{u}}\frac{1}{1+\lambda^{\prime}_{v}}\frac{\lambda^{\prime}_{v}}{1+\lambda^{\prime}_{v}}\left(f^{\prime}(S\cup\{v\}\setminus\{u\})-f^{\prime}(S\setminus\{u\})\right)^{2}.

Let Q3Q_{3} denote the lower bound we obtained above:

Q3:=1λu11+λvλv1+λvSV;uSμ(S)(f(S{v}{u})f(S{u}))2μ(Varv(f)).Q_{3}:=\frac{1}{\lambda^{\prime}_{u}}\frac{1}{1+\lambda^{\prime}_{v}}\frac{\lambda^{\prime}_{v}}{1+\lambda^{\prime}_{v}}\sum_{S\subseteq V;u\in S}\mu(S)\left(f^{\prime}(S\cup\{v\}\setminus\{u\})-f^{\prime}(S\setminus\{u\})\right)^{2}\geq\mu^{\prime}(\mathrm{Var}_{v}(f^{\prime})). (24)

Plugging in (22), (23), (24) we obtain that eq. 17 follows from the following inequality:

F(λu)Q1(F(λv)1)Q3+F(λu)Q2.F(\lambda_{u})Q_{1}\leq(F(\lambda^{\prime}_{v})-1)Q_{3}+F(\lambda^{\prime}_{u})Q_{2}. (25)

We will establish (25) term-by-term, that is, for each SS in the sums of (22), (23), (24). Fix SVS\subseteq V such that uSu\in S and let A=f(Su)A=f^{\prime}(S-u), B=f(Su+v)B=f^{\prime}(S-u+v), and C=f(S)C=f^{\prime}(S). We need to show

1+λv1+λu+λv(C11+λvAλv1+λvB)2F(λu1+λv)1+λv+λu1+λv1(1+λu)2(CA)2F(λu)+1λu(1+λv)2(BA)2(F(λv)1).\begin{split}\frac{1+\lambda^{\prime}_{v}}{1+\lambda^{\prime}_{u}+\lambda^{\prime}_{v}}\left(C-\frac{1}{1+\lambda^{\prime}_{v}}A-\frac{\lambda^{\prime}_{v}}{1+\lambda^{\prime}_{v}}B\right)^{2}F\left(\frac{\lambda^{\prime}_{u}}{1+\lambda^{\prime}_{v}}\right)\hskip 108.405pt\\ \leq\frac{1+\lambda^{\prime}_{v}+\lambda^{\prime}_{u}}{1+\lambda^{\prime}_{v}}\frac{1}{(1+\lambda^{\prime}_{u})^{2}}\left(C-A\right)^{2}F(\lambda^{\prime}_{u})+\frac{1}{\lambda^{\prime}_{u}(1+\lambda^{\prime}_{v})^{2}}(B-A)^{2}\left(F(\lambda^{\prime}_{v})-1\right).\end{split} (26)

Let p=1/(1+λv)p=1/(1+\lambda^{\prime}_{v}). We will match (19) to (26), by first dividing both sides of (26) by 1+λv1+λu+λvF(λu1+λv)\frac{1+\lambda^{\prime}_{v}}{1+\lambda^{\prime}_{u}+\lambda^{\prime}_{v}}F\left(\frac{\lambda^{\prime}_{u}}{1+\lambda^{\prime}_{v}}\right) and then choosing

1+s1=(1+λu+λv(1+λv)(1+λu))2F(λu)F(λu1+λv)and1+s2=1+λu+λvλu(1+λv)λv2F(λv)1F(λu1+λv).1+s_{1}=\left(\frac{1+\lambda^{\prime}_{u}+\lambda^{\prime}_{v}}{(1+\lambda^{\prime}_{v})(1+\lambda^{\prime}_{u})}\right)^{2}\cdot\frac{F(\lambda^{\prime}_{u})}{F\left(\frac{\lambda^{\prime}_{u}}{1+\lambda^{\prime}_{v}}\right)}\quad\mbox{and}\quad 1+s_{2}=\frac{1+\lambda^{\prime}_{u}+\lambda^{\prime}_{v}}{\lambda^{\prime}_{u}(1+\lambda^{\prime}_{v}){\lambda^{\prime}_{v}}^{2}}\cdot\frac{F(\lambda^{\prime}_{v})-1}{F\left(\frac{\lambda^{\prime}_{u}}{1+\lambda^{\prime}_{v}}\right)}.

Note that with this choice of s1s_{1} and s2s_{2} equations (19) and (26) are equivalent, and hence to prove (26) it is enough to show s1s21s_{1}s_{2}\geq 1.

Claim 3.3.

s1s21.s_{1}s_{2}\geq 1.

This completes the proof of the lemma. ∎

We use the following lemma to prove Claim 3.3

Lemma 3.4.

Let α=1.1\alpha=1.1 and β=1.3\beta=1.3. Suppose x,y(0,α]x,y\in(0,\alpha] are such that (1+x)y[0,α](1+x)y\in[0,\alpha]. Then

(exp(βxy)1)((1+x)(1+y)yx2exp(β(xy))1)1.1.\left(\exp(\beta xy)-1\right)\left(\frac{(1+x)}{(1+y)yx^{2}}\exp(\beta(x-y))-1\right)\geq 1.1. (27)
Proof.

We will show that for x,y,(1+x)y(0,α]x,y,(1+x)y\in(0,\alpha] we have

(1+x)(1+y)xexp(β(xy))1.15,\frac{(1+x)}{(1+y)x}\exp(\beta(x-y))\geq 1.15, (28)

and

(exp(βxy)1)(1.15xy1)1.1.\left(\exp(\beta xy)-1\right)\left(\frac{1.15}{xy}-1\right)\geq 1.1. (29)

To see that (28) and (29) imply (27) note

(exp(βxy)1)((1+x)(1+y)yx2exp(β(xy))1)(exp(βxy)1)(1.15yx1)1.1,\left(\exp(\beta xy)-1\right)\left(\frac{(1+x)}{(1+y)yx^{2}}\exp(\beta(x-y))-1\right)\geq\left(\exp(\beta xy)-1\right)\left(\frac{1.15}{yx}-1\right)\geq 1.1,

where the first inequality follows from (28) and the second from (29).

Note that the constraints on x,yx,y imply that y+xyαy+xy\leq\alpha and xyαyxy\leq\alpha y. Hence xyα/(1+1/α)xy\leq\alpha/(1+1/\alpha). To prove (29) it is sufficient to show for z[0,α/(1+1/α)]z\in[0,\alpha/(1+1/\alpha)]

(exp(βz)1)(1.15z)1.1z0.\left(\exp(\beta z)-1\right)\left(1.15-z\right)-1.1z\geq 0. (30)

We have

2z2((exp(βz)1)(1.15z)1.1z)=exp(βz)β(1.15ββx2)<0.\frac{\partial^{2}}{\partial z^{2}}\Big{(}\left(\exp(\beta z)-1\right)\left(1.15-z\right)-1.1z\Big{)}=\exp(\beta z)\beta\left(1.15\beta-\beta x-2\right)<0.

Hence (exp(βz)1)(1.15z)1.1z\left(\exp(\beta z)-1\right)\left(1.15-z\right)-1.1z is concave and we only need to check (30) for the endpoints of the interval; for z=0z=0 LHS of (30) is zero, for z=α/(1+1/α)z=\alpha/(1+1/\alpha) the LHS of (30) has value larger than 0.0050.005. This concludes the proof of (29).

To prove (28) note that

y(1+x)exp(β(xy))1.15(1+y)x=(1+x)βexp(β(xy))1.15x<0.\frac{\partial}{\partial y}(1+x)\exp(\beta(x-y))-1.15(1+y)x=-(1+x)\beta\exp(\beta(x-y))-1.15x<0. (31)

Hence we only need to prove (28) for y=α/(1+x)y=\alpha/(1+x); this simplifies to showing

exp(1.3(x1.1/(1+x)))(1.15(2.1+x)x(1+x)2).\exp(1.3(x-1.1/(1+x)))\geq\left(\frac{1.15(2.1+x)x}{(1+x)^{2}}\right). (32)

For x=0x=0 and x=1.1x=1.1 we have that (32) is satisfied (using interval arithmetic). Let

Q(x):=1.3(x1.1/(1+x))ln(1.15(2.1+x)x(1+x)2).Q(x):=1.3(x-1.1/(1+x))-\ln\left(\frac{1.15(2.1+x)x}{(1+x)^{2}}\right).

The critical points of Q(x)Q(x) are roots of

1330x4+5330x3+8290x2+3733x2100.1330x^{4}+5330x^{3}+8290x^{2}+3733x-2100.

Since this polynomial is increasing when xx is non-negative, it has exactly one positive real root, which is in the interval [0.30765554,0.30765555][0.30765554,0.30765555]. The value of Q(x)Q(x) on both endpoints of the interval is at least 0.00320.0032. The derivative of Q(x)Q(x) (a rational function) is bounded in absolute value by 11 on the interval and hence Q(x)>0Q(x)>0 on the entire interval (in particular at the critical point). This proves (32). ∎

We can now complete the proof of Claim 3.3.

Proof of Claim 3.3.

We will use the following substitution to simplify the expression F(x)=(1+x)2H(x)F(x)=(1+x)^{2}H(x). Note that H(x)=1000exp(1.3x)=1000exp(βx)H(x)=1000\exp(1.3x)=1000\exp(\beta x). In terms of H(x)H(x) we have

1+s1=H(λu)H(λu1+λv)and1+s2=1+λv(1+λu+λv)λu(1+λv)2H(λv)1H(λu1+λv).1+s_{1}=\frac{H(\lambda^{\prime}_{u})}{H\left(\frac{\lambda^{\prime}_{u}}{1+\lambda^{\prime}_{v}}\right)}\quad\mbox{and}\quad 1+s_{2}=\frac{1+\lambda^{\prime}_{v}}{(1+\lambda^{\prime}_{u}+\lambda^{\prime}_{v})\lambda^{\prime}_{u}}\cdot\frac{(1+\lambda^{\prime}_{v})^{2}H(\lambda^{\prime}_{v})-1}{H\left(\frac{\lambda^{\prime}_{u}}{1+\lambda^{\prime}_{v}}\right)}.

Let λv=x\lambda^{\prime}_{v}=x and λu=y(1+x)\lambda^{\prime}_{u}=y(1+x). We have

1+s1=H(y(1+x))H(y)and1+s2=1+xy(1+y)x2H(x)1(1+x)2H(y).1+s_{1}=\frac{H(y(1+x))}{H(y)}\quad\mbox{and}\quad 1+s_{2}=\frac{1+x}{y(1+y)x^{2}}\cdot\frac{H(x)-\frac{1}{(1+x)^{2}}}{H(y)}.

Recall, our goal is to show s1s21s_{1}s_{2}\geq 1. First we show that for x,y(0,1.1]x,y\in(0,1.1] such that (1+x)y1.1(1+x)y\leq 1.1 we have

s2=1+xy(1+y)x2H(x)1(1+x)2H(y)19991000(1+xy(1+y)x2H(x)H(y)1).s_{2}=\frac{1+x}{y(1+y)x^{2}}\cdot\frac{H(x)-\frac{1}{(1+x)^{2}}}{H(y)}-1\geq\frac{999}{1000}\left(\frac{1+x}{y(1+y)x^{2}}\cdot\frac{H(x)}{H(y)}-1\right). (33)

Note that (33) is equivalent to

1+xx2exp(1.3x)1x2(1+x)y(1+y)exp(1.3y).\frac{1+x}{x^{2}}\exp(1.3x)-\frac{1}{x^{2}(1+x)}\geq y(1+y)\exp(1.3y). (34)

Note that the RHS of (34) is increasing in yy and hence it is sufficient to show (34) for the maximal value of y=1.1/(1+x)y=1.1/(1+x); this is equivalent to

(1+x)3exp(1.3x)(1+x)1.1x2(2.1+x)exp(1.43/(1+x)),(1+x)^{3}\exp(1.3x)-(1+x)\geq 1.1x^{2}(2.1+x)\exp(1.43/(1+x)),

the last inequality follows from

(1+x)3(1+1.3x+1.322x2+1.336x3+1.3424x4)(1+x)1.1x2(2.1+x)exp(1.43),(1+x)^{3}\left(1+1.3x+\frac{1.3^{2}}{2}x^{2}+\frac{1.3^{3}}{6}x^{3}+\frac{1.3^{4}}{24}x^{4}\right)-(1+x)\geq 1.1x^{2}(2.1+x)\exp(1.43),

checked using Sturm sequences. This concludes the proof of (33).

Note that Lemma 3.4 is equivalent to the following

s1((1+x)y(1+y)x2H(x)H(y)1)=(exp(βxy)1)((1+x)(1+y)yx2exp(β(xy))1)1.1,s_{1}\left(\frac{(1+x)}{y(1+y)x^{2}}\frac{H(x)}{H(y)}-1\right)=\left(\exp(\beta xy)-1\right)\left(\frac{(1+x)}{(1+y)yx^{2}}\exp(\beta(x-y))-1\right)\geq 1.1,

and hence

s11.1((1+x)y(1+y)x2H(x)H(y)1)1,s_{1}\geq 1.1\left(\frac{(1+x)}{y(1+y)x^{2}}\frac{H(x)}{H(y)}-1\right)^{-1},

which combined with (33) yields s1s21.1(999/1000)>1s_{1}s_{2}\geq 1.1(999/1000)>1, concluding the proof. ∎

4 Spectral Independence for Trees

Let G=(V,E)G=(V,E) be a graph. Suppose we have a set of fugacities λi\lambda_{i} for each iVi\in V, and consider the corresponding Gibbs distribution. Recall that the influence matrix G{\mathcal{I}}_{G} has u,vu,v entry Inf(uv){\mbox{Inf}}(u\rightarrow v), where

Inf(uv)=μ(vI|uI)μ(vI|uI).{\mbox{Inf}}(u\rightarrow v)=\mu(v\in I|u\in I)-\mu(v\in I|u\not\in I).

Let u,vu,v be any two vertices in a graph. We have

|Inf(vu)|max{μ(uI|vI),μ(uI|vI)}λu1+λu.\big{|}{\mbox{Inf}}(v\rightarrow u)\big{|}\leq\max\{\mu(u\in I|v\in I),\mu(u\in I|v\not\in I)\}\leq\frac{\lambda_{u}}{1+\lambda_{u}}. (35)

We have

Inf(vu)μ(vI)μ(vI)=Inf(uv)μ(uI)μ(uI).{\mbox{Inf}}(v\rightarrow u)\mu(v\in I)\mu(v\not\in I)={\mbox{Inf}}(u\rightarrow v)\mu(u\in I)\mu(u\not\in I). (36)

Let DD be diagonal matrix with entries μ(vI)μ(vI)\mu(v\in I)\mu(v\not\in I). Equation (36) means that DGD{\mathcal{I}}_{G} is symmetric. That also means that D1/2GD1/2D^{1/2}{\mathcal{I}}_{G}D^{-1/2} is symmetric (since it is obtained from DGD{\mathcal{I}}_{G} by multiplying by the same diagonal matrix on the left and on the right). The u,vu,v entry in D1/2GD1/2D^{1/2}{\mathcal{I}}_{G}D^{-1/2} is

Inf(uv)(μ(vI)μ(vI))1/2(μ(uI)μ(uI))1/2=Inf(vu)(μ(uI)μ(uI))1/2(μ(vI)μ(vI))1/2,\begin{split}{\mbox{Inf}}(u\rightarrow v)\left(\mu(v\in I)\mu(v\not\in I)\right)^{-1/2}\left(\mu(u\in I)\mu(u\not\in I)\right)^{1/2}=\\ {\mbox{Inf}}(v\rightarrow u)\left(\mu(u\in I)\mu(u\not\in I)\right)^{-1/2}\left(\mu(v\in I)\mu(v\not\in I)\right)^{1/2},\end{split} (37)

which is equal to ±Inf(uv)Inf(vu)\pm\sqrt{{\mbox{Inf}}(u\rightarrow v){\mbox{Inf}}(v\rightarrow u)} (take geometric mean of the sides of (37)). We will call M=D1/2GD1/2M=D^{1/2}{\mathcal{I}}_{G}D^{-1/2} the symmetrized influence matrix (since it is similar to the influence matrix, it has the same spectral radius).

We will prove the following result.

Lemma 4.1.

For any forest TT with fugacities in [0,1.3][0,1.3] the spectral radius of the influence matrix of the hard-core model on TT is bounded by 10000.

We will prove Lemma 4.1 by induction; we will prove the following strengthened statement.

Lemma 4.2.

For any forest TT with fugacities in [0,1.3][0,1.3] the symmetrized influence matrix MM of the hard-core model on TT satisfies

Mdiag(4001.3λv0.78,vV).M\preceq{\mathrm{diag}}\Big{(}\frac{400}{1.3-\lambda_{v}^{0.78}},\ v\in V\Big{)}.
Proof.

Let T=(V,E)T=(V,E) be a forest. Let vv be a vertex of degree 11 in TT. Let uu be the neighbor of vv in TT. Let T=(V,E)T^{\prime}=(V^{\prime},E^{\prime}) be TT with vv removed. Let λi=λi\lambda^{\prime}_{i}=\lambda_{i} for iV{u}i\in V^{\prime}\setminus\{u\}. Let λu=λu/(1+λv)\lambda^{\prime}_{u}=\lambda_{u}/(1+\lambda_{v}). Let μ\mu^{\prime} be the hard-core model on TT^{\prime} with fugacities {λi}iV\{\lambda^{\prime}_{i}\}_{i\in V^{\prime}}. Note that μ\mu^{\prime} is the same as the marginalization of μ\mu to VV^{\prime}, that is, for an independent set II of TT^{\prime} we have

μ(I)=JIμ(J),\mu^{\prime}(I)=\sum_{J\supseteq I}\mu(J), (38)

where JJ ranges over independent sets of TT that contain II.

Let MM be the symmetrized influence matrix for μ\mu and let MM^{\prime} be the symmetrized influence matrix for μ\mu^{\prime}. Note that (38) implies that MM^{\prime} is a submatrix of MM (removing the column and row of MM corresponding to vertex vv yields MM^{\prime}).

It is standard to show, e.g., see [ALO21, Lemma B.3], that

Inf(vw)=Inf(vu)Inf(uw),{\mbox{Inf}}(v\rightarrow w)={\mbox{Inf}}(v\rightarrow u){\mbox{Inf}}(u\rightarrow w), (39)

and

Inf(wv)=Inf(wu)Inf(uv).{\mbox{Inf}}(w\rightarrow v)={\mbox{Inf}}(w\rightarrow u){\mbox{Inf}}(u\rightarrow v). (40)

Let τ2=Inf(vu)Inf(uv)\tau^{2}={\mbox{Inf}}(v\rightarrow u){\mbox{Inf}}(u\rightarrow v). Note that, using (35), we have

τ2=Inf(uv)Inf(vu)λv1+λvλu1+λu.\begin{split}\tau^{2}={\mbox{Inf}}(u\rightarrow v){\mbox{Inf}}(v\rightarrow u)\leq\frac{\lambda_{v}}{1+\lambda_{v}}\frac{\lambda_{u}}{1+\lambda_{u}}.\end{split} (41)

From (39) and (40) we have that MM and MM^{\prime} have the following form (QQ is a matrix and zz is a vector):

M=(QzzT0)andM=(QzτzzT0ττzTτ0).M^{\prime}=\left(\begin{array}[]{cc}Q&z\\ z^{T}&0\\ \end{array}\right)\quad\mbox{and}\quad M=\left(\begin{array}[]{ccc}Q&z&\tau z\\ z^{T}&0&\tau\\ \tau z^{T}&\tau&0\\ \end{array}\right).

Let FF be an increasing function on [0,1.3][0,1.3]. We will take F(x)=1/H(x)F(x)=1/H(x) where H(x)=(axb)/400H(x)=(a-x^{b})/400 and a=1.3a=1.3, b=0.78b=0.78. (We will keep aa and bb as variables to elucidate the choice of aa and bb in the proof below.) Suppose that we know Mdiag(F(λi))M^{\prime}\preceq{\mathrm{diag}}(F(\lambda^{\prime}_{i})) and we want to conclude Mdiag(F(λi))M\preceq{\mathrm{diag}}(F(\lambda_{i})). Let WW be a diagonal matrix with entries F(λi)F(\lambda_{i}) for iV{u,v}i\in V\setminus\{u,v\}. We can restate our goal as follows.

KNOW:(WQzzTF(λu1+λv))0WANT:(WQzτzzTF(λu)ττzTτF(λv))0.\mbox{KNOW:}\left(\begin{array}[]{cc}W-Q&-z\\ -z^{T}&F\Big{(}\frac{\lambda_{u}}{1+\lambda_{v}}\Big{)}\\ \end{array}\right)\succeq 0\quad\quad\mbox{WANT:}\left(\begin{array}[]{ccc}W-Q&-z&-\tau z\\ -z^{T}&F(\lambda_{u})&-\tau\\ -\tau z^{T}&-\tau&F(\lambda_{v})\\ \end{array}\right)\succeq 0.

Since FF will be an increasing function, we have F(λu/(1+λv))<F(λu)F(\lambda_{u}/(1+\lambda_{v}))<F(\lambda_{u}).

The condition we want is equivalent (by applying row and column operations, specifically subtracting uu-th row/column times τ\tau from the vv-th row/column) to

(WQz0zTF(λu)τ(1+F(λu))0τ(1+F(λu))F(λv)+2τ2+τ2F(λu))0.\left(\begin{array}[]{ccc}W-Q&-z&0\\ -z^{T}&F(\lambda_{u})&-\tau(1+F(\lambda_{u}))\\ 0&-\tau(1+F(\lambda_{u}))&F(\lambda_{v})+2\tau^{2}+\tau^{2}F(\lambda_{u})\\ \end{array}\right)\succeq 0.

If we show

(F(λu)F(λu/(1+λv))τ(1+F(λu))τ(1+F(λu))F(λv)+2τ2+τ2F(λu))0,\left(\begin{array}[]{cc}F(\lambda_{u})-F(\lambda_{u}/(1+\lambda_{v}))&-\tau(1+F(\lambda_{u}))\\ -\tau(1+F(\lambda_{u}))&F(\lambda_{v})+2\tau^{2}+\tau^{2}F(\lambda_{u})\\ \end{array}\right)\succeq 0, (42)

then the conclusion will follow (adding the “know” positive semidefinite matrix and (42) (expanded with zeros to match dimensions) yields that the “want” matrix is positive semidefinite). Equation (42) is equivalent to checking if the determinant is positive (since the entry in the first row/column is positive), that is, we need to check

(F(λu)F(λu/(1+λv)))F(λv)>τ2(1+F(λu/(1+λv))(2+F(λu))).\left(F(\lambda_{u})-F(\lambda_{u}/(1+\lambda_{v}))\right)F(\lambda_{v})>\tau^{2}\left(1+F(\lambda_{u}/(1+\lambda_{v}))(2+F(\lambda_{u}))\right).

Hence it is sufficient (using (41)) to show

(F(λu)F(λu1+λv))F(λv)>λv1+λvλu1+λu(1+F(λu1+λv)(2+F(λu))).\left(F(\lambda_{u})-F\Big{(}\frac{\lambda_{u}}{1+\lambda_{v}}\Big{)}\right)F(\lambda_{v})>\frac{\lambda_{v}}{1+\lambda_{v}}\frac{\lambda_{u}}{1+\lambda_{u}}\left(1+F\Big{(}\frac{\lambda_{u}}{1+\lambda_{v}}\Big{)}(2+F(\lambda_{u}))\right).

Letting H(x)=1/F(x)H(x)=1/F(x) the condition becomes the following

(H(λu1+λv)H(λu))>H(λv)λv1+λvλu1+λu(H(λu1+λv)H(λu)+2H(λu)+1).\left(H\Big{(}\frac{\lambda_{u}}{1+\lambda_{v}}\Big{)}-H(\lambda_{u})\right)>H(\lambda_{v})\frac{\lambda_{v}}{1+\lambda_{v}}\frac{\lambda_{u}}{1+\lambda_{u}}\left(H\Big{(}\frac{\lambda_{u}}{1+\lambda_{v}}\Big{)}H(\lambda_{u})+2H(\lambda_{u})+1\right). (43)

We are going to search for HH that is 1) bounded from above by 1/3001/300, 2) bounded away from 0 on [0,1.3][0,1.3] and 3) that satisfies

(H(λu1+λv)H(λu))>H(λv)λv1+λvλu1+λu(1+1/100).\left(H\Big{(}\frac{\lambda_{u}}{1+\lambda_{v}}\Big{)}-H(\lambda_{u})\right)>H(\lambda_{v})\frac{\lambda_{v}}{1+\lambda_{v}}\frac{\lambda_{u}}{1+\lambda_{u}}(1+1/100). (44)

Note that such HH will also satisfy (43) since H(λu1+λv)H(λu)+2H(λu)1/100H\Big{(}\frac{\lambda_{u}}{1+\lambda_{v}}\Big{)}H(\lambda_{u})+2H(\lambda_{u})\leq 1/100 (here the choice of 1/1001/100 is arbitrary; we just need something sufficiently small).

We will search for HH of the following form: H(x)axbH(x)\propto a-x^{b}. Note that (44) is invariant under scaling of HH and hence satisfying the upper bound of 1/3001/300 can be achieved by scaling of HH. Ultimately the price we will pay for HH being small is that FF is big and hence we get a weaker upper bound on the spectral radius of MM; we do not optimize the constants at this point. Consider the following function LL (obtained as LHS of (44) divided by RHS of (44), excluding the constant term 1+1/1001+1/100):

L(λu,λv):=λub(1+λu)λu(1+λv)b1(aλvb)λv(1+λv)b1.L(\lambda_{u},\lambda_{v}):=\frac{\lambda_{u}^{b}(1+\lambda_{u})}{\lambda_{u}}\frac{(1+\lambda_{v})^{b}-1}{(a-\lambda_{v}^{b})\lambda_{v}(1+\lambda_{v})^{b-1}}. (45)

The minimum of the first term in (45) is attained for λu=(1b)/b\lambda_{u}=(1-b)/b and hence setting b=.78b=.78 we have that:

λub(1+λu)λu(1b)b1bb1.32.78.\frac{\lambda_{u}^{b}(1+\lambda_{u})}{\lambda_{u}}\geq\frac{(1-b)^{b-1}}{b^{b}}\geq\frac{1.32}{.78}. (46)

For the second term in (45) by setting a=1.3a=1.3 we have the following:

(1+λv)b1(aλvb)λv(1+λv)b1(1+λv)b1aλv(1+λv)b1ba=.781.3.\frac{(1+\lambda_{v})^{b}-1}{(a-\lambda_{v}^{b})\lambda_{v}(1+\lambda_{v})^{b-1}}\geq\frac{(1+\lambda_{v})^{b}-1}{a\lambda_{v}(1+\lambda_{v})^{b-1}}\geq\frac{b}{a}=\frac{.78}{1.3}. (47)

The second inequality in (47) is obtained from the following inequality which is valid for b[0,1]b\in[0,1] and x0x\geq 0:

(1+x)b1bx(1+x)b1.(1+x)^{b}-1\geq bx(1+x)^{b-1}. (48)

For the above, it suffices to prove that

(1+x)bbx(1+x)b11.\displaystyle(1+x)^{b}-bx(1+x)^{b-1}\geq 1. (49)

We have that for b1b\leq 1:

(1+x)bbx(1+x)b1\displaystyle(1+x)^{b}-bx(1+x)^{b-1} =(1+x)b1(1(b1)x)\displaystyle=(1+x)^{b-1}(1-(b-1)x)
(1+x)b1(1+x)(b1)\displaystyle\geq(1+x)^{b-1}(1+x)^{-(b-1)} by Bernoulli’s inequality
1.\displaystyle\geq 1.

Finally, plugging in (46) and (47) into (45) we obtain:

L(λu,λv)1.321.3>1.01.L(\lambda_{u},\lambda_{v})\geq\frac{1.32}{1.3}>1.01. (50)

Equation (50) implies that (44) is satisfied. Recall that the statement of the lemma assumed that the fugacities are in the interval [0,1.3][0,1.3]. For λ[0,1.3]\lambda\in[0,1.3] we have

H(λ)=aλb400110000andH(λ)=aλb4001300.H(\lambda)=\frac{a-\lambda^{b}}{400}\geq\frac{1}{10000}\quad\mbox{and}\quad H(\lambda)=\frac{a-\lambda^{b}}{400}\leq\frac{1}{300}. (51)

This implies that HH satisfies (43) and hence for

F(λ)=1H(λ)10000,F(\lambda)=\frac{1}{H(\lambda)}\leq 10000,

we have (42) and this completes the proof of Lemma 4.2 by induction. ∎

We can now complete the proof for spectral independence.

Proof of Theorem 1.2.

We apply Theorem 1.12 of [CLV21] which says that if for all pinnings we have η\eta-spectral independence and bb-marginally boundedness then the mixing time of the Glauber dynamics is C(η,Δ,b)nlognC(\eta,\Delta,b)n\log{n}. A pinning refers to a fixed configuration τ\tau on a subset SS of vertices; for the hard-core model a pinning of a tree TT corresponds to the hard-core model on a forest which is an induced subgraph. Hence, Lemma 4.1 implies that η\eta-spectral independence holds for all pinnings with η=10000\eta=10000. The condition bb-marginally boundedness (see Definition 1.9 in [CLV21]) says that for every pinning τ\tau on a subset SVS\subset V, for every vertex vSv\notin S, for every assignment to vv denoted as σ(v)\sigma(v) which has positive probability in the conditional Gibbs distribution μτ\mu_{\tau}, then the marginal probability is lower bounded as μτ(σ(v))b\mu_{\tau}(\sigma(v))\geq b. This holds for bmin{1,λ}/[λ+(1+λ)Δ]b\geq\min\{1,\lambda\}/[\lambda+(1+\lambda)^{\Delta}]. Hence, [CLV21, Theorem 1.12] implies Theorem 1.2. ∎

5 Proof of Corollary 1.3

We prove Corollary 1.3 by combining the spectral independence result in Lemma 4.1 and Theorem 1.1, while we utilise [CFYZ24, Theorem 1.9].

In [CFYZ24], they introduce the notion of “complete η\eta-spectral independence”. Let μ\mu be the hard-core model on graph G=(V,E)G=(V,E) with fugacity λ\lambda. For η>0\eta>0, complete η\eta-spectral independence for μ\mu corresponds to the following condition: For any induced subgraph GG^{\prime} of GG, for the hard-core model μ\mu^{\prime} on GG^{\prime} such that each vertex vVv\in V has fugacity λvλ\lambda^{\prime}_{v}\leq\lambda, the corresponding influence matrix G\mathcal{I}_{G^{\prime}} has spectral radius at most η\eta.

Lemma 4.1 is equivalent to complete η\eta-spectral independence for all λ1.3\lambda\leq 1.3 with η10000\eta\leq 10000. We can now prove Corollary 1.3.

Proof of Corollary 1.3:.

For the forest TT, let μ\mu be the hard-core model with fugacity λ1.3\lambda\leq 1.3. Also, let γ\gamma be the spectral gap for Glauber dynamics on μ\mu. Corollary 1.3 follows by showing that γ=Ω(n1)\gamma=\Omega(n^{-1}).

First, note that if λ<1.1\lambda<1.1, then Theorem 1.1 already implies γ=Ω(n1)\gamma=\Omega(n^{-1}). We now focus on the case where λ[1.1,1.3]\lambda\in[1.1,1.3].

For the same forest TT, let μ^\widehat{\mu} be the hard-core model on TT with fugacity λ^=1\widehat{\lambda}=1. Let γ^\widehat{\gamma} be the spectral gap for Glauber dynamics on μ^\widehat{\mu}. From Theorem 1.1, we have that

γ^=Ω(n1).\displaystyle\widehat{\gamma}=\Omega(n^{-1}). (52)

Lemma 4.1 and [CFYZ24, Theorem 1.9] imply the following relation between γ\gamma and γ^\widehat{\gamma}: for θ=λ\theta=\lambda and η10000\eta\leq 10000, we have

γ\displaystyle\gamma (θ2)2η+7γ^>(12)20007γ^.\displaystyle\geq\left(\frac{\theta}{2}\right)^{2\eta+7}\widehat{\gamma}\ >\left(\frac{1}{2}\right)^{20007}\widehat{\gamma}. (53)

From (52) and (53), we get that γ=Ω(n1)\gamma=\Omega(n^{-1}), for any λ[1.1,1.3]\lambda\in[1.1,1.3].

From the above we have established Corollary 1.3. ∎

Acknowledgements

We thank Zongchen Chen for pointing out the counterexample to entropy tensorization as discussed in Remark 1.4, and the anonymous referee for pointing out Corollary 1.3.

References

  • [ALO21] Nima Anari, Kuikui Liu, and Shayan Oveis Gharan. Spectral independence in high-dimensional expanders and applications to the hardcore model. SIAM Journal on Computing, 2021.
  • [BCC+22] Antonio Blanca, Pietro Caputo, Zongchen Chen, Daniel Parisi, Daniel Štefankovič, and Eric Vigoda. On Mixing of Markov Chains: Coupling, Spectral Independence, and Entropy Factorization. In Proceedings of the 33rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3670–3692, 2022.
  • [BKMP05] Noam Berger, Claire Kenyon, Elchanan Mossel, and Yuval Peres. Glauber dynamics on trees and hyperbolic graphs. Probability Theory and Related Fields, 131(3):311–340, 2005.
  • [BST16] Nayantara Bhatnagar, Allan Sly, and Prasad Tetali. Decay of correlations for the hardcore model on the dd-regular random graph. Electron. J. Probab., 21:1–42, 2016.
  • [BW04] Graham Brightwell and Peter Winkler. A second threshold for the hard-core model on a Bethe lattice. Random Structures and Algorithms, 24:303–314, 2004.
  • [Cap23] Pietro Caputo. Lecture notes on Entropy and Markov Chains. Preprint, available from: http://www.mat.uniroma3.it/users/caputo/entropy.pdf, 2023.
  • [CE22] Yuansi Chen and Ronen Eldan. Localization schemes: A framework for proving mixing bounds for Markov chains. In Proceedings of the 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 110–122, 2022.
  • [CFYZ22] Xiaoyu Chen, Weiming Feng, Yitong Yin, and Xinyuan Zhang. Optimal mixing for two-state anti-ferromagnetic spin systems. In Proceedings of the 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 588–599, 2022.
  • [CFYZ24] Xiaoyu Chen, Weiming Feng, Yitong Yin, and Xinyuan Zhang. Rapid mixing of Glauber dynamics via spectral independence for all degrees. SIAM Journal on Computing, 2024.
  • [CG24] Zongchen Chen and Yuzhou Gu. Fast sampling of bb-matchings and bb-edge covers. In Proceedings of the 35th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 4972–4987, 2024.
  • [Che24] Zongchen Chen. Combinatorial approach for factorization of variance and entropy in spin systems. In Proceedings of the 35th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 4988–5012, 2024.
  • [CLMM23] Zongchen Chen, Kuikui Liu, Nitya Mani, and Ankur Moitra. Strong spatial mixing for colorings on trees and its algorithmic applications. In Proceedings of the 64th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 810–845, 2023.
  • [CLV21] Zongchen Chen, Kuikui Liu, and Eric Vigoda. Optimal mixing of Glauber dynamics: Entropy factorization via high-dimensional expansion. In Proceedings of the 53rd Annual ACM Symposium on Theory of Computing (STOC), pages 1537–1550, 2021.
  • [CLV22] Zongchen Chen, Kuikui Liu, and Eric Vigoda. Spectral independence via stability and applications to Holant-type problems. In Proceedings of the 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 149–160, 2022.
  • [CMT15] Pietro Caputo, Georg Menz, and Prasad Tetali. Approximate tensorization of entropy at high temperature. Annales de la Faculté des sciences de Toulouse: Mathématiques, 24(4):691–716, 2015.
  • [CP21] Pietro Caputo and Daniel Parisi. Block factorization of the relative entropy via spatial mixing. Communications in Mathematical Physics, 388:793–818, 2021.
  • [CYYZ24] Xiaoyu Chen, Xiongxin Yang, Yitong Yin, and Xinyuan Zhang. Spectral independence beyond total influence on trees and related graphs. arXiv preprint arXiv:2404.04668, 2024.
  • [DHP20] Michelle Delcourt, Marc Heinrich, and Guillem Perarnau. The Glauber dynamics for edge-colorings of trees. Random Structures & Algorithms, 57(4):1050–1076, 2020.
  • [EF23] David Eppstein and Daniel Frishberg. Rapid mixing of the hardcore Glauber dynamics and other Markov chains in bounded-treewidth graphs. In Proceedings of the 34th International Symposium on Algorithms and Computation (ISAAC), pages 30:1–30:13, 2023.
  • [EHŠV23] Charilaos Efthymiou, Thomas P. Hayes, Daniel Štefankovič, and Eric Vigoda. Optimal mixing via tensorization for random independent sets on arbitrary trees. arXiv preprint arXiv:2307.07727, version 2, 2023.
  • [GŠV16] Andreas Galanis, Daniel Štefankovič, and Eric Vigoda. Inapproximability of the partition function for the antiferromagnetic Ising and hard-core models. Combinatorics, Probability and Computing, 25(4):500–559, 2016.
  • [HS07] Thomas P. Hayes and Alistair Sinclair. A general lower bound for mixing of single-site dynamics on graphs. The Annals of Applied Probability, 17(3):931 – 952, 2007.
  • [Liu21] Kuikui Liu. From coupling to spectral independence and blackbox comparison with the down-up walk. In APPROX-RANDOM, pages 32:1–32:21, 2021.
  • [LMP09] Brendan Lucier, Michael Molloy, and Yuval Peres. The Glauber dynamics for colourings of bounded degree trees. In Randomization and Approximation Techniques in Computer Science (RANDOM), pages 631–645, 2009.
  • [Mar03] James B. Martin. Reconstruction thresholds on regular trees. In Discrete Random Walks, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, page 191–204, 2003.
  • [Mos04] Elchanan Mossel. Survey: Information flow on trees. In Graphs, Morphisms and Statistical Physics, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, page 155–170, 2004.
  • [MSW03] Fabio Martinelli, Alistair Sinclair, and Dror Weitz. The Ising model on trees: boundary conditions and mixing time. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 628–639, 2003.
  • [MSW04] Fabio Martinelli, Alistair Sinclair, and Dror Weitz. Fast mixing for independent sets, colorings and other models on trees. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 456–465, 2004.
  • [MWW07] Elchanan Mossel, Dror Weitz, and Nicholas C. Wormald. On the hardness of sampling independent sets beyond the tree threshold. Probability Theory and Related Fields, 143:401–439, 2007.
  • [RŠV+14] Ricardo Restrepo, Daniel Štefankovič, Juan C. Vera, Eric Vigoda, and Linji Yang. Phase transition for Glauber dynamics for independent sets on regular trees. SIAM Journal on Discrete Mathematics, 28:835–861, 2014.
  • [Sly10] Allan Sly. Computational transition at the uniqueness threshold. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 287–296, 2010.
  • [SS14] Allan Sly and Nike Sun. The computational hardness of counting in two-spin models on dd-regular graphs. The Annals of Probability, 42(6):2383–2416, 2014.
  • [ŠVV09] Daniel Štefankovič, Santosh Vempala, and Eric Vigoda. Adaptive simulated annealing: A near-optimal connection between sampling and counting. Journal of the ACM, 56(3):1–36, 2009.