This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Combinatorial Approach for Factorization of Variance and Entropy in Spin Systems

Zongchen Chen Department of Mathematics, Massachusetts Institute of Technology. Email: [email protected].
Abstract

We present a simple combinatorial framework for establishing approximate tensorization of variance and entropy in the setting of spin systems (a.k.a. undirected graphical models) based on balanced separators of the underlying graph. Such approximate tensorization results immediately imply as corollaries many important structural properties of the associated Gibbs distribution, in particular rapid mixing of the Glauber dynamics for sampling. We prove approximate tensorization by recursively establishing block factorization of variance and entropy with a small balanced separator of the graph. Our approach goes beyond the classical canonical path method for variance and the recent spectral independence approach, and allows us to obtain new rapid mixing results. As applications of our approach, we show that:

  1. 1.

    On graphs of treewidth tt, the mixing time of the Glauber dynamics is nO(t)n^{O(t)}, which recovers the recent results of Eppstein and Frishberg [EF21] with improved exponents and simpler proofs;

  2. 2.

    On bounded-degree planar graphs, strong spatial mixing implies O~(n)\widetilde{O}(n) mixing time of the Glauber dynamics, which gives a faster algorithm than the previous deterministic counting algorithm by Yin and Zhang [YZ13].

1 Introduction

Spin systems, also known as undirected graphical models, are important models for describing the joint distribution of interacting random variables. Spin systems were first studied in statistical physics but have been widely used in many other areas including computer science, social network, and biology.

Consider a general spin system defined on a graph G=(V,E)G=(V,E). The model describes a distribution, called Gibbs distribution, over all spin configurations where each vertex vv is assigned a spin σv\sigma_{v} from a finite set 𝒬\mathcal{Q}. Given functions ϕ:𝒬×𝒬0\phi:\mathcal{Q}\times\mathcal{Q}\to\mathbb{R}_{\geq 0} characterizing pairwise interactions and ψ:𝒬+\psi:\mathcal{Q}\to\mathbb{R}_{+} measuring external bias, the Gibbs distribution associated with a spin system is defined as

μ(σ)=1Ze=uvEϕ(σu,σv)vVψ(σv),σ:V𝒬\mu(\sigma)=\frac{1}{Z}\prod_{e=uv\in E}\phi(\sigma_{u},\sigma_{v})\prod_{v\in V}\psi(\sigma_{v}),\quad\forall\sigma:V\to\mathcal{Q} (1)

where ZZ is a normalizing constant known as the partition function.

We mention two classical examples of spin systems which have been extensively studied. The first is the hardcore model of independent sets. The Gibbs distribution is supported over the collection G\mathcal{I}_{G} of all independent sets of GG where each IGI\in\mathcal{I}_{G} has density μ(I)=λ|I|/Z\mu(I)=\lambda^{|I|}/Z, where Z=IGλ|λ|Z=\sum_{I\in\mathcal{I}_{G}}\lambda^{|\lambda|}. Observe that this corresponds to 𝒬={0,1}\mathcal{Q}=\{0,1\}, ϕ(σu,σv)=1σuσv\phi(\sigma_{u},\sigma_{v})=1-\sigma_{u}\sigma_{v}, and ψ(σv)=1+(λ1)σv\psi(\sigma_{v})=1+(\lambda-1)\sigma_{v} in Eq. 1 with σ\sigma being the indicator vector. Another example is random vertex colorings where the Gibbs distribution is the uniform distribution over all proper qq-colorings of GG. This corresponds to 𝒬=[q]={1,,q}\mathcal{Q}=[q]=\{1,\dots,q\}, ϕ(σu,σv)=𝟙{σuσv}\phi(\sigma_{u},\sigma_{v})=\mathbbm{1}\{\sigma_{u}\neq\sigma_{v}\}, and ψ1\psi\equiv 1 in Eq. 1.

We study the problem of sampling from the Gibbs distribution of a spin system. In particular, we consider the single-site Glauber dynamics (also called Gibbs sampling) which is perhaps the simplest and most popular Markov chain Monte Carlo (MCMC) algorithm for sampling from a high-dimensional distribution. The Glauber dynamics is an ergodic Markov chain where in each iteration we choose a vertex vv uniformly at random, and update the spin σv\sigma_{v} conditional on the spin values of all other vertices. The mixing time of Glauber dynamics is the smallest tt such that, starting from any initial configuration σ(0)\sigma^{(0)}, the distribution of σ(t)\sigma^{(t)} after tt steps is 1/41/4-close to the target distribution μ\mu in total variation distance.

Bounding the mixing time of Glauber dynamics is challenging even for simplest spin systems like the hardcore model or random colorings. One common method for establishing mixing time bounds of Markov chains is by proving associated functional inequalities such as the Poincaré inequality or the standard/modified log-Sobolev inequality. For a function f:𝒬Vf:\mathcal{Q}^{V}\to\mathbb{R}, the expectation of ff with respect to a Gibbs distribution μ\mu is defined as 𝔼f=σ:V𝒬μ(σ)f(σ)\mathbb{E}f=\sum_{\sigma:V\to\mathcal{Q}}\mu(\sigma)f(\sigma). The variance and entropy functionals are defined as Varf=𝔼[f2](𝔼f)2\mathrm{Var}f=\mathbb{E}[f^{2}]-(\mathbb{E}f)^{2}, and Entf=𝔼[flogf](𝔼f)log(𝔼f)\mathrm{Ent}f=\mathbb{E}[f\log f]-(\mathbb{E}f)\log(\mathbb{E}f) for non-negative ff. For Glauber dynamics specifically, the Poincaré inequality can be expressed equivalently in the form of

VarfCvV𝔼[Varv(f)],f:𝒬V\mathrm{Var}f\leq C\sum_{v\in V}\mathbb{E}[\mathrm{Var}_{v}(f)],\quad\forall f:\mathcal{Q}^{V}\to\mathbb{R} (2)

where on the right-hand side Varvηf\mathrm{Var}^{\eta}_{v}f is the variance of ff under the conditional distribution μvη\mu^{\eta}_{v} of σv\sigma_{v} where all other vertices are fixed to be some η:V{v}𝒬\eta:V\setminus\{v\}\to\mathcal{Q}, and 𝔼[Varvf]\mathbb{E}[\mathrm{Var}_{v}f] takes expectation over η\eta. The inequality Eq. 2 is called Approximate Tensorization (AT) of variance. The Poincaré inequality, or equivalently AT of variance, implies that Glauber dynamics mixes in O(Cn2)O(Cn^{2}) steps.

One can also consider the entropy analog of Eq. 2, known as Approximate Tensorization (AT) of entropy:

EntfCvV𝔼[Entv(f)],f:𝒬V.\mathrm{Ent}f\leq C\sum_{v\in V}\mathbb{E}[\mathrm{Ent}_{v}(f)],\quad\forall f:\mathcal{Q}^{V}\to\mathbb{R}. (3)

AT of entropy is a much stronger property. It implies the modified log-Sobolev inequality for the Glauber dynamics, and gives a sharper mixing time bound O(Cnlogn)O(Cn\log n) (see Lemma 2.7) which is optimal if CC is constant.

Both Eqs. 2 and 3 were explicitly mentioned and carefully studied in [CMT15], though they were implicitly used in even earlier works, see e.g. [Mar99, GZ03, Ces01]. On a high level, both Eqs. 2 and 3 say that the global fluctuation of a function, quantified as variance or entropy, is always controlled by the sum of local fluctuations at each single variable, which is intuitively true when all variables are sufficiently independent from each other. Indeed, one has C1C\geq 1 in Eqs. 2 and 3 with the equality holds iff μ\mu is a product distribution. See also [BCŠV22, KHR22] for recent applications of AT in learning and testing.

Establishing AT of variance and entropy is a challenging task even for simple distributions. For variance, the canonical path approach is a common way of proving the Poincaré inequality (i.e., bounding the spectral gap) in the setting of spin systems. The high-level idea is to construct a family of canonical paths or more generally a multi-commodity flow between each pair of configurations and then use the congestion of the flow to establish Eq. 2. Canonical paths have found many successful applications such as matchings [JS89], ferromagnetic Ising model [JS93], and bipartite perfect matchings [JSV04]. However, constructing canonical paths is not easy at all and usually involves some specific technical complications for each problem.

Meanwhile, for entropy it is much more difficult to establish AT or other related functional inequalities like the standard/modified log-Sobolev inequality. In many cases they are proved analytically, relying on the topology being the lattice [Mar99, GZ03, Ces01]. It was also known that for high-temperature models such as under the Dobrushin uniqueness, AT of entropy holds with C=O(1)C=O(1) [CMT15, Mar19].

Recently, the spectral/entropic independence approach was introduced [ALO20, AJK+22] and becomes a powerful tool for establishing AT of both variance and entropy. For many families of spin systems it achieves C=O(1)C=O(1) in Eqs. 2 and 3 and thus shows optimal O(nlogn)O(n\log n) mixing of Glauber dynamics. For example, for the hardcore model one obtains AT of variance and entropy with C=O(1)C=O(1) when λ<λc(Δ)\lambda<\lambda_{c}(\Delta) by a sequence of recent works [ALO20, CLV20, CLV21, CFYZ21, AJK+22, CFYZ22, CE22], where Δ\Delta denotes the maximum degree and λc(Δ)\lambda_{c}(\Delta) is the tree uniqueness threshold [Wei06]. It was known that Glauber dynamics can be exponentially slow when λ>λc(Δ)\lambda>\lambda_{c}(\Delta) [MWW09] and hence C=eΩ(n)C=e^{\Omega(n)} for some graphs. The critical value λc(Δ)\lambda_{c}(\Delta) in fact pinpoints a computational phase transition, see [Wei06, Sly10, SS14, GŠV16] for more discussions. Though the spectral independence approach works well on general graphs for proving optimal mixing time, it does not apply when the mixing time is a larger polynomial instead of nearly linear. For example, for the hardcore model on trees the Glauber dynamics always mixes in polynomial time for all λ>0\lambda>0 [JSTV04] but (constant) spectral independence fails for large λ\lambda.

In this paper, we ask if there is a natural and direct way of proving AT beyond canonical paths or spectral independence, especially when we have extra knowledge of the underlying graph structure. We present a simple combinatorial approach for proving AT of variance and entropy based on the existence of balanced separators of the graph, see Propositions 3.1 and 4.6. Using this approach we are able to obtain new rapid mixing results as immediate corollaries for certain classes of graphs such as bounded-treewidth graphs or planar graphs. We are also able to derive many previous results in a simple and straightforward way in contrast to the detailed technical proofs that were known previously. For example, we can easily deduce within one page that Glauber dynamics for qq-colorings on complete dd-ary trees is rapidly mixing for all d2,q3d\geq 2,q\geq 3, see Proposition 3.7.

Our proof approach is nicely explained for graphs of bounded treewidth. The treewidth of a graph characterizes the closeness of a graph to a tree, and is an important parameter for getting fixed parameter tractable algorithms for many graph problems. For the hardcore model and random colorings on bounded-treewidth graphs, we obtain rapid mixing of Glauber dynamics as immediate consequences of Propositions 3.1 and 4.6, which improves the results in [EF21] with better exponents. We remark that we can also obtain similar results for more general spin systems but for simplicity we only state them for the hardcore model and vertex colorings which are the most commonly studied examples .

Theorem 1.1.

Let G=(V,E)G=(V,E) be an nn-vertex graph of treewidth t1t\geq 1. The mixing time of the Glauber dynamics for sampling from the hardcore model on GG with fugacity λ>0\lambda>0 is nO(1+tlog(1+λ))n^{O(1+t\log(1+\lambda))}.

Theorem 1.2.

Let G=(V,E)G=(V,E) be an nn-vertex graph of maximum degree Δ3\Delta\geq 3 and treewidth t1t\geq 1. For any qΔ+2q\geq\Delta+2, the mixing time of the Glauber dynamics for sampling uniformly random qq-colorings of GG is nO(tΔ)n^{O(t\Delta)}.

Previously, [EF21] presented mixing time bounds nO(1+tlogλ^)n^{O(1+t\log\hat{\lambda})} for hardcore model where λ^=max{λ,1/λ}\hat{\lambda}=\max\{\lambda,1/\lambda\} and nO(tΔlogq)n^{O(t\Delta\log q)} for random qq-colorings. Our mixing time bounds are better in the exponents and our proof is much simpler avoiding the technical construction of multi-commodity flows as done in [EF21]. We note that for bounded-treewidth graphs one can exactly compute the partition function and thus sample in time eO(t)poly(n)e^{O(t)}\cdot\mathrm{poly}(n), see e.g. [YZ13, WTZL18] and the references therein. However, the performance of the Glauber dynamics is still unclear for such graphs which is the focus of this paper.

We establish Theorems 1.1 and 1.2 by factorizing variance recursively using a balanced separator of the graph; this is inspired by previous works on trees [MSW04] and lattice [Ces01, CP21]. Since GG has treewidth tt, it has a balanced separator of size at most tt. More specifically, there exists a partition V=ABSV=A\cup B\cup S such that |S|t|S|\leq t, |A|2n/3|A|\leq 2n/3, |B|2n/3|B|\leq 2n/3 and there is no edge between AA and BB. Given such we are able to establish a block factorization of variance

VarfC0(𝔼[VarAf]+𝔼[VarBf]+𝔼[VarSf]),\mathrm{Var}f\leq C_{0}\left(\mathbb{E}[\mathrm{Var}_{A}f]+\mathbb{E}[\mathrm{Var}_{B}f]+\mathbb{E}[\mathrm{Var}_{S}f]\right), (4)

where C0C_{0} is a constant independent of nn. Since the size of SS is bounded, we can factorize the variance on SS into single vertices. Then by recursively applying Eq. 4 to AA and BB respectively, we are able to prove AT of variance with constant C=C0O(logn)=nO(1)C=C_{0}^{O(\log n)}=n^{O(1)}. We present our universal proof approach for AT in Propositions 3.1 and 4.6, and summarize basic tools for establishing such block factorization Eq. 4 in Sections 3.2 and 4.2.2.

We note that [EF21] also used balanced separators to construct multi-commodity flows and obtained similar results. In comparison, our approach establishes AT in a more direct way and is arguably simpler in nature. We remark that we could also establish AT of entropy, but since we are not aiming to get an optimal exponent in the mixing time we choose to work with variance which is much easier for calculation.

As another application of our approach, we consider planar graphs and show that the Strong Spatial Mixing (SSM) property (Definition 5.2) implies nearly optimal mixing time of Glauber dynamics. SSM is an important structural property characterizing the exponential decay of correlations between two subsets of variables as their graph distance grows. There is a rich literature in establishing rapid mixing of Glauber dynamics from SSM but mostly only for special classes of graphs such as lattice, see e.g. [Mar99, GZ03, Ces01, DSVW04, GMP05]. Here we consider general bounded-degree planar graphs with no restriction on the underlying topology. Previously, [YZ13] presented a determinstic counting algorithm under the same assumptions with large polynomial running time. Our result can be viewed as a faster sampling algorithm with nearly linear running time. We state our result for the hardcore model but it extends easily to general spin systems.

Theorem 1.3.

Let G=(V,E)G=(V,E) be an nn-vertex planar graph of maximum degree Δ3\Delta\geq 3. Consider the hardcore model on GG with fugacity λ>0\lambda>0. If SSM holds, then the mixing time of the Glauber dynamics is O(nlog4n)O(n\log^{4}n).

In fact, we prove Theorem 1.3 more generally for all graphs of polynomially bounded local treewidth, see Theorem 5.3. This is a class of graphs such that all local balls of radius rr have treewidth poly(r)\mathrm{poly}(r). This includes for examples bounded-treewidth graphs, planar graphs, or graphs with polynomial neighborhood growth such as regions in d\mathbb{Z}^{d} (see Theorem 5.7). We remark that the result in [YZ13] holds only for graphs of linearly bounded local treewidth and thus our result holds for a larger class of graphs. We establish AT of entropy by combining SSM and local structure of the underlying graph. In particular, we show a new low-diameter decomposition of graphs (see Lemma 5.4) based on a classical result of Linial and Saks [LS93], which allows us to focus on subgraphs of poly(logn)\mathrm{poly}(\log n) diameter assuming SSM. Note that here we have to work with entropy in order to get an O~(n)\widetilde{O}(n) mixing time.

Paper organization.

After giving preliminaries in Section 2, we present our new framework for establishing approximate tensorization in Section 3. We prove Theorems 1.1 and 1.2 in Section 4 for Glauber dynamics on bounded-treewidth graphs. We prove Theorem 1.3 more generally for graphs of bounded local treewidth in Section 5. We present missing proofs of basic tools for approximate tensorization and block factorization in Section 6.

Acknowledgments.

The author thanks Kuikui Liu, Eric Vigoda, and Thuy-Duong Vuong for stimulating discussions. The author thanks Eric Vigoda for helpful comments on the manuscript.

2 Preliminaries

2.1 Spin systems

We consider general families of spin systems, also known as undirected graphical models. Let G=(V,E)G=(V,E) be a graph. Every vertex vVv\in V is associated with a finite set 𝒬v\mathcal{Q}_{v} of spins (colors, labels). Let 𝒳=vV𝒬v\mathcal{X}=\prod_{v\in V}\mathcal{Q}_{v} denote the product space of all spin assignments called configurations. Let Φ={ϕe:eE}\Phi=\{\phi_{e}:e\in E\} be a collection of edge interactions such that for every edge e=uvEe=uv\in E, ϕe:𝒬u×𝒬v0\phi_{e}:\mathcal{Q}_{u}\times\mathcal{Q}_{v}\to\mathbb{R}_{\geq 0} is a function mapping spins of the two endpoints to a non-negative weight, characterizing the interaction between neighboring vertices. Let Ψ={ψv:vV}\Psi=\{\psi_{v}:v\in V\} be a collection of external fields such that for every vertex vVv\in V, ψv:𝒬v0\psi_{v}:\mathcal{Q}_{v}\to\mathbb{R}_{\geq 0} assigns a weight to each color representing the bias towards each color. The induced Gibbs distribution μ=μG,Φ,Ψ\mu=\mu_{G,\Phi,\Psi} is given by

μ(σ)=1ZeEϕe(σe)vVψv(σv),σ𝒳\mu(\sigma)=\frac{1}{Z}\prod_{e\in E}\phi_{e}(\sigma_{e})\prod_{v\in V}\psi_{v}(\sigma_{v}),\quad\forall\sigma\in\mathcal{X}

where σv\sigma_{v} denotes the color of a vertex vVv\in V and σe\sigma_{e} denotes the (partial) spin assignment of vertices in ee, and Z=ZG,Φ,ΨZ=Z_{G,\Phi,\Psi} is the partition function defined as

Z=σ𝒳eEϕe(σe)vVψv(σv).Z=\sum_{\sigma\in\mathcal{X}}\prod_{e\in E}\phi_{e}(\sigma_{e})\prod_{v\in V}\psi_{v}(\sigma_{v}).

We assume Z>0Z>0 so that the Gibbs distribution is well-defined.

For a subset WVW\subseteq V let μW\mu_{W} denotes the marginal distribution projected on WW. We say a spin configuration ηvΛ𝒬v\eta\in\prod_{v\in\Lambda}\mathcal{Q}_{v} on some subset ΛV\Lambda\subseteq V is a (feasible) pinning if μΛ(η)>0\mu_{\Lambda}(\eta)>0. For any pinning η\eta on ΛV\Lambda\subseteq V, the conditional Gibbs distribution μη\mu^{\eta}, where the configuration on Λ\Lambda is fixed to be η\eta, can be viewed as another spin system on the graph GΛG\setminus\Lambda. For any WVΛW\subseteq V\setminus\Lambda we further define μWη\mu^{\eta}_{W} to be the marginal on WW under the conditional distribution μη\mu^{\eta}.

The Glauber dynamics is one of the simplest and most popular Markov chain Monte Carlo algorithms for sampling from the Gibbs distribution of a spin system. In each step of Glauber dynamics, we pick a vertex vVv\in V uniformly at random and update the spin value σv\sigma_{v} conditional on the configuration σV{v}\sigma_{V\setminus\{v\}} of all other vertices, i.e., from the conditional marginal μvσV{v}\mu_{v}^{\sigma_{V\setminus\{v\}}}. Two distinct configurations σ,τ𝒳\sigma,\tau\in\mathcal{X} are said to be adjacent iff they differ in the spin value at a single vertex. Hence, for Glauber dynamics configurations only move to adjacent ones. We assume that under any pinning η\eta, the Glauber dynamics for the conditional distribution μη\mu^{\eta} is irreducible; namely, one feasible configuration can move to another through a chain of feasible configurations where consecutive pairs are adjacent. This is necessary for the Glauber dynamics to be ergodic and is naturally true for many spin systems of interest, including the hardcore model, random qq-colorings with qΔ+2q\geq\Delta+2, etc.

2.2 Block factorization of variance and entropy

Consider the Gibbs distribution μ\mu of a spin system (G,Φ,Ψ)(G,\Phi,\Psi) defined on a graph G=(V,E)G=(V,E) with state space 𝒳=vV𝒬v\mathcal{X}=\prod_{v\in V}\mathcal{Q}_{v}.

Definition 2.1.

Let f:𝒳f:\mathcal{X}\to\mathbb{R} be a function.

  • The expectation of ff with respect to μ\mu is defined as 𝔼μf=σ𝒳μ(σ)f(σ).\mathbb{E}_{\mu}f=\sum_{\sigma\in\mathcal{X}}\mu(\sigma)f(\sigma).

  • The variance of ff with respect to μ\mu is defined as Varμf=𝔼[(f𝔼f)2].\mathrm{Var}_{\mu}f=\mathbb{E}[(f-\mathbb{E}f)^{2}].

  • For non-negative ff, the entropy of ff with respect to μ\mu is defined as Entμf=𝔼[flogf](𝔼f)log(𝔼f),\mathrm{Ent}_{\mu}f=\mathbb{E}[f\log f]-(\mathbb{E}f)\log(\mathbb{E}f), with the convention that 0log0=00\log 0=0.

We often omit the subscript and write 𝔼f,Varf,Entf\mathbb{E}f,\mathrm{Var}f,\mathrm{Ent}f when the underlying distribution μ\mu is clear from context.

Let BVB\subseteq V be a subset of vertices. For any pinning η\eta on VBV\setminus B, the expectation, variance, and entropy of a function ff with respect to the conditional Gibbs distribution μBη\mu^{\eta}_{B} is denoted as 𝔼Bηf\mathbb{E}^{\eta}_{B}f, VarBηf\mathrm{Var}^{\eta}_{B}f, and EntBηf\mathrm{Ent}^{\eta}_{B}f; in particular, we view all of them as functions mapping a full configuration σ𝒳\sigma\in\mathcal{X} to a real, which depends only on the pinning η=σVB\eta=\sigma_{V\setminus B}.

The following lemma summarizes several basic and important properties of variance and entropy in spin systems. The proof can be found in [MSW04] and the references therein.

Lemma 2.2 ([MSW04, Eq. (3), (4), (5)]).

Let f:𝒳0f:\mathcal{X}\to\mathbb{R}_{\geq 0} be a function.

  1. (1)

    For any subsets BAVB\subseteq A\subseteq V, it holds that 𝔼[EntAf]=𝔼[EntBf]+𝔼[EntA(𝔼Bf)]\mathbb{E}[\mathrm{Ent}_{A}f]=\mathbb{E}[\mathrm{Ent}_{B}f]+\mathbb{E}[\mathrm{Ent}_{A}(\mathbb{E}_{B}f)];

  2. (2)

    For B=i=1mBiB=\bigcup_{i=1}^{m}B_{i} where B1,,BmVB_{1},\dots,B_{m}\subseteq V are pairwise disjoint subsets such that every distinct pair of Bi,BjB_{i},B_{j} are disconnected in G[B]G[B], it holds that 𝔼[EntBf]i=1m𝔼[EntBif]\mathbb{E}[\mathrm{Ent}_{B}f]\leq\sum_{i=1}^{m}\mathbb{E}[\mathrm{Ent}_{B_{i}}f];

  3. (3)

    For any subsets A,BVA,B\subseteq V such that there are no edges between AA and BAB\setminus A, it holds that 𝔼[EntA(𝔼Bf)]𝔼[EntA(𝔼ABf)]\mathbb{E}\left[\mathrm{Ent}_{A}(\mathbb{E}_{B}f)\right]\leq\mathbb{E}\left[\mathrm{Ent}_{A}(\mathbb{E}_{A\cap B}f)\right].

All three properties (1), (2), (3) hold with variance as well.

Definition 2.3.

Let \mathcal{B} be a collection (possibly multiset) of subsets of VV. We say that μ\mu satisfies \mathcal{B}-factorization of variance (resp., entropy) with multiplier CC if for every function f:𝒳f:\mathcal{X}\to\mathbb{R} (resp., f:𝒳0f:\mathcal{X}\to\mathbb{R}_{\geq 0}) it holds

VarfCB𝔼[VarBf](.resp., EntfCB𝔼[EntBf].).\mathrm{Var}f\leq C\sum_{B\in\mathcal{B}}\mathbb{E}[\mathrm{Var}_{B}f]\qquad\Big{(}\Big{.}~{}\text{resp.,~{}~{}}\mathrm{Ent}f\leq C\sum_{B\in\mathcal{B}}\mathbb{E}[\mathrm{Ent}_{B}f]~{}\Big{.}\Big{)}. (5)
Remark 2.4.

Following [BGGŠ22], we call CC the “multiplier” instead of “constant” since it may depend on nn.

Remark 2.5.

The block factorization of variance/entropy can be defined more generally for weighted blocks; we refer to [CP21] for more details.

The \mathcal{B}-factorization corresponds to the heat-bath block dynamics for sampling from μ\mu: In each iteration, we pick BB\in\mathcal{B} uniformly at random and update σB\sigma_{B} from the conditional distribution μBσVB\mu_{B}^{\sigma_{V\setminus B}}. More specifically, the \mathcal{B}-factorization of variance is equivalent to the Poincaré inequality of such block dynamics and the \mathcal{B}-factorization of entropy implies the modified log-Sobolev inequality (but the converse is not true). See [CMT15, CP21].

Definition 2.6.

We say that μ\mu satisfies approximate tensorization (AT) of variance (resp., entropy) with multiplier CC if for every function f:𝒳f:\mathcal{X}\to\mathbb{R} (resp., f:𝒳0f:\mathcal{X}\to\mathbb{R}_{\geq 0}) it holds

VarfCvV𝔼[Varvf](.resp., EntfCvV𝔼[Entvf].).\mathrm{Var}f\leq C\sum_{v\in V}\mathbb{E}[\mathrm{Var}_{v}f]\qquad\Big{(}\Big{.}~{}\text{resp.,~{}~{}}\mathrm{Ent}f\leq C\sum_{v\in V}\mathbb{E}[\mathrm{Ent}_{v}f]~{}\Big{.}\Big{)}. (6)

Observe that AT is exactly \mathcal{B}-factorization with ={{v}:vV}\mathcal{B}=\{\{v\}:v\in V\}. Establishing AT with a small CC allows us to conclude rapid mixing of Glauber dynamics, see [CLV21] and the references therein.

Lemma 2.7.

Suppose it holds that minσ𝒳:μ(σ)>0μ(σ)=eO(n)\min_{\sigma\in\mathcal{X}:\,\mu(\sigma)>0}\mu(\sigma)=e^{-O(n)}. If μ\mu satisfies AT of variance with multiplier CC, then the mixing time of Glauber dynamics is O(Cn2)O(Cn^{2}). If μ\mu satisfies AT of entropy with multiplier CC, then the mixing time of Glauber dynamics is O(Cnlogn)O(Cn\log n).

2.3 Separator decomposition

We prove AT via a divide-and-conquer argument. To accomplish this we need the following separator decomposition, slightly modified from [YZ13]. Such ideas also appeared in many previous works to obtain fixed-parameter tractable algorithms in graphs of bounded treewidth.

Definition 2.8 (Separator Decomposition, [YZ13]).

For a graph G=(V,E)G=(V,E), a separator decomposition tree T𝖲𝖣T_{\mathsf{SD}} of GG is a rooted tree satisfying the following conditions:

  • Every node of T𝖲𝖣T_{\mathsf{SD}} is a pair (U,S)(U,S) where UU is a subset of vertices and SS is a separator of G[U]G[U];

  • The root node of T𝖲𝖣T_{\mathsf{SD}} is a pair (V,SV)(V,S_{V});

  • For every non-leaf node (U,S)(U,S), the children of (U,S)(U,S) are connected components of G[US]G[U\setminus S] and their separators;

  • Every leaf of T𝖲𝖣T_{\mathsf{SD}} is a pair (U,U)(U,U).

A separator decomposition tree is said to be balanced if for every internal node (U,S)(U,S) and every child (U,S)(U^{\prime},S^{\prime}) of (U,S)(U,S), it holds that |U|2|U|/3|U^{\prime}|\leq 2|U|/3.

Remark 2.9.

Observe that all separators SS appearing in T𝖲𝖣T_{\mathsf{SD}} form a partition of VV.

It is easy to see that the height of a balanced separator decomposition tree is O(logn)O(\log n).

Lemma 2.10.

Suppose G=(V,E)G=(V,E) is an nn-vertex graph with n2n\geq 2. If T𝖲𝖣T_{\mathsf{SD}} is a balanced separator decomposition tree of GG, then the height of T𝖲𝖣T_{\mathsf{SD}} is less than 3logn3\log n.

Proof.

Assume h3lognh\geq 3\log n. Let (U,U)(U,U) be a leaf of T𝖲𝖣T_{\mathsf{SD}} at distance hh from the root node (V,SV)(V,S_{V}). Since all separators are balanced, we have 1|U|(2/3)hnn13log(3/2)<11\leq|U|\leq(2/3)^{h}n\leq n^{1-3\log(3/2)}<1, contradiction. ∎

3 Combinatorial Approach for Approximate Tensorization

3.1 Approximate tensorization via separator decomposition

Our main step for establishing approximate tensorization of variance and entropy is given by the following proposition. Roughly speaking, if we can find a (small) separator SVS\subseteq V of the underlying graph GG whose removal disconnects GG, then we can factorize variance/entropy into the block SS and all connected components of VSV\setminus S, whose sizes are significantly smaller if the separator SS is balanced. Given a (balanced) separator decomposition tree, we can continue this process for each smaller block, and in the end tensorize into single vertices.

Proposition 3.1.

Let (G,Φ,Ψ)(G,\Phi,\Psi) be a spin system defined on a graph G=(V,E)G=(V,E) with associated Gibbs distribution μ\mu. Suppose that T𝖲𝖣T_{\mathsf{SD}} is a separator decomposition tree of GG satisfying:

  1. 1.

    (Block Factorization for Decomposition) For every node (U,S)(U,S), there exists CU,S1C_{U,S}\geq 1, such that for any function f:𝒳f:\mathcal{X}\to\mathbb{R} we have

    𝔼[VarUf]CU,S(𝔼[VarSf]+𝔼[VarUSf]).\mathbb{E}[\mathrm{Var}_{U}f]\leq C_{U,S}\left(\mathbb{E}\left[\mathrm{Var}_{S}f\right]+\mathbb{E}\left[\mathrm{Var}_{U\setminus S}f\right]\right). (7)

    For all leaves (U,U)(U,U) we take CU,U=1C_{U,U}=1.

  2. 2.

    (Approximate Tensorization for Separators) For every node (U,S)(U,S), there exists CS1C_{S}\geq 1, such that for any function f:𝒳f:\mathcal{X}\to\mathbb{R} we have

    𝔼[VarSf]CSvS𝔼[Varvf].\mathbb{E}\left[\mathrm{Var}_{S}f\right]\leq C_{S}\sum_{v\in S}\mathbb{E}[\mathrm{Var}_{v}f]. (8)

Then the Gibbs distribution μ\mu satisfies approximate tensorization of variance with multiplier CC given by

C=max(U,S){CS(U,S)CU,S},C=\max_{(U,S)}\left\{C_{S}\prod_{(U^{\prime},S^{\prime})}C_{U^{\prime},S^{\prime}}\right\},

where the maximum is taken over all nodes of T𝖲𝖣T_{\mathsf{SD}}, and the product is over all nodes (U,S)(U^{\prime},S^{\prime}) in the unique path from the root (V,SV)(V,S_{V}) to (U,S)(U,S). Namely, for every function f:𝒳f:\mathcal{X}\to\mathbb{R} we have

Varf\displaystyle\mathrm{Var}f CvV𝔼[Varvf].\displaystyle\leq C\sum_{v\in V}\mathbb{E}[\mathrm{Var}_{v}f].
Remark 3.2.

The entropy version of Proposition 3.1 is also true and the proof is exactly the same with Ent()\mathrm{Ent}(\cdot) replacing Var()\mathrm{Var}(\cdot).

The proof of Proposition 3.1 is straightforwardly applying properties of variance and entropy given in Lemma 2.2. We use it as our basic strategy for obtaining meaningful AT bounds in many applications.

Proof of Proposition 3.1.

The lemma follows by decomposing the variance level by level on the separator decomposition tree T𝖲𝖣T_{\mathsf{SD}} by Lemmas 2.2, 7 and 8. More specifically, for the root (V,SV)(V,S_{V}) we have that

Varf\displaystyle\mathrm{Var}f CV,SV(𝔼[VarSf]+𝔼[VarVSf])\displaystyle\leq C_{V,S_{V}}\left(\mathbb{E}\left[\mathrm{Var}_{S}f\right]+\mathbb{E}\left[\mathrm{Var}_{V\setminus S}f\right]\right)
CV,SVCSVvSV𝔼[Varvf]+CV,SVU: c.c. of G[VSV]𝔼[VarUf],\displaystyle\leq C_{V,S_{V}}C_{S_{V}}\sum_{v\in S_{V}}\mathbb{E}\left[\mathrm{Var}_{v}f\right]+C_{V,S_{V}}\sum_{U:\text{ c.c.\ of $G[V\setminus S_{V}]$}}\mathbb{E}\left[\mathrm{Var}_{U}f\right],

where every UU is a connected component of G[VSV]G[V\setminus S_{V}] and we can factorize 𝔼[VarVSVf]\mathbb{E}\left[\mathrm{Var}_{V\setminus S_{V}}f\right] without loss since it is a product distribution (Lemma 2.2). In particular, UU (together with its separator) is a child of (V,SV)(V,S_{V}) in T𝖲𝖣T_{\mathsf{SD}}. Continue the process for each child (U,SU)(U,S_{U}), we obtain

𝔼[VarUf]\displaystyle\mathbb{E}\left[\mathrm{Var}_{U}f\right] CU,SU(𝔼[VarSUf]+𝔼[VarUSUf])\displaystyle\leq C_{U,S_{U}}\left(\mathbb{E}\left[\mathrm{Var}_{S_{U}}f\right]+\mathbb{E}\left[\mathrm{Var}_{U\setminus S_{U}}f\right]\right)
CU,SUCSUvSU𝔼[Varvf]+CU,SUW: c.c. of G[USU]𝔼[VarWf],\displaystyle\leq C_{U,S_{U}}C_{S_{U}}\sum_{v\in S_{U}}\mathbb{E}\left[\mathrm{Var}_{v}f\right]+C_{U,S_{U}}\sum_{W:\text{ c.c.\ of $G[U\setminus S_{U}]$}}\mathbb{E}\left[\mathrm{Var}_{W}f\right],

where every WW is a connected component of G[USU]G[U\setminus S_{U}]. So in the end we obtain

VarfvVCv𝔼[Varvf]CvV𝔼[Varvf],\mathrm{Var}f\leq\sum_{v\in V}C_{v}\mathbb{E}\left[\mathrm{Var}_{v}f\right]\leq C\sum_{v\in V}\mathbb{E}\left[\mathrm{Var}_{v}f\right],

where for each vv,

Cv=CS(U,S)CU,SCC_{v}=C_{S}\prod_{(U^{\prime},S^{\prime})}C_{U^{\prime},S^{\prime}}\leq C

where (U,S)(U,S) is the unique node such that vSv\in S (see Remark 2.9) and the product runs through all (U,S)(U^{\prime},S^{\prime}) on the unique path from (V,SV)(V,S_{V}) to (U,S)(U,S). This shows the lemma. ∎

3.2 Tools for factorization of variance and entropy

To apply Proposition 3.1, one needs to establish block factorization of variance/entropy for decomposition Eq. 7 and approximate tensorization for separators Eq. 8. In this subsection, we summarizes known and gives new results for factorization of variance and entropy in a very general setting, which are useful for showing Eqs. 7 and 8. The lemmas in this subsection are suitable for establishing AT or block factorization for disjoint blocks; for overlapping blocks we also need Lemma 4.10 from Section 4.2.2.

Two-variable factorization with weak correlation

We first consider AT for two variables. Let XX and YY be two random variables with joint distribution π=πXY\pi=\pi_{XY}, fully supported on finite state spaces 𝒳\mathcal{X} and 𝒴\mathcal{Y} respectively. For applications such as proving Eq. 7, X,YX,Y would represent a block of vertices, namely X=σSX=\sigma_{S} and Y=σUSY=\sigma_{U\setminus S}, and we consider the joint distribution of (X,Y)=σU(X,Y)=\sigma_{U} under an arbitrary pinning η\eta outside UU.

Denote the marginal distribution of XX by πX\pi_{X}, and for y𝒴y\in\mathcal{Y} let πXy\pi_{X}^{y} be the distribution of XX conditioned on Y=yY=y. We define the marginal distribution πY\pi_{Y} and for x𝒳x\in\mathcal{X} the conditional distribution πYx\pi_{Y}^{x} in the same way.

We use 𝔼f=𝔼πf\mathbb{E}f=\mathbb{E}_{\pi}f, Varf=Varπf\mathrm{Var}f=\mathrm{Var}_{\pi}f, and Entf=Entπf\mathrm{Ent}f=\mathrm{Ent}_{\pi}f to denote the expectation, variance, and entropy of some function f:𝒳×𝒴f:\mathcal{X}\times\mathcal{Y}\to\mathbb{R} or 0\mathbb{R}_{\geq 0} under the distribution π\pi, and use 𝔼Xyf\mathbb{E}_{X}^{y}f, VarXyf\mathrm{Var}_{X}^{y}f, and EntXyf\mathrm{Ent}_{X}^{y}f to denote the variance and entropy under the conditional distribution πXy\pi_{X}^{y}. As before, 𝔼[VarX(f)]\mathbb{E}[\mathrm{Var}_{X}(f)] and 𝔼[EntX(f)]\mathbb{E}[\mathrm{Ent}_{X}(f)] represent the expectation of VarXYf\mathrm{Var}_{X}^{Y}f and EntXYf\mathrm{Ent}_{X}^{Y}f where YY is chosen from πY\pi_{Y}.

We first show AT for π\pi when the correlation between XX and YY is bounded pointwisely. The following lemma is implicitly given in [Ces01, DPPP02]. Here we give a self-contained, simplified proof with an improved constant. The lemma below can also be applied to the main results from [Ces01, DPPP02].

Lemma 3.3 ([Ces01, Proposition 2.1] and [DPPP02, Lemma 5.1 & 5.2]).

Suppose there exists a real ε[0,1/2)\varepsilon\in[0,1/2) such that for all y𝒴y\in\mathcal{Y},

|πXy(x)πX(x)1|ε,x𝒳.\left|\frac{\pi_{X}^{y}(x)}{\pi_{X}(x)}-1\right|\leq\varepsilon,\quad\forall x\in\mathcal{X}. (9)

Then we have

Varf\displaystyle\mathrm{Var}f (1+ε12ε)(𝔼[VarXf]+𝔼[VarYf]),f:𝒳×𝒴\displaystyle\leq\left(1+\frac{\varepsilon}{1-2\varepsilon}\right)\left(\mathbb{E}[\mathrm{Var}_{X}f]+\mathbb{E}[\mathrm{Var}_{Y}f]\right),\quad\forall f:\mathcal{X}\times\mathcal{Y}\to\mathbb{R} (10)
andEntf\displaystyle\text{and}\quad\mathrm{Ent}f (1+ε12ε)(𝔼[EntXf]+𝔼[EntYf]),f:𝒳×𝒴0.\displaystyle\leq\left(1+\frac{\varepsilon}{1-2\varepsilon}\right)\left(\mathbb{E}[\mathrm{Ent}_{X}f]+\mathbb{E}[\mathrm{Ent}_{Y}f]\right),\quad\forall f:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}_{\geq 0}. (11)

The proof of Lemma 3.3 can be found in Section 6.1.

Two-variable factorization with strong correlation

Our next result allows stronger correlations between XX and YY, at a cost of larger constants for AT.

Lemma 3.4.

Suppose there exist reals εX,εY[0,1]\varepsilon_{X},\varepsilon_{Y}\in[0,1] with εXεY>0\varepsilon_{X}\varepsilon_{Y}>0 such that

dTV(πXy,πXy)\displaystyle d_{\mathrm{TV}}(\pi_{X}^{y},\pi_{X}^{y^{\prime}}) 1εX,y,y𝒴\displaystyle\leq 1-\varepsilon_{X},\quad\forall y,y^{\prime}\in\mathcal{Y} (12)
anddTV(πYx,πYx)\displaystyle\text{and}\quad d_{\mathrm{TV}}(\pi_{Y}^{x},\pi_{Y}^{x^{\prime}}) 1εY,x,x𝒳.\displaystyle\leq 1-\varepsilon_{Y},\quad\forall x,x^{\prime}\in\mathcal{X}. (13)

Then we have

Varf\displaystyle\mathrm{Var}f 2εX+εY(𝔼[VarXf]+𝔼[VarYf]),f:𝒳×𝒴\displaystyle\leq\frac{2}{\varepsilon_{X}+\varepsilon_{Y}}\left(\mathbb{E}[\mathrm{Var}_{X}f]+\mathbb{E}[\mathrm{Var}_{Y}f]\right),\quad\forall f:\mathcal{X}\times\mathcal{Y}\to\mathbb{R} (14)
andEntf\displaystyle\text{and}\quad\mathrm{Ent}f 4+2log(1/πmin)εX+εY(𝔼[EntXf]+𝔼[EntYf]),f:𝒳×𝒴0\displaystyle\leq\frac{4+2\log(1/\pi_{\min})}{\varepsilon_{X}+\varepsilon_{Y}}\left(\mathbb{E}[\mathrm{Ent}_{X}f]+\mathbb{E}[\mathrm{Ent}_{Y}f]\right),\quad\forall f:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}_{\geq 0} (15)

where πmin=min(x,y)𝒳×𝒴:π(x,y)>0π(x,y)\pi_{\min}=\min_{(x,y)\in\mathcal{X}\times\mathcal{Y}:\,\pi(x,y)>0}\pi(x,y).

Remark 3.5.

We remark that the constant for entropy factorization Eq. 15 is not optimal since log(1/πmin)\log(1/\pi_{\min}) depends logarithmically on the size of state space, |𝒳×𝒴|=|𝒳||𝒴||\mathcal{X}\times\mathcal{Y}|=|\mathcal{X}|\cdot|\mathcal{Y}|. Getting rid of this is an interesting open question.

Multi-variable factorization

Finally, we consider approximate tensorization for multiple variables, which is helpful for establishing Eq. 8 when the size of the separator is bounded.

Let X1,,XnX_{1},\dots,X_{n} be nn random variables where XiX_{i} is fully supported on finite 𝒳i\mathcal{X}_{i} for each ii. The joint distribution of (X1,,Xn)(X_{1},\dots,X_{n}), over the product space 𝒳=i=1n𝒳i\mathcal{X}=\prod_{i=1}^{n}\mathcal{X}_{i} is denoted by π\pi. For two disjoint subsets A,B[n]A,B\subseteq[n] and a partial assignment xAiA𝒳ix_{A}\in\prod_{i\in A}\mathcal{X}_{i} with π(XA=xA)>0\pi(X_{A}=x_{A})>0, let πBxA=π(XB=XA=xA)\pi_{B}^{x_{A}}=\pi(X_{B}=\cdot\mid X_{A}=x_{A}) denote the conditional marginal distribution on BB conditioned on that the variables in AA are assigned values from xAx_{A}. In particular, πB\pi_{B} denotes the marginal on BB. We write πi\pi_{i} and xix_{i} when the underlying set is {i}\{i\} for simplicity.

As before, we write Entixi¯f=EntXixi¯f\mathrm{Ent}_{i}^{x_{\bar{i}}}f=\mathrm{Ent}_{X_{i}}^{x_{\bar{i}}}f as the conditional entropy of ff on XiX_{i} given xi¯=(x1,,xi1,xi+1,,xn)x_{\bar{i}}=(x_{1},\dots,x_{i-1},x_{i+1},\dots,x_{n}) and let 𝔼[Entif]\mathbb{E}[\mathrm{Ent}_{i}f] be its expectation where xi¯x_{\bar{i}} is drawn from μ[n]i\mu_{[n]\setminus i}; the variance versions are defined in the same way.

Lemma 3.6.

Suppose there exists a real ε(0,1]\varepsilon\in(0,1] such that for every Λ[n]\Lambda\subseteq[n] with |Λ|n2|\Lambda|\leq n-2, every xΛiΛ𝒳ix_{\Lambda}\in\prod_{i\in\Lambda}\mathcal{X}_{i} with πΛ(xΛ)>0\pi_{\Lambda}(x_{\Lambda})>0, every i,j[n]Λi,j\in[n]\setminus\Lambda with iji\neq j, and every xi,xi𝒳ix_{i},x^{\prime}_{i}\in\mathcal{X}_{i} with πixΛ(xi)>0\pi_{i}^{x_{\Lambda}}(x_{i})>0 and πixΛ(xi)>0\pi_{i}^{x_{\Lambda}}(x^{\prime}_{i})>0, it holds

dTV(πjxΛ,xi,πjxΛ,xi)1ε.d_{\mathrm{TV}}(\pi_{j}^{x_{\Lambda},x_{i}},\pi_{j}^{x_{\Lambda},x^{\prime}_{i}})\leq 1-\varepsilon.

Then we have

Varf\displaystyle\mathrm{Var}f 1εn1i=1n𝔼[Varif],f:𝒳\displaystyle\leq\frac{1}{\varepsilon^{n-1}}\sum_{i=1}^{n}\mathbb{E}[\mathrm{Var}_{i}f],\quad\forall f:\mathcal{X}\to\mathbb{R} (16)
andEntf\displaystyle\text{and}\quad\mathrm{Ent}f 2+log(1/πmin)εn1i=1n𝔼[Entif],f:𝒳0\displaystyle\leq\frac{2+\log(1/\pi_{\min})}{\varepsilon^{n-1}}\sum_{i=1}^{n}\mathbb{E}[\mathrm{Ent}_{i}f],\quad\forall f:\mathcal{X}\to\mathbb{R}_{\geq 0} (17)

where πmin=minx𝒳:π(x)>0π(x)\pi_{\min}=\min_{x\in\mathcal{X}:\,\pi(x)>0}\pi(x).

Observe that for n=2n=2, Lemma 3.6 recovers Lemma 3.4 for the special case εX=εY=ε\varepsilon_{X}=\varepsilon_{Y}=\varepsilon. The proofs of Lemmas 3.4 and 3.6 can be found in Section 6.2, which are simple applications of the spectral independence approach based on [AL20, ALO20, FGYZ21].

3.3 Example

Here we give a simple example as an application of Proposition 3.1 and tools from Section 3.2. We show polynomial mixing time of Glauber dynamics for sampling qq-colorings on a complete dd-ary tree for all d2d\geq 2 and q3q\geq 3. This was known previously, see [GJK10, LM11, LMP09, TVVY12, SZ17] for even sharper results. However, Proposition 3.1 allows us to establish this fact in a more straightforward manner, avoiding technical complications such as constructing canonical paths or Markov chain decomposition.

Proposition 3.7 ([GJK10], see [LM11, LMP09, TVVY12, SZ17] for sharper results).

Let d2d\geq 2 and q3q\geq 3 be integers. Suppose TT is a complete dd-ary tree of height hh, and denote the number of vertices by n=i=0hdin=\sum_{i=0}^{h}d^{i}. The mixing time of the Glauber dynamics for sampling uniformly random qq-colorings of TT is nO(1+dqlogd)n^{O\big{(}1+\frac{d}{q\log d}\big{)}}.

Proof.

We apply Proposition 3.1. We may assume that q3dq\leq 3d since otherwise rapid mixing follows by standard path coupling arguments [Jer03]. The separator decomposition tree T𝖲𝖣T_{\mathsf{SD}} can be obtain from the original tree TT: every node of T𝖲𝖣T_{\mathsf{SD}} is of the form (Tv,{v})(T_{v},\{v\}) where TvT_{v} is the subtree rooted at vv and the single-vertex set {v}\{v\} is a separator of TvT_{v}. In particular, we have C{v}=1C_{\{v\}}=1 in Eq. 8 for each separator {v}\{v\}. We claim that CTv,{v}=eO(d/q)C_{T_{v},\{v\}}=e^{O(d/q)} in Eq. 7 for each node (Tv,{v})(T_{v},\{v\}). To see this, consider two pinnings where vv receives colors c1c_{1} and c2c_{2}, and we can couple all children of vv with probability (11q1)d=eO(d/q)(1-\frac{1}{q-1})^{d}=e^{-O(d/q)} when neither c1c_{1} nor c2c_{2} is used. Hence, we can couple the whole subtree Tv{v}T_{v}\setminus\{v\} with the same probability, implying

dTV(μTv{v}vc1,μTv{v}vc2)=1eO(d/q).d_{\mathrm{TV}}\left(\mu_{T_{v}\setminus\{v\}}^{v\leftarrow c_{1}},\mu_{T_{v}\setminus\{v\}}^{v\leftarrow c_{2}}\right)=1-e^{-O(d/q)}.

Thus, we deduce from Lemma 3.4 that CTv,{v}=eO(d/q)C_{T_{v},\{v\}}=e^{O(d/q)}. By Proposition 3.1 we obtain that AT of variance holds with multiplier

C=(eO(d/q))h=nO(dqlogd).C=\left(e^{O(d/q)}\right)^{h}=n^{O\left(\frac{d}{q\log d}\right)}.

The mixing time then follows from Lemma 2.7. ∎

4 Rapid Mixing for Graphs of Bounded Treewidth

In this section we consider graphs of bounded treewidth. It is well-known that all bounded-treewidth graphs have a balanced separator decomposition tree with separators of bounded size.

Lemma 4.1 ([RS86, Gru12]).

If GG is a graph of treewidth tt, then there exists a balanced separator decomposition tree T𝖲𝖣T_{\mathsf{SD}} for GG such that for every node (U,S)(U,S) in T𝖲𝖣T_{\mathsf{SD}} it holds |S|t|S|\leq t.

See also [Bod98, Ree03, HW17] for surveys on treewidth.

4.1 Hardcore model

In this subsection we prove Theorem 1.1 for the hardcore model via Proposition 3.1. As a byproduct, we also show eO(n)e^{O(\sqrt{n})} mixing time of the Glauber dynamics on arbitrary planar graphs with no restriction on the maximum degree, see Theorem 4.4.

To apply Proposition 3.1, we need to establish block factorization for every decomposition Eq. 7 and approximate tensorization for all separators Eq. 8, which are given by the following two lemmas.

Lemma 4.2.

Consider the hardcore model on a graph G=(V,E)G=(V,E) with fugacity λ>0\lambda>0. Let SUVS\subseteq U\subseteq V be subsets with |S|t|S|\leq t. For any pinning η\eta on VUV\setminus U, the conditional hardcore Gibbs distribution μUη\mu^{\eta}_{U} satisfies {S,US}\{S,U\setminus S\}-factorization of variance with constant C=2(1+λ)tC=2(1+\lambda)^{t}. In particular, for every function f:𝒳f:\mathcal{X}\to\mathbb{R} it holds

𝔼[VarUf]C(𝔼[VarSf]+𝔼[VarUSf]).\mathbb{E}[\mathrm{Var}_{U}f]\leq C\left(\mathbb{E}[\mathrm{Var}_{S}f]+\mathbb{E}[\mathrm{Var}_{U\setminus S}f]\right). (18)
Proof.

For any pinning τ\tau on USU\setminus S, we have

μSη,τ(σS=0)1(1+λ)|S|1(1+λ)t.\mu^{\eta,\tau}_{S}(\sigma_{S}=\vec{0})\geq\frac{1}{(1+\lambda)^{|S|}}\geq\frac{1}{(1+\lambda)^{t}}.

Hence, for any two pinnings τ,ξ\tau,\xi on USU\setminus S we have dTV(μSη,τ,μSη,ξ)1(1+λ)td_{\mathrm{TV}}(\mu^{\eta,\tau}_{S},\mu^{\eta,\xi}_{S})\leq 1-(1+\lambda)^{-t}. Therefore, by Lemma 3.4 we have that μUη\mu^{\eta}_{U} satisfies {S,US}\{S,U\setminus S\}-factorization of variance with constant C=2(1+λ)tC=2(1+\lambda)^{t}. Taking expectation over η\eta, we also obtain Eq. 18. ∎

Lemma 4.3.

Consider the hardcore model on a graph G=(V,E)G=(V,E) with fugacity λ>0\lambda>0. Let SVS\subseteq V be a subset with |S|t|S|\leq t. For any pinning η\eta on VSV\setminus S, the conditional hardcore Gibbs distribution μSη\mu^{\eta}_{S} satisfies approximate tensorization of variance with constant C=(1+λ)t1C=(1+\lambda)^{t-1}. In particular, for every function f:𝒳f:\mathcal{X}\to\mathbb{R} it holds

𝔼[VarSf]CvS𝔼[Varvf].\mathbb{E}[\mathrm{Var}_{S}f]\leq C\sum_{v\in S}\mathbb{E}[\mathrm{Var}_{v}f]. (19)
Proof.

For any subset ΛS\Lambda\subseteq S with |Λ||S|2|\Lambda|\leq|S|-2 and any pinning τ\tau on Λ\Lambda, let u,vSΛu,v\in S\setminus\Lambda be two distinct vertices and we have

μSΛη,τ(σv=0σu=0)11+λandμSΛη,τ(σv=0σu=1)11+λ.\mu^{\eta,\tau}_{S\setminus\Lambda}(\sigma_{v}=0\mid\sigma_{u}=0)\geq\frac{1}{1+\lambda}\quad\text{and}\quad\mu^{\eta,\tau}_{S\setminus\Lambda}(\sigma_{v}=0\mid\sigma_{u}=1)\geq\frac{1}{1+\lambda}.

Hence, dTV(μvη,τ,u0,μvη,τ,u1)1(1+λ)1d_{\mathrm{TV}}(\mu^{\eta,\tau,u\leftarrow 0}_{v},\mu^{\eta,\tau,u\leftarrow 1}_{v})\leq 1-(1+\lambda)^{-1}. It follows from Lemma 3.6 that μSη\mu^{\eta}_{S} satisfies approximate tensorization of variance with constant

C=(1+λ)|S|1(1+λ)t1.C=(1+\lambda)^{|S|-1}\leq(1+\lambda)^{t-1}.

Taking expectation over η\eta, we also obtain Eq. 19. ∎

We are now ready to prove Theorem 1.1.

Proof of Theorem 1.1.

We deduce the theorem from Proposition 3.1. Since the graph has treewidth tt, there exists a balanced separator decomposition tree T𝖲𝖣T_{\mathsf{SD}} by Lemma 4.1 where all separators have size at most tt. The height of T𝖲𝖣T_{\mathsf{SD}} is 3logn3\log n by Lemma 2.10. Block factorization for decomposition Eq. 7 is shown by Lemma 4.2 with CU,S=2(1+λ)tC_{U,S}=2(1+\lambda)^{t} for each node (U,S)(U,S). Approximate tensorization for separators Eq. 8 is shown by Lemma 4.3 with CS=(1+λ)t1C_{S}=(1+\lambda)^{t-1} for each separator SS. Thus, we conclude from Proposition 3.1 that the hardcore Gibbs distribution μ\mu satisfies approximate tensorization of variance with multiplier

C=(1+λ)t1(2(1+λ)t)3logn=nO(1+tlog(1+λ)).C=(1+\lambda)^{t-1}\cdot\left(2(1+\lambda)^{t}\right)^{3\log n}=n^{O(1+t\log(1+\lambda))}.

The mixing time then follows from Lemma 2.7. ∎

As a byproduct, we also show eO(n)e^{O(\sqrt{n})} mixing of Glauber dynamics for the hardcore model on any planar graph. See [BKMP05, Hei20, EF21] which can obtain similar results.

Theorem 4.4.

Suppose GG is an nn-vertex planar graph. The mixing time of the Glauber dynamics for sampling from the hardcore model on GG with fugacity λ>0\lambda>0 is (1+λ)O(n)(1+\lambda)^{O(\sqrt{n})}.

Proof.

We apply Proposition 3.1. It is well-known that every planar graph has a balanced separator SVS\subseteq V of size O(|V|)O(\sqrt{|V|}), such that each connected component of G[VS]G[V\setminus S] has size at most 2|V|/32|V|/3; see [LT79]. We can further find balanced separators for each component, and construct a balanced separator decomposition tree T𝖲𝖣T_{\mathsf{SD}} recursively. By Lemma 4.2, for every node (U,S)(U,S) we have block factorization for decomposition Eq. 7 with constant

CU,S=2(1+λ)|S|=(1+λ)O(|U|).C_{U,S}=2(1+\lambda)^{|S|}=(1+\lambda)^{O\big{(}\sqrt{|U|}\big{)}}.

By Lemma 4.3, for every separator SS we have approximate tensorization for separators Eq. 8 with constant

CS=(1+λ)|S|1=(1+λ)O(n).C_{S}=(1+\lambda)^{|S|-1}=(1+\lambda)^{O(\sqrt{n})}.

Therefore, we obtain from Proposition 3.1 that AT of variance holds with multiplier

C(1+λ)O(n)i=0(1+λ)O((23)in)=(1+λ)O(n).\displaystyle C\leq(1+\lambda)^{O(\sqrt{n})}\cdot\prod_{i=0}^{\infty}(1+\lambda)^{O\left(\sqrt{\left(\frac{2}{3}\right)^{i}n}\right)}=(1+\lambda)^{O(\sqrt{n})}.

The mixing time then follows from Lemma 2.7. ∎

4.2 List colorings

In this subsection we prove Theorem 1.2 from the introduction for colorings. We consider the more general setting of list colorings, where each vertex vv is associated with a list LvL_{v} of available colors, and every list coloring assigns to each vertex a color from its list such that adjacent vertices receive distinct colors. The Glauber dynamics is ergodic for list colorings if |Lv|degG(v)+2|L_{v}|\geq\deg_{G}(v)+2 for all vv. One handy feature of list colorings is that any pinning η\eta on a subset ΛV\Lambda\subseteq V of vertices induces a new list coloring instance on the induced subgraph G[VΛ]G[V\setminus\Lambda], for which |Lvη|degG[VΛ](v)+2|L^{\eta}_{v}|\geq\deg_{G[V\setminus\Lambda]}(v)+2 still holds for all unpinned vv where LvηL^{\eta}_{v} is the new list of available colors conditioned on η\eta; see, e.g., [GKM15].

Theorem 4.5.

Let G=(V,E)G=(V,E) be a graph of maximum degree Δ3\Delta\geq 3 and treewidth t1t\geq 1. Suppose each vertex vVv\in V is associated with a color list LvL_{v} such that degG(v)+2|Lv|q\deg_{G}(v)+2\leq|L_{v}|\leq q. The mixing time of the Glauber dynamics for sampling uniformly random list colorings of GG is nO(t(Δ+logq))n^{O(t(\Delta+\log q))}.

4.2.1 A generalized version of Proposition 3.1

For list colorings we are not able to directly apply Proposition 3.1 to prove Theorem 4.5. Unlike the hardcore model where a vertex can always be unoccupied under any pinning of its neighbors, the lack of such “universal” color makes it hard to establish block factorization for decomposition Eq. 7 using tools from Section 3.2. In this subsection we present a stronger version of Proposition 3.1 which allows us to establish block factorization more easily for list colorings. Our applications in Section 5 also requires this more general version.

For subsets SUVS\subseteq U\subseteq V and an integer r0r\geq 0, define

𝖡(S,r)={vV:distG(v,S)r}\mathsf{B}(S,r)=\{v\in V:\mathrm{dist}_{G}(v,S)\leq r\}

to be the ball around SS of radius rr in GG, and define

𝖡U(S,r)=U𝖡(S,r)={vU:distG(v,S)r}\mathsf{B}_{U}(S,r)=U\cap\mathsf{B}(S,r)=\{v\in U:\mathrm{dist}_{G}(v,S)\leq r\}

to be the portion of the ball contained in UU. In Proposition 4.6 below, we replace SS by 𝖡U(S,r)\mathsf{B}_{U}(S,r) in all suitable places in Proposition 3.1, which makes it easier for us to establish block factorization for decomposition Eq. 20 for hard-constraint models like list colorings.

Proposition 4.6.

Let (G,Φ,Ψ)(G,\Phi,\Psi) be a spin system defined on a graph G=(V,E)G=(V,E) with associated Gibbs distribution μ\mu. Let r0r\geq 0 be an integer. Suppose that T𝖲𝖣T_{\mathsf{SD}} is a separator decomposition tree of GG satisfying

  1. 1.

    (Block Factorization for Decomposition) For every node (U,S)(U,S), there exists CU,S1C_{U,S}\geq 1, such that for any function f:𝒳0f:\mathcal{X}\to\mathbb{R}_{\geq 0} we have

    𝔼[EntUf]CU,S(𝔼[Ent𝖡U(S,r)f]+𝔼[EntUSf]).\mathbb{E}[\mathrm{Ent}_{U}f]\leq C_{U,S}\left(\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U}(S,r)}f\right]+\mathbb{E}\left[\mathrm{Ent}_{U\setminus S}f\right]\right). (20)

    For all leaves (U,U)(U,U) we take CU,U=1C_{U,U}=1.

  2. 2.

    (Approximate Tensorization for Separators) For every node (U,S)(U,S), there exists CS1C_{S}\geq 1, such that for any function f:𝒳0f:\mathcal{X}\to\mathbb{R}_{\geq 0} we have

    𝔼[Ent𝖡U(S,r)f]CSv𝖡U(S,r)𝔼[Entvf].\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U}(S,r)}f\right]\leq C_{S}\sum_{v\in\mathsf{B}_{U}(S,r)}\mathbb{E}[\mathrm{Ent}_{v}f]. (21)
  3. 3.

    (Bounded Coverage) There exists A1A\geq 1 such that for any vertex vVv\in V,

    |{(U,S):v𝖡U(S,r)}|A.|\{(U,S):v\in\mathsf{B}_{U}(S,r)\}|\leq A. (22)

Then the Gibbs distribution μ\mu satisfies approximate tensorization of entropy with multiplier CC given by

C=Amax(U,S){CS(U,S)CU,S},C=A\cdot\max_{(U,S)}\left\{C_{S}\prod_{(U^{\prime},S^{\prime})}C_{U^{\prime},S^{\prime}}\right\},

where the maximum is taken over all nodes of T𝖲𝖣T_{\mathsf{SD}}, and the product is over all nodes (U,S)(U^{\prime},S^{\prime}) in the unique path from the root (V,SV)(V,S_{V}) to (U,S)(U,S). Namely, for every function f:𝒳0f:\mathcal{X}\to\mathbb{R}_{\geq 0} we have

Entf\displaystyle\mathrm{Ent}f CvV𝔼[Entvf].\displaystyle\leq C\sum_{v\in V}\mathbb{E}[\mathrm{Ent}_{v}f].
Remark 4.7.

The variance version of Proposition 4.6 is also true and the proof is exactly the same.

Remark 4.8.

Observe that if r=0r=0 then 𝖡U(S,r)=S\mathsf{B}_{U}(S,r)=S and we have A=1A=1 in Eq. 22, so we recover exactly Proposition 3.1.

Proof of Proposition 4.6.

The proof is the same as for Proposition 3.1 by decomposing the entropy level by level on the separator decomposition tree T𝖲𝖣T_{\mathsf{SD}}. We similarly obtain

EntfvVCv𝔼[Entvf],\mathrm{Ent}f\leq\sum_{v\in V}C_{v}\mathbb{E}\left[\mathrm{Ent}_{v}f\right],

where for each vv,

Cv=(U,S):v𝖡U(S,r)CS(U,S)CU,SCC_{v}=\sum_{(U,S):\,v\in\mathsf{B}_{U}(S,r)}C_{S}\prod_{(U^{\prime},S^{\prime})}C_{U^{\prime},S^{\prime}}\leq C

where the product runs through all (U,S)(U^{\prime},S^{\prime}) on the unique path from (V,SV)(V,S_{V}) to (U,S)(U,S). This establishes the proposition. ∎

The following lemma is helpful for establishing bounded coverage Eq. 22 when applying Proposition 4.6.

Lemma 4.9.

Under the assumptions of Proposition 4.6:

  • If GG has maximum degree Δ3\Delta\geq 3, then AΔi=0r1(Δ1)i=O(Δr)A\leq\Delta\sum_{i=0}^{r-1}(\Delta-1)^{i}=O(\Delta^{r});

  • If n3n\geq 3 and T𝖲𝖣T_{\mathsf{SD}} is a balanced separator decomposition tree, then A4lognA\leq 4\log n.

Proof.

If GG has maximum degree Δ3\Delta\geq 3, we have for any vVv\in V that

|{(U,S):v𝖡U(S,r)}|\displaystyle|\{(U,S):v\in\mathsf{B}_{U}(S,r)\}| |{(U,S):distG(v,S)r}|\displaystyle\leq|\{(U,S):\mathrm{dist}_{G}(v,S)\leq r\}|
|{uV:distG(v,u)r}|\displaystyle\leq|\{u\in V:\mathrm{dist}_{G}(v,u)\leq r\}|
Δi=0r1(Δ1)i=O(Δr),\displaystyle\leq\Delta\sum_{i=0}^{r-1}(\Delta-1)^{i}=O(\Delta^{r}),

where the second inequality is because all separators form a partition of VV (see Remark 2.9).

If T𝖲𝖣T_{\mathsf{SD}} is balanced, then the height hh of T𝖲𝖣T_{\mathsf{SD}} is at most 3logn3\log n by Lemma 2.10. Hence, for any vVv\in V we have

|{(U,S):v𝖡U(S,r)}||{(U,S):vU}|h+14logn,|\{(U,S):v\in\mathsf{B}_{U}(S,r)\}|\leq|\{(U,S):v\in U\}|\leq h+1\leq 4\log n,

as claimed. ∎

4.2.2 Block factorization via marginal distributions

Note that to apply Proposition 4.6, we need to establish the block factorization for decomposition Eq. 20 where the two blocks 𝖡U(S,r)\mathsf{B}_{U}(S,r) and USU\setminus S overlap with each other. In particular, we can no longer apply Lemmas 3.4 and 3.3 directly since fixing the spin assignment in one block greatly changes the conditional distribution on the other block because of the overlap. Instead, we show block factorization for the marginal distribution μS(U𝖡U(S,r))\mu_{S\cup(U\setminus\mathsf{B}_{U}(S,r))} for the two blocks SS and U𝖡U(S,r)U\setminus\mathsf{B}_{U}(S,r), where we essentially exclude the overlap part. It turns out these two notions of block factorization are equivalent to each other, and for the latter we are able to apply tools from Section 3.2 since the blocks are now disjoint.

We show this equivalence in a general setting. Let X,Y,ZX,Y,Z be three random variables taking values from finite state spaces 𝒳,𝒴,𝒵\mathcal{X},\mathcal{Y},\mathcal{Z} respectively. Their joint distribution is denoted by π=πXYZ\pi=\pi_{XYZ}. Denote the marginal distribution for (X,Y)(X,Y) by πXY\pi_{XY} and similarly for other choices of subsets of variables. We establish block factorization for π\pi into two blocks {X,Z}\{X,Z\} and {Y,Z}\{Y,Z\} from approximate tensorization for the marginal distribution πXY\pi_{XY}. More precisely, we say π\pi satisfies {{X,Z},{Y,Z}}\{\{X,Z\},\{Y,Z\}\}-factorization of entropy with constant CC if for every function f:𝒳×𝒴×𝒵0f:\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}\to\mathbb{R}_{\geq 0}, it holds

EntfC(𝔼[EntXZf]+𝔼[EntYZf]).\mathrm{Ent}f\leq C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}f\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}f\right]\right). (23)

We say πXY\pi_{XY} satisfies {{X},{Y}}\{\{X\},\{Y\}\}-factorization (i.e., approximate tensorization) of entropy with constant CC if for every function g:𝒳×𝒴0g:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}_{\geq 0}, it holds

EntXYgC(𝔼XY[EntXg]+𝔼XY[EntYg]),\mathrm{Ent}_{XY}g\leq C\left(\mathbb{E}_{XY}\left[\mathrm{Ent}_{X}g\right]+\mathbb{E}_{XY}\left[\mathrm{Ent}_{Y}g\right]\right), (24)

where the reference distribution is the marginal distribution πXY\pi_{XY} over the state space 𝒳×𝒴\mathcal{X}\times\mathcal{Y} and in particular EntXg\mathrm{Ent}_{X}g will be viewed as a function of YY (instead of Y,ZY,Z). The variance versions of Eqs. 23 and 24 are defined in the same way with Var()\mathrm{Var}(\cdot) replacing Ent()\mathrm{Ent}(\cdot).

We show that these two notions of block factorization of entropy are equivalent to each other.

Lemma 4.10.

The joint distribution π\pi satisfies {{X,Z},{Y,Z}}\{\{X,Z\},\{Y,Z\}\}-factorization of entropy (resp., variance) with constant CC if and only if the marginal distribution πXY\pi_{XY} satisfies {{X},{Y}}\{\{X\},\{Y\}\}-factorization (i.e., approximate tensorization) of entropy (resp., variance) with constant CC.

The proof of Lemma 4.10 can be found in Section 6.3.

4.2.3 Proof of Theorem 4.5

We apply Proposition 4.6 with r=1r=1, where we take A=Δ+1A=\Delta+1 in Eq. 22 by Lemma 4.9. We establish block factorization for decomposition Eq. 20 and approximate tensorization for separators Eq. 21 in the following two lemmas.

Lemma 4.11.

Consider list colorings on a graph G=(V,E)G=(V,E) of maximum degree Δ3\Delta\geq 3 where each vertex vVv\in V is associated with a color list LvL_{v} such that degG(v)+2|Lv|q\deg_{G}(v)+2\leq|L_{v}|\leq q. Let SUVS\subseteq U\subseteq V be subsets with |S|t|S|\leq t. For any pinning η\eta on VUV\setminus U, the uniform distribution μUη\mu^{\eta}_{U} over list colorings conditioned on η\eta satisfies {𝖡U(S,1),US}\{\mathsf{B}_{U}(S,1),U\setminus S\}-factorization of variance with constant C=2(2Δq)tC=2(2^{\Delta}q)^{t}. In particular, for every function f:𝒳f:\mathcal{X}\to\mathbb{R} it holds

𝔼[VarUf]C(𝔼[Var𝖡U(S,1)f]+𝔼[VarUSf]).\mathbb{E}[\mathrm{Var}_{U}f]\leq C\left(\mathbb{E}[\mathrm{Var}_{\mathsf{B}_{U}(S,1)}f]+\mathbb{E}[\mathrm{Var}_{U\setminus S}f]\right). (25)
Proof.

Let T=U𝖡U(S,1)T=U\setminus\mathsf{B}_{U}(S,1). We first prove {S,T}\{S,T\}-factorization for the marginal distribution μSTη\mu^{\eta}_{S\cup T} on STS\cup T via Lemma 3.4. Let τ\tau is an arbitrary pinning on TT which is feasible under η\eta. For any partial list coloring σ\sigma on SS that is feasible under η\eta, we claim that

μSη,τ(σ)1(2Δq)|S|1(2Δq)t.\mu^{\eta,\tau}_{S}(\sigma)\geq\frac{1}{(2^{\Delta}q)^{|S|}}\geq\frac{1}{(2^{\Delta}q)^{t}}. (26)

Note that σ\sigma must also be feasible under τ\tau since SS and TT are not adjacent, and we can extend στη\sigma\cup\tau\cup\eta to a full list coloring by the assumption |Lv|degG(v)+2|L_{v}|\geq\deg_{G}(v)+2 for all vv. By the chain rule, it suffices to show that for any ΛS\Lambda\subseteq S and vSΛv\in S\setminus\Lambda we have

μvη,τ,σΛ(σv)12Δq.\mu^{\eta,\tau,\sigma_{\Lambda}}_{v}(\sigma_{v})\geq\frac{1}{2^{\Delta}q}. (27)

Consider a resampling procedure: starting from a random full list coloring generated from μη,τ,σΛ\mu^{\eta,\tau,\sigma_{\Lambda}}, we first resample all (free) neighbors of vv and then vv. When resampling neighbors, the probability of avoiding color σv\sigma_{v} is at least 1/2Δ1/2^{\Delta} since there are at least two available colors for each neighbor by assumption, and when resampling vv the probability of getting σv\sigma_{v} is at least 1/|Lv|1/q1/|L_{v}|\geq 1/q, thus establishing Eq. 27 and consequently Eq. 26.

We deduce from Eq. 26 that for any two pinnings τ,ξ\tau,\xi on TT, it holds

dTV(μSη,τ,μSη,ξ)11(2Δq)t.d_{\mathrm{TV}}(\mu^{\eta,\tau}_{S},\mu^{\eta,\xi}_{S})\leq 1-\frac{1}{(2^{\Delta}q)^{t}}.

Therefore, by Lemma 3.4 we have that μSTη\mu^{\eta}_{S\cup T} satisfies {S,T}\{S,T\}-factorization of variance with constant C=2(2Δq)tC=2(2^{\Delta}q)^{t}. Finally, by Lemma 4.10 we conclude that μUη\mu^{\eta}_{U} satisfies {𝖡U(S,1),US}\{\mathsf{B}_{U}(S,1),U\setminus S\}-factorization of variance with the same constant, and Eq. 25 follows by taking expectation over η\eta. ∎

Lemma 4.12.

Consider list colorings on a graph G=(V,E)G=(V,E) of maximum degree Δ3\Delta\geq 3 where each vertex vVv\in V is associated with a color list LvL_{v} such that degG(v)+2|Lv|q\deg_{G}(v)+2\leq|L_{v}|\leq q. Let BVB\subseteq V be a subset with |B|k|B|\leq k. For any pinning η\eta on VBV\setminus B, the uniform distribution μBη\mu^{\eta}_{B} over list colorings conditioned on η\eta satisfies approximate tensorization of variance with constant C=(2Δq)k1C=(2^{\Delta}q)^{k-1}. In particular, for every function f:𝒳f:\mathcal{X}\to\mathbb{R} it holds

𝔼[VarBf]CvB𝔼[Varvf].\mathbb{E}[\mathrm{Var}_{B}f]\leq C\sum_{v\in B}\mathbb{E}[\mathrm{Var}_{v}f]. (28)
Proof.

Fix a pinning τ\tau on some subset ΛB\Lambda\subseteq B and consider two distinct vertices u,vBΛu,v\in B\setminus\Lambda. For any two colors c1,c2c_{1},c_{2} that are available to uu, there exists a color cc that is always available to vv no matter σu=c1\sigma_{u}=c_{1} or c2c_{2}. This is because: either all neighbors of vv are pinned and we can choose any color cc available to vv; or vv has at least one free neighbor and thus at least three available colors, and we choose one of them which is not c1c_{1} or c2c_{2}. For this color cc, we have μvη,τ,ucj(c)(2Δq)1\mu^{\eta,\tau,u\leftarrow c_{j}}_{v}(c)\geq(2^{\Delta}q)^{-1} for j=1,2j=1,2, which was already shown in the proof of Lemma 4.11 (see Eq. 27). This implies that

dTV(μvη,τ,uc1,μvη,τ,uc2)1(2Δq)1.d_{\mathrm{TV}}\left(\mu^{\eta,\tau,u\leftarrow c_{1}}_{v},\mu^{\eta,\tau,u\leftarrow c_{2}}_{v}\right)\leq 1-(2^{\Delta}q)^{-1}.

The lemma then follows from an application of Lemma 3.6, and taking expectation over η\eta we obtain Eq. 28 as claimed. ∎

We are now ready to prove Theorem 4.5 and Theorem 1.2 from the introduction.

Proof of Theorem 4.5.

We deduce the theorem from Proposition 4.6. Since the graph has bounded treewidth, there exists a balanced separator decomposition tree T𝖲𝖣T_{\mathsf{SD}} by Lemma 4.1 such that each separator has size at most tt. The height of T𝖲𝖣T_{\mathsf{SD}} is at most 3logn3\log n by Lemma 2.10. We pick r=1r=1 and hence we can take A=Δ+1A=\Delta+1 in Eq. 22 by Lemma 4.9. Block factorization for decomposition Eq. 20 is shown by Lemma 4.11 with CU,S=2(2Δq)tC_{U,S}=2(2^{\Delta}q)^{t} for each node (U,S)(U,S). Approximate tensorization for separators Eq. 21 follows from Lemma 4.12 with CS=(2Δq)(Δ+1)t1C_{S}=(2^{\Delta}q)^{(\Delta+1)t-1} for each separator SS, noting that |𝖡U(S,1)|(Δ+1)t|\mathsf{B}_{U}(S,1)|\leq(\Delta+1)t. Thus, we conclude from Proposition 4.6 that the uniform distribution μ\mu of list colorings satisfies approximate tensorization of variance with multiplier

C=(Δ+1)(2Δq)(Δ+1)t1(2(2Δq)t)3logn=nO(t(Δ+logq)).C=(\Delta+1)\cdot(2^{\Delta}q)^{(\Delta+1)t-1}\cdot\left(2(2^{\Delta}q)^{t}\right)^{3\log n}=n^{O(t(\Delta+\log q))}.

The mixing time then follows from Lemma 2.7. ∎

Proof of Theorem 1.2.

If q>2Δq>2\Delta then optimal mixing of Glauber dynamics follows from standard path coupling arguments [Jer03]. Otherwise, the theorem follows from Theorem 4.5. ∎

5 Rapid Mixing via SSM for Graphs of Bounded Local Treewidth

In this section we prove Theorem 1.3 from the introduction. We consider families of graphs of bounded local treewidth, which include planar graphs as a special case. We establish rapid mixing of Glauber dynamics under SSM for all such families.

Graphs of bounded local treewidth are defined as follows.

Definition 5.1 (Bounded Local Treewidth).

Let G=(V,E)G=(V,E) be a graph and a,d>0a,d>0 be reals. We say GG has polynomially bounded local treewidth if it satisfies the following diameter-treewidth property: for any subgraph HH of GG, it holds

tw(H)a(diam(H))d.\mathrm{tw}(H)\leq a\cdot(\mathrm{diam}(H))^{d}.

Examples of families of graphs that have bounded local treewidth include:

  • Graphs of bounded treewidth;

  • Planar graphs, see [Epp99, DH04, YZ13];

  • Graphs of bounded growth, see Section 5.2.

The strong spatial mixing property characterizes the decay of correlations in a quantitative way. Roughly speaking, it says that in a spin system the correlation between the spin on a vertex vv and the configuration on a subset WW decays exponentially fast with their graph distance, and such decay holds uniformly under any pinning on any subset of vertices.

Definition 5.2 (Strong Spatial Mixing (SSM)).

Consider a spin system on a graph G=(V,E)G=(V,E) with Gibbs distribution μ\mu. Let C>0C>0 and δ(0,1)\delta\in(0,1) be reals. We say the spin system satisfies the strong spatial mixing property with exponential decay rate if for all pinning η\eta on ΛV\Lambda\subseteq V, for any vertex vVΛv\in V\setminus\Lambda and feasible spin c𝒬vc\in\mathcal{Q}_{v}, and for any subset WVΛ{v}W\subseteq V\setminus\Lambda\setminus\{v\} and two feasible configurations τ,ξ\tau,\xi on WW, it holds

|μvη(cτ)μvη(cξ)1|C(1δ)distG(v,W).\left|\frac{\mu^{\eta}_{v}(c\mid\tau)}{\mu^{\eta}_{v}(c\mid\xi)}-1\right|\leq C(1-\delta)^{\mathrm{dist}_{G}(v,W)}.

Our main result in this section is stated as follows. We state it only for the hardcore model but it extends naturally to other spin systems as long as Glauber dynamics is ergodic under any pinning.

Theorem 5.3.

Let G=(V,E)G=(V,E) be an nn-vertex graph of maximum degree Δ3\Delta\geq 3. Suppose that GG has bounded local treewidth with constant parameters a,d>0a,d>0, and suppose that the hardcore model on GG with fugacity λ>0\lambda>0 satisfies SSM with constant parameters C,δ>0C,\delta>0. Then the mixing time of the Glauber dynamics for the hardcore Gibbs distribution on GG is O(nlog4n)O(n\log^{4}n).

5.1 Proof of Theorem 5.3

For graphs of bounded local treewidth, if the diameter is mildly bounded (say poly-logarithmically), then the treewidth is also bounded. However, in general the diameter of GG can be arbitrarily large. We need the following low-diameter decomposition of graphs which allows us to focus on subgraphs of small diameters and thus of small treewidth. We remark that Lemma 5.4 holds for arbitrary graphs with no restriction on the local treewidth or maximum degree.

Lemma 5.4.

Let G=(V,E)G=(V,E) be an nn-vertex graph where n10n\geq 10. For any integer r+r\in\mathbb{N}^{+}, there exists a partition V=i=1mViV=\bigcup_{i=1}^{m}V_{i} of the vertex set satisfying the following conditions:

  1. 1.

    For each i[m]i\in[m], we have diam(G[𝖡(Vi,r)])6rlogn+2r\mathrm{diam}(G[\mathsf{B}(V_{i},r)])\leq 6r\log n+2r;

  2. 2.

    For each vertex vVv\in V, we have |{i[m]:v𝖡(Vi,r)}|2logn\left|\left\{i\in[m]:v\in\mathsf{B}(V_{i},r)\right\}\right|\leq 2\log n.

The proof of Lemma 5.4 is postponed to Section 5.3, which is based on a classical low-diameter decomposition result of Linial and Saks [LS93].

To obtain block factorization of entropy from Lemma 5.4, we need the strong spatial mixing property, which is given by the following lemma.

Lemma 5.5.

Suppose SSM holds with constant parameters C>0C>0 and δ(0,1)\delta\in(0,1). There exists a constant ρ>0\rho>0 such that for any subsets SUVS\subseteq U\subseteq V and any γ10\gamma\geq 10, if rρ(log|S|+logγ)r\geq\rho(\log|S|+\log\gamma) then it holds for every function f:𝒳0f:\mathcal{X}\to\mathbb{R}_{\geq 0} that,

𝔼[EntUf]e1/γ(𝔼[Ent𝖡U(S,r)f]+𝔼[EntUSf]).\mathbb{E}[\mathrm{Ent}_{U}f]\leq e^{1/\gamma}\left(\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U}(S,r)}f\right]+\mathbb{E}\left[\mathrm{Ent}_{U\setminus S}f\right]\right). (29)
Proof.

First observe that it suffices to establish the inequality under an arbitrary pinning η\eta on VUV\setminus U and then Eq. 29 follows by taking expectation over η\eta. By Lemma 4.10, it then suffices to prove block factorization of entropy for the marginal distribution μSTη\mu^{\eta}_{S\cup T} for the two blocks SS and T=U𝖡U(S,r)T=U\setminus\mathsf{B}_{U}(S,r). Suppose S={v1,,v}S=\{v_{1},\dots,v_{\ell}\} where =|S|\ell=|S| and let σ\sigma be any feasible configuration on SS. For each ii define Si={v1,,vi}S_{i}=\{v_{1},\dots,v_{i}\} and let σSi\sigma_{S_{i}} be the configuration restricted on SiS_{i}. By SSM we have that for any two feasible configurations τ,ξ\tau,\xi on TT,

μSη(στ)μSη(σξ)=i=1μviη,σSi1(σviτ)μviη,σSi1(σviξ)(1+Ceδr)(1+14γ)1+12γ,\frac{\mu^{\eta}_{S}(\sigma\mid\tau)}{\mu^{\eta}_{S}(\sigma\mid\xi)}=\prod_{i=1}^{\ell}\frac{\mu^{\eta,\sigma_{S_{i-1}}}_{v_{i}}(\sigma_{v_{i}}\mid\tau)}{\mu^{\eta,\sigma_{S_{i-1}}}_{v_{i}}(\sigma_{v_{i}}\mid\xi)}\leq\left(1+Ce^{-\delta r}\right)^{\ell}\leq\left(1+\frac{1}{4\gamma\ell}\right)^{\ell}\leq 1+\frac{1}{2\gamma},

where we pick r=ρ(log+logγ)r=\rho(\log\ell+\log\gamma) for large enough constant ρ>0\rho>0 and we assume γ10\gamma\geq 10 so that the last two inequalities hold. Similarly, we also have

μSη(στ)μSη(σξ)112γ.\frac{\mu^{\eta}_{S}(\sigma\mid\tau)}{\mu^{\eta}_{S}(\sigma\mid\xi)}\geq 1-\frac{1}{2\gamma}.

Then, we deduce from Lemma 3.3 that the marginal distribution μSTη\mu^{\eta}_{S\cup T} satisfies {S,T}\{S,T\}-factorization of entropy with constant C1+1/γe1/γC\leq 1+1/\gamma\leq e^{1/\gamma} (again we assume γ10\gamma\geq 10 so that the first inequality holds). Eq. 29 then follows from Lemma 4.10 and averaging over η\eta. ∎

We now present the proof of Theorem 5.3.

Proof of Theorem 5.3.

By Lemma 5.5, there exists a constant ρ>0\rho>0 such that Eq. 29 holds for all subsets SUVS\subseteq U\subseteq V and for γ=n\gamma=n whenever rρlognr\geq\rho\log n. We apply Lemma 5.4 to GG for r=ρlognr=\left\lceil\rho\log n\right\rceil, and suppose the resulting partition is V=i=1mViV=\bigcup_{i=1}^{m}V_{i}. For each i[m]i\in[m] let Ui=j=imVjU_{i}=\bigcup_{j=i}^{m}V_{j}, and so U1=VU_{1}=V, Um=VmU_{m}=V_{m} and Ui+1=UiViU_{i+1}=U_{i}\setminus V_{i}. Then we deduce from Eq. 29 that for every function f:𝒳0f:\mathcal{X}\to\mathbb{R}_{\geq 0},

Entf\displaystyle\mathrm{Ent}f e1/n(𝔼[Ent𝖡U1(V1,r)f]+𝔼[EntU2f])\displaystyle\leq e^{1/n}\left(\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U_{1}}(V_{1},r)}f\right]+\mathbb{E}\left[\mathrm{Ent}_{U_{2}}f\right]\right)
e2/n(𝔼[Ent𝖡U1(V1,r)f]+𝔼[Ent𝖡U2(V2,r)f]+𝔼[EntU3f])\displaystyle\leq e^{2/n}\left(\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U_{1}}(V_{1},r)}f\right]+\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U_{2}}(V_{2},r)}f\right]+\mathbb{E}\left[\mathrm{Ent}_{U_{3}}f\right]\right)
\displaystyle\hskip 5.0pt\vdots
em/ni=1m𝔼[Ent𝖡Ui(Vi,r)f]\displaystyle\leq e^{m/n}\sum_{i=1}^{m}\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U_{i}}(V_{i},r)}f\right]
3i=1m𝔼[Ent𝖡(Vi,r)f],\displaystyle\leq 3\sum_{i=1}^{m}\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}(V_{i},r)}f\right], (30)

where the last inequality is due to mnm\leq n and that 𝔼[EntBf]𝔼[EntAf]\mathbb{E}[\mathrm{Ent}_{B}f]\leq\mathbb{E}[\mathrm{Ent}_{A}f] for any BAVB\subseteq A\subseteq V (Lemma 2.2). The good news are that each ball 𝖡(Vi,r)\mathsf{B}(V_{i},r) has O(log2n)O(\log^{2}n) diameter and thus O(log2dn)O(\log^{2d}n) treewidth, and every vertex is contained in O(logn)O(\log n) many balls; both properties are guaranteed by Lemma 5.4. Thus, to prove approximate tensorization of entropy for μ\mu, it suffices to prove it for each ball 𝖡(Vi,r)\mathsf{B}(V_{i},r).

We apply Proposition 4.6 to show AT for each Bi=𝖡(Vi,r)B_{i}=\mathsf{B}(V_{i},r) (to be more precise, by Proposition 4.6 we get AT uniformly under any pinning η\eta outside BiB_{i} and then we take expectation over η\eta). The balanced separator decomposition tree is given by Lemma 4.1. Observe that the size of each separator is O(log2dn)O(\log^{2d}n) since the treewidth of BiB_{i} is O(log2dn)O(\log^{2d}n), and the height hh of the decomposition tree satisfies h=O(log|Bi|)=O(logn)h=O(\log|B_{i}|)=O(\log n) by Lemma 2.10 since all separators are balanced. We take r=cloglognr=\left\lceil c\log\log n\right\rceil in Proposition 4.6 for some sufficiently large constant c>0c>0, such that block factorization for decomposition Eq. 20 holds with CU,S=e1/hC_{U,S}=e^{1/h} for each node (U,S)(U,S); this again follows from Lemma 5.5 where we have |S|=O(log2dn)|S|=O(\log^{2d}n) and γ=h=O(logn)\gamma=h=O(\log n), and Eq. 29 holds whenever rcloglognr\geq c\log\log n. Also A4lognA\leq 4\log n in Eq. 22 by Lemma 4.9. Therefore, we conclude from Proposition 4.6 that every Bi=𝖡(Vi,r)B_{i}=\mathsf{B}(V_{i},r) satisfies approximate tensorization of entropy with multiplier

C(𝖡(Vi,r))=4logn(e1/h)hmaxSCS12lognmaxSCS,C(\mathsf{B}(V_{i},r))=4\log n\cdot\big{(}e^{1/h}\big{)}^{h}\cdot\max_{S}C_{S}\leq 12\log n\cdot\max_{S}C_{S}, (31)

where CSC_{S} is the AT multiplier in Eq. 21 for the ball 𝖡U(S,r)\mathsf{B}_{U}(S,r) and we take maximum over all separators.

Observe that, for each node (U,S)(U,S) in the decomposition tree we have

|BU(S,r)|=O(|S|Δr)=O(log2dnΔcloglogn)logtn,|B_{U}(S,r)|=O\left(|S|\cdot\Delta^{r}\right)=O\left(\log^{2d}n\cdot\Delta^{c\log\log n}\right)\leq\log^{t}n,

for some constant t>0t>0 when nn is sufficiently large. Define φ(k)\varphi(k) to be the maximum of optimal AT multipliers over all subsets of vertices of size at most kk; namely, for all WVW\subseteq V with |W|k|W|\leq k, it holds for every function f:𝒳0f:\mathcal{X}\to\mathbb{R}_{\geq 0} that

𝔼[EntWf]φ(k)vW𝔼[Entvf].\mathbb{E}[\mathrm{Ent}_{W}f]\leq\varphi(k)\sum_{v\in W}\mathbb{E}[\mathrm{Ent}_{v}f].

Note that φ(k)\varphi(k) is monotone increasing. Thus, we obtain from Eq. 31 that

C(𝖡(Vi,r))12lognφ(logtn).C(\mathsf{B}(V_{i},r))\leq 12\log n\cdot\varphi(\log^{t}n). (32)

Combining Eqs. 30 and 32 and Lemma 5.4, we obtain

Entf\displaystyle\mathrm{Ent}f 3i=1m𝔼[Ent𝖡(Vi,r)f]\displaystyle\leq 3\sum_{i=1}^{m}\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}(V_{i},r)}f\right]
312lognφ(logtn)i=1mv𝖡(Vi,r)𝔼[Entvf]\displaystyle\leq 3\cdot 12\log n\cdot\varphi(\log^{t}n)\sum_{i=1}^{m}\sum_{v\in\mathsf{B}(V_{i},r)}\mathbb{E}\left[\mathrm{Ent}_{v}f\right]
100log2nφ(logtn)vV𝔼[Entvf].\displaystyle\leq 100\log^{2}n\cdot\varphi(\log^{t}n)\sum_{v\in V}\mathbb{E}\left[\mathrm{Ent}_{v}f\right].

More generally, for any subset WVW\subseteq V of size k0|W|kk_{0}\leq|W|\leq k where k0>0k_{0}>0 is some fixed constant, we have for every function f:𝒳0f:\mathcal{X}\to\mathbb{R}_{\geq 0} that

𝔼[EntWf]100log2kφ(logtk)vW𝔼[Entvf].\mathbb{E}[\mathrm{Ent}_{W}f]\leq 100\log^{2}k\cdot\varphi(\log^{t}k)\sum_{v\in W}\mathbb{E}\left[\mathrm{Ent}_{v}f\right].

This is shown by the same arguments applied to μWη\mu^{\eta}_{W} under an arbitrary pinning η\eta outside WW and then taking expectation. Hence, we have established the following recursive bound: for all kk0k\geq k_{0},

φ(k)max{100log2kφ(logtk),φ(k0)}.\varphi(k)\leq\max\left\{100\log^{2}k\cdot\varphi(\log^{t}k),\,\varphi(k_{0})\right\}. (33)

We now solve Eq. 33. By choosing k0k_{0} large enough, we may assume that for all kk0k\geq k_{0},

logtk<kand100t3(loglogk)3logk.\log^{t}k<k\quad\text{and}\quad 100t^{3}(\log\log k)^{3}\leq\log k.

We prove by induction that

φ(k)φ(k0)log3k.\varphi(k)\leq\varphi(k_{0})\cdot\log^{3}k. (34)

Eq. 34 is trivial for k<k0k<k_{0}. Suppose Eq. 34 is true for all k<kk^{\prime}<k. Then we have

100log2kφ(logtk)100log2kφ(k0)t3(loglogk)3φ(k0)log3k.100\log^{2}k\cdot\varphi(\log^{t}k)\leq 100\log^{2}k\cdot\varphi(k_{0})\cdot t^{3}(\log\log k)^{3}\leq\varphi(k_{0})\cdot\log^{3}k.

Thus, we deduce from the recursive bound Eq. 33 that

φ(k)max{φ(k0)log3k,φ(k0)}=φ(k0)log3k,\varphi(k)\leq\max\left\{\varphi(k_{0})\cdot\log^{3}k,\,\varphi(k_{0})\right\}=\varphi(k_{0})\cdot\log^{3}k,

establishing Eq. 34.

Therefore, we have φ(n)=O(log3n)\varphi(n)=O(\log^{3}n). In particular, the hardcore Gibbs distribution on GG satisfies AT with multiplier O(log3n)O(\log^{3}n), and the mixing time follows from Lemma 2.7. ∎

5.2 Rapid mixing via SSM for graphs of bounded growth

In this subsection we consider graphs with polynomially bounded neighborhood growth.

Definition 5.6 (Bounded Growth).

Let G=(V,E)G=(V,E) be a graph and a,d>0a,d>0 be reals. We say GG has polynomially bounded growth if for any vertex vv and any integer r1r\geq 1 it holds

|𝖡(v,r)|ard.|\mathsf{B}(v,r)|\leq a\cdot r^{d}.

Observe that any family of graphs of bounded growth also have bounded maximum degree and bounded local treewidth. To see the latter, for any nontrivial subgraph HH of a graph GG of bounded growth, let vv be any vertex of HH and r=diam(H)1r=\mathrm{diam}(H)\geq 1, and we have

tw(H)|H||𝖡(v,r)|ard,\displaystyle\mathrm{tw}(H)\leq|H|\leq|\mathsf{B}(v,r)|\leq a\cdot r^{d}, (35)

showing that GG has bounded local tree width. Thus, we deduce from Theorem 5.3 that for graphs of bounded growth, SSM implies O(nlog4n)O(n\log^{4}n) mixing of Glauber dynamics. We can actually improve this mixing time bound slightly.

Theorem 5.7.

Let G=(V,E)G=(V,E) be an nn-vertex graph. Suppose that GG has bounded growth with constant parameters a,d>0a,d>0, and suppose that the hardcore model on GG with fugacity λ>0\lambda>0 satisfies SSM with constant parameters C,δ>0C,\delta>0. Then the mixing time of the Glauber dynamics for the hardcore Gibbs distribution on GG is O(nlog3n)O(n\log^{3}n).

Proof.

Following the proof of Theorem 5.3, we have that for every function f:𝒳0f:\mathcal{X}\to\mathbb{R}_{\geq 0},

Entf3i=1m𝔼[Ent𝖡(Vi,r)f],\mathrm{Ent}f\leq 3\sum_{i=1}^{m}\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}(V_{i},r)}f\right],

which is Eq. 30. For bounded-growth graphs, we have similarly as Eq. 35 that

|𝖡(Vi,r)|a(diam(G[𝖡(Vi,r)]))dtlog2dn,|\mathsf{B}(V_{i},r)|\leq a\cdot\big{(}\mathrm{diam}(G[\mathsf{B}(V_{i},r)])\big{)}^{d}\leq t\log^{2d}n,

where t>0t>0 is some fixed constant, and the last inequality is due to Lemma 5.4. With the same definition of φ()\varphi(\cdot), we deduce that

Entf\displaystyle\mathrm{Ent}f 3i=1m𝔼[Ent𝖡(Vi,r)f]\displaystyle\leq 3\sum_{i=1}^{m}\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}(V_{i},r)}f\right]
3φ(tlog2dn)i=1mv𝖡(Vi,r)𝔼[Entvf]\displaystyle\leq 3\cdot\varphi(t\log^{2d}n)\sum_{i=1}^{m}\sum_{v\in\mathsf{B}(V_{i},r)}\mathbb{E}\left[\mathrm{Ent}_{v}f\right]
6lognφ(tlog2dn)vV𝔼[Entvf],\displaystyle\leq 6\log n\cdot\varphi(t\log^{2d}n)\sum_{v\in V}\mathbb{E}\left[\mathrm{Ent}_{v}f\right],

where the last inequality follows from Lemma 5.4. This allows us to obtain the following recursive bound: for all kk0k\geq k_{0} where k0>0k_{0}>0 is some fixed constant,

φ(k)max{6logkφ(tlog2dk),φ(k0)}.\varphi(k)\leq\max\left\{6\log k\cdot\varphi(t\log^{2d}k),\,\varphi(k_{0})\right\}. (36)

Solving Eq. 36, we can get φ(n)=O(log2n)\varphi(n)=O(\log^{2}n). Thus, AT of entropy holds with multiplier O(log2n)O(\log^{2}n), and we get the mixing time bound from Lemma 2.7. ∎

5.3 Low-diameter decomposition

In this subsection we present the proof of Lemma 5.4.

Given a graph G=(V,E)G=(V,E) and a partition V=i=1mViV=\bigcup_{i=1}^{m}V_{i} of the vertex set into mm clusters, define the quotient graph (G;V1,,Vm)\mathcal{H}(G;V_{1},\dots,V_{m}) to be the graph with vertex set {Vi:i[m]}\{V_{i}:i\in[m]\} where two clusters Vi,VjV_{i},V_{j} are adjacent iff there exists uVi,vVju\in V_{i},v\in V_{j} such that uvEuv\in E. Namely, HH is the graph obtained from GG by contracting every cluster into a single vertex and connect two vertices iff the two clusters are adjacent. We need the following classical result of low-diameter decomposition due to Linial and Saks [LS93].

Lemma 5.8 ([LS93, Theorem 2.1]).

Let G=(V,E)G=(V,E) be an nn-vertex graph where n10n\geq 10. There exists a partition V=i=1mViV=\bigcup_{i=1}^{m}V_{i} of the vertex set into clusters such that the following conditions hold:

  1. 1.

    For each i[m]i\in[m], we have diam(G[Vi])3logn\mathrm{diam}(G[V_{i}])\leq 3\log n;

  2. 2.

    The quotient graph =(G;V1,,Vm)\mathcal{H}=\mathcal{H}(G;V_{1},\dots,V_{m}) has chromatic number at most 2logn2\log n.

Remark 5.9.

Lemma 5.8 is obtained by taking p=1/2p=1/2 in Theorem 2.1 of [LS93]. Also note that the partition in Lemma 5.8 is defined differently from [LS93]: the cluster in [LS93] represents the union of all clusters of the same color in Lemma 5.8.

Proof of Lemma 5.4.

Let G2rG^{\leq 2r} be the graph with vertex set VV where two vertices are adjacent iff their graph distance is at most 2r2r. By Lemma 5.8, there exists a partition V=i=1mViV=\bigcup_{i=1}^{m}V_{i} for the graph G2rG^{\leq 2r} such that

  1. 1.

    For each i[m]i\in[m], we have diam(G2r[Vi])3logn\mathrm{diam}(G^{\leq 2r}[V_{i}])\leq 3\log n;

  2. 2.

    The quotient graph (G2r;V1,,Vm)\mathcal{H}(G^{\leq 2r};V_{1},\dots,V_{m}) has chromatic number at most 2logn2\log n.

We claim that such a partition V=i=1mViV=\bigcup_{i=1}^{m}V_{i} satisfies our requirements.

Fix i[m]i\in[m]. For u,vViu,v\in V_{i}, since the diameter of G2r[Vi]G^{\leq 2r}[V_{i}] is at most 3logn3\log n, there exists a path PP in G2r[Vi]G^{\leq 2r}[V_{i}] connecting uu and vv of length at most 3logn3\log n. By replacing every edge in PP in G2r[Vi]G^{\leq 2r}[V_{i}] with a path in GG of length at most 2r2r, we obtain a path PP^{\prime} in GG of length at most 3logn2r=6rlogn3\log n\cdot 2r=6r\log n. In particular, this new path PP^{\prime} is contained in the induced subgraph G[𝖡(Vi,r)]G[\mathsf{B}(V_{i},r)]. Hence, the distance between all pairs of u,vViu,v\in V_{i} in G[𝖡(Vi,r)]G[\mathsf{B}(V_{i},r)] is at most 6rlogn6r\log n. Also, note that every vertex in 𝖡(Vi,r)\mathsf{B}(V_{i},r) is at distance at most rr from ViV_{i} in G[𝖡(Vi,r)]G[\mathsf{B}(V_{i},r)]. Thus, the diameter of G[𝖡(Vi,r)]G[\mathsf{B}(V_{i},r)] is at most 6rlogn+2r6r\log n+2r, verifying the first condition.

For the second condition, take an arbitrary vertex vVv\in V and consider all clusters at distance at most rr from vv. Then all these clusters have pairwise distance at most 2r2r, meaning that they form a clique in the quotient graph (G2r;V1,,Vm)\mathcal{H}(G^{\leq 2r};V_{1},\dots,V_{m}). Since the quotient graph is (2logn)(2\log n)-colorable, the size of this clique is at most 2logn2\log n, implying the second condition. ∎

6 Proofs for Variance and Entropy Factorization

In this section, we give missing proofs from Sections 3.2 and 4.2.2.

6.1 Proof of Lemma 3.3

In this subsection we present the proof of Lemma 3.3. Our proof here avoids some technical difficulties appeared in [Ces01, Proposition 2.1] or [DPPP02, Lemma 5.2], and allows us to get a slightly better constant for AT.

Proof of Lemma 3.3.

We notice that factorization of entropy Eq. 11 implies factorization of variance Eq. 10 with the same constant by a standard linearization argument (plugging f=1+θgf=1+\theta g into Eq. 11 and then taking θ0\theta\to 0), see [CMT15]. Thus, it suffices to consider only entropy and prove Eq. 11.

By the law of total entropy (Lemma 6.5) we have

Entf=𝔼[EntXf]+Ent[𝔼Xf]=𝔼[EntYf]+Ent[𝔼Yf].\mathrm{Ent}f=\mathbb{E}[\mathrm{Ent}_{X}f]+\mathrm{Ent}[\mathbb{E}_{X}f]=\mathbb{E}[\mathrm{Ent}_{Y}f]+\mathrm{Ent}[\mathbb{E}_{Y}f].

Hence, Eq. 11 is equivalent to that

Entf(1ε)(Ent[𝔼Xf]+Ent[𝔼Yf]).\mathrm{Ent}f\geq(1-\varepsilon)\left(\mathrm{Ent}[\mathbb{E}_{X}f]+\mathrm{Ent}[\mathbb{E}_{Y}f]\right). (37)

It suffices to show Eq. 37.

Without loss of generality we may assume that 𝔼f=1\mathbb{E}f=1. Thus, 𝔼[𝔼Xf]=𝔼[𝔼Yf]=𝔼f=1\mathbb{E}[\mathbb{E}_{X}f]=\mathbb{E}[\mathbb{E}_{Y}f]=\mathbb{E}f=1. Let us define

D=EntfEnt[𝔼Xf]Ent[𝔼Yf].D=\mathrm{Ent}f-\mathrm{Ent}[\mathbb{E}_{X}f]-\mathrm{Ent}[\mathbb{E}_{Y}f]. (38)

Then by definition we have

D\displaystyle D =𝔼[flogf]𝔼[(𝔼Xf)log(𝔼Xf)]𝔼[(𝔼Yf)log(𝔼Yf)]\displaystyle=\mathbb{E}[f\log f]-\mathbb{E}[(\mathbb{E}_{X}f)\log(\mathbb{E}_{X}f)]-\mathbb{E}[(\mathbb{E}_{Y}f)\log(\mathbb{E}_{Y}f)]
=𝔼[flogf]𝔼[flog(𝔼Xf)]𝔼[flog(𝔼Yf)]\displaystyle=\mathbb{E}[f\log f]-\mathbb{E}[f\log(\mathbb{E}_{X}f)]-\mathbb{E}[f\log(\mathbb{E}_{Y}f)]
=𝔼[flogfflog((𝔼Xf)(𝔼Yf))].\displaystyle=\mathbb{E}\left[f\log f-f\log\left((\mathbb{E}_{X}f)(\mathbb{E}_{Y}f)\right)\right].

Let (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y} such that π(x,y)>0\pi(x,y)>0. If (𝔼Xyf)(𝔼Yxf)>0(\mathbb{E}_{X}^{y}f)(\mathbb{E}_{Y}^{x}f)>0, then by the inequality alog(a/b)aba\log(a/b)\geq a-b for all a0a\geq 0 and b>0b>0, we deduce that at the point (x,y)(x,y)

flogfflog((𝔼Xf)(𝔼Yf))f(𝔼Xf)(𝔼Yf).\displaystyle f\log f-f\log\left((\mathbb{E}_{X}f)(\mathbb{E}_{Y}f)\right)\geq f-(\mathbb{E}_{X}f)(\mathbb{E}_{Y}f). (39)

Meanwhile, if (𝔼Xyf)(𝔼Yxf)=0(\mathbb{E}_{X}^{y}f)(\mathbb{E}_{Y}^{x}f)=0 then it must hold f(x,y)=0f(x,y)=0 since ff is non-negative, and hence Eq. 39 still holds at (x,y)(x,y) with both sides equal to 0 (recall 0log0=00\log 0=0). Thus, we obtain that

D𝔼[f(𝔼Xf)(𝔼Yf)]=𝔼[(𝔼Xf1)(𝔼Yf1)],D\geq\mathbb{E}\left[f-(\mathbb{E}_{X}f)(\mathbb{E}_{Y}f)\right]=-\mathbb{E}\left[(\mathbb{E}_{X}f-1)(\mathbb{E}_{Y}f-1)\right], (40)

where we use 𝔼[𝔼Xf]=𝔼[𝔼Yf]=𝔼f=1\mathbb{E}[\mathbb{E}_{X}f]=\mathbb{E}[\mathbb{E}_{Y}f]=\mathbb{E}f=1. Note that 𝔼[(𝔼Xf1)(𝔼Yf1)]\mathbb{E}\left[(\mathbb{E}_{X}f-1)(\mathbb{E}_{Y}f-1)\right] is the covariance of the two functions 𝔼Xf\mathbb{E}_{X}f and 𝔼Yf\mathbb{E}_{Y}f.

Let us now define a probability distribution over 𝒳×𝒴\mathcal{X}\times\mathcal{Y} by ν=fπ\nu=f\pi, i.e., ν(x,y)=π(x,y)f(x,y)\nu(x,y)=\pi(x,y)f(x,y) for all (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y}. Note that ν\nu is indeed a distribution since 𝔼πf=1\mathbb{E}_{\pi}f=1. We also define the marginal distributions νX,νY\nu_{X},\nu_{Y} and the conditional distributions νXy,νYx\nu_{X}^{y},\nu_{Y}^{x} similarly as for π\pi. Observe that for each y𝒴y\in\mathcal{Y}, we have

𝔼Xyf=x𝒳πXyf(x,y)=x𝒳π(x,y)πY(y)f(x,y)=1πY(y)x𝒳ν(x,y)=νY(y)πY(y).\mathbb{E}_{X}^{y}f=\sum_{x\in\mathcal{X}}\pi_{X}^{y}f(x,y)=\sum_{x\in\mathcal{X}}\frac{\pi(x,y)}{\pi_{Y}(y)}f(x,y)=\frac{1}{\pi_{Y}(y)}\sum_{x\in\mathcal{X}}\nu(x,y)=\frac{\nu_{Y}(y)}{\pi_{Y}(y)}.

Then, we deduce that

𝔼[(𝔼Xf1)(𝔼Yf1)]=(x,y)𝒳×𝒴π(x,y)(νX(x)πX(x)1)(νY(y)πY(y)1).\mathbb{E}\left[(\mathbb{E}_{X}f-1)(\mathbb{E}_{Y}f-1)\right]=\sum_{(x,y)\in\mathcal{X}\times\mathcal{Y}}\pi(x,y)\left(\frac{\nu_{X}(x)}{\pi_{X}(x)}-1\right)\left(\frac{\nu_{Y}(y)}{\pi_{Y}(y)}-1\right). (41)

Let 𝒳+={x𝒳:νX(x)πX(x)}\mathcal{X}^{+}=\{x\in\mathcal{X}:\nu_{X}(x)\geq\pi_{X}(x)\} and 𝒳={x𝒳:νX(x)<πX(x)}\mathcal{X}^{-}=\{x\in\mathcal{X}:\nu_{X}(x)<\pi_{X}(x)\}, so (𝒳+,𝒳)(\mathcal{X}^{+},\mathcal{X}^{-}) is a partition of 𝒳\mathcal{X}. Define 𝒴+\mathcal{Y}^{+} and 𝒴\mathcal{Y}^{-} in the same way with respect to YY. Recall that the condition in Eq. 9 is equivalent to that for all (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y},

(1ε)πX(x)πY(y)π(x,y)(1+ε)πX(x)πY(y).(1-\varepsilon)\pi_{X}(x)\pi_{Y}(y)\leq\pi(x,y)\leq(1+\varepsilon)\pi_{X}(x)\pi_{Y}(y).

Hence, we obtain that

(x,y)𝒳+×𝒴+π(x,y)(νX(x)πX(x)1)(νY(y)πY(y)1)\displaystyle\sum_{(x,y)\in\mathcal{X}^{+}\times\mathcal{Y}^{+}}\pi(x,y)\left(\frac{\nu_{X}(x)}{\pi_{X}(x)}-1\right)\left(\frac{\nu_{Y}(y)}{\pi_{Y}(y)}-1\right)
\displaystyle\leq{} (1+ε)(x,y)𝒳+×𝒴+πX(x)πY(y)(νX(x)πX(x)1)(νY(y)πY(y)1)\displaystyle(1+\varepsilon)\sum_{(x,y)\in\mathcal{X}^{+}\times\mathcal{Y}^{+}}\pi_{X}(x)\pi_{Y}(y)\left(\frac{\nu_{X}(x)}{\pi_{X}(x)}-1\right)\left(\frac{\nu_{Y}(y)}{\pi_{Y}(y)}-1\right)
=\displaystyle={} (1+ε)(x𝒳+νX(x)πX(x))(y𝒴+νY(y)πY(y))\displaystyle(1+\varepsilon)\left(\sum_{x\in\mathcal{X}^{+}}\nu_{X}(x)-\pi_{X}(x)\right)\left(\sum_{y\in\mathcal{Y}^{+}}\nu_{Y}(y)-\pi_{Y}(y)\right)
=\displaystyle={} (1+ε)dTV(νX,πX)dTV(νY,πY).\displaystyle(1+\varepsilon)d_{\mathrm{TV}}(\nu_{X},\pi_{X})d_{\mathrm{TV}}(\nu_{Y},\pi_{Y}).

The same upper bound holds for 𝒳×𝒴\mathcal{X}^{-}\times\mathcal{Y}^{-} as well. Meanwhile,

(x,y)𝒳+×𝒴π(x,y)(νX(x)πX(x)1)(νY(y)πY(y)1)\displaystyle\sum_{(x,y)\in\mathcal{X}^{+}\times\mathcal{Y}^{-}}\pi(x,y)\left(\frac{\nu_{X}(x)}{\pi_{X}(x)}-1\right)\left(\frac{\nu_{Y}(y)}{\pi_{Y}(y)}-1\right)
\displaystyle\leq{} (1ε)(x,y)𝒳+×𝒴πX(x)πY(y)(νX(x)πX(x)1)(νY(y)πY(y)1)\displaystyle(1-\varepsilon)\sum_{(x,y)\in\mathcal{X}^{+}\times\mathcal{Y}^{-}}\pi_{X}(x)\pi_{Y}(y)\left(\frac{\nu_{X}(x)}{\pi_{X}(x)}-1\right)\left(\frac{\nu_{Y}(y)}{\pi_{Y}(y)}-1\right)
=\displaystyle={} (1ε)(x𝒳+νX(x)πX(x))(y𝒴νY(y)πY(y))\displaystyle(1-\varepsilon)\left(\sum_{x\in\mathcal{X}^{+}}\nu_{X}(x)-\pi_{X}(x)\right)\left(\sum_{y\in\mathcal{Y}^{-}}\nu_{Y}(y)-\pi_{Y}(y)\right)
=\displaystyle={} (1ε)dTV(νX,πX)dTV(νY,πY),\displaystyle-(1-\varepsilon)d_{\mathrm{TV}}(\nu_{X},\pi_{X})d_{\mathrm{TV}}(\nu_{Y},\pi_{Y}),

and the same bound also holds for (x,y)𝒳×𝒴+(x,y)\in\mathcal{X}^{-}\times\mathcal{Y}^{+}. Therefore, plugging into Eq. 41, we obtain that

𝔼[(𝔼Xf1)(𝔼Yf1)]\displaystyle\mathbb{E}\left[(\mathbb{E}_{X}f-1)(\mathbb{E}_{Y}f-1)\right] 2(1+ε)dTV(νX,πX)dTV(νY,πY)2(1ε)dTV(νX,πX)dTV(νY,πY)\displaystyle\leq 2(1+\varepsilon)d_{\mathrm{TV}}(\nu_{X},\pi_{X})d_{\mathrm{TV}}(\nu_{Y},\pi_{Y})-2(1-\varepsilon)d_{\mathrm{TV}}(\nu_{X},\pi_{X})d_{\mathrm{TV}}(\nu_{Y},\pi_{Y})
=4εdTV(νX,πX)dTV(νY,πY).\displaystyle=4\varepsilon d_{\mathrm{TV}}(\nu_{X},\pi_{X})d_{\mathrm{TV}}(\nu_{Y},\pi_{Y}). (42)

By Pinsker’s inequality, we have

4dTV(νX,πX)dTV(νY,πY)\displaystyle 4d_{\mathrm{TV}}(\nu_{X},\pi_{X})d_{\mathrm{TV}}(\nu_{Y},\pi_{Y}) 2DKL(νXπX)DKL(νYπY)\displaystyle\leq 2\sqrt{D_{\mathrm{KL}}(\nu_{X}\|\pi_{X})D_{\mathrm{KL}}(\nu_{Y}\|\pi_{Y})}
DKL(νXπX)+DKL(νYπY)\displaystyle\leq D_{\mathrm{KL}}(\nu_{X}\|\pi_{X})+D_{\mathrm{KL}}(\nu_{Y}\|\pi_{Y})
=Ent[𝔼Yf]+Ent[𝔼Xf]\displaystyle=\mathrm{Ent}[\mathbb{E}_{Y}f]+\mathrm{Ent}[\mathbb{E}_{X}f] (43)

where the last equality follows from the observation that DKL(νXπX)=Ent[νX/πX]=Ent[𝔼Yf]D_{\mathrm{KL}}(\nu_{X}\|\pi_{X})=\mathrm{Ent}[\nu_{X}/\pi_{X}]=\mathrm{Ent}[\mathbb{E}_{Y}f] and similarly DKL(νYπY)=Ent[𝔼Xf]D_{\mathrm{KL}}(\nu_{Y}\|\pi_{Y})=\mathrm{Ent}[\mathbb{E}_{X}f].

Finally, combining Eqs. 40, 42 and 43, we obtain that

Dε(Ent[𝔼Xf]+Ent[𝔼Yf]).D\geq-\varepsilon\left(\mathrm{Ent}[\mathbb{E}_{X}f]+\mathrm{Ent}[\mathbb{E}_{Y}f]\right).

Recalling Eq. 38, we obtain Eq. 37 as claimed and the lemma then follows. ∎

6.2 Proofs of Lemmas 3.4 and 3.6

Before presenting the proofs of these two lemmas, we first give some definitions and lemmas that are needed.

Our proof of approximate tensorization is based on the spectral independence approach. The following definitions and theorem are taken from [FGYZ21] which builds upon [AL20, ALO20] (see also [CGŠV21]).

Definition 6.1 (Influence Matrix, [FGYZ21]).

Let π\pi be a distribution over a finite product space 𝒳=i=1n𝒳i\mathcal{X}=\prod_{i=1}^{n}\mathcal{X}_{i}. Fix a subset Λ[n]\Lambda\subseteq[n] and a feasible partial assignment xΛiΛ𝒳ix_{\Lambda}\in\prod_{i\in\Lambda}\mathcal{X}_{i} with πΛ(xΛ)>0\pi_{\Lambda}(x_{\Lambda})>0. For any i,j[n]Λi,j\in[n]\setminus\Lambda with iji\neq j, we define the (pairwise) influence of XiX_{i} on XjX_{j} conditioned on xΛx_{\Lambda} by

ΨπxΛ(i,j)=maxxi,xidTV(πjxΛ,xi,πjxΛ,xi),\Psi_{\pi}^{x_{\Lambda}}(i,j)=\max_{x_{i},x^{\prime}_{i}}d_{\mathrm{TV}}(\pi_{j}^{x_{\Lambda},x_{i}},\pi_{j}^{x_{\Lambda},x^{\prime}_{i}}),

where xi,xix_{i},x^{\prime}_{i} are chosen from 𝒳i\mathcal{X}_{i} such that πixΛ(xi)>0\pi_{i}^{x_{\Lambda}}(x_{i})>0 and πixΛ(xi)>0\pi_{i}^{x_{\Lambda}}(x^{\prime}_{i})>0.

The (pairwise) influence matrix ΨπxΛ\Psi_{\pi}^{x_{\Lambda}} is defined with entries given as above and also with ΨπxΛ(i,i)=0\Psi_{\pi}^{x_{\Lambda}}(i,i)=0 for i[n]Λi\in[n]\setminus\Lambda for diagonal entries.

Definition 6.2 (Spectral Independence, [FGYZ21]).

We say a distribution π\pi over a finite product space 𝒳=i=1n𝒳i\mathcal{X}=\prod_{i=1}^{n}\mathcal{X}_{i} is (η0,η1,,ηn2)(\eta_{0},\eta_{1},\dots,\eta_{n-2})-spectrally independent, if for every 0kn20\leq k\leq n-2, every subset Λ[n]\Lambda\subseteq[n] of size |Λ|=k|\Lambda|=k, and every feasible partial assignment xΛiΛ𝒳ix_{\Lambda}\in\prod_{i\in\Lambda}\mathcal{X}_{i} with πΛ(xΛ)>0\pi_{\Lambda}(x_{\Lambda})>0, the spectral radius ρ(ΨπxΛ)\rho(\Psi_{\pi}^{x_{\Lambda}}) of the influence matrix ΨπxΛ\Psi_{\pi}^{x_{\Lambda}} satisfies

ρ(ΨπxΛ)ηk.\rho(\Psi_{\pi}^{x_{\Lambda}})\leq\eta_{k}.
Theorem 6.3 ([FGYZ21, Theorem 3.2]).

Let π\pi be a distribution over a finite product space 𝒳=i=1n𝒳i\mathcal{X}=\prod_{i=1}^{n}\mathcal{X}_{i}. Let η0,η1,,ηn2\eta_{0},\eta_{1},\dots,\eta_{n-2} be a sequence of reals such that 0ηk<nk10\leq\eta_{k}<n-k-1 for each kk. If π\pi is (η0,η1,,ηn2)(\eta_{0},\eta_{1},\dots,\eta_{n-2})-spectrally independent, then the spectral gap of the Glauber dynamics PP for sampling from π\pi satisfies

λ(P)1nk=0n2(1ηknk1).\lambda(P)\geq\frac{1}{n}\prod_{k=0}^{n-2}\left(1-\frac{\eta_{k}}{n-k-1}\right).

We also need the following well-known comparison between the spectral gap and the standard log-Sobolev constant (which holds more generally for any Markov chain).

Lemma 6.4 ([DSC96, Corollay A.4]).

Let π\pi be a distribution over a finite product space 𝒳=i=1n𝒳i\mathcal{X}=\prod_{i=1}^{n}\mathcal{X}_{i}. Suppose λ\lambda is the spectral gap of the Glauber dynamics for sampling from π\pi, and ρ\rho is the standard log-Sobolev constant of it. Then we have

ρ12πminlog(1/πmin1)λ,\rho\geq\frac{1-2\pi_{\min}}{\log(1/\pi_{\min}-1)}\lambda,

where πmin=minx𝒳:π(x)>0π(x)\pi_{\min}=\min_{x\in\mathcal{X}:\,\pi(x)>0}\pi(x). In particular, if π\pi has positive density on at least two assignments in 𝒳\mathcal{X} then πmin1/2\pi_{\min}\leq 1/2 and we have

ρλ2+log(1/πmin).\rho\geq\frac{\lambda}{2+\log(1/\pi_{\min})}.

Finally, we need the following lemma for deriving AT from the spectral gap and the standard log-Sobolev constant.

Lemma 6.5 ([CMT15, Proposition 1.1]).

Let π\pi be a distribution over a finite product space 𝒳=i=1n𝒳i\mathcal{X}=\prod_{i=1}^{n}\mathcal{X}_{i}. Suppose λ\lambda is the spectral gap of the Glauber dynamics for sampling from π\pi, and ρ\rho is the standard log-Sobolev constant of it. Then we have

Varf\displaystyle\mathrm{Var}f 1λni=1n𝔼[Varif],f:𝒳\displaystyle\leq\frac{1}{\lambda n}\sum_{i=1}^{n}\mathbb{E}[\mathrm{Var}_{i}f],\quad\forall f:\mathcal{X}\to\mathbb{R}
andEntf\displaystyle\text{and}\quad\mathrm{Ent}f 1ρni=1n𝔼[Entif],f:𝒳0.\displaystyle\leq\frac{1}{\rho n}\sum_{i=1}^{n}\mathbb{E}[\mathrm{Ent}_{i}f],\quad\forall f:\mathcal{X}\to\mathbb{R}_{\geq 0}.

We are now ready to prove Lemmas 3.4 and 3.6.

Proof of Lemma 3.4.

The spectral radius of the influence matrix of π\pi, as defined in Definition 6.1, is upper bounded by

ρ(Ψπ)ρ([01εY1εX0])=(1εX)(1εY)1εX+εY2.\rho(\Psi_{\pi})\leq\rho\left(\begin{bmatrix}0&1-\varepsilon_{Y}\\ 1-\varepsilon_{X}&0\\ \end{bmatrix}\right)=\sqrt{(1-\varepsilon_{X})(1-\varepsilon_{Y})}\leq 1-\frac{\varepsilon_{X}+\varepsilon_{Y}}{2}.

Therefore, by Theorem 6.3 the spectral gap of the Glauber dynamics is at least (εX+εY)/4(\varepsilon_{X}+\varepsilon_{Y})/4, and by Lemma 6.4 the standard log-Sobolev constant is at least (εX+εY)/(8+4log(1/πmin))(\varepsilon_{X}+\varepsilon_{Y})/(8+4\log(1/\pi_{\min})). We then deduce the lemma from Lemma 6.5. ∎

Remark 6.6.

Another way of proving Lemma 3.4 is by viewing the (pairwise) influence matrix Ψπ\Psi_{\pi} as the Dobrushin dependency/influence matrix, and showing the Glauber dynamics is contractive with respective to some weighted Hamming distance with the weight vector given by the principle eigenvector of Ψπ\Psi_{\pi}; the lower bound on the spectral gap then follows from [LP17, Theorem 13.1] or [Che98].

Proof of Lemma 3.6.

By definition we see that π\pi is (η0,η1,,ηn2)(\eta_{0},\eta_{1},\dots,\eta_{n-2})-spectrally independent, where for each kk we have

ηk(nk1)(1ε)\eta_{k}\leq(n-k-1)(1-\varepsilon)

by considering the \ell_{\infty} norm (absolute row sum) of the influence matrices. Hence, we deduce from Theorem 6.3 that the spectral gap of the Glauber dynamics is lower bounded by

λ1nk=0n2(1ηknk1)εn1n.\lambda\geq\frac{1}{n}\prod_{k=0}^{n-2}\left(1-\frac{\eta_{k}}{n-k-1}\right)\geq\frac{\varepsilon^{n-1}}{n}.

Furthermore, by Lemma 6.4 the standard log-Sobolev constant is lower bounded by

ρλ2+log(1/πmin)εn1(2+log(1/πmin))n.\rho\geq\frac{\lambda}{2+\log(1/\pi_{\min})}\geq\frac{\varepsilon^{n-1}}{(2+\log(1/\pi_{\min}))n}.

Finally, the lemma follows from an application of Lemma 6.5. ∎

6.3 Proof of Lemma 4.10

Proof of Lemma 4.10.

Recall, the marginal distribution πXY\pi_{XY} satisfies {{X},{Y}}\{\{X\},\{Y\}\}-factorization (i.e., approximate tensorization) of entropy with constant CC if

EntXYgC(𝔼XY[EntXg]+𝔼XY[EntYg]),g:𝒳×𝒴0.\mathrm{Ent}_{XY}g\leq C\left(\mathbb{E}_{XY}\left[\mathrm{Ent}_{X}g\right]+\mathbb{E}_{XY}\left[\mathrm{Ent}_{Y}g\right]\right),\quad\forall g:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}_{\geq 0}. (44)

We claim that Eq. 44 is equivalent to

Entg¯C(𝔼[EntXZg¯]+𝔼[EntYZg¯]),g¯:𝒳×𝒴×𝒵0 depending only on X and Y\mathrm{Ent}\bar{g}\leq C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}\bar{g}\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}\bar{g}\right]\right),\quad\forall\bar{g}:\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}\to\mathbb{R}_{\geq 0}\text{~{}depending \emph{only} on $X$ and $Y$} (45)

where the underlying distribution is π=πXYZ\pi=\pi_{XYZ}. To see this, observe that every function g:𝒳×𝒴0g:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}_{\geq 0} is in one-to-one correspondence to a function g¯:𝒳×𝒴×𝒵0\bar{g}:\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}\to\mathbb{R}_{\geq 0} depending only on X,YX,Y by the relationship g¯(x,y,z)=g(x,y)\bar{g}(x,y,z)=g(x,y). By definitions, we have EntXYg=Entg¯\mathrm{Ent}_{XY}g=\mathrm{Ent}\bar{g}, EntπXyg=EntπXZyg¯\mathrm{Ent}_{\pi^{y}_{X}}g=\mathrm{Ent}_{\pi^{y}_{XZ}}\bar{g} for all y𝒴y\in\mathcal{Y}, and EntπYxg=EntπYZxg¯\mathrm{Ent}_{\pi^{x}_{Y}}g=\mathrm{Ent}_{\pi^{x}_{YZ}}\bar{g} for all x𝒳x\in\mathcal{X}, implying Eqs. 44 and 45 are equivalent.

Therefore, it suffices to show that Eq. 45 is equivalent to that π\pi satisfies {{X,Z},{Y,Z}}\{\{X,Z\},\{Y,Z\}\}-factorization of entropy with constant CC, i.e.,

EntfC(𝔼[EntXZf]+𝔼[EntYZf]),f:𝒳×𝒴×𝒵0.\mathrm{Ent}f\leq C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}f\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}f\right]\right),\quad\forall f:\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}\to\mathbb{R}_{\geq 0}. (46)

It is trivial that Eq. 46 implies Eq. 45. For the other direction, suppose Eq. 45 is true. Since 𝔼Zf\mathbb{E}_{Z}f is a function depending only on XX and YY, we have from Eq. 45 that

Ent(𝔼Zf)C(𝔼[EntXZ(𝔼Zf)]+𝔼[EntYZ(𝔼Zf)]).\mathrm{Ent}(\mathbb{E}_{Z}f)\leq C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}(\mathbb{E}_{Z}f)\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}(\mathbb{E}_{Z}f)\right]\right).

Then, we deduce from Lemma 2.2 that

Entf\displaystyle\mathrm{Ent}f =𝔼[EntZf]+Ent(𝔼Zf)\displaystyle=\mathbb{E}[\mathrm{Ent}_{Z}f]+\mathrm{Ent}(\mathbb{E}_{Z}f)
𝔼[EntZf]+C(𝔼[EntXZ(𝔼Zf)]+𝔼[EntYZ(𝔼Zf)])\displaystyle\leq\mathbb{E}[\mathrm{Ent}_{Z}f]+C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}(\mathbb{E}_{Z}f)\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}(\mathbb{E}_{Z}f)\right]\right)
=𝔼[EntZf]+C(𝔼[EntXZf]𝔼[EntZf]+𝔼[EntYZf]𝔼[EntZf])\displaystyle=\mathbb{E}[\mathrm{Ent}_{Z}f]+C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}f\right]-\mathbb{E}\left[\mathrm{Ent}_{Z}f\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}f\right]-\mathbb{E}\left[\mathrm{Ent}_{Z}f\right]\right)
=C(𝔼[EntXZf]+𝔼[EntYZf])(2C1)𝔼[EntZf]\displaystyle=C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}f\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}f\right]\right)-(2C-1)\mathbb{E}[\mathrm{Ent}_{Z}f]
C(𝔼[EntXZf]+𝔼[EntYZf]),\displaystyle\leq C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}f\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}f\right]\right),

where the last inequality is due to C1C\geq 1 (this can be seen by considering functions depending only on XX in Eq. 45). ∎

References

  • [AJK+22] Nima Anari, Vishesh Jain, Frederic Koehler, Huy Tuan Pham, and Thuy-Duong Vuong. Entropic independence: optimal mixing of down-up random walks. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 1418–1430, 2022.
  • [AL20] Vedat Levi Alev and Lap Chi Lau. Improved analysis of higher order random walks and applications. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 1198–1211, 2020.
  • [ALO20] Nima Anari, Kuikui Liu, and Shayan Oveis Gharan. Spectral independence in high-dimensional expanders and applications to the hardcore model. In Proceedings of the 61st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 1319–1330, 2020.
  • [BCŠV22] Antonio Blanca, Zongchen Chen, Daniel Štefankovič, and Eric Vigoda. Complexity of high-dimensional identity testing with coordinate conditional sampling. arXiv preprint arXiv:2207.09102, 2022.
  • [BGGŠ22] Ivona Bezáková, Andreas Galanis, Leslie Ann Goldberg, and Daniel Štefankovič. Fast sampling via spectral independence beyond bounded-degree graphs. In Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP), volume 229, 2022.
  • [BKMP05] Noam Berger, Claire Kenyon, Elchanan Mossel, and Yuval Peres. Glauber dynamics on trees and hyperbolic graphs. Probability Theory and Related Fields, 131(3):311–340, 2005.
  • [Bod98] Hans L Bodlaender. A partial kk-arboretum of graphs with bounded treewidth. Theoretical computer science, 209(1-2):1–45, 1998.
  • [CE22] Yuansi Chen and Ronen Eldan. Localization schemes: A framework for proving mixing bounds for Markov chains. In Proceedings of the 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 110–122. IEEE, 2022.
  • [Ces01] Filippo Cesi. Quasi-factorization of the entropy and logarithmic Sobolev inequalities for Gibbs random fields. Probability Theory and Related Fields, 120(4):569–584, 2001.
  • [CFYZ21] Xiaoyu Chen, Weiming Feng, Yitong Yin, and Xinyuan Zhang. Rapid mixing of Glauber dynamics via spectral independence for all degrees. In Proceedings of the 62nd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 137–148. IEEE, 2021.
  • [CFYZ22] Xiaoyu Chen, Weiming Feng, Yitong Yin, and Xinyuan Zhang. Optimal mixing for two-state anti-ferromagnetic spin systems. In Proceedings of the 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 588–599. IEEE, 2022.
  • [CGŠV21] Zongchen Chen, Andreas Galanis, Daniel Štefankovič, and Eric Vigoda. Rapid mixing for colorings via spectral independence. In Proceedings of the 32nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1548–1557, 2021.
  • [Che98] Mu-Fa Chen. Trilogy of couplings and general formulas for lower bound of spectral gap. In Probability towards 2000, pages 123–136. Springer, 1998.
  • [CLV20] Zongchen Chen, Kuikui Liu, and Eric Vigoda. Rapid mixing of Glauber dynamics up to uniqueness via contraction. In Proceedings of the 61st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 1307–1318, 2020.
  • [CLV21] Zongchen Chen, Kuikui Liu, and Eric Vigoda. Optimal mixing of Glauber dynamics: Entropy factorization via high-dimensional expansion. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 1537–1550, 2021.
  • [CMT15] Pietro Caputo, Georg Menz, and Prasad Tetali. Approximate tensorization of entropy at high temperature. Annales de la Faculté des sciences de Toulouse: Mathématiques, 24(4):691–716, 2015.
  • [CP21] Pietro Caputo and Daniel Parisi. Block factorization of the relative entropy via spatial mixing. Communications in Mathematical Physics, 388(2):793–818, 2021.
  • [DH04] Erik D Demaine and MohammadTaghi Hajiaghayi. Equivalence of local treewidth and linear local treewidth and its algorithmic applications. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 840–849, 2004.
  • [DPPP02] Paolo Dai Pra, Anna Maria Paganoni, and Gustavo Posta. Entropy inequalities for unbounded spin systems. The Annals of Probability, 30(4):1959–1976, 2002.
  • [DSC96] Persi Diaconis and Laurent Saloff-Coste. Logarithmic Sobolev inequalities for finite Markov chains. The Annals of Applied Probability, 6(3):695–750, 1996.
  • [DSVW04] Martin Dyer, Alistair Sinclair, Eric Vigoda, and Dror Weitz. Mixing in time and space for lattice spin systems: A combinatorial view. Random Structures & Algorithms, 24(4):461–479, 2004.
  • [EF21] David Eppstein and Daniel Frishberg. Rapid mixing of the hardcore Glauber dynamics and other Markov chains in bounded-treewidth graphs. arXiv preprint arXiv:2111.03898, 2021.
  • [Epp99] David Eppstein. Subgraph isomorphism in planar graphs and related problems. Journal of Graph Algorithms and Applications, 3(3):1–27, 1999.
  • [FGYZ21] Weiming Feng, Heng Guo, Yitong Yin, and Chihao Zhang. Rapid mixing from spectral independence beyond the Boolean domain. In Proceedings of the 32nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1558–1577. SIAM, 2021.
  • [GJK10] Leslie Ann Goldberg, Mark Jerrum, and Marek Karpinski. The mixing time of Glauber dynamics for coloring regular trees. Random Structures & Algorithms, 36(4):464–476, 2010.
  • [GKM15] David Gamarnik, Dmitriy Katz, and Sidhant Misra. Strong spatial mixing of list coloring of graphs. Random Structures & Algorithms, 46(4):599–613, 2015.
  • [GMP05] Leslie A. Goldberg, Russell Martin, and Mike Paterson. Strong spatial mixing with fewer colors for lattice graphs. SIAM Journal on Computing, 35(2):486–517, 2005.
  • [Gru12] Hermann Gruber. On balanced separators, treewidth, and cycle rank. Journal of Combinatorics, 3(4):669–681, 2012.
  • [GŠV16] Andreas Galanis, Daniel Štefankovič, and Eric Vigoda. Inapproximability of the partition function for the antiferromagnetic Ising and hard-core models. Combinatorics, Probability and Computing, 25(4):500–559, 2016.
  • [GZ03] Alice Guionnet and Bogusław Zegarliński. Lectures on Logarithmic Sobolev Inequalities, pages 1–134. Springer, Berlin, 2003.
  • [Hei20] Marc Heinrich. Glauber dynamics for colourings of chordal graphs and graphs of bounded treewidth. arXiv preprint arXiv:2010.16158, 2020.
  • [HW17] Daniel J Harvey and David R Wood. Parameters tied to treewidth. Journal of Graph Theory, 84(4):364–385, 2017.
  • [Jer03] Mark Jerrum. Counting, Sampling and Integrating: Algorithms and Complexity. Lectures in Mathematics, ETH Zürich. Birkhäuser Basel, 2003.
  • [JS89] Mark Jerrum and Alistair Sinclair. Approximating the permanent. SIAM Journal on Computing, 18(6):1149–1178, 1989.
  • [JS93] Mark Jerrum and Alistair Sinclair. Polynomial-time approximation algorithms for the Ising model. SIAM Journal on Computing, 22(5):1087–1116, 1993.
  • [JSTV04] Mark Jerrum, Jung-Bae Son, Prasad Tetali, and Eric Vigoda. Elementary bounds on Poincaré and log-Sobolev constants for decomposable Markov chains. The Annals of Applied Probability, 14(4):1741–1765, 2004.
  • [JSV04] Mark Jerrum, Alistair Sinclair, and Eric Vigoda. A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries. Journal of the ACM (JACM), 51(4):671–697, 2004.
  • [KHR22] Frederic Koehler, Alexander Heckett, and Andrej Risteski. Statistical efficiency of score matching: The view from isoperimetry. In Proceedings of the 11th International Conference on Learning Representations (ICLR), 2022.
  • [LM11] Brendan Lucier and Michael Molloy. The Glauber dynamics for colorings of bounded degree trees. SIAM Journal on Discrete Mathematics, 25(2):827–853, 2011.
  • [LMP09] Brendan Lucier, Michael Molloy, and Yuval Peres. The Glauber dynamics for colourings of bounded degree trees. In International Workshop on Approximation Algorithms for Combinatorial Optimization, pages 631–645. Springer, 2009.
  • [LP17] David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
  • [LS93] Nathan Linial and Michael Saks. Low diameter graph decompositions. Combinatorica, 13(4):441–454, 1993.
  • [LT79] Richard J Lipton and Robert Endre Tarjan. A separator theorem for planar graphs. SIAM Journal on Applied Mathematics, 36(2):177–189, 1979.
  • [Mar99] Fabio Martinelli. Lectures on Glauber dynamics for discrete spin models. In Lectures on probability theory and statistics, pages 93–191. Springer, 1999.
  • [Mar19] Katalin Marton. Logarithmic sobolev inequalities in discrete product spaces. Combinatorics, Probability and Computing, 28(6):919–935, 2019.
  • [MSW04] Fabio Martinelli, Alistair Sinclair, and Dror Weitz. Glauber dynamics on trees: boundary conditions and mixing time. Communications in Mathematical Physics, 250(2):301–334, 2004.
  • [MWW09] Elchanan Mossel, Dror Weitz, and Nicholas Wormald. On the hardness of sampling independent sets beyond the tree threshold. Probability Theory and Related Fields, 143(3):401–439, 2009.
  • [Ree03] Bruce A Reed. Algorithmic aspects of tree width. Recent advances in algorithms and combinatorics, pages 85–107, 2003.
  • [RS86] Neil Robertson and Paul D. Seymour. Graph minors. II. Algorithmic aspects of tree-width. Journal of algorithms, 7(3):309–322, 1986.
  • [Sly10] Allan Sly. Computational transition at the uniqueness threshold. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 287–296, 2010.
  • [SS14] Allan Sly and Nike Sun. The computational hardness of counting in two-spin models on dd-regular graphs. The Annals of Probability, 42(6):2383–2416, 2014.
  • [SZ17] Allan Sly and Yumeng Zhang. The Glauber dynamics of colorings on trees is rapidly mixing throughout the nonreconstruction regime. The Annals of Applied Probability, pages 2646–2674, 2017.
  • [TVVY12] Prasad Tetali, Juan C Vera, Eric Vigoda, and Linji Yang. Phase transition for the mixing time of the Glauber dynamics for coloring regular trees. The Annals of Applied Probability, pages 2210–2239, 2012.
  • [Wei06] Dror Weitz. Counting independent sets up to the tree threshold. In Proceedings of the 38th Annual ACM Symposium on Theory of Computing (STOC), pages 140–149, 2006.
  • [WTZL18] Pengfei Wan, Jianhua Tu, Shenggui Zhang, and Binlong Li. Computing the numbers of independent sets and matchings of all sizes for graphs with bounded treewidth. Applied Mathematics and Computation, 332:42–47, 2018.
  • [YZ13] Yitong Yin and Chihao Zhang. Approximate counting via correlation decay on planar graphs. In Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 47–66. SIAM, 2013.