Combinatorial Approach for Factorization of Variance and Entropy in Spin Systems

Zongchen Chen Department of Mathematics, Massachusetts Institute of Technology. Email: [email protected].

Abstract

We present a simple combinatorial framework for establishing approximate tensorization of variance and entropy in the setting of spin systems (a.k.a. undirected graphical models) based on balanced separators of the underlying graph. Such approximate tensorization results immediately imply as corollaries many important structural properties of the associated Gibbs distribution, in particular rapid mixing of the Glauber dynamics for sampling. We prove approximate tensorization by recursively establishing block factorization of variance and entropy with a small balanced separator of the graph. Our approach goes beyond the classical canonical path method for variance and the recent spectral independence approach, and allows us to obtain new rapid mixing results. As applications of our approach, we show that:

1.

On graphs of treewidth $t$ , the mixing time of the Glauber dynamics is $n^{O(t)}$ , which recovers the recent results of Eppstein and Frishberg [EF21] with improved exponents and simpler proofs;
2.

On bounded-degree planar graphs, strong spatial mixing implies $\widetilde{O}(n)$ mixing time of the Glauber dynamics, which gives a faster algorithm than the previous deterministic counting algorithm by Yin and Zhang [YZ13].

1 Introduction

Spin systems, also known as undirected graphical models, are important models for describing the joint distribution of interacting random variables. Spin systems were first studied in statistical physics but have been widely used in many other areas including computer science, social network, and biology.

Consider a general spin system defined on a graph $G=(V,E)$ . The model describes a distribution, called Gibbs distribution, over all spin configurations where each vertex $v$ is assigned a spin $\sigma_{v}$ from a finite set $\mathcal{Q}$ . Given functions $\phi:\mathcal{Q}\times\mathcal{Q}\to\mathbb{R}_{\geq 0}$ characterizing pairwise interactions and $\psi:\mathcal{Q}\to\mathbb{R}_{+}$ measuring external bias, the Gibbs distribution associated with a spin system is defined as

\mu(\sigma)=\frac{1}{Z}\prod_{e=uv\in E}\phi(\sigma_{u},\sigma_{v})\prod_{v\in V}\psi(\sigma_{v}),\quad\forall\sigma:V\to\mathcal{Q}

(1)

where $Z$ is a normalizing constant known as the partition function.

We mention two classical examples of spin systems which have been extensively studied. The first is the hardcore model of independent sets. The Gibbs distribution is supported over the collection $\mathcal{I}_{G}$ of all independent sets of $G$ where each $I\in\mathcal{I}_{G}$ has density $\mu(I)=\lambda^{|I|}/Z$ , where $Z=\sum_{I\in\mathcal{I}_{G}}\lambda^{|\lambda|}$ . Observe that this corresponds to $\mathcal{Q}=\{0,1\}$ , $\phi(\sigma_{u},\sigma_{v})=1-\sigma_{u}\sigma_{v}$ , and $\psi(\sigma_{v})=1+(\lambda-1)\sigma_{v}$ in Eq. 1 with $\sigma$ being the indicator vector. Another example is random vertex colorings where the Gibbs distribution is the uniform distribution over all proper $q$ -colorings of $G$ . This corresponds to $\mathcal{Q}=[q]=\{1,\dots,q\}$ , $\phi(\sigma_{u},\sigma_{v})=\mathbbm{1}\{\sigma_{u}\neq\sigma_{v}\}$ , and $\psi\equiv 1$ in Eq. 1.

We study the problem of sampling from the Gibbs distribution of a spin system. In particular, we consider the single-site Glauber dynamics (also called Gibbs sampling) which is perhaps the simplest and most popular Markov chain Monte Carlo (MCMC) algorithm for sampling from a high-dimensional distribution. The Glauber dynamics is an ergodic Markov chain where in each iteration we choose a vertex $v$ uniformly at random, and update the spin $\sigma_{v}$ conditional on the spin values of all other vertices. The mixing time of Glauber dynamics is the smallest $t$ such that, starting from any initial configuration $\sigma^{(0)}$ , the distribution of $\sigma^{(t)}$ after $t$ steps is $1/4$ -close to the target distribution $\mu$ in total variation distance.

Bounding the mixing time of Glauber dynamics is challenging even for simplest spin systems like the hardcore model or random colorings. One common method for establishing mixing time bounds of Markov chains is by proving associated functional inequalities such as the Poincaré inequality or the standard/modified log-Sobolev inequality. For a function $f:\mathcal{Q}^{V}\to\mathbb{R}$ , the expectation of $f$ with respect to a Gibbs distribution $\mu$ is defined as $\mathbb{E}f=\sum_{\sigma:V\to\mathcal{Q}}\mu(\sigma)f(\sigma)$ . The variance and entropy functionals are defined as $\mathrm{Var}f=\mathbb{E}[f^{2}]-(\mathbb{E}f)^{2}$ , and $\mathrm{Ent}f=\mathbb{E}[f\log f]-(\mathbb{E}f)\log(\mathbb{E}f)$ for non-negative $f$ . For Glauber dynamics specifically, the Poincaré inequality can be expressed equivalently in the form of

\mathrm{Var}f\leq C\sum_{v\in V}\mathbb{E}[\mathrm{Var}_{v}(f)],\quad\forall f:\mathcal{Q}^{V}\to\mathbb{R}

(2)

where on the right-hand side $\mathrm{Var}^{\eta}_{v}f$ is the variance of $f$ under the conditional distribution $\mu^{\eta}_{v}$ of $\sigma_{v}$ where all other vertices are fixed to be some $\eta:V\setminus\{v\}\to\mathcal{Q}$ , and $\mathbb{E}[\mathrm{Var}_{v}f]$ takes expectation over $\eta$ . The inequality Eq. 2 is called Approximate Tensorization (AT) of variance. The Poincaré inequality, or equivalently AT of variance, implies that Glauber dynamics mixes in $O(Cn^{2})$ steps.

One can also consider the entropy analog of Eq. 2, known as Approximate Tensorization (AT) of entropy:

\mathrm{Ent}f\leq C\sum_{v\in V}\mathbb{E}[\mathrm{Ent}_{v}(f)],\quad\forall f:\mathcal{Q}^{V}\to\mathbb{R}.

(3)

AT of entropy is a much stronger property. It implies the modified log-Sobolev inequality for the Glauber dynamics, and gives a sharper mixing time bound $O(Cn\log n)$ (see Lemma 2.7) which is optimal if $C$ is constant.

Both Eqs. 2 and 3 were explicitly mentioned and carefully studied in [CMT15], though they were implicitly used in even earlier works, see e.g. [Mar99, GZ03, Ces01]. On a high level, both Eqs. 2 and 3 say that the global fluctuation of a function, quantified as variance or entropy, is always controlled by the sum of local fluctuations at each single variable, which is intuitively true when all variables are sufficiently independent from each other. Indeed, one has $C\geq 1$ in Eqs. 2 and 3 with the equality holds iff $\mu$ is a product distribution. See also [BCŠV22, KHR22] for recent applications of AT in learning and testing.

Establishing AT of variance and entropy is a challenging task even for simple distributions. For variance, the canonical path approach is a common way of proving the Poincaré inequality (i.e., bounding the spectral gap) in the setting of spin systems. The high-level idea is to construct a family of canonical paths or more generally a multi-commodity flow between each pair of configurations and then use the congestion of the flow to establish Eq. 2. Canonical paths have found many successful applications such as matchings [JS89], ferromagnetic Ising model [JS93], and bipartite perfect matchings [JSV04]. However, constructing canonical paths is not easy at all and usually involves some specific technical complications for each problem.

Meanwhile, for entropy it is much more difficult to establish AT or other related functional inequalities like the standard/modified log-Sobolev inequality. In many cases they are proved analytically, relying on the topology being the lattice [Mar99, GZ03, Ces01]. It was also known that for high-temperature models such as under the Dobrushin uniqueness, AT of entropy holds with $C=O(1)$ [CMT15, Mar19].

Recently, the spectral/entropic independence approach was introduced [ALO20, AJK⁺22] and becomes a powerful tool for establishing AT of both variance and entropy. For many families of spin systems it achieves $C=O(1)$ in Eqs. 2 and 3 and thus shows optimal $O(n\log n)$ mixing of Glauber dynamics. For example, for the hardcore model one obtains AT of variance and entropy with $C=O(1)$ when $\lambda<\lambda_{c}(\Delta)$ by a sequence of recent works [ALO20, CLV20, CLV21, CFYZ21, AJK⁺22, CFYZ22, CE22], where $\Delta$ denotes the maximum degree and $\lambda_{c}(\Delta)$ is the tree uniqueness threshold [Wei06]. It was known that Glauber dynamics can be exponentially slow when $\lambda>\lambda_{c}(\Delta)$ [MWW09] and hence $C=e^{\Omega(n)}$ for some graphs. The critical value $\lambda_{c}(\Delta)$ in fact pinpoints a computational phase transition, see [Wei06, Sly10, SS14, GŠV16] for more discussions. Though the spectral independence approach works well on general graphs for proving optimal mixing time, it does not apply when the mixing time is a larger polynomial instead of nearly linear. For example, for the hardcore model on trees the Glauber dynamics always mixes in polynomial time for all $\lambda>0$ [JSTV04] but (constant) spectral independence fails for large $\lambda$ .

In this paper, we ask if there is a natural and direct way of proving AT beyond canonical paths or spectral independence, especially when we have extra knowledge of the underlying graph structure. We present a simple combinatorial approach for proving AT of variance and entropy based on the existence of balanced separators of the graph, see Propositions 3.1 and 4.6. Using this approach we are able to obtain new rapid mixing results as immediate corollaries for certain classes of graphs such as bounded-treewidth graphs or planar graphs. We are also able to derive many previous results in a simple and straightforward way in contrast to the detailed technical proofs that were known previously. For example, we can easily deduce within one page that Glauber dynamics for $q$ -colorings on complete $d$ -ary trees is rapidly mixing for all $d\geq 2,q\geq 3$ , see Proposition 3.7.

Our proof approach is nicely explained for graphs of bounded treewidth. The treewidth of a graph characterizes the closeness of a graph to a tree, and is an important parameter for getting fixed parameter tractable algorithms for many graph problems. For the hardcore model and random colorings on bounded-treewidth graphs, we obtain rapid mixing of Glauber dynamics as immediate consequences of Propositions 3.1 and 4.6, which improves the results in [EF21] with better exponents. We remark that we can also obtain similar results for more general spin systems but for simplicity we only state them for the hardcore model and vertex colorings which are the most commonly studied examples .

Theorem 1.1.

Let $G=(V,E)$ be an $n$ -vertex graph of treewidth $t\geq 1$ . The mixing time of the Glauber dynamics for sampling from the hardcore model on $G$ with fugacity $\lambda>0$ is $n^{O(1+t\log(1+\lambda))}$ .

Theorem 1.2.

Let $G=(V,E)$ be an $n$ -vertex graph of maximum degree $\Delta\geq 3$ and treewidth $t\geq 1$ . For any $q\geq\Delta+2$ , the mixing time of the Glauber dynamics for sampling uniformly random $q$ -colorings of $G$ is $n^{O(t\Delta)}$ .

Previously, [EF21] presented mixing time bounds $n^{O(1+t\log\hat{\lambda})}$ for hardcore model where $\hat{\lambda}=\max\{\lambda,1/\lambda\}$ and $n^{O(t\Delta\log q)}$ for random $q$ -colorings. Our mixing time bounds are better in the exponents and our proof is much simpler avoiding the technical construction of multi-commodity flows as done in [EF21]. We note that for bounded-treewidth graphs one can exactly compute the partition function and thus sample in time $e^{O(t)}\cdot\mathrm{poly}(n)$ , see e.g. [YZ13, WTZL18] and the references therein. However, the performance of the Glauber dynamics is still unclear for such graphs which is the focus of this paper.

We establish Theorems 1.1 and 1.2 by factorizing variance recursively using a balanced separator of the graph; this is inspired by previous works on trees [MSW04] and lattice [Ces01, CP21]. Since $G$ has treewidth $t$ , it has a balanced separator of size at most $t$ . More specifically, there exists a partition $V=A\cup B\cup S$ such that $|S|\leq t$ , $|A|\leq 2n/3$ , $|B|\leq 2n/3$ and there is no edge between $A$ and $B$ . Given such we are able to establish a block factorization of variance

\mathrm{Var}f\leq C_{0}\left(\mathbb{E}[\mathrm{Var}_{A}f]+\mathbb{E}[\mathrm{Var}_{B}f]+\mathbb{E}[\mathrm{Var}_{S}f]\right),

(4)

where $C_{0}$ is a constant independent of $n$ . Since the size of $S$ is bounded, we can factorize the variance on $S$ into single vertices. Then by recursively applying Eq. 4 to $A$ and $B$ respectively, we are able to prove AT of variance with constant $C=C_{0}^{O(\log n)}=n^{O(1)}$ . We present our universal proof approach for AT in Propositions 3.1 and 4.6, and summarize basic tools for establishing such block factorization Eq. 4 in Sections 3.2 and 4.2.2.

We note that [EF21] also used balanced separators to construct multi-commodity flows and obtained similar results. In comparison, our approach establishes AT in a more direct way and is arguably simpler in nature. We remark that we could also establish AT of entropy, but since we are not aiming to get an optimal exponent in the mixing time we choose to work with variance which is much easier for calculation.

As another application of our approach, we consider planar graphs and show that the Strong Spatial Mixing (SSM) property (Definition 5.2) implies nearly optimal mixing time of Glauber dynamics. SSM is an important structural property characterizing the exponential decay of correlations between two subsets of variables as their graph distance grows. There is a rich literature in establishing rapid mixing of Glauber dynamics from SSM but mostly only for special classes of graphs such as lattice, see e.g. [Mar99, GZ03, Ces01, DSVW04, GMP05]. Here we consider general bounded-degree planar graphs with no restriction on the underlying topology. Previously, [YZ13] presented a determinstic counting algorithm under the same assumptions with large polynomial running time. Our result can be viewed as a faster sampling algorithm with nearly linear running time. We state our result for the hardcore model but it extends easily to general spin systems.

Theorem 1.3.

Let $G=(V,E)$ be an $n$ -vertex planar graph of maximum degree $\Delta\geq 3$ . Consider the hardcore model on $G$ with fugacity $\lambda>0$ . If SSM holds, then the mixing time of the Glauber dynamics is $O(n\log^{4}n)$ .

In fact, we prove Theorem 1.3 more generally for all graphs of polynomially bounded local treewidth, see Theorem 5.3. This is a class of graphs such that all local balls of radius $r$ have treewidth $\mathrm{poly}(r)$ . This includes for examples bounded-treewidth graphs, planar graphs, or graphs with polynomial neighborhood growth such as regions in $\mathbb{Z}^{d}$ (see Theorem 5.7). We remark that the result in [YZ13] holds only for graphs of linearly bounded local treewidth and thus our result holds for a larger class of graphs. We establish AT of entropy by combining SSM and local structure of the underlying graph. In particular, we show a new low-diameter decomposition of graphs (see Lemma 5.4) based on a classical result of Linial and Saks [LS93], which allows us to focus on subgraphs of $\mathrm{poly}(\log n)$ diameter assuming SSM. Note that here we have to work with entropy in order to get an $\widetilde{O}(n)$ mixing time.

Paper organization.

After giving preliminaries in Section 2, we present our new framework for establishing approximate tensorization in Section 3. We prove Theorems 1.1 and 1.2 in Section 4 for Glauber dynamics on bounded-treewidth graphs. We prove Theorem 1.3 more generally for graphs of bounded local treewidth in Section 5. We present missing proofs of basic tools for approximate tensorization and block factorization in Section 6.

Acknowledgments.

The author thanks Kuikui Liu, Eric Vigoda, and Thuy-Duong Vuong for stimulating discussions. The author thanks Eric Vigoda for helpful comments on the manuscript.

2 Preliminaries

2.1 Spin systems

We consider general families of spin systems, also known as undirected graphical models. Let $G=(V,E)$ be a graph. Every vertex $v\in V$ is associated with a finite set $\mathcal{Q}_{v}$ of spins (colors, labels). Let $\mathcal{X}=\prod_{v\in V}\mathcal{Q}_{v}$ denote the product space of all spin assignments called configurations. Let $\Phi=\{\phi_{e}:e\in E\}$ be a collection of edge interactions such that for every edge $e=uv\in E$ , $\phi_{e}:\mathcal{Q}_{u}\times\mathcal{Q}_{v}\to\mathbb{R}_{\geq 0}$ is a function mapping spins of the two endpoints to a non-negative weight, characterizing the interaction between neighboring vertices. Let $\Psi=\{\psi_{v}:v\in V\}$ be a collection of external fields such that for every vertex $v\in V$ , $\psi_{v}:\mathcal{Q}_{v}\to\mathbb{R}_{\geq 0}$ assigns a weight to each color representing the bias towards each color. The induced Gibbs distribution $\mu=\mu_{G,\Phi,\Psi}$ is given by

\mu(\sigma)=\frac{1}{Z}\prod_{e\in E}\phi_{e}(\sigma_{e})\prod_{v\in V}\psi_{v}(\sigma_{v}),\quad\forall\sigma\in\mathcal{X}

where $\sigma_{v}$ denotes the color of a vertex $v\in V$ and $\sigma_{e}$ denotes the (partial) spin assignment of vertices in $e$ , and $Z=Z_{G,\Phi,\Psi}$ is the partition function defined as

Z=\sum_{\sigma\in\mathcal{X}}\prod_{e\in E}\phi_{e}(\sigma_{e})\prod_{v\in V}\psi_{v}(\sigma_{v}).

We assume $Z>0$ so that the Gibbs distribution is well-defined.

For a subset $W\subseteq V$ let $\mu_{W}$ denotes the marginal distribution projected on $W$ . We say a spin configuration $\eta\in\prod_{v\in\Lambda}\mathcal{Q}_{v}$ on some subset $\Lambda\subseteq V$ is a (feasible) pinning if $\mu_{\Lambda}(\eta)>0$ . For any pinning $\eta$ on $\Lambda\subseteq V$ , the conditional Gibbs distribution $\mu^{\eta}$ , where the configuration on $\Lambda$ is fixed to be $\eta$ , can be viewed as another spin system on the graph $G\setminus\Lambda$ . For any $W\subseteq V\setminus\Lambda$ we further define $\mu^{\eta}_{W}$ to be the marginal on $W$ under the conditional distribution $\mu^{\eta}$ .

The Glauber dynamics is one of the simplest and most popular Markov chain Monte Carlo algorithms for sampling from the Gibbs distribution of a spin system. In each step of Glauber dynamics, we pick a vertex $v\in V$ uniformly at random and update the spin value $\sigma_{v}$ conditional on the configuration $\sigma_{V\setminus\{v\}}$ of all other vertices, i.e., from the conditional marginal $\mu_{v}^{\sigma_{V\setminus\{v\}}}$ . Two distinct configurations $\sigma,\tau\in\mathcal{X}$ are said to be adjacent iff they differ in the spin value at a single vertex. Hence, for Glauber dynamics configurations only move to adjacent ones. We assume that under any pinning $\eta$ , the Glauber dynamics for the conditional distribution $\mu^{\eta}$ is irreducible; namely, one feasible configuration can move to another through a chain of feasible configurations where consecutive pairs are adjacent. This is necessary for the Glauber dynamics to be ergodic and is naturally true for many spin systems of interest, including the hardcore model, random $q$ -colorings with $q\geq\Delta+2$ , etc.

2.2 Block factorization of variance and entropy

Consider the Gibbs distribution $\mu$ of a spin system $(G,\Phi,\Psi)$ defined on a graph $G=(V,E)$ with state space $\mathcal{X}=\prod_{v\in V}\mathcal{Q}_{v}$ .

Definition 2.1.

Let $f:\mathcal{X}\to\mathbb{R}$ be a function.

•

The expectation of $f$ with respect to $\mu$ is defined as $\mathbb{E}_{\mu}f=\sum_{\sigma\in\mathcal{X}}\mu(\sigma)f(\sigma).$
•

The variance of $f$ with respect to $\mu$ is defined as $\mathrm{Var}_{\mu}f=\mathbb{E}[(f-\mathbb{E}f)^{2}].$
•

For non-negative $f$ , the entropy of $f$ with respect to $\mu$ is defined as $\mathrm{Ent}_{\mu}f=\mathbb{E}[f\log f]-(\mathbb{E}f)\log(\mathbb{E}f),$ with the convention that $0\log 0=0$ .

We often omit the subscript and write $\mathbb{E}f,\mathrm{Var}f,\mathrm{Ent}f$ when the underlying distribution $\mu$ is clear from context.

Let $B\subseteq V$ be a subset of vertices. For any pinning $\eta$ on $V\setminus B$ , the expectation, variance, and entropy of a function $f$ with respect to the conditional Gibbs distribution $\mu^{\eta}_{B}$ is denoted as $\mathbb{E}^{\eta}_{B}f$ , $\mathrm{Var}^{\eta}_{B}f$ , and $\mathrm{Ent}^{\eta}_{B}f$ ; in particular, we view all of them as functions mapping a full configuration $\sigma\in\mathcal{X}$ to a real, which depends only on the pinning $\eta=\sigma_{V\setminus B}$ .

The following lemma summarizes several basic and important properties of variance and entropy in spin systems. The proof can be found in [MSW04] and the references therein.

Lemma 2.2 ([MSW04, Eq. (3), (4), (5)]).

Let $f:\mathcal{X}\to\mathbb{R}_{\geq 0}$ be a function.

(1)

For any subsets $B\subseteq A\subseteq V$ , it holds that $\mathbb{E}[\mathrm{Ent}_{A}f]=\mathbb{E}[\mathrm{Ent}_{B}f]+\mathbb{E}[\mathrm{Ent}_{A}(\mathbb{E}_{B}f)]$ ;
(2)

For $B=\bigcup_{i=1}^{m}B_{i}$ where $B_{1},\dots,B_{m}\subseteq V$ are pairwise disjoint subsets such that every distinct pair of $B_{i},B_{j}$ are disconnected in $G[B]$ , it holds that $\mathbb{E}[\mathrm{Ent}_{B}f]\leq\sum_{i=1}^{m}\mathbb{E}[\mathrm{Ent}_{B_{i}}f]$ ;
(3)

For any subsets $A,B\subseteq V$ such that there are no edges between $A$ and $B\setminus A$ , it holds that $\mathbb{E}\left[\mathrm{Ent}_{A}(\mathbb{E}_{B}f)\right]\leq\mathbb{E}\left[\mathrm{Ent}_{A}(\mathbb{E}_{A\cap B}f)\right]$ .

All three properties (1), (2), (3) hold with variance as well.

Definition 2.3.

Let $\mathcal{B}$ be a collection (possibly multiset) of subsets of $V$ . We say that $\mu$ satisfies $\mathcal{B}$ -factorization of variance (resp., entropy) with multiplier $C$ if for every function $f:\mathcal{X}\to\mathbb{R}$ (resp., $f:\mathcal{X}\to\mathbb{R}_{\geq 0}$ ) it holds

\mathrm{Var}f\leq C\sum_{B\in\mathcal{B}}\mathbb{E}[\mathrm{Var}_{B}f]\qquad\Big{(}\Big{.}~{}\text{resp.,~{}~{}}\mathrm{Ent}f\leq C\sum_{B\in\mathcal{B}}\mathbb{E}[\mathrm{Ent}_{B}f]~{}\Big{.}\Big{)}.

(5)

Remark 2.4.

Following [BGGŠ22], we call $C$ the “multiplier” instead of “constant” since it may depend on $n$ .

Remark 2.5.

The block factorization of variance/entropy can be defined more generally for weighted blocks; we refer to [CP21] for more details.

The $\mathcal{B}$ -factorization corresponds to the heat-bath block dynamics for sampling from $\mu$ : In each iteration, we pick $B\in\mathcal{B}$ uniformly at random and update $\sigma_{B}$ from the conditional distribution $\mu_{B}^{\sigma_{V\setminus B}}$ . More specifically, the $\mathcal{B}$ -factorization of variance is equivalent to the Poincaré inequality of such block dynamics and the $\mathcal{B}$ -factorization of entropy implies the modified log-Sobolev inequality (but the converse is not true). See [CMT15, CP21].

Definition 2.6.

We say that $\mu$ satisfies approximate tensorization (AT) of variance (resp., entropy) with multiplier $C$ if for every function $f:\mathcal{X}\to\mathbb{R}$ (resp., $f:\mathcal{X}\to\mathbb{R}_{\geq 0}$ ) it holds

\mathrm{Var}f\leq C\sum_{v\in V}\mathbb{E}[\mathrm{Var}_{v}f]\qquad\Big{(}\Big{.}~{}\text{resp.,~{}~{}}\mathrm{Ent}f\leq C\sum_{v\in V}\mathbb{E}[\mathrm{Ent}_{v}f]~{}\Big{.}\Big{)}.

(6)

Observe that AT is exactly $\mathcal{B}$ -factorization with $\mathcal{B}=\{\{v\}:v\in V\}$ . Establishing AT with a small $C$ allows us to conclude rapid mixing of Glauber dynamics, see [CLV21] and the references therein.

Lemma 2.7.

Suppose it holds that $\min_{\sigma\in\mathcal{X}:\,\mu(\sigma)>0}\mu(\sigma)=e^{-O(n)}$ . If $\mu$ satisfies AT of variance with multiplier $C$ , then the mixing time of Glauber dynamics is $O(Cn^{2})$ . If $\mu$ satisfies AT of entropy with multiplier $C$ , then the mixing time of Glauber dynamics is $O(Cn\log n)$ .

2.3 Separator decomposition

We prove AT via a divide-and-conquer argument. To accomplish this we need the following separator decomposition, slightly modified from [YZ13]. Such ideas also appeared in many previous works to obtain fixed-parameter tractable algorithms in graphs of bounded treewidth.

Definition 2.8 (Separator Decomposition, [YZ13]).

For a graph $G=(V,E)$ , a separator decomposition tree $T_{\mathsf{SD}}$ of $G$ is a rooted tree satisfying the following conditions:

•

Every node of $T_{\mathsf{SD}}$ is a pair $(U,S)$ where $U$ is a subset of vertices and $S$ is a separator of $G[U]$ ;
•

The root node of $T_{\mathsf{SD}}$ is a pair $(V,S_{V})$ ;
•

For every non-leaf node $(U,S)$ , the children of $(U,S)$ are connected components of $G[U\setminus S]$ and their separators;
•

Every leaf of $T_{\mathsf{SD}}$ is a pair $(U,U)$ .

A separator decomposition tree is said to be balanced if for every internal node $(U,S)$ and every child $(U^{\prime},S^{\prime})$ of $(U,S)$ , it holds that $|U^{\prime}|\leq 2|U|/3$ .

Remark 2.9.

Observe that all separators $S$ appearing in $T_{\mathsf{SD}}$ form a partition of $V$ .

It is easy to see that the height of a balanced separator decomposition tree is $O(\log n)$ .

Lemma 2.10.

Suppose $G=(V,E)$ is an $n$ -vertex graph with $n\geq 2$ . If $T_{\mathsf{SD}}$ is a balanced separator decomposition tree of $G$ , then the height of $T_{\mathsf{SD}}$ is less than $3\log n$ .

Proof.

Assume $h\geq 3\log n$ . Let $(U,U)$ be a leaf of $T_{\mathsf{SD}}$ at distance $h$ from the root node $(V,S_{V})$ . Since all separators are balanced, we have $1\leq|U|\leq(2/3)^{h}n\leq n^{1-3\log(3/2)}<1$ , contradiction. ∎

3 Combinatorial Approach for Approximate Tensorization

3.1 Approximate tensorization via separator decomposition

Our main step for establishing approximate tensorization of variance and entropy is given by the following proposition. Roughly speaking, if we can find a (small) separator $S\subseteq V$ of the underlying graph $G$ whose removal disconnects $G$ , then we can factorize variance/entropy into the block $S$ and all connected components of $V\setminus S$ , whose sizes are significantly smaller if the separator $S$ is balanced. Given a (balanced) separator decomposition tree, we can continue this process for each smaller block, and in the end tensorize into single vertices.

Proposition 3.1.

Let $(G,\Phi,\Psi)$ be a spin system defined on a graph $G=(V,E)$ with associated Gibbs distribution $\mu$ . Suppose that $T_{\mathsf{SD}}$ is a separator decomposition tree of $G$ satisfying:

(Block Factorization for Decomposition) For every node $(U,S)$ , there exists $C_{U,S}\geq 1$ , such that for any function $f:\mathcal{X}\to\mathbb{R}$ we have

\mathbb{E}[\mathrm{Var}_{U}f]\leq C_{U,S}\left(\mathbb{E}\left[\mathrm{Var}_{S}f\right]+\mathbb{E}\left[\mathrm{Var}_{U\setminus S}f\right]\right).

(7)

For all leaves $(U,U)$ we take $C_{U,U}=1$ .

2.

(Approximate Tensorization for Separators) For every node $(U,S)$ , there exists $C_{S}\geq 1$ , such that for any function $f:\mathcal{X}\to\mathbb{R}$ we have

$\mathbb{E}\left[\mathrm{Var}_{S}f\right]\leq C_{S}\sum_{v\in S}\mathbb{E}[\mathrm{Var}_{v}f].$ (8)

Then the Gibbs distribution $\mu$ satisfies approximate tensorization of variance with multiplier $C$ given by

C=\max_{(U,S)}\left\{C_{S}\prod_{(U^{\prime},S^{\prime})}C_{U^{\prime},S^{\prime}}\right\},

where the maximum is taken over all nodes of $T_{\mathsf{SD}}$ , and the product is over all nodes $(U^{\prime},S^{\prime})$ in the unique path from the root $(V,S_{V})$ to $(U,S)$ . Namely, for every function $f:\mathcal{X}\to\mathbb{R}$ we have

\displaystyle\mathrm{Var}f

\displaystyle\leq C\sum_{v\in V}\mathbb{E}[\mathrm{Var}_{v}f].

Remark 3.2.

The entropy version of Proposition 3.1 is also true and the proof is exactly the same with $\mathrm{Ent}(\cdot)$ replacing $\mathrm{Var}(\cdot)$ .

The proof of Proposition 3.1 is straightforwardly applying properties of variance and entropy given in Lemma 2.2. We use it as our basic strategy for obtaining meaningful AT bounds in many applications.

Proof of Proposition 3.1.

The lemma follows by decomposing the variance level by level on the separator decomposition tree $T_{\mathsf{SD}}$ by Lemmas 2.2, 7 and 8. More specifically, for the root $(V,S_{V})$ we have that

	$\displaystyle\mathrm{Var}f$	$\displaystyle\leq C_{V,S_{V}}\left(\mathbb{E}\left[\mathrm{Var}_{S}f\right]+\mathbb{E}\left[\mathrm{Var}_{V\setminus S}f\right]\right)$
		$\displaystyle\leq C_{V,S_{V}}C_{S_{V}}\sum_{v\in S_{V}}\mathbb{E}\left[\mathrm{Var}_{v}f\right]+C_{V,S_{V}}\sum_{U:\text{ c.c.\ of $G[V\setminus S_{V}]$}}\mathbb{E}\left[\mathrm{Var}_{U}f\right],$

where every $U$ is a connected component of $G[V\setminus S_{V}]$ and we can factorize $\mathbb{E}\left[\mathrm{Var}_{V\setminus S_{V}}f\right]$ without loss since it is a product distribution (Lemma 2.2). In particular, $U$ (together with its separator) is a child of $(V,S_{V})$ in $T_{\mathsf{SD}}$ . Continue the process for each child $(U,S_{U})$ , we obtain

	$\displaystyle\mathbb{E}\left[\mathrm{Var}_{U}f\right]$	$\displaystyle\leq C_{U,S_{U}}\left(\mathbb{E}\left[\mathrm{Var}_{S_{U}}f\right]+\mathbb{E}\left[\mathrm{Var}_{U\setminus S_{U}}f\right]\right)$
		$\displaystyle\leq C_{U,S_{U}}C_{S_{U}}\sum_{v\in S_{U}}\mathbb{E}\left[\mathrm{Var}_{v}f\right]+C_{U,S_{U}}\sum_{W:\text{ c.c.\ of $G[U\setminus S_{U}]$}}\mathbb{E}\left[\mathrm{Var}_{W}f\right],$

where every $W$ is a connected component of $G[U\setminus S_{U}]$ . So in the end we obtain

\mathrm{Var}f\leq\sum_{v\in V}C_{v}\mathbb{E}\left[\mathrm{Var}_{v}f\right]\leq C\sum_{v\in V}\mathbb{E}\left[\mathrm{Var}_{v}f\right],

where for each $v$ ,

C_{v}=C_{S}\prod_{(U^{\prime},S^{\prime})}C_{U^{\prime},S^{\prime}}\leq C

where $(U,S)$ is the unique node such that $v\in S$ (see Remark 2.9) and the product runs through all $(U^{\prime},S^{\prime})$ on the unique path from $(V,S_{V})$ to $(U,S)$ . This shows the lemma. ∎

3.2 Tools for factorization of variance and entropy

To apply Proposition 3.1, one needs to establish block factorization of variance/entropy for decomposition Eq. 7 and approximate tensorization for separators Eq. 8. In this subsection, we summarizes known and gives new results for factorization of variance and entropy in a very general setting, which are useful for showing Eqs. 7 and 8. The lemmas in this subsection are suitable for establishing AT or block factorization for disjoint blocks; for overlapping blocks we also need Lemma 4.10 from Section 4.2.2.

Two-variable factorization with weak correlation

We first consider AT for two variables. Let $X$ and $Y$ be two random variables with joint distribution $\pi=\pi_{XY}$ , fully supported on finite state spaces $\mathcal{X}$ and $\mathcal{Y}$ respectively. For applications such as proving Eq. 7, $X,Y$ would represent a block of vertices, namely $X=\sigma_{S}$ and $Y=\sigma_{U\setminus S}$ , and we consider the joint distribution of $(X,Y)=\sigma_{U}$ under an arbitrary pinning $\eta$ outside $U$ .

Denote the marginal distribution of $X$ by $\pi_{X}$ , and for $y\in\mathcal{Y}$ let $\pi_{X}^{y}$ be the distribution of $X$ conditioned on $Y=y$ . We define the marginal distribution $\pi_{Y}$ and for $x\in\mathcal{X}$ the conditional distribution $\pi_{Y}^{x}$ in the same way.

We use $\mathbb{E}f=\mathbb{E}_{\pi}f$ , $\mathrm{Var}f=\mathrm{Var}_{\pi}f$ , and $\mathrm{Ent}f=\mathrm{Ent}_{\pi}f$ to denote the expectation, variance, and entropy of some function $f:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}$ or $\mathbb{R}_{\geq 0}$ under the distribution $\pi$ , and use $\mathbb{E}_{X}^{y}f$ , $\mathrm{Var}_{X}^{y}f$ , and $\mathrm{Ent}_{X}^{y}f$ to denote the variance and entropy under the conditional distribution $\pi_{X}^{y}$ . As before, $\mathbb{E}[\mathrm{Var}_{X}(f)]$ and $\mathbb{E}[\mathrm{Ent}_{X}(f)]$ represent the expectation of $\mathrm{Var}_{X}^{Y}f$ and $\mathrm{Ent}_{X}^{Y}f$ where $Y$ is chosen from $\pi_{Y}$ .

We first show AT for $\pi$ when the correlation between $X$ and $Y$ is bounded pointwisely. The following lemma is implicitly given in [Ces01, DPPP02]. Here we give a self-contained, simplified proof with an improved constant. The lemma below can also be applied to the main results from [Ces01, DPPP02].

Lemma 3.3 ([Ces01, Proposition 2.1] and [DPPP02, Lemma 5.1 & 5.2]).

Suppose there exists a real $\varepsilon\in[0,1/2)$ such that for all $y\in\mathcal{Y}$ ,

\left|\frac{\pi_{X}^{y}(x)}{\pi_{X}(x)}-1\right|\leq\varepsilon,\quad\forall x\in\mathcal{X}.

(9)

Then we have

	$\displaystyle\mathrm{Var}f$	$\displaystyle\leq\left(1+\frac{\varepsilon}{1-2\varepsilon}\right)\left(\mathbb{E}[\mathrm{Var}_{X}f]+\mathbb{E}[\mathrm{Var}_{Y}f]\right),\quad\forall f:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}$		(10)
	$\displaystyle\text{and}\quad\mathrm{Ent}f$	$\displaystyle\leq\left(1+\frac{\varepsilon}{1-2\varepsilon}\right)\left(\mathbb{E}[\mathrm{Ent}_{X}f]+\mathbb{E}[\mathrm{Ent}_{Y}f]\right),\quad\forall f:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}_{\geq 0}.$		(11)

The proof of Lemma 3.3 can be found in Section 6.1.

Two-variable factorization with strong correlation

Our next result allows stronger correlations between $X$ and $Y$ , at a cost of larger constants for AT.

Lemma 3.4.

Suppose there exist reals $\varepsilon_{X},\varepsilon_{Y}\in[0,1]$ with $\varepsilon_{X}\varepsilon_{Y}>0$ such that

	$\displaystyle d_{\mathrm{TV}}(\pi_{X}^{y},\pi_{X}^{y^{\prime}})$	$\displaystyle\leq 1-\varepsilon_{X},\quad\forall y,y^{\prime}\in\mathcal{Y}$		(12)
	$\displaystyle\text{and}\quad d_{\mathrm{TV}}(\pi_{Y}^{x},\pi_{Y}^{x^{\prime}})$	$\displaystyle\leq 1-\varepsilon_{Y},\quad\forall x,x^{\prime}\in\mathcal{X}.$		(13)

Then we have

	$\displaystyle\mathrm{Var}f$	$\displaystyle\leq\frac{2}{\varepsilon_{X}+\varepsilon_{Y}}\left(\mathbb{E}[\mathrm{Var}_{X}f]+\mathbb{E}[\mathrm{Var}_{Y}f]\right),\quad\forall f:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}$		(14)
	$\displaystyle\text{and}\quad\mathrm{Ent}f$	$\displaystyle\leq\frac{4+2\log(1/\pi_{\min})}{\varepsilon_{X}+\varepsilon_{Y}}\left(\mathbb{E}[\mathrm{Ent}_{X}f]+\mathbb{E}[\mathrm{Ent}_{Y}f]\right),\quad\forall f:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}_{\geq 0}$		(15)

where $\pi_{\min}=\min_{(x,y)\in\mathcal{X}\times\mathcal{Y}:\,\pi(x,y)>0}\pi(x,y)$ .

Remark 3.5.

We remark that the constant for entropy factorization Eq. 15 is not optimal since $\log(1/\pi_{\min})$ depends logarithmically on the size of state space, $|\mathcal{X}\times\mathcal{Y}|=|\mathcal{X}|\cdot|\mathcal{Y}|$ . Getting rid of this is an interesting open question.

Multi-variable factorization

Finally, we consider approximate tensorization for multiple variables, which is helpful for establishing Eq. 8 when the size of the separator is bounded.

Let $X_{1},\dots,X_{n}$ be $n$ random variables where $X_{i}$ is fully supported on finite $\mathcal{X}_{i}$ for each $i$ . The joint distribution of $(X_{1},\dots,X_{n})$ , over the product space $\mathcal{X}=\prod_{i=1}^{n}\mathcal{X}_{i}$ is denoted by $\pi$ . For two disjoint subsets $A,B\subseteq[n]$ and a partial assignment $x_{A}\in\prod_{i\in A}\mathcal{X}_{i}$ with $\pi(X_{A}=x_{A})>0$ , let $\pi_{B}^{x_{A}}=\pi(X_{B}=\cdot\mid X_{A}=x_{A})$ denote the conditional marginal distribution on $B$ conditioned on that the variables in $A$ are assigned values from $x_{A}$ . In particular, $\pi_{B}$ denotes the marginal on $B$ . We write $\pi_{i}$ and $x_{i}$ when the underlying set is $\{i\}$ for simplicity.

As before, we write $\mathrm{Ent}_{i}^{x_{\bar{i}}}f=\mathrm{Ent}_{X_{i}}^{x_{\bar{i}}}f$ as the conditional entropy of $f$ on $X_{i}$ given $x_{\bar{i}}=(x_{1},\dots,x_{i-1},x_{i+1},\dots,x_{n})$ and let $\mathbb{E}[\mathrm{Ent}_{i}f]$ be its expectation where $x_{\bar{i}}$ is drawn from $\mu_{[n]\setminus i}$ ; the variance versions are defined in the same way.

Lemma 3.6.

Suppose there exists a real $\varepsilon\in(0,1]$ such that for every $\Lambda\subseteq[n]$ with $|\Lambda|\leq n-2$ , every $x_{\Lambda}\in\prod_{i\in\Lambda}\mathcal{X}_{i}$ with $\pi_{\Lambda}(x_{\Lambda})>0$ , every $i,j\in[n]\setminus\Lambda$ with $i\neq j$ , and every $x_{i},x^{\prime}_{i}\in\mathcal{X}_{i}$ with $\pi_{i}^{x_{\Lambda}}(x_{i})>0$ and $\pi_{i}^{x_{\Lambda}}(x^{\prime}_{i})>0$ , it holds

d_{\mathrm{TV}}(\pi_{j}^{x_{\Lambda},x_{i}},\pi_{j}^{x_{\Lambda},x^{\prime}_{i}})\leq 1-\varepsilon.

Then we have

	$\displaystyle\mathrm{Var}f$	$\displaystyle\leq\frac{1}{\varepsilon^{n-1}}\sum_{i=1}^{n}\mathbb{E}[\mathrm{Var}_{i}f],\quad\forall f:\mathcal{X}\to\mathbb{R}$		(16)
	$\displaystyle\text{and}\quad\mathrm{Ent}f$	$\displaystyle\leq\frac{2+\log(1/\pi_{\min})}{\varepsilon^{n-1}}\sum_{i=1}^{n}\mathbb{E}[\mathrm{Ent}_{i}f],\quad\forall f:\mathcal{X}\to\mathbb{R}_{\geq 0}$		(17)

where $\pi_{\min}=\min_{x\in\mathcal{X}:\,\pi(x)>0}\pi(x)$ .

Observe that for $n=2$ , Lemma 3.6 recovers Lemma 3.4 for the special case $\varepsilon_{X}=\varepsilon_{Y}=\varepsilon$ . The proofs of Lemmas 3.4 and 3.6 can be found in Section 6.2, which are simple applications of the spectral independence approach based on [AL20, ALO20, FGYZ21].

3.3 Example

Here we give a simple example as an application of Proposition 3.1 and tools from Section 3.2. We show polynomial mixing time of Glauber dynamics for sampling $q$ -colorings on a complete $d$ -ary tree for all $d\geq 2$ and $q\geq 3$ . This was known previously, see [GJK10, LM11, LMP09, TVVY12, SZ17] for even sharper results. However, Proposition 3.1 allows us to establish this fact in a more straightforward manner, avoiding technical complications such as constructing canonical paths or Markov chain decomposition.

Proposition 3.7 ([GJK10], see [LM11, LMP09, TVVY12, SZ17] for sharper results).

Let $d\geq 2$ and $q\geq 3$ be integers. Suppose $T$ is a complete $d$ -ary tree of height $h$ , and denote the number of vertices by $n=\sum_{i=0}^{h}d^{i}$ . The mixing time of the Glauber dynamics for sampling uniformly random $q$ -colorings of $T$ is $n^{O\big{(}1+\frac{d}{q\log d}\big{)}}$ .

Proof.

We apply Proposition 3.1. We may assume that $q\leq 3d$ since otherwise rapid mixing follows by standard path coupling arguments [Jer03]. The separator decomposition tree $T_{\mathsf{SD}}$ can be obtain from the original tree $T$ : every node of $T_{\mathsf{SD}}$ is of the form $(T_{v},\{v\})$ where $T_{v}$ is the subtree rooted at $v$ and the single-vertex set $\{v\}$ is a separator of $T_{v}$ . In particular, we have $C_{\{v\}}=1$ in Eq. 8 for each separator $\{v\}$ . We claim that $C_{T_{v},\{v\}}=e^{O(d/q)}$ in Eq. 7 for each node $(T_{v},\{v\})$ . To see this, consider two pinnings where $v$ receives colors $c_{1}$ and $c_{2}$ , and we can couple all children of $v$ with probability $(1-\frac{1}{q-1})^{d}=e^{-O(d/q)}$ when neither $c_{1}$ nor $c_{2}$ is used. Hence, we can couple the whole subtree $T_{v}\setminus\{v\}$ with the same probability, implying

d_{\mathrm{TV}}\left(\mu_{T_{v}\setminus\{v\}}^{v\leftarrow c_{1}},\mu_{T_{v}\setminus\{v\}}^{v\leftarrow c_{2}}\right)=1-e^{-O(d/q)}.

Thus, we deduce from Lemma 3.4 that $C_{T_{v},\{v\}}=e^{O(d/q)}$ . By Proposition 3.1 we obtain that AT of variance holds with multiplier

C=\left(e^{O(d/q)}\right)^{h}=n^{O\left(\frac{d}{q\log d}\right)}.

The mixing time then follows from Lemma 2.7. ∎

4 Rapid Mixing for Graphs of Bounded Treewidth

In this section we consider graphs of bounded treewidth. It is well-known that all bounded-treewidth graphs have a balanced separator decomposition tree with separators of bounded size.

Lemma 4.1 ([RS86, Gru12]).

If $G$ is a graph of treewidth $t$ , then there exists a balanced separator decomposition tree $T_{\mathsf{SD}}$ for $G$ such that for every node $(U,S)$ in $T_{\mathsf{SD}}$ it holds $|S|\leq t$ .

See also [Bod98, Ree03, HW17] for surveys on treewidth.

4.1 Hardcore model

In this subsection we prove Theorem 1.1 for the hardcore model via Proposition 3.1. As a byproduct, we also show $e^{O(\sqrt{n})}$ mixing time of the Glauber dynamics on arbitrary planar graphs with no restriction on the maximum degree, see Theorem 4.4.

To apply Proposition 3.1, we need to establish block factorization for every decomposition Eq. 7 and approximate tensorization for all separators Eq. 8, which are given by the following two lemmas.

Lemma 4.2.

Consider the hardcore model on a graph $G=(V,E)$ with fugacity $\lambda>0$ . Let $S\subseteq U\subseteq V$ be subsets with $|S|\leq t$ . For any pinning $\eta$ on $V\setminus U$ , the conditional hardcore Gibbs distribution $\mu^{\eta}_{U}$ satisfies $\{S,U\setminus S\}$ -factorization of variance with constant $C=2(1+\lambda)^{t}$ . In particular, for every function $f:\mathcal{X}\to\mathbb{R}$ it holds

\mathbb{E}[\mathrm{Var}_{U}f]\leq C\left(\mathbb{E}[\mathrm{Var}_{S}f]+\mathbb{E}[\mathrm{Var}_{U\setminus S}f]\right).

(18)

Proof.

For any pinning $\tau$ on $U\setminus S$ , we have

\mu^{\eta,\tau}_{S}(\sigma_{S}=\vec{0})\geq\frac{1}{(1+\lambda)^{|S|}}\geq\frac{1}{(1+\lambda)^{t}}.

Hence, for any two pinnings $\tau,\xi$ on $U\setminus S$ we have $d_{\mathrm{TV}}(\mu^{\eta,\tau}_{S},\mu^{\eta,\xi}_{S})\leq 1-(1+\lambda)^{-t}$ . Therefore, by Lemma 3.4 we have that $\mu^{\eta}_{U}$ satisfies $\{S,U\setminus S\}$ -factorization of variance with constant $C=2(1+\lambda)^{t}$ . Taking expectation over $\eta$ , we also obtain Eq. 18. ∎

Lemma 4.3.

Consider the hardcore model on a graph $G=(V,E)$ with fugacity $\lambda>0$ . Let $S\subseteq V$ be a subset with $|S|\leq t$ . For any pinning $\eta$ on $V\setminus S$ , the conditional hardcore Gibbs distribution $\mu^{\eta}_{S}$ satisfies approximate tensorization of variance with constant $C=(1+\lambda)^{t-1}$ . In particular, for every function $f:\mathcal{X}\to\mathbb{R}$ it holds

\mathbb{E}[\mathrm{Var}_{S}f]\leq C\sum_{v\in S}\mathbb{E}[\mathrm{Var}_{v}f].

(19)

Proof.

For any subset $\Lambda\subseteq S$ with $|\Lambda|\leq|S|-2$ and any pinning $\tau$ on $\Lambda$ , let $u,v\in S\setminus\Lambda$ be two distinct vertices and we have

\mu^{\eta,\tau}_{S\setminus\Lambda}(\sigma_{v}=0\mid\sigma_{u}=0)\geq\frac{1}{1+\lambda}\quad\text{and}\quad\mu^{\eta,\tau}_{S\setminus\Lambda}(\sigma_{v}=0\mid\sigma_{u}=1)\geq\frac{1}{1+\lambda}.

Hence, $d_{\mathrm{TV}}(\mu^{\eta,\tau,u\leftarrow 0}_{v},\mu^{\eta,\tau,u\leftarrow 1}_{v})\leq 1-(1+\lambda)^{-1}$ . It follows from Lemma 3.6 that $\mu^{\eta}_{S}$ satisfies approximate tensorization of variance with constant

C=(1+\lambda)^{|S|-1}\leq(1+\lambda)^{t-1}.

Taking expectation over $\eta$ , we also obtain Eq. 19. ∎

We are now ready to prove Theorem 1.1.

Proof of Theorem 1.1.

We deduce the theorem from Proposition 3.1. Since the graph has treewidth $t$ , there exists a balanced separator decomposition tree $T_{\mathsf{SD}}$ by Lemma 4.1 where all separators have size at most $t$ . The height of $T_{\mathsf{SD}}$ is $3\log n$ by Lemma 2.10. Block factorization for decomposition Eq. 7 is shown by Lemma 4.2 with $C_{U,S}=2(1+\lambda)^{t}$ for each node $(U,S)$ . Approximate tensorization for separators Eq. 8 is shown by Lemma 4.3 with $C_{S}=(1+\lambda)^{t-1}$ for each separator $S$ . Thus, we conclude from Proposition 3.1 that the hardcore Gibbs distribution $\mu$ satisfies approximate tensorization of variance with multiplier

C=(1+\lambda)^{t-1}\cdot\left(2(1+\lambda)^{t}\right)^{3\log n}=n^{O(1+t\log(1+\lambda))}.

The mixing time then follows from Lemma 2.7. ∎

As a byproduct, we also show $e^{O(\sqrt{n})}$ mixing of Glauber dynamics for the hardcore model on any planar graph. See [BKMP05, Hei20, EF21] which can obtain similar results.

Theorem 4.4.

Suppose $G$ is an $n$ -vertex planar graph. The mixing time of the Glauber dynamics for sampling from the hardcore model on $G$ with fugacity $\lambda>0$ is $(1+\lambda)^{O(\sqrt{n})}$ .

Proof.

We apply Proposition 3.1. It is well-known that every planar graph has a balanced separator $S\subseteq V$ of size $O(\sqrt{|V|})$ , such that each connected component of $G[V\setminus S]$ has size at most $2|V|/3$ ; see [LT79]. We can further find balanced separators for each component, and construct a balanced separator decomposition tree $T_{\mathsf{SD}}$ recursively. By Lemma 4.2, for every node $(U,S)$ we have block factorization for decomposition Eq. 7 with constant

C_{U,S}=2(1+\lambda)^{|S|}=(1+\lambda)^{O\big{(}\sqrt{|U|}\big{)}}.

By Lemma 4.3, for every separator $S$ we have approximate tensorization for separators Eq. 8 with constant

C_{S}=(1+\lambda)^{|S|-1}=(1+\lambda)^{O(\sqrt{n})}.

Therefore, we obtain from Proposition 3.1 that AT of variance holds with multiplier

\displaystyle C\leq(1+\lambda)^{O(\sqrt{n})}\cdot\prod_{i=0}^{\infty}(1+\lambda)^{O\left(\sqrt{\left(\frac{2}{3}\right)^{i}n}\right)}=(1+\lambda)^{O(\sqrt{n})}.

The mixing time then follows from Lemma 2.7. ∎

4.2 List colorings

In this subsection we prove Theorem 1.2 from the introduction for colorings. We consider the more general setting of list colorings, where each vertex $v$ is associated with a list $L_{v}$ of available colors, and every list coloring assigns to each vertex a color from its list such that adjacent vertices receive distinct colors. The Glauber dynamics is ergodic for list colorings if $|L_{v}|\geq\deg_{G}(v)+2$ for all $v$ . One handy feature of list colorings is that any pinning $\eta$ on a subset $\Lambda\subseteq V$ of vertices induces a new list coloring instance on the induced subgraph $G[V\setminus\Lambda]$ , for which $|L^{\eta}_{v}|\geq\deg_{G[V\setminus\Lambda]}(v)+2$ still holds for all unpinned $v$ where $L^{\eta}_{v}$ is the new list of available colors conditioned on $\eta$ ; see, e.g., [GKM15].

Theorem 4.5.

Let $G=(V,E)$ be a graph of maximum degree $\Delta\geq 3$ and treewidth $t\geq 1$ . Suppose each vertex $v\in V$ is associated with a color list $L_{v}$ such that $\deg_{G}(v)+2\leq|L_{v}|\leq q$ . The mixing time of the Glauber dynamics for sampling uniformly random list colorings of $G$ is $n^{O(t(\Delta+\log q))}$ .

4.2.1 A generalized version of Proposition 3.1

For list colorings we are not able to directly apply Proposition 3.1 to prove Theorem 4.5. Unlike the hardcore model where a vertex can always be unoccupied under any pinning of its neighbors, the lack of such “universal” color makes it hard to establish block factorization for decomposition Eq. 7 using tools from Section 3.2. In this subsection we present a stronger version of Proposition 3.1 which allows us to establish block factorization more easily for list colorings. Our applications in Section 5 also requires this more general version.

For subsets $S\subseteq U\subseteq V$ and an integer $r\geq 0$ , define

\mathsf{B}(S,r)=\{v\in V:\mathrm{dist}_{G}(v,S)\leq r\}

to be the ball around $S$ of radius $r$ in $G$ , and define

\mathsf{B}_{U}(S,r)=U\cap\mathsf{B}(S,r)=\{v\in U:\mathrm{dist}_{G}(v,S)\leq r\}

to be the portion of the ball contained in $U$ . In Proposition 4.6 below, we replace $S$ by $\mathsf{B}_{U}(S,r)$ in all suitable places in Proposition 3.1, which makes it easier for us to establish block factorization for decomposition Eq. 20 for hard-constraint models like list colorings.

Proposition 4.6.

Let $(G,\Phi,\Psi)$ be a spin system defined on a graph $G=(V,E)$ with associated Gibbs distribution $\mu$ . Let $r\geq 0$ be an integer. Suppose that $T_{\mathsf{SD}}$ is a separator decomposition tree of $G$ satisfying

(Block Factorization for Decomposition) For every node $(U,S)$ , there exists $C_{U,S}\geq 1$ , such that for any function $f:\mathcal{X}\to\mathbb{R}_{\geq 0}$ we have

\mathbb{E}[\mathrm{Ent}_{U}f]\leq C_{U,S}\left(\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U}(S,r)}f\right]+\mathbb{E}\left[\mathrm{Ent}_{U\setminus S}f\right]\right).

(20)

For all leaves $(U,U)$ we take $C_{U,U}=1$ .

(Approximate Tensorization for Separators) For every node $(U,S)$ , there exists $C_{S}\geq 1$ , such that for any function $f:\mathcal{X}\to\mathbb{R}_{\geq 0}$ we have

\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U}(S,r)}f\right]\leq C_{S}\sum_{v\in\mathsf{B}_{U}(S,r)}\mathbb{E}[\mathrm{Ent}_{v}f].

(21)

3.

(Bounded Coverage) There exists $A\geq 1$ such that for any vertex $v\in V$ ,

$|\{(U,S):v\in\mathsf{B}_{U}(S,r)\}|\leq A.$ (22)

Then the Gibbs distribution $\mu$ satisfies approximate tensorization of entropy with multiplier $C$ given by

C=A\cdot\max_{(U,S)}\left\{C_{S}\prod_{(U^{\prime},S^{\prime})}C_{U^{\prime},S^{\prime}}\right\},

\displaystyle\mathrm{Ent}f

\displaystyle\leq C\sum_{v\in V}\mathbb{E}[\mathrm{Ent}_{v}f].

Remark 4.7.

The variance version of Proposition 4.6 is also true and the proof is exactly the same.

Remark 4.8.

Observe that if $r=0$ then $\mathsf{B}_{U}(S,r)=S$ and we have $A=1$ in Eq. 22, so we recover exactly Proposition 3.1.

Proof of Proposition 4.6.

The proof is the same as for Proposition 3.1 by decomposing the entropy level by level on the separator decomposition tree $T_{\mathsf{SD}}$ . We similarly obtain

\mathrm{Ent}f\leq\sum_{v\in V}C_{v}\mathbb{E}\left[\mathrm{Ent}_{v}f\right],

where for each $v$ ,

C_{v}=\sum_{(U,S):\,v\in\mathsf{B}_{U}(S,r)}C_{S}\prod_{(U^{\prime},S^{\prime})}C_{U^{\prime},S^{\prime}}\leq C

where the product runs through all $(U^{\prime},S^{\prime})$ on the unique path from $(V,S_{V})$ to $(U,S)$ . This establishes the proposition. ∎

The following lemma is helpful for establishing bounded coverage Eq. 22 when applying Proposition 4.6.

Lemma 4.9.

Under the assumptions of Proposition 4.6:

•

If $G$ has maximum degree $\Delta\geq 3$ , then $A\leq\Delta\sum_{i=0}^{r-1}(\Delta-1)^{i}=O(\Delta^{r})$ ;
•

If $n\geq 3$ and $T_{\mathsf{SD}}$ is a balanced separator decomposition tree, then $A\leq 4\log n$ .

Proof.

If $G$ has maximum degree $\Delta\geq 3$ , we have for any $v\in V$ that

	$\displaystyle\|\{(U,S):v\in\mathsf{B}_{U}(S,r)\}\|$	$\displaystyle\leq\|\{(U,S):\mathrm{dist}_{G}(v,S)\leq r\}\|$
		$\displaystyle\leq\|\{u\in V:\mathrm{dist}_{G}(v,u)\leq r\}\|$
		$\displaystyle\leq\Delta\sum_{i=0}^{r-1}(\Delta-1)^{i}=O(\Delta^{r}),$

where the second inequality is because all separators form a partition of $V$ (see Remark 2.9).

If $T_{\mathsf{SD}}$ is balanced, then the height $h$ of $T_{\mathsf{SD}}$ is at most $3\log n$ by Lemma 2.10. Hence, for any $v\in V$ we have

|\{(U,S):v\in\mathsf{B}_{U}(S,r)\}|\leq|\{(U,S):v\in U\}|\leq h+1\leq 4\log n,

as claimed. ∎

4.2.2 Block factorization via marginal distributions

Note that to apply Proposition 4.6, we need to establish the block factorization for decomposition Eq. 20 where the two blocks $\mathsf{B}_{U}(S,r)$ and $U\setminus S$ overlap with each other. In particular, we can no longer apply Lemmas 3.4 and 3.3 directly since fixing the spin assignment in one block greatly changes the conditional distribution on the other block because of the overlap. Instead, we show block factorization for the marginal distribution $\mu_{S\cup(U\setminus\mathsf{B}_{U}(S,r))}$ for the two blocks $S$ and $U\setminus\mathsf{B}_{U}(S,r)$ , where we essentially exclude the overlap part. It turns out these two notions of block factorization are equivalent to each other, and for the latter we are able to apply tools from Section 3.2 since the blocks are now disjoint.

We show this equivalence in a general setting. Let $X,Y,Z$ be three random variables taking values from finite state spaces $\mathcal{X},\mathcal{Y},\mathcal{Z}$ respectively. Their joint distribution is denoted by $\pi=\pi_{XYZ}$ . Denote the marginal distribution for $(X,Y)$ by $\pi_{XY}$ and similarly for other choices of subsets of variables. We establish block factorization for $\pi$ into two blocks $\{X,Z\}$ and $\{Y,Z\}$ from approximate tensorization for the marginal distribution $\pi_{XY}$ . More precisely, we say $\pi$ satisfies $\{\{X,Z\},\{Y,Z\}\}$ -factorization of entropy with constant $C$ if for every function $f:\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}\to\mathbb{R}_{\geq 0}$ , it holds

\mathrm{Ent}f\leq C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}f\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}f\right]\right).

(23)

We say $\pi_{XY}$ satisfies $\{\{X\},\{Y\}\}$ -factorization (i.e., approximate tensorization) of entropy with constant $C$ if for every function $g:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}_{\geq 0}$ , it holds

\mathrm{Ent}_{XY}g\leq C\left(\mathbb{E}_{XY}\left[\mathrm{Ent}_{X}g\right]+\mathbb{E}_{XY}\left[\mathrm{Ent}_{Y}g\right]\right),

(24)

where the reference distribution is the marginal distribution $\pi_{XY}$ over the state space $\mathcal{X}\times\mathcal{Y}$ and in particular $\mathrm{Ent}_{X}g$ will be viewed as a function of $Y$ (instead of $Y,Z$ ). The variance versions of Eqs. 23 and 24 are defined in the same way with $\mathrm{Var}(\cdot)$ replacing $\mathrm{Ent}(\cdot)$ .

We show that these two notions of block factorization of entropy are equivalent to each other.

Lemma 4.10.

The joint distribution $\pi$ satisfies $\{\{X,Z\},\{Y,Z\}\}$ -factorization of entropy (resp., variance) with constant $C$ if and only if the marginal distribution $\pi_{XY}$ satisfies $\{\{X\},\{Y\}\}$ -factorization (i.e., approximate tensorization) of entropy (resp., variance) with constant $C$ .

The proof of Lemma 4.10 can be found in Section 6.3.

4.2.3 Proof of Theorem 4.5

We apply Proposition 4.6 with $r=1$ , where we take $A=\Delta+1$ in Eq. 22 by Lemma 4.9. We establish block factorization for decomposition Eq. 20 and approximate tensorization for separators Eq. 21 in the following two lemmas.

Lemma 4.11.

Consider list colorings on a graph $G=(V,E)$ of maximum degree $\Delta\geq 3$ where each vertex $v\in V$ is associated with a color list $L_{v}$ such that $\deg_{G}(v)+2\leq|L_{v}|\leq q$ . Let $S\subseteq U\subseteq V$ be subsets with $|S|\leq t$ . For any pinning $\eta$ on $V\setminus U$ , the uniform distribution $\mu^{\eta}_{U}$ over list colorings conditioned on $\eta$ satisfies $\{\mathsf{B}_{U}(S,1),U\setminus S\}$ -factorization of variance with constant $C=2(2^{\Delta}q)^{t}$ . In particular, for every function $f:\mathcal{X}\to\mathbb{R}$ it holds

\mathbb{E}[\mathrm{Var}_{U}f]\leq C\left(\mathbb{E}[\mathrm{Var}_{\mathsf{B}_{U}(S,1)}f]+\mathbb{E}[\mathrm{Var}_{U\setminus S}f]\right).

(25)

Proof.

Let $T=U\setminus\mathsf{B}_{U}(S,1)$ . We first prove $\{S,T\}$ -factorization for the marginal distribution $\mu^{\eta}_{S\cup T}$ on $S\cup T$ via Lemma 3.4. Let $\tau$ is an arbitrary pinning on $T$ which is feasible under $\eta$ . For any partial list coloring $\sigma$ on $S$ that is feasible under $\eta$ , we claim that

\mu^{\eta,\tau}_{S}(\sigma)\geq\frac{1}{(2^{\Delta}q)^{|S|}}\geq\frac{1}{(2^{\Delta}q)^{t}}.

(26)

Note that $\sigma$ must also be feasible under $\tau$ since $S$ and $T$ are not adjacent, and we can extend $\sigma\cup\tau\cup\eta$ to a full list coloring by the assumption $|L_{v}|\geq\deg_{G}(v)+2$ for all $v$ . By the chain rule, it suffices to show that for any $\Lambda\subseteq S$ and $v\in S\setminus\Lambda$ we have

\mu^{\eta,\tau,\sigma_{\Lambda}}_{v}(\sigma_{v})\geq\frac{1}{2^{\Delta}q}.

(27)

Consider a resampling procedure: starting from a random full list coloring generated from $\mu^{\eta,\tau,\sigma_{\Lambda}}$ , we first resample all (free) neighbors of $v$ and then $v$ . When resampling neighbors, the probability of avoiding color $\sigma_{v}$ is at least $1/2^{\Delta}$ since there are at least two available colors for each neighbor by assumption, and when resampling $v$ the probability of getting $\sigma_{v}$ is at least $1/|L_{v}|\geq 1/q$ , thus establishing Eq. 27 and consequently Eq. 26.

We deduce from Eq. 26 that for any two pinnings $\tau,\xi$ on $T$ , it holds

d_{\mathrm{TV}}(\mu^{\eta,\tau}_{S},\mu^{\eta,\xi}_{S})\leq 1-\frac{1}{(2^{\Delta}q)^{t}}.

Therefore, by Lemma 3.4 we have that $\mu^{\eta}_{S\cup T}$ satisfies $\{S,T\}$ -factorization of variance with constant $C=2(2^{\Delta}q)^{t}$ . Finally, by Lemma 4.10 we conclude that $\mu^{\eta}_{U}$ satisfies $\{\mathsf{B}_{U}(S,1),U\setminus S\}$ -factorization of variance with the same constant, and Eq. 25 follows by taking expectation over $\eta$ . ∎

Lemma 4.12.

Consider list colorings on a graph $G=(V,E)$ of maximum degree $\Delta\geq 3$ where each vertex $v\in V$ is associated with a color list $L_{v}$ such that $\deg_{G}(v)+2\leq|L_{v}|\leq q$ . Let $B\subseteq V$ be a subset with $|B|\leq k$ . For any pinning $\eta$ on $V\setminus B$ , the uniform distribution $\mu^{\eta}_{B}$ over list colorings conditioned on $\eta$ satisfies approximate tensorization of variance with constant $C=(2^{\Delta}q)^{k-1}$ . In particular, for every function $f:\mathcal{X}\to\mathbb{R}$ it holds

\mathbb{E}[\mathrm{Var}_{B}f]\leq C\sum_{v\in B}\mathbb{E}[\mathrm{Var}_{v}f].

(28)

Proof.

Fix a pinning $\tau$ on some subset $\Lambda\subseteq B$ and consider two distinct vertices $u,v\in B\setminus\Lambda$ . For any two colors $c_{1},c_{2}$ that are available to $u$ , there exists a color $c$ that is always available to $v$ no matter $\sigma_{u}=c_{1}$ or $c_{2}$ . This is because: either all neighbors of $v$ are pinned and we can choose any color $c$ available to $v$ ; or $v$ has at least one free neighbor and thus at least three available colors, and we choose one of them which is not $c_{1}$ or $c_{2}$ . For this color $c$ , we have $\mu^{\eta,\tau,u\leftarrow c_{j}}_{v}(c)\geq(2^{\Delta}q)^{-1}$ for $j=1,2$ , which was already shown in the proof of Lemma 4.11 (see Eq. 27). This implies that

d_{\mathrm{TV}}\left(\mu^{\eta,\tau,u\leftarrow c_{1}}_{v},\mu^{\eta,\tau,u\leftarrow c_{2}}_{v}\right)\leq 1-(2^{\Delta}q)^{-1}.

The lemma then follows from an application of Lemma 3.6, and taking expectation over $\eta$ we obtain Eq. 28 as claimed. ∎

We are now ready to prove Theorem 4.5 and Theorem 1.2 from the introduction.

Proof of Theorem 4.5.

We deduce the theorem from Proposition 4.6. Since the graph has bounded treewidth, there exists a balanced separator decomposition tree $T_{\mathsf{SD}}$ by Lemma 4.1 such that each separator has size at most $t$ . The height of $T_{\mathsf{SD}}$ is at most $3\log n$ by Lemma 2.10. We pick $r=1$ and hence we can take $A=\Delta+1$ in Eq. 22 by Lemma 4.9. Block factorization for decomposition Eq. 20 is shown by Lemma 4.11 with $C_{U,S}=2(2^{\Delta}q)^{t}$ for each node $(U,S)$ . Approximate tensorization for separators Eq. 21 follows from Lemma 4.12 with $C_{S}=(2^{\Delta}q)^{(\Delta+1)t-1}$ for each separator $S$ , noting that $|\mathsf{B}_{U}(S,1)|\leq(\Delta+1)t$ . Thus, we conclude from Proposition 4.6 that the uniform distribution $\mu$ of list colorings satisfies approximate tensorization of variance with multiplier

C=(\Delta+1)\cdot(2^{\Delta}q)^{(\Delta+1)t-1}\cdot\left(2(2^{\Delta}q)^{t}\right)^{3\log n}=n^{O(t(\Delta+\log q))}.

The mixing time then follows from Lemma 2.7. ∎

Proof of Theorem 1.2.

If $q>2\Delta$ then optimal mixing of Glauber dynamics follows from standard path coupling arguments [Jer03]. Otherwise, the theorem follows from Theorem 4.5. ∎

5 Rapid Mixing via SSM for Graphs of Bounded Local Treewidth

In this section we prove Theorem 1.3 from the introduction. We consider families of graphs of bounded local treewidth, which include planar graphs as a special case. We establish rapid mixing of Glauber dynamics under SSM for all such families.

Graphs of bounded local treewidth are defined as follows.

Definition 5.1 (Bounded Local Treewidth).

Let $G=(V,E)$ be a graph and $a,d>0$ be reals. We say $G$ has polynomially bounded local treewidth if it satisfies the following diameter-treewidth property: for any subgraph $H$ of $G$ , it holds

\mathrm{tw}(H)\leq a\cdot(\mathrm{diam}(H))^{d}.

Examples of families of graphs that have bounded local treewidth include:

•

Graphs of bounded treewidth;
•

Planar graphs, see [Epp99, DH04, YZ13];
•

Graphs of bounded growth, see Section 5.2.

The strong spatial mixing property characterizes the decay of correlations in a quantitative way. Roughly speaking, it says that in a spin system the correlation between the spin on a vertex $v$ and the configuration on a subset $W$ decays exponentially fast with their graph distance, and such decay holds uniformly under any pinning on any subset of vertices.

Definition 5.2 (Strong Spatial Mixing (SSM)).

Consider a spin system on a graph $G=(V,E)$ with Gibbs distribution $\mu$ . Let $C>0$ and $\delta\in(0,1)$ be reals. We say the spin system satisfies the strong spatial mixing property with exponential decay rate if for all pinning $\eta$ on $\Lambda\subseteq V$ , for any vertex $v\in V\setminus\Lambda$ and feasible spin $c\in\mathcal{Q}_{v}$ , and for any subset $W\subseteq V\setminus\Lambda\setminus\{v\}$ and two feasible configurations $\tau,\xi$ on $W$ , it holds

\left|\frac{\mu^{\eta}_{v}(c\mid\tau)}{\mu^{\eta}_{v}(c\mid\xi)}-1\right|\leq C(1-\delta)^{\mathrm{dist}_{G}(v,W)}.

Our main result in this section is stated as follows. We state it only for the hardcore model but it extends naturally to other spin systems as long as Glauber dynamics is ergodic under any pinning.

Theorem 5.3.

Let $G=(V,E)$ be an $n$ -vertex graph of maximum degree $\Delta\geq 3$ . Suppose that $G$ has bounded local treewidth with constant parameters $a,d>0$ , and suppose that the hardcore model on $G$ with fugacity $\lambda>0$ satisfies SSM with constant parameters $C,\delta>0$ . Then the mixing time of the Glauber dynamics for the hardcore Gibbs distribution on $G$ is $O(n\log^{4}n)$ .

5.1 Proof of Theorem 5.3

For graphs of bounded local treewidth, if the diameter is mildly bounded (say poly-logarithmically), then the treewidth is also bounded. However, in general the diameter of $G$ can be arbitrarily large. We need the following low-diameter decomposition of graphs which allows us to focus on subgraphs of small diameters and thus of small treewidth. We remark that Lemma 5.4 holds for arbitrary graphs with no restriction on the local treewidth or maximum degree.

Lemma 5.4.

Let $G=(V,E)$ be an $n$ -vertex graph where $n\geq 10$ . For any integer $r\in\mathbb{N}^{+}$ , there exists a partition $V=\bigcup_{i=1}^{m}V_{i}$ of the vertex set satisfying the following conditions:

1.

For each $i\in[m]$ , we have $\mathrm{diam}(G[\mathsf{B}(V_{i},r)])\leq 6r\log n+2r$ ;
2.

For each vertex $v\in V$ , we have $\left|\left\{i\in[m]:v\in\mathsf{B}(V_{i},r)\right\}\right|\leq 2\log n$ .

The proof of Lemma 5.4 is postponed to Section 5.3, which is based on a classical low-diameter decomposition result of Linial and Saks [LS93].

To obtain block factorization of entropy from Lemma 5.4, we need the strong spatial mixing property, which is given by the following lemma.

Lemma 5.5.

Suppose SSM holds with constant parameters $C>0$ and $\delta\in(0,1)$ . There exists a constant $\rho>0$ such that for any subsets $S\subseteq U\subseteq V$ and any $\gamma\geq 10$ , if $r\geq\rho(\log|S|+\log\gamma)$ then it holds for every function $f:\mathcal{X}\to\mathbb{R}_{\geq 0}$ that,

\mathbb{E}[\mathrm{Ent}_{U}f]\leq e^{1/\gamma}\left(\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U}(S,r)}f\right]+\mathbb{E}\left[\mathrm{Ent}_{U\setminus S}f\right]\right).

(29)

Proof.

First observe that it suffices to establish the inequality under an arbitrary pinning $\eta$ on $V\setminus U$ and then Eq. 29 follows by taking expectation over $\eta$ . By Lemma 4.10, it then suffices to prove block factorization of entropy for the marginal distribution $\mu^{\eta}_{S\cup T}$ for the two blocks $S$ and $T=U\setminus\mathsf{B}_{U}(S,r)$ . Suppose $S=\{v_{1},\dots,v_{\ell}\}$ where $\ell=|S|$ and let $\sigma$ be any feasible configuration on $S$ . For each $i$ define $S_{i}=\{v_{1},\dots,v_{i}\}$ and let $\sigma_{S_{i}}$ be the configuration restricted on $S_{i}$ . By SSM we have that for any two feasible configurations $\tau,\xi$ on $T$ ,

\frac{\mu^{\eta}_{S}(\sigma\mid\tau)}{\mu^{\eta}_{S}(\sigma\mid\xi)}=\prod_{i=1}^{\ell}\frac{\mu^{\eta,\sigma_{S_{i-1}}}_{v_{i}}(\sigma_{v_{i}}\mid\tau)}{\mu^{\eta,\sigma_{S_{i-1}}}_{v_{i}}(\sigma_{v_{i}}\mid\xi)}\leq\left(1+Ce^{-\delta r}\right)^{\ell}\leq\left(1+\frac{1}{4\gamma\ell}\right)^{\ell}\leq 1+\frac{1}{2\gamma},

where we pick $r=\rho(\log\ell+\log\gamma)$ for large enough constant $\rho>0$ and we assume $\gamma\geq 10$ so that the last two inequalities hold. Similarly, we also have

\frac{\mu^{\eta}_{S}(\sigma\mid\tau)}{\mu^{\eta}_{S}(\sigma\mid\xi)}\geq 1-\frac{1}{2\gamma}.

Then, we deduce from Lemma 3.3 that the marginal distribution $\mu^{\eta}_{S\cup T}$ satisfies $\{S,T\}$ -factorization of entropy with constant $C\leq 1+1/\gamma\leq e^{1/\gamma}$ (again we assume $\gamma\geq 10$ so that the first inequality holds). Eq. 29 then follows from Lemma 4.10 and averaging over $\eta$ . ∎

We now present the proof of Theorem 5.3.

Proof of Theorem 5.3.

By Lemma 5.5, there exists a constant $\rho>0$ such that Eq. 29 holds for all subsets $S\subseteq U\subseteq V$ and for $\gamma=n$ whenever $r\geq\rho\log n$ . We apply Lemma 5.4 to $G$ for $r=\left\lceil\rho\log n\right\rceil$ , and suppose the resulting partition is $V=\bigcup_{i=1}^{m}V_{i}$ . For each $i\in[m]$ let $U_{i}=\bigcup_{j=i}^{m}V_{j}$ , and so $U_{1}=V$ , $U_{m}=V_{m}$ and $U_{i+1}=U_{i}\setminus V_{i}$ . Then we deduce from Eq. 29 that for every function $f:\mathcal{X}\to\mathbb{R}_{\geq 0}$ ,

$\displaystyle\mathrm{Ent}f$	$\displaystyle\leq e^{1/n}\left(\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U_{1}}(V_{1},r)}f\right]+\mathbb{E}\left[\mathrm{Ent}_{U_{2}}f\right]\right)$
	$\displaystyle\leq e^{2/n}\left(\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U_{1}}(V_{1},r)}f\right]+\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U_{2}}(V_{2},r)}f\right]+\mathbb{E}\left[\mathrm{Ent}_{U_{3}}f\right]\right)$
	$\displaystyle\hskip 5.0pt\vdots$
	$\displaystyle\leq e^{m/n}\sum_{i=1}^{m}\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}_{U_{i}}(V_{i},r)}f\right]$
	$\displaystyle\leq 3\sum_{i=1}^{m}\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}(V_{i},r)}f\right],$	(30)

where the last inequality is due to $m\leq n$ and that $\mathbb{E}[\mathrm{Ent}_{B}f]\leq\mathbb{E}[\mathrm{Ent}_{A}f]$ for any $B\subseteq A\subseteq V$ (Lemma 2.2). The good news are that each ball $\mathsf{B}(V_{i},r)$ has $O(\log^{2}n)$ diameter and thus $O(\log^{2d}n)$ treewidth, and every vertex is contained in $O(\log n)$ many balls; both properties are guaranteed by Lemma 5.4. Thus, to prove approximate tensorization of entropy for $\mu$ , it suffices to prove it for each ball $\mathsf{B}(V_{i},r)$ .

We apply Proposition 4.6 to show AT for each $B_{i}=\mathsf{B}(V_{i},r)$ (to be more precise, by Proposition 4.6 we get AT uniformly under any pinning $\eta$ outside $B_{i}$ and then we take expectation over $\eta$ ). The balanced separator decomposition tree is given by Lemma 4.1. Observe that the size of each separator is $O(\log^{2d}n)$ since the treewidth of $B_{i}$ is $O(\log^{2d}n)$ , and the height $h$ of the decomposition tree satisfies $h=O(\log|B_{i}|)=O(\log n)$ by Lemma 2.10 since all separators are balanced. We take $r=\left\lceil c\log\log n\right\rceil$ in Proposition 4.6 for some sufficiently large constant $c>0$ , such that block factorization for decomposition Eq. 20 holds with $C_{U,S}=e^{1/h}$ for each node $(U,S)$ ; this again follows from Lemma 5.5 where we have $|S|=O(\log^{2d}n)$ and $\gamma=h=O(\log n)$ , and Eq. 29 holds whenever $r\geq c\log\log n$ . Also $A\leq 4\log n$ in Eq. 22 by Lemma 4.9. Therefore, we conclude from Proposition 4.6 that every $B_{i}=\mathsf{B}(V_{i},r)$ satisfies approximate tensorization of entropy with multiplier

C(\mathsf{B}(V_{i},r))=4\log n\cdot\big{(}e^{1/h}\big{)}^{h}\cdot\max_{S}C_{S}\leq 12\log n\cdot\max_{S}C_{S},

(31)

where $C_{S}$ is the AT multiplier in Eq. 21 for the ball $\mathsf{B}_{U}(S,r)$ and we take maximum over all separators.

Observe that, for each node $(U,S)$ in the decomposition tree we have

|B_{U}(S,r)|=O\left(|S|\cdot\Delta^{r}\right)=O\left(\log^{2d}n\cdot\Delta^{c\log\log n}\right)\leq\log^{t}n,

for some constant $t>0$ when $n$ is sufficiently large. Define $\varphi(k)$ to be the maximum of optimal AT multipliers over all subsets of vertices of size at most $k$ ; namely, for all $W\subseteq V$ with $|W|\leq k$ , it holds for every function $f:\mathcal{X}\to\mathbb{R}_{\geq 0}$ that

\mathbb{E}[\mathrm{Ent}_{W}f]\leq\varphi(k)\sum_{v\in W}\mathbb{E}[\mathrm{Ent}_{v}f].

Note that $\varphi(k)$ is monotone increasing. Thus, we obtain from Eq. 31 that

C(\mathsf{B}(V_{i},r))\leq 12\log n\cdot\varphi(\log^{t}n).

(32)

Combining Eqs. 30 and 32 and Lemma 5.4, we obtain

	$\displaystyle\mathrm{Ent}f$	$\displaystyle\leq 3\sum_{i=1}^{m}\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}(V_{i},r)}f\right]$
		$\displaystyle\leq 3\cdot 12\log n\cdot\varphi(\log^{t}n)\sum_{i=1}^{m}\sum_{v\in\mathsf{B}(V_{i},r)}\mathbb{E}\left[\mathrm{Ent}_{v}f\right]$
		$\displaystyle\leq 100\log^{2}n\cdot\varphi(\log^{t}n)\sum_{v\in V}\mathbb{E}\left[\mathrm{Ent}_{v}f\right].$

More generally, for any subset $W\subseteq V$ of size $k_{0}\leq|W|\leq k$ where $k_{0}>0$ is some fixed constant, we have for every function $f:\mathcal{X}\to\mathbb{R}_{\geq 0}$ that

\mathbb{E}[\mathrm{Ent}_{W}f]\leq 100\log^{2}k\cdot\varphi(\log^{t}k)\sum_{v\in W}\mathbb{E}\left[\mathrm{Ent}_{v}f\right].

This is shown by the same arguments applied to $\mu^{\eta}_{W}$ under an arbitrary pinning $\eta$ outside $W$ and then taking expectation. Hence, we have established the following recursive bound: for all $k\geq k_{0}$ ,

\varphi(k)\leq\max\left\{100\log^{2}k\cdot\varphi(\log^{t}k),\,\varphi(k_{0})\right\}.

(33)

We now solve Eq. 33. By choosing $k_{0}$ large enough, we may assume that for all $k\geq k_{0}$ ,

\log^{t}k<k\quad\text{and}\quad 100t^{3}(\log\log k)^{3}\leq\log k.

We prove by induction that

\varphi(k)\leq\varphi(k_{0})\cdot\log^{3}k.

(34)

Eq. 34 is trivial for $k<k_{0}$ . Suppose Eq. 34 is true for all $k^{\prime}<k$ . Then we have

100\log^{2}k\cdot\varphi(\log^{t}k)\leq 100\log^{2}k\cdot\varphi(k_{0})\cdot t^{3}(\log\log k)^{3}\leq\varphi(k_{0})\cdot\log^{3}k.

Thus, we deduce from the recursive bound Eq. 33 that

\varphi(k)\leq\max\left\{\varphi(k_{0})\cdot\log^{3}k,\,\varphi(k_{0})\right\}=\varphi(k_{0})\cdot\log^{3}k,

establishing Eq. 34.

Therefore, we have $\varphi(n)=O(\log^{3}n)$ . In particular, the hardcore Gibbs distribution on $G$ satisfies AT with multiplier $O(\log^{3}n)$ , and the mixing time follows from Lemma 2.7. ∎

5.2 Rapid mixing via SSM for graphs of bounded growth

In this subsection we consider graphs with polynomially bounded neighborhood growth.

Definition 5.6 (Bounded Growth).

Let $G=(V,E)$ be a graph and $a,d>0$ be reals. We say $G$ has polynomially bounded growth if for any vertex $v$ and any integer $r\geq 1$ it holds

|\mathsf{B}(v,r)|\leq a\cdot r^{d}.

Observe that any family of graphs of bounded growth also have bounded maximum degree and bounded local treewidth. To see the latter, for any nontrivial subgraph $H$ of a graph $G$ of bounded growth, let $v$ be any vertex of $H$ and $r=\mathrm{diam}(H)\geq 1$ , and we have

\displaystyle\mathrm{tw}(H)\leq|H|\leq|\mathsf{B}(v,r)|\leq a\cdot r^{d},

(35)

showing that $G$ has bounded local tree width. Thus, we deduce from Theorem 5.3 that for graphs of bounded growth, SSM implies $O(n\log^{4}n)$ mixing of Glauber dynamics. We can actually improve this mixing time bound slightly.

Theorem 5.7.

Let $G=(V,E)$ be an $n$ -vertex graph. Suppose that $G$ has bounded growth with constant parameters $a,d>0$ , and suppose that the hardcore model on $G$ with fugacity $\lambda>0$ satisfies SSM with constant parameters $C,\delta>0$ . Then the mixing time of the Glauber dynamics for the hardcore Gibbs distribution on $G$ is $O(n\log^{3}n)$ .

Proof.

Following the proof of Theorem 5.3, we have that for every function $f:\mathcal{X}\to\mathbb{R}_{\geq 0}$ ,

\mathrm{Ent}f\leq 3\sum_{i=1}^{m}\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}(V_{i},r)}f\right],

which is Eq. 30. For bounded-growth graphs, we have similarly as Eq. 35 that

|\mathsf{B}(V_{i},r)|\leq a\cdot\big{(}\mathrm{diam}(G[\mathsf{B}(V_{i},r)])\big{)}^{d}\leq t\log^{2d}n,

where $t>0$ is some fixed constant, and the last inequality is due to Lemma 5.4. With the same definition of $\varphi(\cdot)$ , we deduce that

	$\displaystyle\mathrm{Ent}f$	$\displaystyle\leq 3\sum_{i=1}^{m}\mathbb{E}\left[\mathrm{Ent}_{\mathsf{B}(V_{i},r)}f\right]$
		$\displaystyle\leq 3\cdot\varphi(t\log^{2d}n)\sum_{i=1}^{m}\sum_{v\in\mathsf{B}(V_{i},r)}\mathbb{E}\left[\mathrm{Ent}_{v}f\right]$
		$\displaystyle\leq 6\log n\cdot\varphi(t\log^{2d}n)\sum_{v\in V}\mathbb{E}\left[\mathrm{Ent}_{v}f\right],$

where the last inequality follows from Lemma 5.4. This allows us to obtain the following recursive bound: for all $k\geq k_{0}$ where $k_{0}>0$ is some fixed constant,

\varphi(k)\leq\max\left\{6\log k\cdot\varphi(t\log^{2d}k),\,\varphi(k_{0})\right\}.

(36)

Solving Eq. 36, we can get $\varphi(n)=O(\log^{2}n)$ . Thus, AT of entropy holds with multiplier $O(\log^{2}n)$ , and we get the mixing time bound from Lemma 2.7. ∎

5.3 Low-diameter decomposition

In this subsection we present the proof of Lemma 5.4.

Given a graph $G=(V,E)$ and a partition $V=\bigcup_{i=1}^{m}V_{i}$ of the vertex set into $m$ clusters, define the quotient graph $\mathcal{H}(G;V_{1},\dots,V_{m})$ to be the graph with vertex set $\{V_{i}:i\in[m]\}$ where two clusters $V_{i},V_{j}$ are adjacent iff there exists $u\in V_{i},v\in V_{j}$ such that $uv\in E$ . Namely, $H$ is the graph obtained from $G$ by contracting every cluster into a single vertex and connect two vertices iff the two clusters are adjacent. We need the following classical result of low-diameter decomposition due to Linial and Saks [LS93].

Lemma 5.8 ([LS93, Theorem 2.1]).

Let $G=(V,E)$ be an $n$ -vertex graph where $n\geq 10$ . There exists a partition $V=\bigcup_{i=1}^{m}V_{i}$ of the vertex set into clusters such that the following conditions hold:

1.

For each $i\in[m]$ , we have $\mathrm{diam}(G[V_{i}])\leq 3\log n$ ;
2.

The quotient graph $\mathcal{H}=\mathcal{H}(G;V_{1},\dots,V_{m})$ has chromatic number at most $2\log n$ .

Remark 5.9.

Lemma 5.8 is obtained by taking $p=1/2$ in Theorem 2.1 of [LS93]. Also note that the partition in Lemma 5.8 is defined differently from [LS93]: the cluster in [LS93] represents the union of all clusters of the same color in Lemma 5.8.

Proof of Lemma 5.4.

Let $G^{\leq 2r}$ be the graph with vertex set $V$ where two vertices are adjacent iff their graph distance is at most $2r$ . By Lemma 5.8, there exists a partition $V=\bigcup_{i=1}^{m}V_{i}$ for the graph $G^{\leq 2r}$ such that

1.

For each $i\in[m]$ , we have $\mathrm{diam}(G^{\leq 2r}[V_{i}])\leq 3\log n$ ;
2.

The quotient graph $\mathcal{H}(G^{\leq 2r};V_{1},\dots,V_{m})$ has chromatic number at most $2\log n$ .

We claim that such a partition $V=\bigcup_{i=1}^{m}V_{i}$ satisfies our requirements.

Fix $i\in[m]$ . For $u,v\in V_{i}$ , since the diameter of $G^{\leq 2r}[V_{i}]$ is at most $3\log n$ , there exists a path $P$ in $G^{\leq 2r}[V_{i}]$ connecting $u$ and $v$ of length at most $3\log n$ . By replacing every edge in $P$ in $G^{\leq 2r}[V_{i}]$ with a path in $G$ of length at most $2r$ , we obtain a path $P^{\prime}$ in $G$ of length at most $3\log n\cdot 2r=6r\log n$ . In particular, this new path $P^{\prime}$ is contained in the induced subgraph $G[\mathsf{B}(V_{i},r)]$ . Hence, the distance between all pairs of $u,v\in V_{i}$ in $G[\mathsf{B}(V_{i},r)]$ is at most $6r\log n$ . Also, note that every vertex in $\mathsf{B}(V_{i},r)$ is at distance at most $r$ from $V_{i}$ in $G[\mathsf{B}(V_{i},r)]$ . Thus, the diameter of $G[\mathsf{B}(V_{i},r)]$ is at most $6r\log n+2r$ , verifying the first condition.

For the second condition, take an arbitrary vertex $v\in V$ and consider all clusters at distance at most $r$ from $v$ . Then all these clusters have pairwise distance at most $2r$ , meaning that they form a clique in the quotient graph $\mathcal{H}(G^{\leq 2r};V_{1},\dots,V_{m})$ . Since the quotient graph is $(2\log n)$ -colorable, the size of this clique is at most $2\log n$ , implying the second condition. ∎

6 Proofs for Variance and Entropy Factorization

In this section, we give missing proofs from Sections 3.2 and 4.2.2.

6.1 Proof of Lemma 3.3

In this subsection we present the proof of Lemma 3.3. Our proof here avoids some technical difficulties appeared in [Ces01, Proposition 2.1] or [DPPP02, Lemma 5.2], and allows us to get a slightly better constant for AT.

Proof of Lemma 3.3.

We notice that factorization of entropy Eq. 11 implies factorization of variance Eq. 10 with the same constant by a standard linearization argument (plugging $f=1+\theta g$ into Eq. 11 and then taking $\theta\to 0$ ), see [CMT15]. Thus, it suffices to consider only entropy and prove Eq. 11.

By the law of total entropy (Lemma 6.5) we have

\mathrm{Ent}f=\mathbb{E}[\mathrm{Ent}_{X}f]+\mathrm{Ent}[\mathbb{E}_{X}f]=\mathbb{E}[\mathrm{Ent}_{Y}f]+\mathrm{Ent}[\mathbb{E}_{Y}f].

Hence, Eq. 11 is equivalent to that

\mathrm{Ent}f\geq(1-\varepsilon)\left(\mathrm{Ent}[\mathbb{E}_{X}f]+\mathrm{Ent}[\mathbb{E}_{Y}f]\right).

(37)

It suffices to show Eq. 37.

Without loss of generality we may assume that $\mathbb{E}f=1$ . Thus, $\mathbb{E}[\mathbb{E}_{X}f]=\mathbb{E}[\mathbb{E}_{Y}f]=\mathbb{E}f=1$ . Let us define

D=\mathrm{Ent}f-\mathrm{Ent}[\mathbb{E}_{X}f]-\mathrm{Ent}[\mathbb{E}_{Y}f].

(38)

Then by definition we have

	$\displaystyle D$	$\displaystyle=\mathbb{E}[f\log f]-\mathbb{E}[(\mathbb{E}_{X}f)\log(\mathbb{E}_{X}f)]-\mathbb{E}[(\mathbb{E}_{Y}f)\log(\mathbb{E}_{Y}f)]$
		$\displaystyle=\mathbb{E}[f\log f]-\mathbb{E}[f\log(\mathbb{E}_{X}f)]-\mathbb{E}[f\log(\mathbb{E}_{Y}f)]$
		$\displaystyle=\mathbb{E}\left[f\log f-f\log\left((\mathbb{E}_{X}f)(\mathbb{E}_{Y}f)\right)\right].$

Let $(x,y)\in\mathcal{X}\times\mathcal{Y}$ such that $\pi(x,y)>0$ . If $(\mathbb{E}_{X}^{y}f)(\mathbb{E}_{Y}^{x}f)>0$ , then by the inequality $a\log(a/b)\geq a-b$ for all $a\geq 0$ and $b>0$ , we deduce that at the point $(x,y)$

\displaystyle f\log f-f\log\left((\mathbb{E}_{X}f)(\mathbb{E}_{Y}f)\right)\geq f-(\mathbb{E}_{X}f)(\mathbb{E}_{Y}f).

(39)

Meanwhile, if $(\mathbb{E}_{X}^{y}f)(\mathbb{E}_{Y}^{x}f)=0$ then it must hold $f(x,y)=0$ since $f$ is non-negative, and hence Eq. 39 still holds at $(x,y)$ with both sides equal to 0 (recall $0\log 0=0$ ). Thus, we obtain that

D\geq\mathbb{E}\left[f-(\mathbb{E}_{X}f)(\mathbb{E}_{Y}f)\right]=-\mathbb{E}\left[(\mathbb{E}_{X}f-1)(\mathbb{E}_{Y}f-1)\right],

(40)

where we use $\mathbb{E}[\mathbb{E}_{X}f]=\mathbb{E}[\mathbb{E}_{Y}f]=\mathbb{E}f=1$ . Note that $\mathbb{E}\left[(\mathbb{E}_{X}f-1)(\mathbb{E}_{Y}f-1)\right]$ is the covariance of the two functions $\mathbb{E}_{X}f$ and $\mathbb{E}_{Y}f$ .

Let us now define a probability distribution over $\mathcal{X}\times\mathcal{Y}$ by $\nu=f\pi$ , i.e., $\nu(x,y)=\pi(x,y)f(x,y)$ for all $(x,y)\in\mathcal{X}\times\mathcal{Y}$ . Note that $\nu$ is indeed a distribution since $\mathbb{E}_{\pi}f=1$ . We also define the marginal distributions $\nu_{X},\nu_{Y}$ and the conditional distributions $\nu_{X}^{y},\nu_{Y}^{x}$ similarly as for $\pi$ . Observe that for each $y\in\mathcal{Y}$ , we have

\mathbb{E}_{X}^{y}f=\sum_{x\in\mathcal{X}}\pi_{X}^{y}f(x,y)=\sum_{x\in\mathcal{X}}\frac{\pi(x,y)}{\pi_{Y}(y)}f(x,y)=\frac{1}{\pi_{Y}(y)}\sum_{x\in\mathcal{X}}\nu(x,y)=\frac{\nu_{Y}(y)}{\pi_{Y}(y)}.

Then, we deduce that

\mathbb{E}\left[(\mathbb{E}_{X}f-1)(\mathbb{E}_{Y}f-1)\right]=\sum_{(x,y)\in\mathcal{X}\times\mathcal{Y}}\pi(x,y)\left(\frac{\nu_{X}(x)}{\pi_{X}(x)}-1\right)\left(\frac{\nu_{Y}(y)}{\pi_{Y}(y)}-1\right).

(41)

Let $\mathcal{X}^{+}=\{x\in\mathcal{X}:\nu_{X}(x)\geq\pi_{X}(x)\}$ and $\mathcal{X}^{-}=\{x\in\mathcal{X}:\nu_{X}(x)<\pi_{X}(x)\}$ , so $(\mathcal{X}^{+},\mathcal{X}^{-})$ is a partition of $\mathcal{X}$ . Define $\mathcal{Y}^{+}$ and $\mathcal{Y}^{-}$ in the same way with respect to $Y$ . Recall that the condition in Eq. 9 is equivalent to that for all $(x,y)\in\mathcal{X}\times\mathcal{Y}$ ,

(1-\varepsilon)\pi_{X}(x)\pi_{Y}(y)\leq\pi(x,y)\leq(1+\varepsilon)\pi_{X}(x)\pi_{Y}(y).

Hence, we obtain that

		$\displaystyle\sum_{(x,y)\in\mathcal{X}^{+}\times\mathcal{Y}^{+}}\pi(x,y)\left(\frac{\nu_{X}(x)}{\pi_{X}(x)}-1\right)\left(\frac{\nu_{Y}(y)}{\pi_{Y}(y)}-1\right)$
	$\displaystyle\leq{}$	$\displaystyle(1+\varepsilon)\sum_{(x,y)\in\mathcal{X}^{+}\times\mathcal{Y}^{+}}\pi_{X}(x)\pi_{Y}(y)\left(\frac{\nu_{X}(x)}{\pi_{X}(x)}-1\right)\left(\frac{\nu_{Y}(y)}{\pi_{Y}(y)}-1\right)$
	$\displaystyle={}$	$\displaystyle(1+\varepsilon)\left(\sum_{x\in\mathcal{X}^{+}}\nu_{X}(x)-\pi_{X}(x)\right)\left(\sum_{y\in\mathcal{Y}^{+}}\nu_{Y}(y)-\pi_{Y}(y)\right)$
	$\displaystyle={}$	$\displaystyle(1+\varepsilon)d_{\mathrm{TV}}(\nu_{X},\pi_{X})d_{\mathrm{TV}}(\nu_{Y},\pi_{Y}).$

The same upper bound holds for $\mathcal{X}^{-}\times\mathcal{Y}^{-}$ as well. Meanwhile,

		$\displaystyle\sum_{(x,y)\in\mathcal{X}^{+}\times\mathcal{Y}^{-}}\pi(x,y)\left(\frac{\nu_{X}(x)}{\pi_{X}(x)}-1\right)\left(\frac{\nu_{Y}(y)}{\pi_{Y}(y)}-1\right)$
	$\displaystyle\leq{}$	$\displaystyle(1-\varepsilon)\sum_{(x,y)\in\mathcal{X}^{+}\times\mathcal{Y}^{-}}\pi_{X}(x)\pi_{Y}(y)\left(\frac{\nu_{X}(x)}{\pi_{X}(x)}-1\right)\left(\frac{\nu_{Y}(y)}{\pi_{Y}(y)}-1\right)$
	$\displaystyle={}$	$\displaystyle(1-\varepsilon)\left(\sum_{x\in\mathcal{X}^{+}}\nu_{X}(x)-\pi_{X}(x)\right)\left(\sum_{y\in\mathcal{Y}^{-}}\nu_{Y}(y)-\pi_{Y}(y)\right)$
	$\displaystyle={}$	$\displaystyle-(1-\varepsilon)d_{\mathrm{TV}}(\nu_{X},\pi_{X})d_{\mathrm{TV}}(\nu_{Y},\pi_{Y}),$

and the same bound also holds for $(x,y)\in\mathcal{X}^{-}\times\mathcal{Y}^{+}$ . Therefore, plugging into Eq. 41, we obtain that

	$\displaystyle\mathbb{E}\left[(\mathbb{E}_{X}f-1)(\mathbb{E}_{Y}f-1)\right]$	$\displaystyle\leq 2(1+\varepsilon)d_{\mathrm{TV}}(\nu_{X},\pi_{X})d_{\mathrm{TV}}(\nu_{Y},\pi_{Y})-2(1-\varepsilon)d_{\mathrm{TV}}(\nu_{X},\pi_{X})d_{\mathrm{TV}}(\nu_{Y},\pi_{Y})$
		$\displaystyle=4\varepsilon d_{\mathrm{TV}}(\nu_{X},\pi_{X})d_{\mathrm{TV}}(\nu_{Y},\pi_{Y}).$		(42)

By Pinsker’s inequality, we have

$\displaystyle 4d_{\mathrm{TV}}(\nu_{X},\pi_{X})d_{\mathrm{TV}}(\nu_{Y},\pi_{Y})$	$\displaystyle\leq 2\sqrt{D_{\mathrm{KL}}(\nu_{X}\\|\pi_{X})D_{\mathrm{KL}}(\nu_{Y}\\|\pi_{Y})}$
	$\displaystyle\leq D_{\mathrm{KL}}(\nu_{X}\\|\pi_{X})+D_{\mathrm{KL}}(\nu_{Y}\\|\pi_{Y})$
	$\displaystyle=\mathrm{Ent}[\mathbb{E}_{Y}f]+\mathrm{Ent}[\mathbb{E}_{X}f]$	(43)

where the last equality follows from the observation that $D_{\mathrm{KL}}(\nu_{X}\|\pi_{X})=\mathrm{Ent}[\nu_{X}/\pi_{X}]=\mathrm{Ent}[\mathbb{E}_{Y}f]$ and similarly $D_{\mathrm{KL}}(\nu_{Y}\|\pi_{Y})=\mathrm{Ent}[\mathbb{E}_{X}f]$ .

Finally, combining Eqs. 40, 42 and 43, we obtain that

D\geq-\varepsilon\left(\mathrm{Ent}[\mathbb{E}_{X}f]+\mathrm{Ent}[\mathbb{E}_{Y}f]\right).

Recalling Eq. 38, we obtain Eq. 37 as claimed and the lemma then follows. ∎

6.2 Proofs of Lemmas 3.4 and 3.6

Before presenting the proofs of these two lemmas, we first give some definitions and lemmas that are needed.

Our proof of approximate tensorization is based on the spectral independence approach. The following definitions and theorem are taken from [FGYZ21] which builds upon [AL20, ALO20] (see also [CGŠV21]).

Definition 6.1 (Influence Matrix, [FGYZ21]).

Let $\pi$ be a distribution over a finite product space $\mathcal{X}=\prod_{i=1}^{n}\mathcal{X}_{i}$ . Fix a subset $\Lambda\subseteq[n]$ and a feasible partial assignment $x_{\Lambda}\in\prod_{i\in\Lambda}\mathcal{X}_{i}$ with $\pi_{\Lambda}(x_{\Lambda})>0$ . For any $i,j\in[n]\setminus\Lambda$ with $i\neq j$ , we define the (pairwise) influence of $X_{i}$ on $X_{j}$ conditioned on $x_{\Lambda}$ by

\Psi_{\pi}^{x_{\Lambda}}(i,j)=\max_{x_{i},x^{\prime}_{i}}d_{\mathrm{TV}}(\pi_{j}^{x_{\Lambda},x_{i}},\pi_{j}^{x_{\Lambda},x^{\prime}_{i}}),

where $x_{i},x^{\prime}_{i}$ are chosen from $\mathcal{X}_{i}$ such that $\pi_{i}^{x_{\Lambda}}(x_{i})>0$ and $\pi_{i}^{x_{\Lambda}}(x^{\prime}_{i})>0$ .

The (pairwise) influence matrix $\Psi_{\pi}^{x_{\Lambda}}$ is defined with entries given as above and also with $\Psi_{\pi}^{x_{\Lambda}}(i,i)=0$ for $i\in[n]\setminus\Lambda$ for diagonal entries.

Definition 6.2 (Spectral Independence, [FGYZ21]).

We say a distribution $\pi$ over a finite product space $\mathcal{X}=\prod_{i=1}^{n}\mathcal{X}_{i}$ is $(\eta_{0},\eta_{1},\dots,\eta_{n-2})$ -spectrally independent, if for every $0\leq k\leq n-2$ , every subset $\Lambda\subseteq[n]$ of size $|\Lambda|=k$ , and every feasible partial assignment $x_{\Lambda}\in\prod_{i\in\Lambda}\mathcal{X}_{i}$ with $\pi_{\Lambda}(x_{\Lambda})>0$ , the spectral radius $\rho(\Psi_{\pi}^{x_{\Lambda}})$ of the influence matrix $\Psi_{\pi}^{x_{\Lambda}}$ satisfies

\rho(\Psi_{\pi}^{x_{\Lambda}})\leq\eta_{k}.

Theorem 6.3 ([FGYZ21, Theorem 3.2]).

Let $\pi$ be a distribution over a finite product space $\mathcal{X}=\prod_{i=1}^{n}\mathcal{X}_{i}$ . Let $\eta_{0},\eta_{1},\dots,\eta_{n-2}$ be a sequence of reals such that $0\leq\eta_{k}<n-k-1$ for each $k$ . If $\pi$ is $(\eta_{0},\eta_{1},\dots,\eta_{n-2})$ -spectrally independent, then the spectral gap of the Glauber dynamics $P$ for sampling from $\pi$ satisfies

\lambda(P)\geq\frac{1}{n}\prod_{k=0}^{n-2}\left(1-\frac{\eta_{k}}{n-k-1}\right).

We also need the following well-known comparison between the spectral gap and the standard log-Sobolev constant (which holds more generally for any Markov chain).

Lemma 6.4 ([DSC96, Corollay A.4]).

Let $\pi$ be a distribution over a finite product space $\mathcal{X}=\prod_{i=1}^{n}\mathcal{X}_{i}$ . Suppose $\lambda$ is the spectral gap of the Glauber dynamics for sampling from $\pi$ , and $\rho$ is the standard log-Sobolev constant of it. Then we have

\rho\geq\frac{1-2\pi_{\min}}{\log(1/\pi_{\min}-1)}\lambda,

where $\pi_{\min}=\min_{x\in\mathcal{X}:\,\pi(x)>0}\pi(x)$ . In particular, if $\pi$ has positive density on at least two assignments in $\mathcal{X}$ then $\pi_{\min}\leq 1/2$ and we have

\rho\geq\frac{\lambda}{2+\log(1/\pi_{\min})}.

Finally, we need the following lemma for deriving AT from the spectral gap and the standard log-Sobolev constant.

Lemma 6.5 ([CMT15, Proposition 1.1]).

	$\displaystyle\mathrm{Var}f$	$\displaystyle\leq\frac{1}{\lambda n}\sum_{i=1}^{n}\mathbb{E}[\mathrm{Var}_{i}f],\quad\forall f:\mathcal{X}\to\mathbb{R}$
	$\displaystyle\text{and}\quad\mathrm{Ent}f$	$\displaystyle\leq\frac{1}{\rho n}\sum_{i=1}^{n}\mathbb{E}[\mathrm{Ent}_{i}f],\quad\forall f:\mathcal{X}\to\mathbb{R}_{\geq 0}.$

We are now ready to prove Lemmas 3.4 and 3.6.

Proof of Lemma 3.4.

The spectral radius of the influence matrix of $\pi$ , as defined in Definition 6.1, is upper bounded by

\rho(\Psi_{\pi})\leq\rho\left(\begin{bmatrix}0&1-\varepsilon_{Y}\\ 1-\varepsilon_{X}&0\\ \end{bmatrix}\right)=\sqrt{(1-\varepsilon_{X})(1-\varepsilon_{Y})}\leq 1-\frac{\varepsilon_{X}+\varepsilon_{Y}}{2}.

Therefore, by Theorem 6.3 the spectral gap of the Glauber dynamics is at least $(\varepsilon_{X}+\varepsilon_{Y})/4$ , and by Lemma 6.4 the standard log-Sobolev constant is at least $(\varepsilon_{X}+\varepsilon_{Y})/(8+4\log(1/\pi_{\min}))$ . We then deduce the lemma from Lemma 6.5. ∎

Remark 6.6.

Another way of proving Lemma 3.4 is by viewing the (pairwise) influence matrix $\Psi_{\pi}$ as the Dobrushin dependency/influence matrix, and showing the Glauber dynamics is contractive with respective to some weighted Hamming distance with the weight vector given by the principle eigenvector of $\Psi_{\pi}$ ; the lower bound on the spectral gap then follows from [LP17, Theorem 13.1] or [Che98].

Proof of Lemma 3.6.

By definition we see that $\pi$ is $(\eta_{0},\eta_{1},\dots,\eta_{n-2})$ -spectrally independent, where for each $k$ we have

\eta_{k}\leq(n-k-1)(1-\varepsilon)

by considering the $\ell_{\infty}$ norm (absolute row sum) of the influence matrices. Hence, we deduce from Theorem 6.3 that the spectral gap of the Glauber dynamics is lower bounded by

\lambda\geq\frac{1}{n}\prod_{k=0}^{n-2}\left(1-\frac{\eta_{k}}{n-k-1}\right)\geq\frac{\varepsilon^{n-1}}{n}.

Furthermore, by Lemma 6.4 the standard log-Sobolev constant is lower bounded by

\rho\geq\frac{\lambda}{2+\log(1/\pi_{\min})}\geq\frac{\varepsilon^{n-1}}{(2+\log(1/\pi_{\min}))n}.

Finally, the lemma follows from an application of Lemma 6.5. ∎

6.3 Proof of Lemma 4.10

Proof of Lemma 4.10.

Recall, the marginal distribution $\pi_{XY}$ satisfies $\{\{X\},\{Y\}\}$ -factorization (i.e., approximate tensorization) of entropy with constant $C$ if

\mathrm{Ent}_{XY}g\leq C\left(\mathbb{E}_{XY}\left[\mathrm{Ent}_{X}g\right]+\mathbb{E}_{XY}\left[\mathrm{Ent}_{Y}g\right]\right),\quad\forall g:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}_{\geq 0}.

(44)

We claim that Eq. 44 is equivalent to

\mathrm{Ent}\bar{g}\leq C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}\bar{g}\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}\bar{g}\right]\right),\quad\forall\bar{g}:\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}\to\mathbb{R}_{\geq 0}\text{~{}depending \emph{only} on $X$ and $Y$}

(45)

where the underlying distribution is $\pi=\pi_{XYZ}$ . To see this, observe that every function $g:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}_{\geq 0}$ is in one-to-one correspondence to a function $\bar{g}:\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}\to\mathbb{R}_{\geq 0}$ depending only on $X,Y$ by the relationship $\bar{g}(x,y,z)=g(x,y)$ . By definitions, we have $\mathrm{Ent}_{XY}g=\mathrm{Ent}\bar{g}$ , $\mathrm{Ent}_{\pi^{y}_{X}}g=\mathrm{Ent}_{\pi^{y}_{XZ}}\bar{g}$ for all $y\in\mathcal{Y}$ , and $\mathrm{Ent}_{\pi^{x}_{Y}}g=\mathrm{Ent}_{\pi^{x}_{YZ}}\bar{g}$ for all $x\in\mathcal{X}$ , implying Eqs. 44 and 45 are equivalent.

Therefore, it suffices to show that Eq. 45 is equivalent to that $\pi$ satisfies $\{\{X,Z\},\{Y,Z\}\}$ -factorization of entropy with constant $C$ , i.e.,

\mathrm{Ent}f\leq C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}f\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}f\right]\right),\quad\forall f:\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}\to\mathbb{R}_{\geq 0}.

(46)

It is trivial that Eq. 46 implies Eq. 45. For the other direction, suppose Eq. 45 is true. Since $\mathbb{E}_{Z}f$ is a function depending only on $X$ and $Y$ , we have from Eq. 45 that

\mathrm{Ent}(\mathbb{E}_{Z}f)\leq C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}(\mathbb{E}_{Z}f)\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}(\mathbb{E}_{Z}f)\right]\right).

Then, we deduce from Lemma 2.2 that

	$\displaystyle\mathrm{Ent}f$	$\displaystyle=\mathbb{E}[\mathrm{Ent}_{Z}f]+\mathrm{Ent}(\mathbb{E}_{Z}f)$
		$\displaystyle\leq\mathbb{E}[\mathrm{Ent}_{Z}f]+C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}(\mathbb{E}_{Z}f)\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}(\mathbb{E}_{Z}f)\right]\right)$
		$\displaystyle=\mathbb{E}[\mathrm{Ent}_{Z}f]+C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}f\right]-\mathbb{E}\left[\mathrm{Ent}_{Z}f\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}f\right]-\mathbb{E}\left[\mathrm{Ent}_{Z}f\right]\right)$
		$\displaystyle=C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}f\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}f\right]\right)-(2C-1)\mathbb{E}[\mathrm{Ent}_{Z}f]$
		$\displaystyle\leq C\left(\mathbb{E}\left[\mathrm{Ent}_{XZ}f\right]+\mathbb{E}\left[\mathrm{Ent}_{YZ}f\right]\right),$

where the last inequality is due to $C\geq 1$ (this can be seen by considering functions depending only on $X$ in Eq. 45). ∎

References

[AJK⁺22] Nima Anari, Vishesh Jain, Frederic Koehler, Huy Tuan Pham, and Thuy-Duong Vuong. Entropic independence: optimal mixing of down-up random walks. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 1418–1430, 2022.
[AL20] Vedat Levi Alev and Lap Chi Lau. Improved analysis of higher order random walks and applications. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 1198–1211, 2020.
[ALO20] Nima Anari, Kuikui Liu, and Shayan Oveis Gharan. Spectral independence in high-dimensional expanders and applications to the hardcore model. In Proceedings of the 61st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 1319–1330, 2020.
[BCŠV22] Antonio Blanca, Zongchen Chen, Daniel Štefankovič, and Eric Vigoda. Complexity of high-dimensional identity testing with coordinate conditional sampling. arXiv preprint arXiv:2207.09102, 2022.
[BGGŠ22] Ivona Bezáková, Andreas Galanis, Leslie Ann Goldberg, and Daniel Štefankovič. Fast sampling via spectral independence beyond bounded-degree graphs. In Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP), volume 229, 2022.
[BKMP05] Noam Berger, Claire Kenyon, Elchanan Mossel, and Yuval Peres. Glauber dynamics on trees and hyperbolic graphs. Probability Theory and Related Fields, 131(3):311–340, 2005.
[Bod98] Hans L Bodlaender. A partial $k$ -arboretum of graphs with bounded treewidth. Theoretical computer science, 209(1-2):1–45, 1998.
[CE22] Yuansi Chen and Ronen Eldan. Localization schemes: A framework for proving mixing bounds for Markov chains. In Proceedings of the 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 110–122. IEEE, 2022.
[Ces01] Filippo Cesi. Quasi-factorization of the entropy and logarithmic Sobolev inequalities for Gibbs random fields. Probability Theory and Related Fields, 120(4):569–584, 2001.
[CFYZ21] Xiaoyu Chen, Weiming Feng, Yitong Yin, and Xinyuan Zhang. Rapid mixing of Glauber dynamics via spectral independence for all degrees. In Proceedings of the 62nd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 137–148. IEEE, 2021.
[CFYZ22] Xiaoyu Chen, Weiming Feng, Yitong Yin, and Xinyuan Zhang. Optimal mixing for two-state anti-ferromagnetic spin systems. In Proceedings of the 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 588–599. IEEE, 2022.
[CGŠV21] Zongchen Chen, Andreas Galanis, Daniel Štefankovič, and Eric Vigoda. Rapid mixing for colorings via spectral independence. In Proceedings of the 32nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1548–1557, 2021.
[Che98] Mu-Fa Chen. Trilogy of couplings and general formulas for lower bound of spectral gap. In Probability towards 2000, pages 123–136. Springer, 1998.
[CLV20] Zongchen Chen, Kuikui Liu, and Eric Vigoda. Rapid mixing of Glauber dynamics up to uniqueness via contraction. In Proceedings of the 61st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 1307–1318, 2020.
[CLV21] Zongchen Chen, Kuikui Liu, and Eric Vigoda. Optimal mixing of Glauber dynamics: Entropy factorization via high-dimensional expansion. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 1537–1550, 2021.
[CMT15] Pietro Caputo, Georg Menz, and Prasad Tetali. Approximate tensorization of entropy at high temperature. Annales de la Faculté des sciences de Toulouse: Mathématiques, 24(4):691–716, 2015.
[CP21] Pietro Caputo and Daniel Parisi. Block factorization of the relative entropy via spatial mixing. Communications in Mathematical Physics, 388(2):793–818, 2021.
[DH04] Erik D Demaine and MohammadTaghi Hajiaghayi. Equivalence of local treewidth and linear local treewidth and its algorithmic applications. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 840–849, 2004.
[DPPP02] Paolo Dai Pra, Anna Maria Paganoni, and Gustavo Posta. Entropy inequalities for unbounded spin systems. The Annals of Probability, 30(4):1959–1976, 2002.
[DSC96] Persi Diaconis and Laurent Saloff-Coste. Logarithmic Sobolev inequalities for finite Markov chains. The Annals of Applied Probability, 6(3):695–750, 1996.
[DSVW04] Martin Dyer, Alistair Sinclair, Eric Vigoda, and Dror Weitz. Mixing in time and space for lattice spin systems: A combinatorial view. Random Structures & Algorithms, 24(4):461–479, 2004.
[EF21] David Eppstein and Daniel Frishberg. Rapid mixing of the hardcore Glauber dynamics and other Markov chains in bounded-treewidth graphs. arXiv preprint arXiv:2111.03898, 2021.
[Epp99] David Eppstein. Subgraph isomorphism in planar graphs and related problems. Journal of Graph Algorithms and Applications, 3(3):1–27, 1999.
[FGYZ21] Weiming Feng, Heng Guo, Yitong Yin, and Chihao Zhang. Rapid mixing from spectral independence beyond the Boolean domain. In Proceedings of the 32nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1558–1577. SIAM, 2021.
[GJK10] Leslie Ann Goldberg, Mark Jerrum, and Marek Karpinski. The mixing time of Glauber dynamics for coloring regular trees. Random Structures & Algorithms, 36(4):464–476, 2010.
[GKM15] David Gamarnik, Dmitriy Katz, and Sidhant Misra. Strong spatial mixing of list coloring of graphs. Random Structures & Algorithms, 46(4):599–613, 2015.
[GMP05] Leslie A. Goldberg, Russell Martin, and Mike Paterson. Strong spatial mixing with fewer colors for lattice graphs. SIAM Journal on Computing, 35(2):486–517, 2005.
[Gru12] Hermann Gruber. On balanced separators, treewidth, and cycle rank. Journal of Combinatorics, 3(4):669–681, 2012.
[GŠV16] Andreas Galanis, Daniel Štefankovič, and Eric Vigoda. Inapproximability of the partition function for the antiferromagnetic Ising and hard-core models. Combinatorics, Probability and Computing, 25(4):500–559, 2016.
[GZ03] Alice Guionnet and Bogusław Zegarliński. Lectures on Logarithmic Sobolev Inequalities, pages 1–134. Springer, Berlin, 2003.
[Hei20] Marc Heinrich. Glauber dynamics for colourings of chordal graphs and graphs of bounded treewidth. arXiv preprint arXiv:2010.16158, 2020.
[HW17] Daniel J Harvey and David R Wood. Parameters tied to treewidth. Journal of Graph Theory, 84(4):364–385, 2017.
[Jer03] Mark Jerrum. Counting, Sampling and Integrating: Algorithms and Complexity. Lectures in Mathematics, ETH Zürich. Birkhäuser Basel, 2003.
[JS89] Mark Jerrum and Alistair Sinclair. Approximating the permanent. SIAM Journal on Computing, 18(6):1149–1178, 1989.
[JS93] Mark Jerrum and Alistair Sinclair. Polynomial-time approximation algorithms for the Ising model. SIAM Journal on Computing, 22(5):1087–1116, 1993.
[JSTV04] Mark Jerrum, Jung-Bae Son, Prasad Tetali, and Eric Vigoda. Elementary bounds on Poincaré and log-Sobolev constants for decomposable Markov chains. The Annals of Applied Probability, 14(4):1741–1765, 2004.
[JSV04] Mark Jerrum, Alistair Sinclair, and Eric Vigoda. A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries. Journal of the ACM (JACM), 51(4):671–697, 2004.
[KHR22] Frederic Koehler, Alexander Heckett, and Andrej Risteski. Statistical efficiency of score matching: The view from isoperimetry. In Proceedings of the 11th International Conference on Learning Representations (ICLR), 2022.
[LM11] Brendan Lucier and Michael Molloy. The Glauber dynamics for colorings of bounded degree trees. SIAM Journal on Discrete Mathematics, 25(2):827–853, 2011.
[LMP09] Brendan Lucier, Michael Molloy, and Yuval Peres. The Glauber dynamics for colourings of bounded degree trees. In International Workshop on Approximation Algorithms for Combinatorial Optimization, pages 631–645. Springer, 2009.
[LP17] David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
[LS93] Nathan Linial and Michael Saks. Low diameter graph decompositions. Combinatorica, 13(4):441–454, 1993.
[LT79] Richard J Lipton and Robert Endre Tarjan. A separator theorem for planar graphs. SIAM Journal on Applied Mathematics, 36(2):177–189, 1979.
[Mar99] Fabio Martinelli. Lectures on Glauber dynamics for discrete spin models. In Lectures on probability theory and statistics, pages 93–191. Springer, 1999.
[Mar19] Katalin Marton. Logarithmic sobolev inequalities in discrete product spaces. Combinatorics, Probability and Computing, 28(6):919–935, 2019.
[MSW04] Fabio Martinelli, Alistair Sinclair, and Dror Weitz. Glauber dynamics on trees: boundary conditions and mixing time. Communications in Mathematical Physics, 250(2):301–334, 2004.
[MWW09] Elchanan Mossel, Dror Weitz, and Nicholas Wormald. On the hardness of sampling independent sets beyond the tree threshold. Probability Theory and Related Fields, 143(3):401–439, 2009.
[Ree03] Bruce A Reed. Algorithmic aspects of tree width. Recent advances in algorithms and combinatorics, pages 85–107, 2003.
[RS86] Neil Robertson and Paul D. Seymour. Graph minors. II. Algorithmic aspects of tree-width. Journal of algorithms, 7(3):309–322, 1986.
[Sly10] Allan Sly. Computational transition at the uniqueness threshold. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 287–296, 2010.
[SS14] Allan Sly and Nike Sun. The computational hardness of counting in two-spin models on $d$ -regular graphs. The Annals of Probability, 42(6):2383–2416, 2014.
[SZ17] Allan Sly and Yumeng Zhang. The Glauber dynamics of colorings on trees is rapidly mixing throughout the nonreconstruction regime. The Annals of Applied Probability, pages 2646–2674, 2017.
[TVVY12] Prasad Tetali, Juan C Vera, Eric Vigoda, and Linji Yang. Phase transition for the mixing time of the Glauber dynamics for coloring regular trees. The Annals of Applied Probability, pages 2210–2239, 2012.
[Wei06] Dror Weitz. Counting independent sets up to the tree threshold. In Proceedings of the 38th Annual ACM Symposium on Theory of Computing (STOC), pages 140–149, 2006.
[WTZL18] Pengfei Wan, Jianhua Tu, Shenggui Zhang, and Binlong Li. Computing the numbers of independent sets and matchings of all sizes for graphs with bounded treewidth. Applied Mathematics and Computation, 332:42–47, 2018.
[YZ13] Yitong Yin and Chihao Zhang. Approximate counting via correlation decay on planar graphs. In Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 47–66. SIAM, 2013.

	$\displaystyle\|\{(U,S):v\in\mathsf{B}_{U}(S,r)\}\|$	$\displaystyle\leq\|\{(U,S):\mathrm{dist}_{G}(v,S)\leq r\}\|$
		$\displaystyle\leq\|\{u\in V:\mathrm{dist}_{G}(v,u)\leq r\}\|$
		$\displaystyle\leq\Delta\sum_{i=0}^{r-1}(\Delta-1)^{i}=O(\Delta^{r}),$