This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The relative ff-invariant and non-uniform random sofic approximations

Christopher Shriver
Abstract

The ff-invariant is an isomorphism invariant of free-group measure-preserving actions introduced by Lewis Bowen in [Bow10b], where it was used to show that two finite-entropy Bernoulli shifts over a finitely generated free group can be isomorphic only if their base measures have the same Shannon entropy. In [Bow10a] Bowen showed that the ff-invariant is a variant of sofic entropy; in particular it is the exponential growth rate of the expected number of good models over a uniform random homomorphism.

In this paper we present an analogous formula for the relative ff-invariant and use it to prove a formula for the exponential growth rate of the expected number of good models over a random sofic approximation which is a type of stochastic block model.

1 Introduction, Main Results

Let G=SG=\langle S\rangle denote the the rank-rr free group with generating set S={s1,,sr}S=\{s_{1},\ldots,s_{r}\} and identity ee, and let (X,μ,T)(X,\mu,T) be a measure-preserving GG-system, i.e. TT is a homomorphism from GG to the automorphism group of the standard probability space (X,μ)(X,\mu). We will not need to make explicit use of the σ\sigma-algebra on XX, so we leave it unnamed.

An observable on XX is a measurable map with domain XX. In this paper the codomain will be a finite set endowed with the discrete sigma algebra; in this case we call the map a finite observable and the codomain an alphabet.

Any observable α:X𝙰\alpha\colon X\to\mathtt{A} induces a map αG:X𝙰G\alpha^{G}\colon X\to\mathtt{A}^{G} by setting

(αG(x))g=α(Tgx)for all gG.(\alpha^{G}(x))_{g}=\alpha(T_{g}x)\quad\text{for all }g\in G.

The 𝙰\mathtt{A}-coloring αG(x)\alpha^{G}(x) of GG is sometimes called the itinerary of xx, since it records the observations that will be made over the entire orbit of xx under the action of GG. We also similarly define the map αH:X𝙰H\alpha^{H}\colon X\to\mathtt{A}^{H} for any subset HH of GG. We abbreviate αnαB(e,n)\alpha^{n}\coloneqq\alpha^{\mathrm{B}(e,n)}, where B(e,n)\mathrm{B}(e,n) is the closed ball of radius nn centered at the identity in GG, which is endowed with the word-length metric. If β:X𝙱\beta\colon X\to\mathtt{B} is a second finite observable, we denote by αβ:X𝙰×𝙱\alpha\beta\colon X\to\mathtt{A}\times\mathtt{B} the map αβ(x)=(α(x),β(x))\alpha\beta(x)=(\alpha(x),\beta(x)).

The (Shannon) entropy of a finite observable α:X𝙰\alpha\colon X\to\mathtt{A} is defined by

Hμ(α)=a𝙰αμ(a)logαμ(a),\mathrm{H}_{\mu}(\alpha)=-\sum_{a\in\mathtt{A}}\alpha_{*}\mu(a)\log\alpha_{*}\mu(a),

where αμProb(𝙰)\alpha_{*}\mu\in\operatorname{Prob}(\mathtt{A}) is the pushforward measure; we take the convention 0log0=00\log 0=0. The entropy of α\alpha can be interpreted as the expected amount of information revealed by observing α\alpha, assuming its distribution αμ\alpha_{*}\mu is known.

An early application of Shannon’s entropy to ergodic theory was its use by Kolmogorov and Sinai to show that there exist nonisomorphic Bernoulli shifts over \mathbb{Z}. A Bernoulli shift over \mathbb{Z} is a system of the form (𝙰,μ,S)(\mathtt{A}^{\mathbb{Z}},\mu^{\mathbb{Z}},S) for some alphabet 𝙰\mathtt{A} and μProb(𝙰)\mu\in\operatorname{Prob}(\mathtt{A}); SS is the shift action of \mathbb{Z}. They did this by defining an entropy rate for \mathbb{Z}-systems, which can be interpreted as the average information per unit time revealed by observing the system. For a Bernoulli shift (𝙰,μ,S)(\mathtt{A}^{\mathbb{Z}},\mu^{\mathbb{Z}},S), the entropy rate is simply the “base entropy” Hμ(α)\mathrm{H}_{\mu}(\alpha), where α:𝙰n𝙰\alpha\colon\mathtt{A}^{n}\to\mathtt{A} is the “time zero” observable.

Isomorphism invariance of the KS entropy rate is typically proven using the fact that entropy rate is nonincreasing under factor maps (which are surjective homomorphisms of measure-preserving systems). This fact can be interpreted as stating that a system cannot simulate another system that is “more random.”

The entropy rate was soon generalized to systems acted on by an arbitrary amenable group (such as d\mathbb{Z}^{d}). Extending beyond amenable groups proved more difficult, and in fact it was found to be impossible for such an extension to preserve all desirable properties of the KS entropy rate. In particular, an entropy rate for nonamenable group which assigns Bernoulli shifts their base entropy cannot be nonincreasing under factor maps [OW87, Appendix C].

The first invariant to distinguish between Bernoulli shifts over free groups is Lewis Bowen’s ff-invariant. Following [Bow10a], this can be defined by

Fμ(T,α)\displaystyle F_{\mu}(T,\alpha) =(12r)Hμ(α)+i=1rHμ(α{e,si})\displaystyle=(1-2r)\mathrm{H}_{\mu}(\alpha)+\sum_{i=1}^{r}\mathrm{H}_{\mu}(\alpha^{\{e,s_{i}\}})
fμ(T,α)\displaystyle f_{\mu}(T,\alpha) =infnFμ(T,αn)=limnFμ(T,αn).\displaystyle=\inf_{n}F_{\mu}(T,\alpha^{n})=\lim_{n\to\infty}F_{\mu}(T,\alpha^{n}).

The main theorem of [Bow10b] is that fμ(T,α)f_{\mu}(T,\alpha) depends on the observable α\alpha only through the σ\sigma-algebra it generates. In particular, the common value of fμ(T,α)f_{\mu}(T,\alpha) among all α\alpha which generate the Borel σ\sigma-algebra on XX (assuming such α\alpha exist) is a measure-conjugacy invariant of the system (X,μ,T)(X,\mu,T). In the same paper, he showed that the ff-invariant of a Bernoulli shift is the Shannon entropy of the base measure; in particular, Bernoulli shifts with different base entropies are nonisomorphic.

In [Bow10a], Bowen gave an alternate formula for the ff-invariant, which we now introduce.

For any homomorphism σ:GSym(n)\sigma\colon G\to\operatorname{Sym}(n) we have a GG-system ([n],Unif(n),σ)([n],\operatorname{Unif}(n),\sigma), and we can consider a labeling 𝐱𝙰n\mathbf{x}\in\mathtt{A}^{n} as an observable on this system. We denote the law of its itinerary by P𝐱σ=𝐱GUnif(n)P^{\sigma}_{\mathbf{x}}=\mathbf{x}^{G}_{*}\operatorname{Unif}(n) and call this the empirical distribution of 𝐱\mathbf{x}. We say that 𝐱\mathbf{x} is a good model for α\alpha over σ\sigma if it is difficult to distinguish the GG-systems (X,μ,T)(X,\mu,T) and ([n],Unif(n),σ)([n],\operatorname{Unif}(n),\sigma) via their respective observables α\alpha and 𝐱\mathbf{x}. To make this precise, we denote

Ω(σ,𝒪){𝐱𝙰n:P𝐱σ𝒪},\Omega(\sigma,\mathcal{O})\coloneqq\{\mathbf{x}\in\mathtt{A}^{n}\,:\,P^{\sigma}_{\mathbf{x}}\in\mathcal{O}\},

which is a set of good models for α\alpha over σ\sigma if 𝒪\mathcal{O} is a weak-open neighborhood of αGμProb(𝙰G)\alpha^{G}_{*}\mu\in\operatorname{Prob}(\mathtt{A}^{G}); the particular set 𝒪\mathcal{O} quantifies how good the models are. The alphabet 𝙰\mathtt{A} is given the discrete topology and 𝙰G\mathtt{A}^{G} the product topology, so “weak-close” means marginals on some finite sets are close in total variation norm.

For each nn\in\mathbb{N}, let μn=Unif(Hom(G,Sym(n)))\mu_{n}=\operatorname{Unif}(\operatorname{Hom}(G,\operatorname{Sym}(n))). Bowen showed in [Bow10a] that the ff-invariant is given by

fμ(T,α)=inf𝒪αGμlim supn1nlog𝔼σμn|Ω(σ,𝒪)|.f_{\mu}(T,\alpha)=\inf_{\mathcal{O}\ni\alpha^{G}_{*}\mu}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\Omega(\sigma,\mathcal{O})\rvert.

To make an analogy with statistical physics, we can think of αGμ\alpha^{G}_{*}\mu as a macroscopic statistical distribution of the state of a system; then the ff-invariant is the exponential growth rate of the number of “microstates” that are consistent with these statistics. What we here call good models are often called microstates for this reason.

If β:X𝙱\beta\colon X\to\mathtt{B} is a second observable, the conditional entropy is

Hμ(α|β)=Hμ(αβ)Hμ(β).\mathrm{H}_{\mu}(\alpha|\beta)=\mathrm{H}_{\mu}(\alpha\beta)-\mathrm{H}_{\mu}(\beta).

This can be interpreted as the expected amount of information revealed by observing α\alpha if both the value of β\beta and the joint distribution of α\alpha and β\beta are known. By analogy we define

Fμ(T,α|β)\displaystyle F_{\mu}(T,\alpha|\beta) =Fμ(T,αβ)Fμ(T,β)\displaystyle=F_{\mu}(T,\alpha\beta)-F_{\mu}(T,\beta)
=(12r)Hμ(α|β)+i=1rHμ(α{e,si}β{e,si})\displaystyle=(1-2r)\mathrm{H}_{\mu}(\alpha|\beta)+\sum_{i=1}^{r}\mathrm{H}_{\mu}(\alpha^{\{e,s_{i}\}}\mid\beta^{\{e,s_{i}\}})
fμ(T,α|β)\displaystyle f_{\mu}(T,\alpha|\beta) =infk1supk2Fμ(T,αk1βk2).\displaystyle=\inf_{k_{1}\in\mathbb{N}}\sup_{k_{2}\in\mathbb{N}}F_{\mu}(T,\alpha^{k_{1}}\mid\beta^{k_{2}}).

Both the infimum and supremum can be replaced by limits; this follows from Lemma 3.2 below. It follows from Corollary 3.5 that we could also directly define

fμ(T,α|β)=fμ(T,αβ)fμ(T,β),f_{\mu}(T,\alpha|\beta)=f_{\mu}(T,\alpha\beta)-f_{\mu}(T,\beta),

as long as fμ(T,β)>f_{\mu}(T,\beta)>-\infty.

A few more definitions are required to state our main theorems. If HH is a finite subset of GG, we denote by dH(μ,ν)d^{H}(\mu,\nu) the total variation distance between the marginals of μ\mu and ν\nu on 𝙰H\mathtt{A}^{H}. Our convention for the total variation distance between measures μ,νProb(𝙰)\mu,\nu\in\operatorname{Prob}(\mathtt{A}) is

μνTV=12a𝙰|μ{a}ν{a}|.\|\mu-\nu\|_{\operatorname{TV}}=\frac{1}{2}\sum_{a\in\mathtt{A}}\lvert\mu\{a\}-\nu\{a\}\rvert.

For each kk\in\mathbb{N} we define a pseudometric on Prob(𝙰G)\operatorname{Prob}(\mathtt{A}^{G}) by

dk(μ,ν)=i[r]dB(e,k)B(si,k)(μ,ν).d^{*}_{k}(\mu,\nu)=\sum_{i\in[r]}d^{\mathrm{B}(e,k)\cup\mathrm{B}(s_{i},k)}(\mu,\nu).

Note that {dk}k\{d_{k}^{*}\}_{k\in\mathbb{N}} together generate the weak topology on Prob(𝙰G)\operatorname{Prob}(\mathtt{A}^{G}). These generalize the almost-pseudometric111Bowen’s dσd_{\sigma}^{*} is essentially a pseudometric except that its first and second arguments come from different sets. dσd_{\sigma}^{*} from [Bow10a], which corresponds to the case k=0k=0. For 𝒪={νProb(𝙰G):dk(αGμ,ν)<ε}\mathcal{O}=\{\nu\in\operatorname{Prob}(\mathtt{A}^{G})\,:\,d^{*}_{k}(\alpha^{G}_{*}\mu,\nu)<\varepsilon\} we write

Ω(σ,𝒪)Ωk(σ,α,ε)𝙰n.\Omega(\sigma,\mathcal{O})\eqqcolon\Omega^{*}_{k}(\sigma,\alpha,\varepsilon)\subseteq\mathtt{A}^{n}.

In the present paper, instead of picking a homomorphism σ\sigma uniformly at random we will use the following type of stochastic block model: given 𝐲0𝙱n\mathbf{y}_{0}\in\mathtt{B}^{n}, σ0Hom(G,Sym(n))\sigma_{0}\in\operatorname{Hom}(G,\operatorname{Sym}(n)), and kk\in\mathbb{N}, let

𝚂𝙱𝙼(σ0,𝐲0,k)Unif({σHom(G,Sym(n)):dk(P𝐲0σ,P𝐲0σ0)=0}).\mathtt{SBM}(\sigma_{0},\mathbf{y}_{0},k)\coloneqq\operatorname{Unif}(\{\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n))\,:\,d^{*}_{k}(P_{\mathbf{y}_{0}}^{\sigma},P_{\mathbf{y}_{0}}^{\sigma_{0}})=0\}).

The labeling 𝐲0\mathbf{y}_{0} partitions the elements of [n][n] into |𝙱|\lvert\mathtt{B}\rvert communities, and we can think of the random homomorphism σ\sigma as a random choice of directed edges between and within the communities. Certain statistics of these random edge choices are determined by the reference homomorphism σ0\sigma_{0}; note that for k>0k>0 these statistics are more precise than those specified by a standard stochastic block model. In Section 2 we define weights, which are the objects used to record the relevant statistics.

We first prove our formula for the relative ff-invariant under a Markov assumption: in this case, our stochastic block model only needs to take into account “one-step statistics.”

Theorem A.

Let α:X𝙰\alpha\colon X\to\mathtt{A} and β:X𝙱\beta\colon X\to\mathtt{B} be finite observables, and for each nn let 𝐲n𝙱n\mathbf{y}_{n}\in\mathtt{B}^{n} and σnHom(G,Sym(n))\sigma_{n}\in\operatorname{Hom}(G,\operatorname{Sym}(n)) be such that

limnd0(P𝐲nσn,βGμ)=0.\lim_{n\to\infty}d_{0}^{*}(P_{\mathbf{y}_{n}}^{\sigma_{n}},\beta^{G}_{*}\mu)=0.

Suppose that βGμ\beta^{G}_{*}\mu is a Markov measure. With μn=𝚂𝙱𝙼(σn,𝐲n,0)\mu_{n}=\mathtt{SBM}(\sigma_{n},\mathbf{y}_{n},0), we have

fμ(T,αβ)=inf𝒪(αβ)Gμlim supn1nlog𝔼σμn|{𝐱𝙰n:(𝐱,𝐲n)Ω(σ,𝒪)}|.f_{\mu}(T,\alpha\mid\beta)=\inf_{\mathcal{O}\ni(\alpha\beta)^{G}_{*}\mu}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\left\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y}_{n})\in\Omega(\sigma,\mathcal{O})\}\right\rvert.
Proposition A.

The assumptions of Theorem A are nonvacuous; that is, for any finite observable β:X𝙱\beta\colon X\to\mathtt{B} there exist sequences {𝐲n𝙱n}n=1\{\mathbf{y}_{n}\in\mathtt{B}^{n}\}_{n=1}^{\infty} and {σnHom(G,Sym(n))}n=1\{\sigma_{n}\in\operatorname{Hom}(G,\operatorname{Sym}(n))\}_{n=1}^{\infty} such that limnd0(P𝐲nσn,βGμ)=0\lim_{n\to\infty}d_{0}^{*}(P_{\mathbf{y}_{n}}^{\sigma_{n}},\beta^{G}_{*}\mu)=0.

If βGμ\beta^{G}_{*}\mu is not Markov, then the same formula holds with a more precise type of stochastic block model:

Theorem B.

Let α:X𝙰\alpha\colon X\to\mathtt{A} and β:X𝙱\beta\colon X\to\mathtt{B} be finite observables. Let mnm_{n} approach infinity as nn goes to infinity while satisfying mn=o(loglogn)m_{n}=o(\log\log n). For each nn let 𝐲n𝙱n\mathbf{y}_{n}\in\mathtt{B}^{n} and σnHom(G,Sym(n))\sigma_{n}\in\operatorname{Hom}(G,\operatorname{Sym}(n)) be such that

dmn(P𝐲nσn,βGμ)=O(1logn).d_{m_{n}}^{*}(P_{\mathbf{y}_{n}}^{\sigma_{n}},\beta^{G}_{*}\mu)=O\big{(}\tfrac{1}{\log n}\big{)}.

Suppose that fμ(T,β)>f_{\mu}(T,\beta)>-\infty. With μn=𝚂𝙱𝙼(σn,𝐲n,mn)\mu_{n}=\mathtt{SBM}(\sigma_{n},\mathbf{y}_{n},m_{n}),

fμ(T,αβ)=inf𝒪(αβ)Gμlim supn1nlog𝔼σμn|{𝐱𝙰n:(𝐱,𝐲n)Ω(σ,𝒪)}|.f_{\mu}(T,\alpha\mid\beta)=\inf_{\mathcal{O}\ni(\alpha\beta)^{G}_{*}\mu}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y}_{n})\in\Omega(\sigma,\mathcal{O})\}\rvert.
Proposition B.

The assumptions of Theorem B are nonvacuous; that is, for any finite observable β:X𝙱\beta\colon X\to\mathtt{B} and any sequence {mn}n=1\{m_{n}\in\mathbb{N}\}_{n=1}^{\infty} approaching infinity while satisfying mn=o(loglogn)m_{n}=o(\log\log n), there exist sequences {𝐲n𝙱n}n=1\{\mathbf{y}_{n}\in\mathtt{B}^{n}\}_{n=1}^{\infty} and {σnHom(G,Sym(n))}n=1\{\sigma_{n}\in\operatorname{Hom}(G,\operatorname{Sym}(n))\}_{n=1}^{\infty} such that limndmn(P𝐲nσn,βGμ)=O(1logn)\lim_{n\to\infty}d_{m_{n}}^{*}(P_{\mathbf{y}_{n}}^{\sigma_{n}},\beta^{G}_{*}\mu)=O\left(\frac{1}{\log n}\right).

The expressions appearing on the right-hand sides of Theorems A and B are very closely related to Ben Hayes’ definition of “relative sofic entropy in the presence” [Hay16, Definition 2.5]. Some differences are that we consider expected numbers of good models over random sofic approximations, and that Hayes takes a supremum inside the logarithm over which good model is to be extended, while we fix a sequence {𝐲n}\{\mathbf{y}_{n}\} of planted good models. Hayes also does not restrict to shift systems as we do here.

Using Theorem B we prove the following formula for the growth rate of the expected number of good models over a homomorphism drawn from a stochastic block model:

Theorem C.

Let μn,α,β\mu_{n},\alpha,\beta be as in the statement of Theorem B. Then

inf𝒪αGμlim supn1nlog𝔼σμn|Ω(σ,𝒪)|=supλ𝖩(αGμ,βGμ)fλ(S,𝚊𝚋).\inf_{\mathcal{O}\ni\alpha^{G}_{*}\mu}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\Omega(\sigma,\mathcal{O})\rvert=\sup_{\lambda\in\mathsf{J}(\alpha_{*}^{G}\mu,\,\beta_{*}^{G}\mu)}f_{\lambda}(S,\mathtt{a}\mid\mathtt{b}).

Here 𝖩(αGμ,βGμ)\mathsf{J}(\alpha_{*}^{G}\mu,\,\beta_{*}^{G}\mu) is the set of joinings of the GG-systems (𝙰G,αGμ,S)(\mathtt{A}^{G},\alpha^{G}_{*}\mu,S) and (𝙱G,βGμ,S)(\mathtt{B}^{G},\beta^{G}_{*}\mu,S), i.e. shift-invariant probability measures on (𝙰×𝙱)G(\mathtt{A}\times\mathtt{B})^{G} whose 𝙰G,𝙱G\mathtt{A}^{G},\mathtt{B}^{G} marginals are αGμ,βGμ\alpha_{*}^{G}\mu,\,\beta_{*}^{G}\mu, respectively. SS denotes the shift action of GG. We use 𝚊,𝚋\mathtt{a},\mathtt{b} to denote the maps

𝚊:(𝙰×𝙱)G\displaystyle\mathtt{a}\colon(\mathtt{A}\times\mathtt{B})^{G} 𝙰\displaystyle\to\mathtt{A} 𝚋:(𝙰×𝙱)G\displaystyle\mathtt{b}\colon(\mathtt{A}\times\mathtt{B})^{G} 𝙱\displaystyle\to\mathtt{B}
((ag,bg))gG\displaystyle\big{(}(a_{g},b_{g})\big{)}_{g\in G} ae\displaystyle\mapsto a_{e} ((ag,bg))gG\displaystyle\big{(}(a_{g},b_{g})\big{)}_{g\in G} be\displaystyle\mapsto b_{e}

which observe the 𝙰\mathtt{A} (resp. 𝙱\mathtt{B}) label at the identity.

Note: the supremum is always greater than or equal to fμ(T,α)f_{\mu}(T,\alpha), with equality attained by the product joining; this means that the expected number of good models for α\alpha over a block model with built-in good models for any β\beta is at least the expected number of good models over a uniformly random homomorphism. It is possible for the supremum to be strictly larger, however. For example, suppose fμ(T,α)<0f_{\mu}(T,\alpha)<0 and α=β\alpha=\beta, and let λ\lambda be the diagonal joining. Then

fλ(S,𝚊𝚋)=0>fμ(T,α).f_{\lambda}(S,\mathtt{a}\mid\mathtt{b})=0>f_{\mu}(T,\alpha).

1.1 Random sofic approximations

The ff-invariant is closely related to another invariant of measure-preserving systems called sofic entropy, which was introduced by Lewis Bowen in [Bow10c].

A homomorphism σHom(G,Sym(n))\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n)) is called (D,δ)(D,\delta)-sofic for some finite DGD\subset G and δ>0\delta>0 if

|{j[n]:σ(γ)jjγD{e}}|>(1δ)n.\left\lvert\{j\in[n]\,:\,\sigma(\gamma)j\neq j\ \forall\gamma\in D\setminus\{e\}\}\right\rvert>(1-\delta)n.

A sequence of homomorphisms Σ=(σnHom(G,Sym(n)))n\Sigma=\big{(}\sigma_{n}\in\operatorname{Hom}(G,\operatorname{Sym}(n))\big{)}_{n\in\mathbb{N}} is called a sofic approximation if for every (D,δ)(D,\delta) the homomorphism σn\sigma_{n} is (D,δ)(D,\delta)-sofic for all large enough nn.

The sofic entropy relative to Σ\Sigma is the exponential growth rate of the number of good models over σn\sigma_{n}. Specifically, if there is some finite observable α\alpha which generates the Borel σ\sigma-algebra on XX then we have

hΣ(μ,T)=inf𝒪αGμlim supn1nlog|Ω(σn,𝒪)|.\mathrm{h}_{\Sigma}(\mu,T)=\inf_{\mathcal{O}\ni\alpha_{*}^{G}\mu}\limsup_{n\to\infty}\frac{1}{n}\log\left\lvert\Omega(\sigma_{n},\mathcal{O})\right\rvert.

By analogy with this expression, we might call the sequences of random homomophisms appearing in expressions above “random sofic approximations.” The following proposition provides further justification for this terminology.

Proposition 1.1.

If (μn)(\mu_{n}) is any of the sequences appearing in Theorems A, B, and C, then for any (D,δ)(D,\delta) there exists ε>0\varepsilon>0 such that

σμn(σ is (D,δ)-sofic)1nεn\operatorname*{\mathbb{P}}_{\sigma\sim\mu_{n}}\big{(}\sigma\text{ is }(D,\delta)\text{-sofic}\big{)}\geq 1-n^{-\varepsilon n}

for all large enough nn.

In particular, if σ1μ1\sigma_{1}\sim\mu_{1}, σ2μ2\sigma_{2}\sim\mu_{2} etc. are independent then (σn)(\sigma_{n}) is a sofic approximation with probability 1.

Organization

In Section 2 we define weights and discuss some of their useful properties. In Section 3 we prove a few basic results about the functions ff and FF. Some of the results of these two sections are used in Section 4 to show that the assumptions of the main theorems are not vacuous. In Section 5 we show how the function FF is related to the number of homomorphism-labeling pairs (σ,𝐲)(\sigma,\mathbf{y}) that realize a given weight, which is the main ingredient of the proofs of Theorems A and B given in the next two sections. In Section 8 we show how to deduce Theorem C from Theorem B. Section 9 contains a proof of Proposition 1.1. The final section contains a proof of Lemma 2.3, which asserts that a weight can be approximated by a denominator-nn weight with a specified marginal.

Acknowledgements

Thanks to Tim Austin for suggesting that theorems similar to B and C should hold, for many helpful discussions, and for providing comments on an earlier draft. Thanks also to Ben Hayes for sharing helpful references.

This material is based upon work supported by the National Science Foundation under Grant No. DMS-1855694.

2 Weights

If α:X𝙰\alpha\colon X\to\mathtt{A} is a finite observable, for a,a𝙰a,a^{\prime}\in\mathtt{A} and i[r]i\in[r] let

Wα(a,a;i)=α{e,si}μ(a,a)=μ{xX:α(x)=a,α(Tsix)=a}W_{\alpha}(a,a^{\prime};i)=\alpha_{*}^{\{e,s_{i}\}}\mu(a,a^{\prime})=\mu\{x\in X\,:\,\alpha(x)=a,\ \alpha(T_{s_{i}}x)=a^{\prime}\}

and also denote

Wα(a)=αμ(a).W_{\alpha}(a)=\alpha_{*}\mu(a).

For 𝐱𝙰n\mathbf{x}\in\mathtt{A}^{n} and σHom(G,Sym(n))\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n)) let

Wσ,𝐱(a,a;i)=P𝐱σ,{e,si}(a,a)W_{\sigma,\mathbf{x}}(a,a^{\prime};i)=P^{\sigma,\{e,s_{i}\}}_{\mathbf{x}}(a,a^{\prime})

and Wσ,𝐱(a)=P𝐱σ,{e}(a)W_{\sigma,\mathbf{x}}(a)=P^{\sigma,\{e\}}_{\mathbf{x}}(a).

More abstractly, any W(Prob(𝙰2))rW\in\big{(}\operatorname{Prob}(\mathtt{A}^{2})\big{)}^{r} is called an 𝙰\mathtt{A}-weight if

a𝙰W(a,a;i)=a𝙰W(a,a;j)\sum_{a^{\prime}\in\mathtt{A}}W(a,a^{\prime};i)=\sum_{a^{\prime}\in\mathtt{A}}W(a^{\prime},a;j)

for all i,j[r]i,j\in[r] and a𝙰a\in\mathtt{A}. For each a𝙰a\in\mathtt{A} we denote this common value W(a)W(a). Note that the objects WαW_{\alpha} and Wσ,𝐱W_{\sigma,\mathbf{x}} defined above satisfy this condition.

We say that WW has denominator nn if nW(a,a;i)n\cdot W(a,a^{\prime};i)\in\mathbb{N} for all a,a,ia,a^{\prime},i.

The measures W(,;i)W(\cdot,\cdot;i) for i[r]i\in[r] are called the edge measures of WW, and W()W(\cdot) is called the vertex measure.

For any alphabet 𝙰\mathtt{A}, we use the metric on 𝙰\mathtt{A}-weights defined by

d(W1,W2)\displaystyle d(W_{1},W_{2}) i[r]W1(,;i)W2(,;i)TV\displaystyle\coloneqq\sum_{i\in[r]}\left\|W_{1}(\cdot,\cdot;i)-W_{2}(\cdot,\cdot;i)\right\|_{\operatorname{TV}}
=12i[r]a,a𝙰|W1(a,a;i)W2(a,a;i)|.\displaystyle=\frac{1}{2}\sum_{i\in[r]}\sum_{a,a^{\prime}\in\mathtt{A}}\lvert W_{1}(a,a^{\prime};i)-W_{2}(a,a^{\prime};i)\rvert.

We can use weights to count good models up to equivalence under the pseudometrics dkd^{*}_{k} using the following proposition:

Proposition 2.1.

If σHom(G,Sym(n))\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n)) and 𝐱𝙰n\mathbf{x}\in\mathtt{A}^{n}, then for any observable α:X𝙰\alpha\colon X\to\mathtt{A}

d(Wσ,𝐱k,Wαk)=dk(P𝐱σ,αGμ).d(W_{\sigma,\mathbf{x}^{k}},W_{\alpha^{k}})=d^{*}_{k}(P_{\mathbf{x}}^{\sigma},\alpha^{G}_{*}\mu).

Note this implies also that

dk(P𝐱σ,αGμ)=d0(P𝐱kσ,(αk)Gμ).d^{*}_{k}(P_{\mathbf{x}}^{\sigma},\alpha^{G}_{*}\mu)=d^{*}_{0}(P_{\mathbf{x}^{k}}^{\sigma},(\alpha^{k})^{G}_{*}\mu).
Proof.

By definition of the distance between weights,

d(Wσ,𝐱k,Wαk)\displaystyle d(W_{\sigma,\mathbf{x}^{k}},W_{\alpha^{k}}) =12i[r]𝐚,𝐚𝙰B(e,k)|Wσ,𝐱k(𝐚,𝐚;i)Wαk(𝐚,𝐚;i)|\displaystyle=\frac{1}{2}\sum_{i\in[r]}\sum_{\mathbf{a},\mathbf{a}^{\prime}\in\mathtt{A}^{\mathrm{B}(e,k)}}\left\lvert W_{\sigma,\mathbf{x}^{k}}(\mathbf{a},\mathbf{a}^{\prime};i)-W_{\alpha^{k}}(\mathbf{a},\mathbf{a}^{\prime};i)\right\rvert
=12i[r]𝐚,𝐚𝙰B(e,k)|1n|{j[n]:(𝐱k)j=𝐚(𝐱k)σ(si)j=𝐚}|\displaystyle=\frac{1}{2}\sum_{i\in[r]}\sum_{\mathbf{a},\mathbf{a}^{\prime}\in\mathtt{A}^{\mathrm{B}(e,k)}}\Bigg{\lvert}\,\frac{1}{n}\left\lvert\left\{j\in[n]\,:\,\begin{array}[]{c}(\mathbf{x}^{k})_{j}=\mathbf{a}\\ (\mathbf{x}^{k})_{\sigma(s_{i})j}=\mathbf{a}^{\prime}\end{array}\right\}\right\rvert
μ{xX:αk(x)=𝐚αk(Tsix)=𝐚}|.\displaystyle\phantom{=\sum_{i\in[r]}\sum_{\mathbf{a},\mathbf{a}^{\prime}\in\mathtt{A}^{\mathrm{B}(e,k)}}}\qquad-\mu\left\{x\in X\,:\,\begin{array}[]{c}\alpha^{k}(x)=\mathbf{a}\\ \alpha^{k}(T_{s_{i}}x)=\mathbf{a}^{\prime}\end{array}\right\}\Bigg{\rvert}.

For many ‘incompatible’ pairs 𝐚,𝐚\mathbf{a},\mathbf{a}^{\prime}, both terms will be zero: suppose gB(e,k)B(si,k)g\in\mathrm{B}(e,k)\cap\mathrm{B}(s_{i},k), so that gsi1B(e,k)gs_{i}^{-1}\in\mathrm{B}(e,k). If the second term in the absolute value is nonzero, then for some xXx\in X we have αk(x)=𝐚\alpha^{k}(x)=\mathbf{a} and αk(Tsix)=𝐚\alpha^{k}(T_{s_{i}}x)=\mathbf{a}^{\prime}, and therefore

𝐚gsi1=(αk(Tsix))gsi1=α(Tgsi1Tsix)=α(Tgx)=(αk(x))g=𝐚g.\mathbf{a}^{\prime}_{gs_{i}^{-1}}=(\alpha^{k}(T_{s_{i}}x))_{gs_{i}^{-1}}=\alpha(T_{gs_{i}^{-1}}T_{s_{i}}x)=\alpha(T_{g}x)=(\alpha^{k}(x))_{g}=\mathbf{a}_{g}.

The same argument shows that 𝐚gsi1=𝐚g\mathbf{a}^{\prime}_{gs_{i}^{-1}}=\mathbf{a}_{g} for all gB(e,k)B(si,k)g\in\mathrm{B}(e,k)\cap\mathrm{B}(s_{i},k) whenever the first term is nonzero. Therefore we can restrict the sum to pairs 𝐚,𝐚\mathbf{a},\mathbf{a}^{\prime} with 𝐚gsi1=𝐚g\mathbf{a}^{\prime}_{gs_{i}^{-1}}=\mathbf{a}_{g} for all gB(e,k)B(si,k)g\in\mathrm{B}(e,k)\cap\mathrm{B}(s_{i},k). Equivalently, we can sum over all 𝐀𝙰B(e,k)B(si,k)\mathbf{A}\in\mathtt{A}^{\mathrm{B}(e,k)\cup\mathrm{B}(s_{i},k)} to get

d(Wσ,𝐱k,Wαk)\displaystyle d(W_{\sigma,\mathbf{x}^{k}},W_{\alpha^{k}}) =12i[r]𝐀𝙰B(e,k)B(si,k)|1n|{j[n]:(𝐱B(e,k)B(si,k))j=𝐀}|\displaystyle=\frac{1}{2}\sum_{i\in[r]}\sum_{\mathbf{A}\in\mathtt{A}^{\mathrm{B}(e,k)\cup\mathrm{B}(s_{i},k)}}\Bigg{\lvert}\frac{1}{n}\left\lvert\left\{j\in[n]\,:\,\big{(}\mathbf{x}^{\mathrm{B}(e,k)\cup\mathrm{B}(s_{i},k)}\big{)}_{j}=\mathbf{A}\right\}\right\rvert
μ{xX:αB(e,k)B(si,k)(x)=𝐀}|\displaystyle\phantom{=\sum_{i\in[r]}\sum_{\mathbf{a},\mathbf{a}^{\prime}\in\mathtt{A}^{\mathrm{B}(e,k)}}}\qquad-\mu\left\{x\in X\,:\,\alpha^{\mathrm{B}(e,k)\cup\mathrm{B}(s_{i},k)}(x)=\mathbf{A}\right\}\Bigg{\rvert}
=i[r]dB(e,k)B(si,k)(P𝐱σ,αGμ).\displaystyle=\sum_{i\in[r]}d^{\mathrm{B}(e,k)\cup\mathrm{B}(s_{i},k)}(P_{\mathbf{x}}^{\sigma},\alpha_{*}^{G}\mu).\qed

It will be useful to consider the pushforward map induced by a map between alphabets: if π:𝙰𝙱\pi\colon\mathtt{A}\to\mathtt{B} is a measurable map and WW is an 𝙰\mathtt{A}-weight, then πW\pi W is the 𝙱\mathtt{B}-weight given by

πW(b,b;i)=aπ1{b}aπ1{b}W(a,a;i).\pi W(b,b^{\prime};i)=\sum_{a\in\pi^{-1}\{b\}}\sum_{a^{\prime}\in\pi^{-1}\{b^{\prime}\}}W(a,a^{\prime};i).

Note that this implies that the vertex measure of WW is

πW(b)=aπ1{b}W(a).\pi W(b)=\sum_{a\in\pi^{-1}\{b\}}W(a).

For example, let π𝙱:𝙰×𝙱𝙱\pi_{\mathtt{B}}\colon\mathtt{A}\times\mathtt{B}\to\mathtt{B} be the projection map. If WW is an 𝙰×𝙱\mathtt{A}\times\mathtt{B}-weight then π𝙱W\pi_{\mathtt{B}}W is given by

π𝙱W(b1)=a𝙰W((a,b1))π𝙱W(b1,b2;i)=a1,a2𝙰W((a1,b1),(a2,b2);i).\pi_{\mathtt{B}}W(b_{1})=\sum_{a\in\mathtt{A}}W\big{(}(a,b_{1})\big{)}\qquad\pi_{\mathtt{B}}W(b_{1},b_{2};i)=\sum_{a_{1},a_{2}\in\mathtt{A}}W\big{(}(a_{1},b_{1}),(a_{2},b_{2});i\big{)}.

We call this the 𝙱\mathtt{B}-marginal of WW.

All weights in the present paper will be over alphabets of the form 𝙰B(e,k)×𝙱B(e,k)\mathtt{A}^{\mathrm{B}(e,k)}\times\mathtt{B}^{\mathrm{B}(e,k^{\prime})}. We use this fact to introduce some simplified notation for projections:

  • πA\pi_{A} denotes projection onto the entire 𝙰\mathtt{A} factor 𝙰B(e,k)\mathtt{A}^{\mathrm{B}(e,k)}; πB\pi_{B} is used similarly.

  • For m<km<k and m<km^{\prime}<k^{\prime}, πm,m\pi_{m,m^{\prime}} denotes projection onto 𝙰B(e,m)×𝙱B(e,m)\mathtt{A}^{\mathrm{B}(e,m)}\times\mathtt{B}^{\mathrm{B}(e,m^{\prime})}.

  • πm\pi_{m} denotes the projection 𝙰B(e,k)𝙰B(e,m)\mathtt{A}^{\mathrm{B}(e,k)}\to\mathtt{A}^{\mathrm{B}(e,m)}, except that if m=0m=0 we write πe\pi_{e}.

We define F(W)F(W) for an abstract weight WW by

F(W)=(12r)H(W())+i[r]H(W(,;i))F(W)=(1-2r)\mathrm{H}\big{(}W(\cdot)\big{)}+\sum_{i\in[r]}\mathrm{H}\big{(}W(\cdot,\cdot;i)\big{)}

where HH is the Shannon entropy. Note that this is consistent with the above definitions in that, for example,

F(Wα)=Fμ(T,α).F(W_{\alpha})=F_{\mu}(T,\alpha).

We can revisit the definition of our version of the stochastic block model using weights: Let HGH\subset G and let WW be a denominator-nn 𝙱B(e,k)\mathtt{B}^{\mathrm{B}(e,k)}-weight. Suppose there exist 𝐲𝙱n\mathbf{y}\in\mathtt{B}^{n} and σHom(G,Sym(n))\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n)) such that W=Wσ,𝐲kW=W_{\sigma,\mathbf{y}^{k}}. Then

𝚂𝙱𝙼(σ,𝐲,k)=Unif({σHom(G,Sym(n)):Wσ,𝐲k=W}),\mathtt{SBM}(\sigma,\mathbf{y},k)=\operatorname{Unif}(\{\sigma^{\prime}\in\operatorname{Hom}(G,\operatorname{Sym}(n))\,:\,W_{\sigma^{\prime},\mathbf{y}^{k}}=W\}),

so we can also denote this distribution by 𝚂𝙱𝙼(𝐲,W)\mathtt{SBM}(\mathbf{y},W). Specifying the distribution by a weight rather than a specific homomorphism will occasionally be more convenient.

2.1 Constructing Weights and Good Models

We borrow the first result of this type from [Bow10a]; it allows us to find a denominator-nn approximation to a given weight.

Lemma 2.2 (Lemma 2.3 of [Bow10a]).

There is a constant CC such that for any 𝙰\mathtt{A}-weight WW there is a denominator-nn 𝙰\mathtt{A}-weight within distance C|𝙰|2r/nC\lvert\mathtt{A}\rvert^{2}r/n of WW.

The following lemma allows us not only to construct a denominator-nn approximation to a given weight, but also to specify a marginal of this approximation:

Lemma 2.3.

Let WW be an 𝙰×𝙱\mathtt{A}\times\mathtt{B}-weight. If W𝙱W_{\mathtt{B}} is a 𝙱\mathtt{B}-weight of denominator nn with d(W𝙱,π𝙱W)<δd(W_{\mathtt{B}},\pi_{\mathtt{B}}W)<\delta then there is an 𝙰×𝙱\mathtt{A}\times\mathtt{B}-weight W𝙰𝙱W_{\mathtt{A}\mathtt{B}} with denominator nn such that π𝙱W𝙰𝙱=W𝙱\pi_{\mathtt{B}}W_{\mathtt{A}\mathtt{B}}=W_{\mathtt{B}} and d(W𝙰𝙱,W)<265r(|𝙰×𝙱|2/n+δ)d(W_{\mathtt{A}\mathtt{B}},W)<265r(\lvert\mathtt{A}\times\mathtt{B}\rvert^{2}/n+\delta).

The construction is fairly involved, so is postponed to Section 10. The constant 265 is not intended to be optimal.

The definition of a weight Wσ,𝐱kW_{\sigma,\mathbf{x}^{k}} in terms of a homomorphism σ\sigma and a labeling 𝐱\mathbf{x} is straightforward. However, we will also need to know whether a given weight can be realized in this way. The next two results address this inverse problem.

Proposition 2.4.

If WW is a denominator-nn 𝙰\mathtt{A}-weight, then there exist 𝐱𝙰n\mathbf{x}\in\mathtt{A}^{n} and σHom(G,Sym(n))\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n)) such that W=Wσ,𝐱W=W_{\sigma,\mathbf{x}}.

Proof.

This is implied by Proposition 2.1 of [Bow10a]. ∎

Unfortunately, this does not imply that for every denominator-nn 𝙰B(e,k)\mathtt{A}^{\mathrm{B}(e,k)}-weight WW there is some σHom(G,Sym(n))\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n)) and 𝐱𝙰n\mathbf{x}\in\mathtt{A}^{n} such that W=Wσ,𝐱kW=W_{\sigma,\mathbf{x}^{k}}; instead it provides 𝐗(𝙰B(e,k))n\mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,k)})^{n} such that W=Wσ,𝐗W=W_{\sigma,\mathbf{X}}.

However, if we already know that WW is close to a weight of the form WαkW_{\alpha^{k}} for some observable α\alpha, then the following proposition shows that WW is also close to a weight of the form Wσ,𝐱kW_{\sigma,\mathbf{x}^{k}}.

Proposition 2.5.

Let α:X𝙰\alpha\colon X\to\mathtt{A}, σHom(G,Sym(n))\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n)), and 𝐗(𝙰B(e,k))n\mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,k)})^{n} be such that d(Wσ,𝐗,Wαk)εd(W_{\sigma,\mathbf{X}},W_{\alpha^{k}})\leq\varepsilon for some ε0\varepsilon\geq 0. Writing 𝐱=πe𝐗𝙰n\mathbf{x}=\pi_{e}\mathbf{X}\in\mathtt{A}^{n}, we have

d(Wσ,𝐗,Wσ,𝐱k)2r|B(e,k)|ε.d(W_{\sigma,\mathbf{X}},W_{\sigma,\mathbf{x}^{k}})\leq 2r\lvert\mathrm{B}(e,k)\rvert\varepsilon.

An immediate consequence is that 𝐗Ω0(σ,αk,ε)\mathbf{X}\in\Omega_{0}^{*}(\sigma,\alpha^{k},\varepsilon) implies πe𝐗Ωk(σ,α,cε)\pi_{e}\mathbf{X}\in\Omega_{k}^{*}(\sigma,\alpha,c\varepsilon) where c=1+2r|B(e,k)|c=1+2r\lvert\mathrm{B}(e,k)\rvert; cf. Claim 2 in the proof of Proposition 3.2 of [Bow10a].

Proof.

Claim 4 in the proof of Proposition 3.2 of [Bow10a] implies that

|{j[n]:𝐗(j)𝐱k(j)}|n|B(e,k)|ε.\lvert\{j\in[n]\,:\,\mathbf{X}(j)\neq\mathbf{x}^{k}(j)\}\rvert\leq n\lvert\mathrm{B}(e,k)\rvert\varepsilon.

It follows that for any i[r]i\in[r]

|{j[n]:𝐗{e,si}(j)(𝐱k){e,si}(j)}|\displaystyle\lvert\{j\in[n]\,:\,\mathbf{X}^{\{e,s_{i}\}}(j)\neq(\mathbf{x}^{k})^{\{e,s_{i}\}}(j)\}\rvert
|{j[n]:𝐗(j)𝐱k(j)}|+|{j[n]:𝐗(σ(si)j)𝐱k(σ(si)j)}|\displaystyle\leq\lvert\{j\in[n]\,:\,\mathbf{X}(j)\neq\mathbf{x}^{k}(j)\}\rvert+\lvert\{j\in[n]\,:\,\mathbf{X}(\sigma(s_{i})j)\neq\mathbf{x}^{k}(\sigma(s_{i})j)\}\rvert
2n|B(e,k)|ε,\displaystyle\leq 2n\lvert\mathrm{B}(e,k)\rvert\varepsilon,

so

d(Wσ,𝐗,Wσ,𝐱k)\displaystyle d(W_{\sigma,\mathbf{X}},W_{\sigma,\mathbf{x}^{k}}) =i[r](𝐗{e,si})Unif(n)((𝐱k){e,si})Unif(n)TV\displaystyle=\sum_{i\in[r]}\left\|\left(\mathbf{X}^{\{e,s_{i}\}}\right)_{*}\operatorname{Unif}(n)-\left((\mathbf{x}^{k})^{\{e,s_{i}\}}\right)_{*}\operatorname{Unif}(n)\right\|_{\operatorname{TV}}
i[r]2|B(e,k)|ε=2r|B(e,k)|ε.\displaystyle\leq\sum_{i\in[r]}2\lvert\mathrm{B}(e,k)\rvert\varepsilon=2r\lvert\mathrm{B}(e,k)\rvert\varepsilon.\qed

3 Properties of FF and ff

Lemma 3.1 (Continuity as weight function).

If W1,W2W_{1},W_{2} are 𝙰\mathtt{A}-weights with d(W1,W2)ε1d(W_{1},W_{2})\leq\varepsilon\leq 1 then

|F(W1)F(W2)|4r(H(ε)+εlog2|𝙰|).\lvert F(W_{1})-F(W_{2})\rvert\leq 4r\big{(}\mathrm{H}(\varepsilon)+\varepsilon\log_{2}\lvert\mathtt{A}\rvert\big{)}.

where H(p)\mathrm{H}(p) denotes the entropy of the probability measure (p,1p)Prob({0,1})(p,1-p)\in\operatorname{Prob}(\{0,1\}).

Proof.

We use Fano’s inequality in the following form (Equation (2.139) of [CT06]): suppose X,YX,Y are 𝙰\mathtt{A}-valued random variables defined on the same probability space and let pe=(XY)p_{e}=\operatorname*{\mathbb{P}}(X\neq Y) be their probability of disagreement. Then

H(XY)H(pe)+pelog|𝙰|.\mathrm{H}(X\mid Y)\leq\mathrm{H}(p_{e})+p_{e}\log\lvert\mathtt{A}\rvert.

Using the chain rule and nonnegativity of Shannon entropy, we can deduce that

|H(X)H(Y)|H(pe)+pelog|𝙰|.\lvert\mathrm{H}(X)-\mathrm{H}(Y)\rvert\leq\mathrm{H}(p_{e})+p_{e}\log\lvert\mathtt{A}\rvert.

Let μ1,μ2Prob(𝙰)\mu_{1},\mu_{2}\in\operatorname{Prob}(\mathtt{A}) be the respective distributions of X1,X2X_{1},X_{2}. Because μ1μ2TV\|\mu_{1}-\mu_{2}\|_{\operatorname{TV}} is the minimum value of (XY)\operatorname*{\mathbb{P}}(X\neq Y) over all possible couplings, if μ1μ2TV<ε\|\mu_{1}-\mu_{2}\|_{\operatorname{TV}}<\varepsilon then

|H(μ1)H(μ2)|H(ε)+εlog2|𝙰|.\lvert\mathrm{H}(\mu_{1})-\mathrm{H}(\mu_{2})\rvert\leq\mathrm{H}(\varepsilon)+\varepsilon\log_{2}\lvert\mathtt{A}\rvert.

The assumed bound d(W1,W2)εd(W_{1},W_{2})\leq\varepsilon implies that each vertex and edge measure of W1W_{1} is within total variation distance ε\varepsilon of its counterpart in W2W_{2}, so

|F(W1)F(W2)|\displaystyle\lvert F(W_{1})-F(W_{2})\rvert |12r||H(W1())H(W2())|\displaystyle\leq\lvert 1-2r\rvert\cdot\left\lvert\mathrm{H}\big{(}W_{1}(\cdot)\big{)}-\mathrm{H}\big{(}W_{2}(\cdot)\big{)}\right\rvert
+i[r]|H(W1(,;i))H(W2(,;i))|\displaystyle\qquad+\sum_{i\in[r]}\left\lvert\mathrm{H}\big{(}W_{1}(\cdot,\cdot;i)\big{)}-\mathrm{H}\big{(}W_{2}(\cdot,\cdot;i)\big{)}\right\rvert
(2r1)(H(ε)+εlog2|𝙰|)\displaystyle\leq(2r-1)\left(\mathrm{H}(\varepsilon)+\varepsilon\log_{2}\lvert\mathtt{A}\rvert\right)
+r(H(ε)+εlog2|𝙰|2)\displaystyle\qquad+r\cdot\left(\mathrm{H}(\varepsilon)+\varepsilon\log_{2}\lvert\mathtt{A}\rvert^{2}\right)
4r(H(ε)+εlog2|𝙰|).\displaystyle\leq 4r\big{(}\mathrm{H}(\varepsilon)+\varepsilon\log_{2}\lvert\mathtt{A}\rvert\big{)}.\qed

Let α:X𝙰\alpha\colon X\to\mathtt{A} and β:X𝙱\beta\colon X\to\mathtt{B} be observables. We say that β\beta is a coarsening of α\alpha if each part of the partition of XX induced by β\beta is a union of parts of the partition induced by α\alpha (up to null sets). Equivalently, there is some function g:𝙰𝙱g\colon\mathtt{A}\to\mathtt{B} such that β=gα\beta=g\circ\alpha almost surely. In this situation we can also call α\alpha a refinement of β\beta.

A useful property of the Shannon entropy Hμ(α)\mathrm{H}_{\mu}(\alpha) is monotonicity under refinement. The function FF does not share this property, but it is monotone under the following particular kind of refinement introduced in [Bow10b]:

We say that β\beta is a simple splitting of α\alpha if there is some s{s1±1,,sr±1}s\in\{s_{1}^{\pm 1},\ldots,s_{r}^{\pm 1}\} and a coarsening α~\tilde{\alpha} of α\alpha such that, up to null sets, the partition induced by β\beta is the coarsest common refinement of the partitions induced by α\alpha and α~Ts\tilde{\alpha}\circ T_{s}.

We say that β\beta is a splitting of α\alpha if there are observables α=β0,β1,,βn=β\alpha=\beta_{0},\beta_{1},\ldots,\beta_{n}=\beta such that βi\beta_{i} is a simple splitting of βi1\beta_{i-1} for i=1,2,,ni=1,2,\ldots,n. We will use the following monotonicity properties of the relative version of FF:

Lemma 3.2 (Monotonicity under splitting).
  1. 1.

    If α1\alpha_{1} is a splitting of α2\alpha_{2} then F(α1|β)F(α2|β)F(\alpha_{1}|\beta)\leq F(\alpha_{2}|\beta).

  2. 2.

    If β1\beta_{1} is a splitting of β2\beta_{2} then F(α|β1)F(α|β2)F(\alpha|\beta_{1})\geq F(\alpha|\beta_{2}).

Proof.
  1. 1.

    This is essentially Proposition 5.1 of [Bow10b]; conditioning on β\beta makes no difference to the proof.

  2. 2.

    The proof is based on the proof of Part 1, but in place of the chain rule for conditional entropy we use the following bound:

    H(αβ2)\displaystyle\mathrm{H}(\alpha\mid\beta_{2}) H(α,β1β2)\displaystyle\leq\mathrm{H}(\alpha,\beta_{1}\mid\beta_{2}) (monotonicity)
    =H(β1β2)+H(αβ1,β2)\displaystyle=\mathrm{H}(\beta_{1}\mid\beta_{2})+\mathrm{H}(\alpha\mid\beta_{1},\beta_{2}) (chain rule)
    H(β1β2)+H(αβ1)\displaystyle\leq\mathrm{H}(\beta_{1}\mid\beta_{2})+\mathrm{H}(\alpha\mid\beta_{1}) (monotonicity).\displaystyle\text{(monotonicity)}\mathrlap{.}

    We will also use the following consequence of the previous bound:

    H(α{e,si}β1{e,si})H(α{e,si}β2{e,si})\displaystyle\mathrm{H}(\alpha^{\{e,s_{i}\}}\mid\beta_{1}^{\{e,s_{i}\}})-\mathrm{H}(\alpha^{\{e,s_{i}\}}\mid\beta_{2}^{\{e,s_{i}\}})
    H(β1{e,si}β2{e,si})\displaystyle\geq-\mathrm{H}(\beta_{1}^{\{e,s_{i}\}}\mid\beta_{2}^{\{e,s_{i}\}}) (previous bound)
    (H(β1{si}β2{e,si})+H(β1β2{e,si}))\displaystyle\geq-\big{(}\mathrm{H}(\beta_{1}^{\{s_{i}\}}\mid\beta_{2}^{\{e,s_{i}\}})+\mathrm{H}(\beta_{1}\mid\beta_{2}^{\{e,s_{i}\}})\big{)} (subadditivity)
    =(H(β1β2{e,si1})+H(β1β2{e,si}))\displaystyle=-\big{(}\mathrm{H}(\beta_{1}\mid\beta_{2}^{\{e,s_{i}^{-1}\}})+\mathrm{H}(\beta_{1}\mid\beta_{2}^{\{e,s_{i}\}})\big{)} (T-invariance of μ).\displaystyle\text{($T$-invariance of $\mu$)}\mathrlap{.}

    It suffices to check the case where β1\beta_{1} is a simple splitting of β2\beta_{2}: let t{s1±1,,sr±1}t\in\{s_{1}^{\pm 1},\ldots,s_{r}^{\pm 1}\} and let β~\tilde{\beta} be a coarsening of β2\beta_{2} such that the partition induced by β1\beta_{1} is the same as the coarsest common refinement of the partitions induced by β2\beta_{2} and β~Tt\tilde{\beta}\circ T_{t} up to null sets. Then, using the two bounds just derived,

    F(α|β1)F(α|β2)\displaystyle F(\alpha|\beta_{1})-F(\alpha|\beta_{2}) =(12r)(H(α|β1)H(α|β2))\displaystyle=(1-2r)\left(\mathrm{H}(\alpha|\beta_{1})-\mathrm{H}(\alpha|\beta_{2})\right)
    +i[r](H(α{e,si}|β1{e,si})H(α{e,si}|β1{e,si}))\displaystyle\qquad+\sum_{i\in[r]}\left(\mathrm{H}(\alpha^{\{e,s_{i}\}}|\beta_{1}^{\{e,s_{i}\}})-\mathrm{H}(\alpha^{\{e,s_{i}\}}|\beta_{1}^{\{e,s_{i}\}})\right)
    (12r)(H(β1|β2))i[r](H(β1β2{e,si1})+H(β1β2{e,si}))\displaystyle\geq(1-2r)\left(-\mathrm{H}(\beta_{1}|\beta_{2})\right)-\sum_{i\in[r]}\left(\mathrm{H}(\beta_{1}\mid\beta_{2}^{\{e,s_{i}^{-1}\}})+\mathrm{H}(\beta_{1}\mid\beta_{2}^{\{e,s_{i}\}})\right)
    =(2r1)H(β1|β2)s{s1±1sr±1}H(β1β2{e,s})\displaystyle=(2r-1)\mathrm{H}(\beta_{1}|\beta_{2})-\sum_{s\in\{s_{1}^{\pm 1}\ldots s_{r}^{\pm 1}\}}\mathrm{H}(\beta_{1}\mid\beta_{2}^{\{e,s\}})

    But

    H(β1β2{e,t})H(β1β2β~{t})=0,\mathrm{H}(\beta_{1}\mid\beta_{2}^{\{e,t\}})\leq\mathrm{H}(\beta_{1}\mid\beta_{2}\tilde{\beta}^{\{t\}})=0,

    so we can remove the tt term from the sum to get

    F(α|β1)F(α|β2)\displaystyle F(\alpha|\beta_{1})-F(\alpha|\beta_{2}) (2r1)H(β1|β2)s{s1±1sr±1}{t}H(β1β2{e,s})\displaystyle\geq(2r-1)\mathrm{H}(\beta_{1}|\beta_{2})-\sum_{s\in\{s_{1}^{\pm 1}\ldots s_{r}^{\pm 1}\}\setminus\{t\}}\mathrm{H}(\beta_{1}\mid\beta_{2}^{\{e,s\}})
    =s{s1±1sr±1}{t}(H(β1|β2)H(β1β2{e,s}))\displaystyle=\sum_{s\in\{s_{1}^{\pm 1}\ldots s_{r}^{\pm 1}\}\setminus\{t\}}\left(\mathrm{H}(\beta_{1}|\beta_{2})-\mathrm{H}(\beta_{1}\mid\beta_{2}^{\{e,s\}})\right)
    0.\displaystyle\geq 0.\qed

One corollary is the following convenient formula:

Corollary 3.3.

Let α,β\alpha,\beta be finite observables such that βGμ\beta^{G}_{*}\mu is a Markov measure. Then Fμ(T,αk1βk2)F_{\mu}(T,\alpha^{k_{1}}\mid\beta^{k_{2}}) is independent of k2k_{2}. In particular,

fμ(T,αβ)=infkFμ(T,αkβ).f_{\mu}(T,\alpha\mid\beta)=\inf_{k}F_{\mu}(T,\alpha^{k}\mid\beta).
Proof.

By the previous proposition, for any kk2k\leq k_{2} we have

Fμ(T,αk1βk)Fμ(T,αk1βk2).F_{\mu}(T,\alpha^{k_{1}}\mid\beta^{k})\leq F_{\mu}(T,\alpha^{k_{1}}\mid\beta^{k_{2}}).

On the other hand, by Theorem 6.1 of [Bow10d] Fμ(T,βk)=Fμ(T,βk2)F_{\mu}(T,\beta^{k})=F_{\mu}(T,\beta^{k_{2}}) so

Fμ(T,αk1βk)=Fμ(T,αk1βk)Fμ(T,βk2).F_{\mu}(T,\alpha^{k_{1}}\mid\beta^{k})=F_{\mu}(T,\alpha^{k_{1}}\beta^{k})-F_{\mu}(T,\beta^{k_{2}}).

Applying monotonicity under splitting to the first term on the right gives

Fμ(T,αk1βk)Fμ(T,αk1βk2)Fμ(T,βk2)=Fμ(T,αk1βk2).F_{\mu}(T,\alpha^{k_{1}}\mid\beta^{k})\geq F_{\mu}(T,\alpha^{k_{1}}\beta^{k_{2}})-F_{\mu}(T,\beta^{k_{2}})=F_{\mu}(T,\alpha^{k_{1}}\mid\beta^{k_{2}}).

This establishes independence of k2k_{2}; the formula for ff follows. ∎

Proposition 3.4.

Let α,β\alpha,\beta be finite observables. Then for any kk\in\mathbb{N},

Fμ(T,αkβ)Hμ(αβ).F_{\mu}(T,\alpha^{k}\mid\beta)\leq\mathrm{H}_{\mu}\big{(}\alpha\mid\beta\big{)}.

It follows that

fμ(T,αβ)Hμ(αβ).f_{\mu}(T,\alpha\mid\beta)\leq\mathrm{H}_{\mu}\big{(}\alpha\mid\beta\big{)}.
Proof.

By Lemma 3.2, Fμ(T,αkβ)Fμ(T,αβ)F_{\mu}(T,\alpha^{k}\mid\beta)\leq F_{\mu}(T,\alpha\mid\beta). Using elementary properties of Shannon entropy, we have

Fμ(T,αβ)\displaystyle F_{\mu}(T,\alpha\mid\beta) =(12r)Hμ(αβ)+i[r]Hμ(α{e,si}β{e,si})\displaystyle=(1-2r)\mathrm{H}_{\mu}(\alpha\mid\beta)+\sum_{i\in[r]}\mathrm{H}_{\mu}\big{(}\alpha^{\{e,s_{i}\}}\mid\beta^{\{e,s_{i}\}}\big{)}
(12r)Hμ(αβ)+i[r][Hμ(αβ{e,si})+Hμ(α{si}β{e,si})]\displaystyle\leq(1-2r)\mathrm{H}_{\mu}(\alpha\mid\beta)+\sum_{i\in[r]}\left[\mathrm{H}_{\mu}\big{(}\alpha\mid\beta^{\{e,s_{i}\}}\big{)}+\mathrm{H}_{\mu}\big{(}\alpha^{\{s_{i}\}}\mid\beta^{\{e,s_{i}\}}\big{)}\right]
(12r)Hμ(αβ)+i[r][Hμ(αβ)+Hμ(α{si}β{si})].\displaystyle\leq(1-2r)\mathrm{H}_{\mu}(\alpha\mid\beta)+\sum_{i\in[r]}\left[\mathrm{H}_{\mu}\big{(}\alpha\mid\beta\big{)}+\mathrm{H}_{\mu}\big{(}\alpha^{\{s_{i}\}}\mid\beta^{\{s_{i}\}}\big{)}\right].

By TT-invariance of μ\mu we have

Hμ(α{si}β{si})=Hμ(αβ),\mathrm{H}_{\mu}\big{(}\alpha^{\{s_{i}\}}\mid\beta^{\{s_{i}\}}\big{)}=\mathrm{H}_{\mu}(\alpha\mid\beta),

so the first inequality follows.

For any k1,k2k_{1},k_{2}\in\mathbb{N} this gives

Fμ(T,αk1βk2)Hμ(αβk2)Hμ(αβ),F_{\mu}(T,\alpha^{k_{1}}\mid\beta^{k_{2}})\leq\mathrm{H}_{\mu}(\alpha\mid\beta^{k_{2}})\leq\mathrm{H}_{\mu}(\alpha\mid\beta),

so the second inequality follows upon taking the supremum over k2k_{2} then the infimum over k1k_{1}. ∎

We can use this bound to give a proof of the chain rule for the relative ff-invariant, a version of which first appeared in [Bow10d] (there it is called the Abramov-Rokhlin formula; see also [BG13]):

Corollary 3.5 (Chain rule).
fμ(T,αβ)=fμ(T,αβ)+fμ(T,β).f_{\mu}(T,\alpha\beta)=f_{\mu}(T,\alpha\mid\beta)+f_{\mu}(T,\beta).
Proof.

By definition of the relative version of FF and the chain rule for conditional entropy, for each k1,k2k_{1},k_{2} we have

Fμ(T,αk1βk2)=Fμ(T,αk1βk2)+Fμ(T,βk2).F_{\mu}(T,\alpha^{k_{1}}\beta^{k_{2}})=F_{\mu}(T,\alpha^{k_{1}}\mid\beta^{k_{2}})+F_{\mu}(T,\beta^{k_{2}}).

By Lemma 3.2 each term is monotone in k2k_{2}, so the limits as k2k_{2}\to\infty exist. By Proposition 3.4 all terms are bounded above (recall we only consider finite observables, so in particular all observables have finite entropy), so we can split the limit across the sum on the right to get

limk2Fμ(T,αk1βk2)=limk2Fμ(T,αk1βk2)+fμ(T,β).\lim_{k_{2}\to\infty}F_{\mu}(T,\alpha^{k_{1}}\beta^{k_{2}})=\lim_{k_{2}\to\infty}F_{\mu}(T,\alpha^{k_{1}}\mid\beta^{k_{2}})+f_{\mu}(T,\beta).

Taking k1k_{1} to infinity gives the result. ∎

4 Non-vacuity of Main Theorems

4.1 Theorem A

Here we prove Proposition A, which asserts the nonvacuity of Theorem A. Given β:X𝙱\beta\colon X\to\mathtt{B}, we need to show that there exist 𝐲n𝙱n\mathbf{y}_{n}\in\mathtt{B}^{n} and σnHom(G,Sym(n))\sigma_{n}\in\operatorname{Hom}(G,\operatorname{Sym}(n)) such that limnd0(P𝐲nσn,βGμ)=0\lim_{n\to\infty}d_{0}^{*}(P_{\mathbf{y}_{n}}^{\sigma_{n}},\beta^{G}_{*}\mu)=0.

By Lemma 2.2, there is a sequence {Wn}n=1\{W_{n}\}_{n=1}^{\infty} of 𝙱\mathtt{B}-weights such that WnW_{n} has denominator nn for each nn and d(Wn,Wβ)=o(1)d(W_{n},W_{\beta})=o(1). By Proposition 2.4, for each nn we can pick 𝐲n,σn\mathbf{y}_{n},\sigma_{n} such that Wσn,𝐲n=WnW_{\sigma_{n},\mathbf{y}_{n}}=W_{n}. Since d0(P𝐲nσn,βGμ)=d(Wσn,𝐲n,Wβ)d_{0}^{*}(P_{\mathbf{y}_{n}}^{\sigma_{n}},\beta^{G}_{*}\mu)=d(W_{\sigma_{n},\mathbf{y}_{n}},W_{\beta}), these suffice.

4.2 Theorems B and C

Here we prove Proposition B, which asserts the nonvacuity of Theorem B (and by extension Theorem C, since the assumptions are the same).

Let mnm_{n} approach infinity as nn approaches infinity while satisfying mn=o(loglogn)m_{n}=o(\log\log n) and let β:X𝙱\beta\colon X\to\mathtt{B} be a finite observable. We need to show that there exist 𝐲n𝙱n\mathbf{y}_{n}\in\mathtt{B}^{n} and σnHom(G,Sym(n))\sigma_{n}\in\operatorname{Hom}(G,\operatorname{Sym}(n)) such that dmn(P𝐲nσn,βGμ)=O(1logn)d_{m_{n}}^{*}(P_{\mathbf{y}_{n}}^{\sigma_{n}},\beta^{G}_{*}\mu)=O(\frac{1}{\log n}).

By Lemma 2.2, there is a sequence {Wn}n=1\{W_{n}\}_{n=1}^{\infty} of weights such that WnW_{n} is a denominator-nn 𝙱B(e,mn)\mathtt{B}^{\mathrm{B}(e,m_{n})}-weight for each nn and d(Wn,Wβmn)=O(|𝙱B(e,mn)|2n)d(W_{n},W_{\beta^{m_{n}}})=O(\frac{\lvert\mathtt{B}^{\mathrm{B}(e,m_{n})}\rvert^{2}}{n}). By Proposition 2.4, for each nn we can pick 𝐘n,σn\mathbf{Y}_{n},\sigma_{n} such that Wσn,𝐘n=WnW_{\sigma_{n},\mathbf{Y}_{n}}=W_{n}. Let 𝐲n=πe𝐘n\mathbf{y}_{n}=\pi_{e}\mathbf{Y}_{n}. By Proposition 2.5,

dmn(P𝐲nσn,βGμ)=d(Wσn,𝐲nmn,Wβmn)=O(|B(e,mn)||𝙱B(e,mn)|2n)=O(1logn).d_{m_{n}}^{*}(P_{\mathbf{y}_{n}}^{\sigma_{n}},\beta^{G}_{*}\mu)=d(W_{\sigma_{n},\mathbf{y}_{n}^{m_{n}}},W_{\beta^{m_{n}}})=O\!\left(\!\lvert\mathrm{B}(e,m_{n})\rvert\cdot\frac{\lvert\mathtt{B}^{\mathrm{B}(e,m_{n})}\rvert^{2}}{n}\right)=O\!\left(\frac{1}{\log n}\right).

5 Counting Lemmas

For a 𝙱\mathtt{B}-weight WW, let Zn(W)Z_{n}(W) denote the number of pairs (σ,𝐲)Hom(G,Sym(n))×𝙱n(\sigma,\mathbf{y})\in\operatorname{Hom}(G,\operatorname{Sym}(n))\times\mathtt{B}^{n} such that Wσ,𝐲=WW_{\sigma,\mathbf{y}}=W.

Proposition 5.1.

If WW is a 𝙱\mathtt{B}-weight with denominator nn then

(3n)r|𝙱|2Zn(W)eF(W)n(n!)rn(1r)/2(3n)r|𝙱|2.(3\sqrt{n})^{-r\lvert\mathtt{B}\rvert^{2}}\leq\frac{Z_{n}(W)}{e^{F(W)n}(n!)^{r}n^{(1-r)/2}}\leq(3\sqrt{n})^{r\lvert\mathtt{B}\rvert^{2}}.
Proof.

We write

Zn(W)=σ|{𝐲𝙱n:Wσ,𝐲=W}|=(n!)r𝔼σ|{𝐲𝙱n:Wσ,𝐲=W}|.Z_{n}(W)=\sum_{\sigma}\lvert\{\mathbf{y}\in\mathtt{B}^{n}\,:\,W_{\sigma,\mathbf{y}}=W\}\rvert=(n!)^{r}\operatorname*{\mathbb{E}}_{\sigma}\lvert\{\mathbf{y}\in\mathtt{B}^{n}\,:\,W_{\sigma,\mathbf{y}}=W\}\rvert.

where 𝔼σ\operatorname*{\mathbb{E}}_{\sigma} denotes the expectation over a uniform choice of σHom(G,Sym(n))\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n)).

Proposition 2.1 of [Bow10a] states that

𝔼σ|{𝐲𝙱n:Wσ,𝐲=W}|=n!1rb𝙱(nW(b))!2r1i=1rb,b𝙱(nW(b,b;i))!.\operatorname*{\mathbb{E}}_{\sigma}\lvert\{\mathbf{y}\in\mathtt{B}^{n}\,:\,W_{\sigma,\mathbf{y}}=W\}\rvert=\frac{n!^{1-r}\prod_{b\in\mathtt{B}}(nW(b))!^{2r-1}}{\prod_{i=1}^{r}\prod_{b,b^{\prime}\in\mathtt{B}}(nW(b,b^{\prime};i))!}.

Lemma 2.2 of the same paper gives an estimate of this quantity, but for our purposes we need to be more careful about how the estimate depends on the size of the alphabet.

We use the version of Stirling’s approximation

kk+1/2ekk!3kk+1/2ek,k^{k+1/2}e^{-k}\leq k!\leq 3\cdot k^{k+1/2}e^{-k},

valid for k1k\geq 1. To estimate the products that appear in the expectation, we will need to omit all factors which equal 0!=10!=1 since Stirling’s approximation is not valid for these. To do this carefully, let

𝙱={b𝙱:W(b)0}\mathtt{B}^{\prime}=\{b\in\mathtt{B}\,:\,W(b)\neq 0\}

and for each i[r]i\in[r] let

𝙱i={(b,b)𝙱2:W(b,b;i)0}.\mathtt{B}^{\prime}_{i}=\{(b,b^{\prime})\in\mathtt{B}^{2}\,:\,W(b,b^{\prime};i)\neq 0\}.

For the numerator of the above expectation we get

n!1rb𝙱(nW(b))!2r1\displaystyle n!^{1-r}\prod_{b\in\mathtt{B}^{\prime}}(nW(b))!^{2r-1} (3nn+1/2en)1rb𝙱(3(nW(b))nW(b)+1/2enW(b))2r1\displaystyle\leq(3n^{n+1/2}\,e^{-n})^{1-r}\prod_{b\in\mathtt{B}^{\prime}}\left(3(nW(b))^{nW(b)+1/2}e^{-nW(b)}\right)^{2r-1}
=31r+|𝙱|(2r1)nrn+1/2r/2+(2r1)|𝙱|/2\displaystyle=3^{1-r+\lvert\mathtt{B}^{\prime}\rvert(2r-1)}\,n^{rn+1/2-r/2+(2r-1)\lvert\mathtt{B}^{\prime}\rvert/2}
×ern+(2r1)[nb𝙱W(b)logW(b)+12b𝙱logW(b)]\displaystyle\qquad\times e^{-rn+(2r-1)[n\sum_{b\in\mathtt{B}^{\prime}}W(b)\log W(b)+\frac{1}{2}\sum_{b\in\mathtt{B}^{\prime}}\log W(b)]}

and a lower bound which is identical except missing the first factor. For the denominator, let S=i[r]|𝙱i|S=\sum_{i\in[r]}\lvert\mathtt{B}^{\prime}_{i}\rvert. We get

i=1r(b,b)𝙱i(nW(b,b;i))!\displaystyle\prod_{i=1}^{r}\prod_{(b,b^{\prime})\in\mathtt{B}^{\prime}_{i}}(nW(b,b^{\prime};i))! i=1r(b,b)𝙱i3(nW(b,b;i))nW(b,b;i)+1/2enW(b,b;i)\displaystyle\leq\prod_{i=1}^{r}\prod_{(b,b^{\prime})\in\mathtt{B}^{\prime}_{i}}3(nW(b,b^{\prime};i))^{nW(b,b^{\prime};i)+1/2}e^{-nW(b,b^{\prime};i)}
=3Snnr+S/2\displaystyle=3^{S}\,n^{nr+S/2}
×enib,bW(b,b;i)logW(b,b;i)+12i,b,blogW(b,b;i)nr,\displaystyle\qquad\times e^{n\sum_{i}\sum_{b,b^{\prime}}W(b,b^{\prime};i)\log W(b,b^{\prime};i)+\frac{1}{2}\sum_{i,b,b^{\prime}}\log W(b,b^{\prime};i)-nr},

and again we have a lower bound which is identical except missing the first factor 3S3^{S}. Therefore the quotient is bounded above by

31r+|𝙱|(2r1)n(1r)/2+(2r1)|𝙱|/2S/2enF(W)+(2r1)12blogW(b)12i,b,blogW(b,b;i)3^{1-r+\lvert\mathtt{B}^{\prime}\rvert(2r-1)}\,n^{(1-r)/2+(2r-1)\lvert\mathtt{B}^{\prime}\rvert/2-S/2}\,e^{-nF(W)+(2r-1)\frac{1}{2}\sum_{b}\log W(b)-\frac{1}{2}\sum_{i,b,b^{\prime}}\log W(b,b^{\prime};i)}

and below by

3Sn(1r)/2+(2r1)|𝙱|/2S/2enF(W)+(2r1)12blogW(b)12i,b,blogW(b,b;i).3^{-S}\,n^{(1-r)/2+(2r-1)\lvert\mathtt{B}^{\prime}\rvert/2-S/2}\,e^{-nF(W)+(2r-1)\frac{1}{2}\sum_{b}\log W(b)-\frac{1}{2}\sum_{i,b,b^{\prime}}\log W(b,b^{\prime};i)}.

Since WW has denominator nn, we have

0(2r1)12b𝙱logW(b)(2r1)12b𝙱log1n=2r12|𝙱|logn0\geq(2r-1)\frac{1}{2}\sum_{b\in\mathtt{B}^{\prime}}\log W(b)\geq(2r-1)\frac{1}{2}\sum_{b\in\mathtt{B}^{\prime}}\log\frac{1}{n}=-\frac{2r-1}{2}\lvert\mathtt{B}^{\prime}\rvert\log n

and

012i(b,b)BilogW(b,b;i)12i(b,b)𝙱ilog1n=S2logn.0\leq-\frac{1}{2}\sum_{i}\sum_{(b,b^{\prime})\in B^{\prime}_{i}}\log W(b,b^{\prime};i)\leq-\frac{1}{2}\sum_{i}\sum_{(b,b^{\prime})\in\mathtt{B}^{\prime}_{i}}\log\frac{1}{n}=\frac{S}{2}\log n.

Therefore Zn(W)Z_{n}(W) satisfies

3Sn((1r)S)/2eF(W)n(n!)rZn(W)31r+|𝙱|(2r1)n((1r)+(2r1)|𝙱|)/2eF(W)n(n!)r.3^{-S}n^{\left((1-r)-S\right)/2}e^{F(W)n}(n!)^{r}\leq Z_{n}(W)\leq 3^{1-r+\lvert\mathtt{B}^{\prime}\rvert(2r-1)}n^{\left((1-r)+(2r-1)\lvert\mathtt{B}^{\prime}\rvert\right)/2}e^{F(W)n}(n!)^{r}.

Since Sr|𝙱|2S\leq r\lvert\mathtt{B}\rvert^{2} and |𝙱||𝙱|\lvert\mathtt{B}^{\prime}\rvert\leq\lvert\mathtt{B}\rvert, we conclude that

3r|𝙱|2n((1r)r|𝙱|2)/2eF(W)n(n!)rZn(W)31r+|𝙱|(2r1)n((1r)+(2r1)|𝙱|)/2eF(W)n(n!)r,3^{-r\lvert\mathtt{B}\rvert^{2}}n^{\left((1-r)-r\lvert\mathtt{B}\rvert^{2}\right)/2}e^{F(W)n}(n!)^{r}\leq Z_{n}(W)\leq 3^{1-r+\lvert\mathtt{B}\rvert(2r-1)}n^{\left((1-r)+(2r-1)\lvert\mathtt{B}\rvert\right)/2}e^{F(W)n}(n!)^{r},

and the stated inequality follows. ∎

The following proposition establishes the connection between the relative version of FF and expected numbers of good models over stochastic block models.

Proposition 5.2.

Given any denominator-nn (𝙰×𝙱B(e,k))(\mathtt{A}\times\mathtt{B}^{\mathrm{B}(e,k)})-weight W𝙰𝙱W_{\mathtt{A}\mathtt{B}}, let W𝙱W_{\mathtt{B}} denote the 𝙱B(e,k)\mathtt{B}^{\mathrm{B}(e,k)}-weight π𝙱W𝙰𝙱\pi_{\mathtt{B}}W_{\mathtt{A}\mathtt{B}}. Let 𝐲𝙱n\mathbf{y}\in\mathtt{B}^{n} be a fixed labeling with p𝐲=πeW𝙱()p_{\mathbf{y}}=\pi_{e}W_{\mathtt{B}}(\cdot), and let

μ=𝚂𝙱𝙼(𝐲,W𝙱)=Unif({σHom(G,Sym(n)):Wσ,𝐲k=W𝙱}),\mu=\mathtt{SBM}(\mathbf{y},W_{\mathtt{B}})=\operatorname{Unif}(\{\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n))\,:\,W_{\sigma,\mathbf{y}^{k}}=W_{\mathtt{B}}\}),

assuming W𝙱W_{\mathtt{B}} is such that the desired support is nonempty. Then

𝔼σμ|{𝐱𝙰n:Wσ,(𝐱,𝐲k)=W𝙰𝙱}|=Zn(W𝙰𝙱)Zn(W𝙱).\mathcal{E}\coloneqq\operatorname*{\mathbb{E}}_{\sigma\sim\mu}\left\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,W_{\sigma,(\mathbf{x},\mathbf{y}^{k})}=W_{\mathtt{A}\mathtt{B}}\}\right\rvert=\frac{Z_{n}(W_{\mathtt{A}\mathtt{B}})}{Z_{n}(W_{\mathtt{B}})}.

In particular,

en(F(W𝙰𝙱)F(W𝙱))((9n)r|𝙱|2(|𝙰|2+1),(9n)r|𝙱|2(|𝙰|2+1)).\frac{\mathcal{E}}{e^{n(F(W_{\mathtt{A}\mathtt{B}})-F(W_{\mathtt{B}}))}}\in\left((9n)^{-r\lvert\mathtt{B}\rvert^{2}(\lvert\mathtt{A}\rvert^{2}+1)},\ (9n)^{r\lvert\mathtt{B}\rvert^{2}(\lvert\mathtt{A}\rvert^{2}+1)}\right).
Lemma 5.3.

Let W𝙰𝙱W_{\mathtt{A}\mathtt{B}} be a 𝙰×𝙱B(e,k)\mathtt{A}\times\mathtt{B}^{\mathrm{B}(e,k)} weight of denominator nn. Then

|{(σ,𝐱,𝐲):Wσ,(𝐱,𝐲k)=W𝙰𝙱}|{0,|{(σ,𝐱,𝐘):Wσ,(𝐱,𝐘)=W𝙰𝙱}|}.\left\lvert\{(\sigma,\mathbf{x},\mathbf{y})\,:\,W_{\sigma,(\mathbf{x},\mathbf{y}^{k})}=W_{\mathtt{A}\mathtt{B}}\}\right\rvert\in\big{\{}0,\ \left\lvert\{(\sigma,\mathbf{x},\mathbf{Y})\,:\,W_{\sigma,(\mathbf{x},\mathbf{Y})}=W_{\mathtt{A}\mathtt{B}}\}\right\rvert\big{\}}.
Proof.

Suppose |{(σ,𝐱,𝐲):Wσ,(𝐱,𝐲k)=W𝙰𝙱}|0\left\lvert\{(\sigma,\mathbf{x},\mathbf{y})\,:\,W_{\sigma,(\mathbf{x},\mathbf{y}^{k})}=W_{\mathtt{A}\mathtt{B}}\}\right\rvert\neq 0; we then need to show

|{(σ,𝐱,𝐲):Wσ,(𝐱,𝐲k)=W𝙰𝙱}|=|{(σ,𝐱,𝐘):Wσ,(𝐱,𝐘)=W𝙰𝙱}|.\left\lvert\{(\sigma,\mathbf{x},\mathbf{y})\,:\,W_{\sigma,(\mathbf{x},\mathbf{y}^{k})}=W_{\mathtt{A}\mathtt{B}}\}\right\rvert=\left\lvert\{(\sigma,\mathbf{x},\mathbf{Y})\,:\,W_{\sigma,(\mathbf{x},\mathbf{Y})}=W_{\mathtt{A}\mathtt{B}}\}\right\rvert.

The inequality \leq is clear, since we have an injection (σ,𝐱,𝐲)(σ,𝐱,𝐲k)(\sigma,\mathbf{x},\mathbf{y})\mapsto(\sigma,\mathbf{x},\mathbf{y}^{k}).

The converse inequality holds because (σ,𝐱,𝐘)(σ,𝐱,𝐘e)(\sigma,\mathbf{x},\mathbf{Y})\mapsto(\sigma,\mathbf{x},\mathbf{Y}_{e}) in an injection from the set on the right to the set on the left. This follows from the remark at the beginning of the proof of Proposition 2.5. ∎

Proof of Proposition.

Let

μ~=Unif({(σ,𝐲~):Wσ,𝐲~k=W𝙱});\tilde{\mu}=\operatorname{Unif}(\{(\sigma,\mathbf{\tilde{y}})\,:\,W_{\sigma,\mathbf{\tilde{y}}^{k}}=W_{\mathtt{B}}\});

then, since |{𝐱𝙰n:Wσ,(𝐱,𝐲~k)=W𝙰𝙱}|\left\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,W_{\sigma,(\mathbf{x},\mathbf{\tilde{y}}^{k})}=W_{\mathtt{A}\mathtt{B}}\}\right\rvert is independent of the choice of 𝐲~\mathbf{\tilde{y}} with p𝐲~=πeW𝙱()p_{\mathbf{\tilde{y}}}=\pi_{e}W_{\mathtt{B}}(\cdot),

\displaystyle\mathcal{E} =𝔼(σ,𝐲~)μ~|{𝐱𝙰n:Wσ,(𝐱,𝐲~k)=W𝙰𝙱}|\displaystyle=\operatorname*{\mathbb{E}}_{(\sigma,\mathbf{\tilde{y}})\sim\tilde{\mu}}\left\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,W_{\sigma,(\mathbf{x},\mathbf{\tilde{y}}^{k})}=W_{\mathtt{A}\mathtt{B}}\}\right\rvert
=σ,𝐲~|{𝐱𝙰n:Wσ,(𝐱,𝐲~k)=W𝙰𝙱}||{(σ,𝐲~):Wσ,𝐲~k=W𝙱}|\displaystyle=\frac{\sum_{\sigma,\mathbf{\tilde{y}}}\left\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,W_{\sigma,(\mathbf{x},\mathbf{\tilde{y}}^{k})}=W_{\mathtt{A}\mathtt{B}}\}\right\rvert}{\left\lvert\{(\sigma,\mathbf{\tilde{y}})\,:\,W_{\sigma,\mathbf{\tilde{y}}^{k}}=W_{\mathtt{B}}\}\right\rvert}
=|{(σ,𝐱,𝐲~):Wσ,(𝐱,𝐲~k)=W𝙰𝙱}||{(σ,𝐲~):Wσ,𝐲~k=W𝙱}|\displaystyle=\frac{\left\lvert\{(\sigma,\mathbf{x},\mathbf{\tilde{y}})\,:\,W_{\sigma,(\mathbf{x},\mathbf{\tilde{y}}^{k})}=W_{\mathtt{A}\mathtt{B}}\}\right\rvert}{\left\lvert\{(\sigma,\mathbf{\tilde{y}})\,:\,W_{\sigma,\mathbf{\tilde{y}}^{k}}=W_{\mathtt{B}}\}\right\rvert}
=|{(σ,𝐱,𝐘):Wσ,(𝐱,𝐘)=W𝙰𝙱}||{(σ,𝐘):Wσ,𝐘=W𝙱}|\displaystyle=\frac{\left\lvert\{(\sigma,\mathbf{x},\mathbf{Y})\,:\,W_{\sigma,(\mathbf{x},\mathbf{Y})}=W_{\mathtt{A}\mathtt{B}}\}\right\rvert}{\left\lvert\{(\sigma,\mathbf{Y})\,:\,W_{\sigma,\mathbf{Y}}=W_{\mathtt{B}}\}\right\rvert} (previous lemma)
=Zn(W𝙰𝙱)Zn(W𝙱).\displaystyle=\frac{Z_{n}(W_{\mathtt{A}\mathtt{B}})}{Z_{n}(W_{\mathtt{B}})}.

Note that our assumption that the intended support of μ\mu is nonempty allows us to rule out the “0” case in the application of the lemma.

The rest of the result then follows from our estimates on ZnZ_{n} in Proposition 5.1. ∎

6 Proof of Theorem A

6.1 Upper bound

Note that we will not rely on the Markov assumption for the upper bound.

For each kk\in\mathbb{N},

inf𝒪(αβ)Gμ\displaystyle\inf_{\mathcal{O}\ni(\alpha\beta)^{G}_{*}\mu} lim supn1nlog𝔼σμn|{𝐱𝙰n:(𝐱,𝐲n)Ω(σ,𝒪)}|\displaystyle\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y}_{n})\in\Omega(\sigma,\mathcal{O})\}\rvert
infεlim supn1nlog𝔼σμn|{𝐱𝙰n:(𝐱,𝐲n)Ωk(σ,αβ,ε)}|\displaystyle\leq\inf_{\varepsilon}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y}_{n})\in\Omega_{k}^{*}(\sigma,\alpha\beta,\varepsilon)\}\rvert
=infεlim supn1nlog𝔼σμn|{𝐱𝙰n:(𝐱k,𝐲nk)Ω0(σ,(αβ)k,ε)}|\displaystyle=\inf_{\varepsilon}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x}^{k},\mathbf{y}_{n}^{k})\in\Omega_{0}^{*}(\sigma,(\alpha\beta)^{k},\varepsilon)\}\rvert
infεlim supn1nlog𝔼σμn|{𝐗(𝙰B(e,k))n:(𝐗,𝐲nk)Ω0(σ,(αβ)k,ε)}|.\displaystyle\leq\inf_{\varepsilon}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,k)})^{n}\,:\,(\mathbf{X},\mathbf{y}_{n}^{k})\in\Omega_{0}^{*}(\sigma,(\alpha\beta)^{k},\varepsilon)\}\rvert.

Write

k(n,ε)\displaystyle\mathcal{E}_{k}(n,\varepsilon) 𝔼σμn|{𝐗(𝙰B(e,k))n:(𝐗,𝐲nk)Ω0(σ,(αβ)k,ε)}|\displaystyle\coloneqq\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,k)})^{n}\,:\,(\mathbf{X},\mathbf{y}_{n}^{k})\in\Omega_{0}^{*}(\sigma,(\alpha\beta)^{k},\varepsilon)\}\rvert
=𝔼σμn|{𝐗(𝙰B(e,k))n:d(Wσ,(𝐗,𝐲nk),W(αβ)k)<ε)}|\displaystyle=\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,k)})^{n}\,:\,d\big{(}W_{\sigma,(\mathbf{X},\mathbf{y}_{n}^{k})},W_{(\alpha\beta)^{k}}\big{)}<\varepsilon)\}\rvert

and assume that nn is large enough that mnkm_{n}\geq k.

Writing 𝒲n(αβ,k,ε)\mathcal{W}_{n}(\alpha\beta,k,\varepsilon) for the set of all denominator-nn weights WW with d(W,W(αβ)k)<εd(W,W_{(\alpha\beta)^{k}})<\varepsilon,

k(n,ε)\displaystyle\mathcal{E}_{k}(n,\varepsilon) =𝔼σμnW𝒲n(αβ,k,ε)|{𝐗(𝙰B(e,k))n:Wσ,(𝐗,𝐲nk)=W}|\displaystyle=\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\!\sum_{W\in\mathcal{W}_{n}(\alpha\beta,k,\varepsilon)}\!\lvert\{\mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,k)})^{n}\,:\,W_{\sigma,(\mathbf{X},\mathbf{y}_{n}^{k})}=W\}\rvert
=W𝒲n(αβ,k,ε)𝔼σμn[|{𝐗(𝙰B(e,k))n:Wσ,(𝐗,𝐲nk)=W}||Wσ,𝐲nk=π𝙱W]σμn(Wσ,𝐲nk=π𝙱W)\displaystyle=\sum_{W\in\mathcal{W}_{n}(\alpha\beta,k,\varepsilon)}\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\big{[}\lvert\{\mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,k)})^{n}\,:\,W_{\sigma,(\mathbf{X},\mathbf{y}_{n}^{k})}=W\}\rvert\big{|}W_{\sigma,\mathbf{y}_{n}^{k}}=\pi_{\mathtt{B}}W\big{]}\operatorname*{\mathbb{P}}_{\sigma\sim\mu_{n}}(W_{\sigma,\mathbf{y}_{n}^{k}}=\pi_{\mathtt{B}}W)

since if Wσ,𝐲nkπ𝙱WW_{\sigma,\mathbf{y}_{n}^{k}}\neq\pi_{\mathtt{B}}W then Wσ,(𝐗,𝐲nk)WW_{\sigma,(\mathbf{X},\mathbf{y}_{n}^{k})}\neq W. But μn\mu_{n} conditioned on {Wσ,𝐲nk=π𝙱W}\{W_{\sigma,\mathbf{y}_{n}^{k}}=\pi_{\mathtt{B}}W\} is 𝚂𝙱𝙼(𝐲n,π𝙱W)\mathtt{SBM}(\mathbf{y}_{n},\pi_{\mathtt{B}}W), so we can bound the expectation above using Proposition 5.2, getting

k(n,ε)(9n)r|𝙱B(e,k)|2(|𝙰B(e,k)|+1)W𝒲n(αβ,k,ε)en(F(W)F(π𝙱W))σμn(Wσ,𝐲nk=π𝙱W).\mathcal{E}_{k}(n,\varepsilon)\leq(9n)^{r\lvert\mathtt{B}^{\mathrm{B}(e,k)}\rvert^{2}(\lvert\mathtt{A}^{\mathrm{B}(e,k)}\rvert+1)}\!\sum_{W\in\mathcal{W}_{n}(\alpha\beta,k,\varepsilon)}\!e^{n(F(W)-F(\pi_{\mathtt{B}}W))}\operatorname*{\mathbb{P}}_{\sigma\sim\mu_{n}}(W_{\sigma,\mathbf{y}_{n}^{k}}=\pi_{\mathtt{B}}W).

Note (9n)r|𝙱B(e,k)|2(|𝙰B(e,k)|+1)eon(n)(9n)^{r\lvert\mathtt{B}^{\mathrm{B}(e,k)}\rvert^{2}(\lvert\mathtt{A}^{\mathrm{B}(e,k)}\rvert+1)}\leq e^{o_{n\to\infty}(n)}. Fix δ>0\delta>0. By continuity of FF, for all small enough ε\varepsilon (possibly depending on kk) we have

k(n,ε)en(Fμ(T,αkβk)+δ+on(1))W𝒲n(αβ,k,ε)σμn(Wσ,𝐲nk=π𝙱W).\mathcal{E}_{k}(n,\varepsilon)\leq e^{n(F_{\mu}(T,\alpha^{k}\mid\beta^{k})+\delta+o_{n\to\infty}(1))}\sum_{W\in\mathcal{W}_{n}(\alpha\beta,k,\varepsilon)}\operatorname*{\mathbb{P}}_{\sigma\sim\mu_{n}}(W_{\sigma,\mathbf{y}_{n}^{k}}=\pi_{\mathtt{B}}W).

Bounding each probability by 1, we get

k(n,ε)en(Fμ(T,αkβk)+δ+on(1))|𝒲n(αβ,k,ε)|.\mathcal{E}_{k}(n,\varepsilon)\leq e^{n(F_{\mu}(T,\alpha^{k}\mid\beta^{k})+\delta+o_{n\to\infty}(1))}\left\lvert\mathcal{W}_{n}(\alpha\beta,k,\varepsilon)\right\rvert.

But

|𝒲n(αβ,k,ε)|nr|(𝙰×𝙱)B(e,k)|2eon(n),\left\lvert\mathcal{W}_{n}(\alpha\beta,k,\varepsilon)\right\rvert\leq n^{r\left\lvert(\mathtt{A}\times\mathtt{B})^{\mathrm{B}(e,k)}\right\rvert^{2}}\leq e^{o_{n\to\infty}(n)},

so this implies

lim supn1nlogk(n,ε)\displaystyle\limsup_{n\to\infty}\frac{1}{n}\log\mathcal{E}_{k}(n,\varepsilon) Fμ(T,αkβk)+δ\displaystyle\leq F_{\mu}(T,\alpha^{k}\mid\beta^{k})+\delta
Fμ(T,αkβk2)+δ\displaystyle\leq F_{\mu}(T,\alpha^{k}\mid\beta^{k_{2}})+\delta

for any k2kk_{2}\geq k, by monotonicity under splitting. Taking the limit as k2k_{2}\to\infty followed by the infimum over ε\varepsilon (which takes δ\delta to 0) and kk gives

infε,klim supn1nlogk(n,ε)fμ(T,αβ).\inf_{\varepsilon,k}\limsup_{n\to\infty}\frac{1}{n}\log\mathcal{E}_{k}(n,\varepsilon)\leq f_{\mu}(T,\alpha\mid\beta).

Since

inf𝒪(αβ)Gμlim supn1nlog𝔼σμn|{𝐱𝙰n:(𝐱,𝐲n)Ω(σ,𝒪)}|infεlim supn1nlogk(n,ε)\inf_{\mathcal{O}\ni(\alpha\beta)^{G}_{*}\mu}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y}_{n})\in\Omega(\sigma,\mathcal{O})\}\rvert\leq\inf_{\varepsilon}\limsup_{n\to\infty}\frac{1}{n}\log\mathcal{E}_{k}(n,\varepsilon)

for every kk, this completes the upper bound.

6.2 Lower bound

Fix kk\in\mathbb{N}. To estimate

\displaystyle\mathcal{E} 𝔼σμn|{𝐱𝙰n:(𝐱,𝐲n)Ωk(σ,αβ,ε)}|\displaystyle\coloneqq\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\left\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y}_{n})\in\Omega_{k}^{*}(\sigma,\alpha\beta,\varepsilon)\}\right\rvert

we bound below using the expected size of

𝒳k(σ,αβ,ε𝐲n){𝐗(𝙰B(e,k))n:(𝐗,𝐲nk)Ω0(σ,(αβ)k,ε)}.\mathcal{X}_{k}(\sigma,\alpha\beta,\varepsilon\mid\mathbf{y}_{n})\coloneqq\{\mathbf{X}\in\left(\mathtt{A}^{\mathrm{B}(e,k)}\right)^{n}\,:\,(\mathbf{X},\mathbf{y}_{n}^{k})\in\Omega_{0}^{*}(\sigma,(\alpha\beta)^{k},\varepsilon)\}.

This is not a true lower bound but, by Equation 1 below, there are constants C,d,cC,d,c independent of nn such that

|𝒳k(σ,αβ,ε𝐲n)|Cexp(ndε+nH(2|B(e,k)|ε))|{𝐱𝙰n:(𝐱,𝐲n)Ωk(σ,αβ,ε)}|.\left\lvert\mathcal{X}_{k}(\sigma,\alpha\beta,\varepsilon\mid\mathbf{y}_{n})\right\rvert\leq C\exp\big{(}nd\varepsilon+n\mathrm{H}(2\lvert\mathrm{B}(e,k)\rvert\varepsilon)\big{)}\cdot\left\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y}_{n})\in\Omega_{k}^{*}(\sigma,\alpha\beta,\varepsilon)\}\right\rvert.

The ‘error’ factor has an exponential growth rate which vanishes as ε0\varepsilon\to 0, so will not be a problem.

We now find a lower bound for the expectation of |𝒳k|\lvert\mathcal{X}_{k}\rvert. Applying Proposition 5.2 as above, we have

𝔼σμn\displaystyle\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}} |𝒳k(σ,αβ,ε𝐲n)|\displaystyle\lvert\mathcal{X}_{k}(\sigma,\alpha\beta,\varepsilon\mid\mathbf{y}_{n})\rvert
=W𝒲n(αβ,k,ε)𝔼σμn|{𝐗(𝙰B(e,k))n:Wσ,(𝐗,𝐲nk)=W}|\displaystyle=\sum_{W\in\mathcal{W}_{n}(\alpha\beta,k,\varepsilon)}\!\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,k)})^{n}\,:\,W_{\sigma,(\mathbf{X},\mathbf{y}_{n}^{k})}=W\}\rvert
W𝒲n(αβ,k,ε)exp[n(F(W)F(π𝙱W)on(1))]σμn(π𝙱W=Wσ,𝐲nk).\displaystyle\geq\sum_{W\in\mathcal{W}_{n}(\alpha\beta,k,\varepsilon)}\!\exp\!\big{[}n(F(W)-F(\pi_{\mathtt{B}}W)-o_{n}(1))\big{]}\operatorname*{\mathbb{P}}_{\sigma\sim\mu_{n}}\big{(}\pi_{\mathtt{B}}W=W_{\sigma,\mathbf{y}_{n}^{k}}\big{)}.

For any δ>0\delta>0, for small enough ε>0\varepsilon>0 (independent of nn), by continuity of FF this is at least

exp[n(Fμ(αkβk)δon(1))]W𝒲n(αβ,k,ε)σμn(π𝙱W=Wσ,𝐲nk).\exp\big{[}n(F_{\mu}(\alpha^{k}\mid\beta^{k})-\delta-o_{n}(1))\big{]}\sum_{W\in\mathcal{W}_{n}(\alpha\beta,k,\varepsilon)}\operatorname*{\mathbb{P}}_{\sigma\sim\mu_{n}}\big{(}\pi_{\mathtt{B}}W=W_{\sigma,\mathbf{y}_{n}^{k}}\big{)}.

We give a lower bound for the sum by first rewriting it as

W𝙱 denom.-n𝙱B(e,k)weight|{W𝒲n(αβ,k,ε):π𝙱W=W𝙱}|σμn(Wσ,𝐲nk=W𝙱).\sum_{\mathclap{W_{\mathtt{B}}\text{ denom.-}n\ \mathtt{B}^{\mathrm{B}(e,k)}-\text{weight}}}\ \left\lvert\{W\in\mathcal{W}_{n}(\alpha\beta,k,\varepsilon)\,:\,\pi_{\mathtt{B}}W=W_{\mathtt{B}}\}\right\rvert\cdot\operatorname*{\mathbb{P}}_{\sigma\sim\mu_{n}}(W_{\sigma,\mathbf{y}_{n}^{k}}=W_{\mathtt{B}}).

Fix η>0\eta>0. By Lemma 2.3, for all large enough nn the 𝙱\mathtt{B}-weight Wσn,𝐲nW_{\sigma_{n},\mathbf{y}_{n}} can be extended to a 𝙱B(e,k)\mathtt{B}^{\mathrm{B}(e,k)}-weight W𝙱W_{\mathtt{B}} with d(W𝙱,Wβk)ηd(W_{\mathtt{B}},W_{\beta^{k}})\leq\eta; to apply the lemma we can think of the extended weight W𝙱W_{\mathtt{B}} as having alphabet 𝙱B(e,k){e}×𝙱\mathtt{B}^{\mathrm{B}(e,k)\setminus\{e\}}\times\mathtt{B}, and recall that we assume limnd(Wσn,𝐲n,Wβ)=0\lim_{n\to\infty}d(W_{\sigma_{n},\mathbf{y}_{n}},W_{\beta})=0. Choose σ,𝐘\sigma,\mathbf{Y} such that W𝙱=Wσ,𝐘W_{\mathtt{B}}=W_{\sigma,\mathbf{Y}}. Since W𝙱W_{\mathtt{B}} is an extension of Wσn,𝐲nW_{\sigma_{n},\mathbf{y}_{n}}, we can make this choice in such a way that πe𝐘=𝐲n\pi_{e}\mathbf{Y}=\mathbf{y}_{n}.

Let W𝙱~=Wσ,𝐲nk\widetilde{W_{\mathtt{B}}}=W_{\sigma,\mathbf{y}_{n}^{k}}. By Proposition 2.5,

d(W𝙱~,Wβk)d(W𝙱~,W𝙱)+d(W𝙱,Wβk)2r|B(e,k)|η+η.d(\widetilde{W_{\mathtt{B}}},W_{\beta^{k}})\leq d(\widetilde{W_{\mathtt{B}}},W_{\mathtt{B}})+d(W_{\mathtt{B}},W_{\beta^{k}})\leq 2r\lvert\mathrm{B}(e,k)\rvert\eta+\eta.

So, as long as η\eta is small enough and nn is large enough (depending on ε,k\varepsilon,k), by Lemma 2.3

|{W𝒲n(αβ,k,ε):π𝙱W=W𝙱}|1.\left\lvert\{W\in\mathcal{W}_{n}(\alpha\beta,k,\varepsilon)\,:\,\pi_{\mathtt{B}}W=W_{\mathtt{B}}\}\right\rvert\geq 1.

Now consider the probability appearing in the W𝙱~\widetilde{W_{\mathtt{B}}} term:

σμn(Wσ,𝐲nk=W~𝙱)=|{σ:Wσ,𝐲nk=W~𝙱}||{σ:Wσ,𝐲n=Wσn,𝐲n}|.\operatorname*{\mathbb{P}}_{\sigma\sim\mu_{n}}(W_{\sigma,\mathbf{y}_{n}^{k}}=\widetilde{W}_{\mathtt{B}})=\frac{\lvert\{\sigma\,:\,W_{\sigma,\mathbf{y}_{n}^{k}}=\widetilde{W}_{\mathtt{B}}\}\rvert}{\lvert\{\sigma\,:\,W_{\sigma,\mathbf{y}_{n}}=W_{\sigma_{n},\mathbf{y}_{n}}\}\rvert}.

By symmetry in choice of 𝐲\mathbf{y} with the correct letter frequencies, we can write this as

σμn(Wσ,𝐲nk=W~𝙱)\displaystyle\operatorname*{\mathbb{P}}_{\sigma\sim\mu_{n}}(W_{\sigma,\mathbf{y}_{n}^{k}}=\widetilde{W}_{\mathtt{B}}) =|{(σ,𝐲):Wσ,𝐲k=W~𝙱}||{(σ,𝐲):Wσ,𝐲=Wσn,𝐲n}|\displaystyle=\frac{\left\lvert\{(\sigma,\mathbf{y})\,:\,W_{\sigma,\mathbf{y}^{k}}=\widetilde{W}_{\mathtt{B}}\}\right\rvert}{\left\lvert\{(\sigma,\mathbf{y})\,:\,W_{\sigma,\mathbf{y}}=W_{\sigma_{n},\mathbf{y}_{n}}\}\right\rvert}
=|{(σ,𝐘):Wσ,𝐘=W~𝙱}||{(σ,𝐲):Wσ,𝐲=Wσn,𝐲n}|\displaystyle=\frac{\left\lvert\{(\sigma,\mathbf{Y})\,:\,W_{\sigma,\mathbf{Y}}=\widetilde{W}_{\mathtt{B}}\}\right\rvert}{\left\lvert\{(\sigma,\mathbf{y})\,:\,W_{\sigma,\mathbf{y}}=W_{\sigma_{n},\mathbf{y}_{n}}\}\right\rvert} (Prop. 2.5)
=Zn(W~𝙱)Zn(Wσn,𝐲n)\displaystyle=\frac{Z_{n}(\widetilde{W}_{\mathtt{B}})}{Z_{n}(W_{\sigma_{n},\mathbf{y}_{n}})} (definition of ZnZ_{n})
exp(n[F(W~𝙱)F(Wσn,𝐲n)])(3n)r(|𝙱B(e,k)|2|𝙱|)\displaystyle\geq\exp\!\left(n[F(\widetilde{W}_{\mathtt{B}})-F(W_{\sigma_{n},\mathbf{y}_{n}})]\right)\cdot(3\sqrt{n})^{-r(\lvert\mathtt{B}^{\mathrm{B}(e,k)}\rvert^{2}-\lvert\mathtt{B}\rvert)} (Prop. 5.1)
=exp(n[F(W~𝙱)F(Wσn,𝐲n)o(1)]).\displaystyle=\exp\!\left(n[F(\widetilde{W}_{\mathtt{B}})-F(W_{\sigma_{n},\mathbf{y}_{n}})-o(1)]\right).

By continuity of FF, we then get

σμn(Wσ,𝐲nk=W~𝙱)expn(Fμ(βk)Fμ(β)2δ+o(1))\operatorname*{\mathbb{P}}_{\sigma\sim\mu_{n}}(W_{\sigma,\mathbf{y}_{n}^{k}}=\widetilde{W}_{\mathtt{B}})\geq\exp n\!\left(F_{\mu}(\beta^{k})-F_{\mu}(\beta)-2\delta+o(1)\right)

for all large enough nn and small enough η\eta (again depending on k,εk,\varepsilon), with δ>0\delta>0 the same as chosen above. Since βGμ\beta^{G}_{*}\mu is a Markov chain, Fμ(βk)=Fμ(β)F_{\mu}(\beta^{k})=F_{\mu}(\beta).

Putting this all together: for any kk\in\mathbb{N}, for all δ>0\delta>0 we have

𝔼σμn|𝒳k(σ,αβ,ε𝐲n)|exp[n(Fμ(αkβk)3δo(1))]\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\mathcal{X}_{k}(\sigma,\alpha\beta,\varepsilon\mid\mathbf{y}_{n})\rvert\geq\exp\big{[}n(F_{\mu}(\alpha^{k}\mid\beta^{k})-3\delta-o(1))\big{]}

for all large enough nn and small enough ε>0\varepsilon>0.

It follows that for any kk\in\mathbb{N}

infεlim supn1nlog𝔼σμn|{𝐱𝙰n:(𝐱,𝐲n)Ωk(σ,αβ,ε)}|Fμ(T,αkβk).\inf_{\varepsilon}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\left\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y}_{n})\in\Omega_{k}^{*}(\sigma,\alpha\beta,\varepsilon)\}\right\rvert\geq F_{\mu}(T,\alpha^{k}\mid\beta^{k}).

Taking the limit as kk\to\infty gives the desired bound, using Corollary 3.3 and that the family of pseudometrics {dk:k}\{d^{*}_{k}\,:\,k\in\mathbb{N}\} generates the weak topology.

7 Proof of Theorem B

Let Wn=Wσn,𝐲nmnW_{n}=W_{\sigma_{n},\mathbf{y}_{n}^{m_{n}}}, so that

μn=𝚂𝙱𝙼(𝐲n,Wn).\mu_{n}=\mathtt{SBM}(\mathbf{y}_{n},W_{n}).

Note that, by definition of μn\mu_{n},

σμn(Wσ,𝐲nmn=Wn)=1.\operatorname*{\mathbb{P}}_{\sigma\sim\mu_{n}}\left(W_{\sigma,\mathbf{y}_{n}^{m_{n}}}=W_{n}\right)=1.
Lemma 7.1.

With WnW_{n} as just defined in terms of mnm_{n}, σn\sigma_{n}, and 𝐲n\mathbf{y}_{n}, we have

limnF(Wn)=fμ(T,β).\lim_{n\to\infty}F(W_{n})=f_{\mu}(T,\beta).
Proof.

The assumption in the theorem statement that dmn(P𝐲nσn,βGμ)=O(1logn)d_{m_{n}}^{*}(P_{\mathbf{y}_{n}}^{\sigma_{n}},\beta^{G}_{*}\mu)=O\big{(}\tfrac{1}{\log n}\big{)} implies the existence of a constant CC such that

d(Wn,Wβmn)Clogn.d(W_{n},W_{\beta^{m_{n}}})\leq\frac{C}{\log n}.

By Lemma 3.1 we have

|F(Wσ,𝐲mn)F(Wβmn)|4r(H(Clogn)+Clogn|B(e,mn)|log2|𝙱|)=o(1)\lvert F(W_{\sigma,\mathbf{y}^{m_{n}}})-F(W_{\beta^{m_{n}}})\rvert\leq 4r\big{(}\mathrm{H}(\tfrac{C}{\log n})+\tfrac{C}{\log n}\lvert\mathrm{B}(e,m_{n})\rvert\log_{2}\lvert\mathtt{B}\rvert\big{)}=o(1)

using that mn=o(loglogn)m_{n}=o(\log\log n). Since mnm_{n} approaches infinity as nn goes to infinity we have fμ(T,β)=limnF(Wβmn)f_{\mu}(T,\beta)=\lim_{n\to\infty}F(W_{\beta^{m_{n}}}), so the result follows. ∎

Lemma 7.2.

If mn=o(loglogn)m_{n}=o(\log\log n), then for any k>0k>0 and ε>0\varepsilon>0 we have |𝙱B(e,mn)|k=o(nε)\lvert\mathtt{B}^{\mathrm{B}(e,m_{n})}\rvert^{k}=o(n^{\varepsilon}).

Proof.

This is certainly true if |𝙱|=1\lvert\mathtt{B}\rvert=1; assume therefore that |𝙱|2\lvert\mathtt{B}\rvert\geq 2.

Our assumption mn=o(loglogn)m_{n}=o(\log\log n) guarantees that

(2r1)mn<r1rεklog|𝙱|logn(2r-1)^{m_{n}}<\frac{r-1}{r}\frac{\varepsilon}{k\log\lvert\mathtt{B}\rvert}\log n

for all large enough nn. Therefore

|B(e,mn)|=r(2r1)mn1r1<εklog|𝙱|logn.\lvert\mathrm{B}(e,m_{n})\rvert=\frac{r(2r-1)^{m_{n}}-1}{r-1}<\frac{\varepsilon}{k\log\lvert\mathtt{B}\rvert}\log n.

This inequality can be rearranged to give

|𝙱B(e,mn)|k<nε.\lvert\mathtt{B}^{\mathrm{B}(e,m_{n})}\rvert^{k}<n^{\varepsilon}.

Since ε>0\varepsilon>0 is arbitrary, the result follows. ∎

In the remainder of this section we prove Theorem B by first proving the right-hand side is an upper bound for the left, then proving it is also lower bound.

7.1 Upper bound

Just as in the proof of the upper bound in Theorem A, for each kk\in\mathbb{N} and ε>0\varepsilon>0 we have

inf𝒪(αβ)Gμlim supn1nlog𝔼σμn|{𝐱𝙰n:(𝐱,𝐲n)Ω(σ,𝒪)}|lim supn1nlogk(n,ε),\inf_{\mathcal{O}\ni(\alpha\beta)^{G}_{*}\mu}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y}_{n})\in\Omega(\sigma,\mathcal{O})\}\rvert\leq\limsup_{n\to\infty}\frac{1}{n}\log\mathcal{E}_{k}(n,\varepsilon),

where

k(n,ε)\displaystyle\mathcal{E}_{k}(n,\varepsilon) 𝔼σμn|{𝐗(𝙰B(e,k))n:(𝐗,𝐲nk)Ω0(σ,(αβ)k,ε)}|\displaystyle\coloneqq\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,k)})^{n}\,:\,(\mathbf{X},\mathbf{y}_{n}^{k})\in\Omega_{0}^{*}(\sigma,(\alpha\beta)^{k},\varepsilon)\}\rvert
=𝔼σμn|{𝐗(𝙰B(e,k))n:d(Wσ,(𝐗,𝐲nk),W(αβ)k)<ε)}|.\displaystyle=\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,k)})^{n}\,:\,d\big{(}W_{\sigma,(\mathbf{X},\mathbf{y}_{n}^{k})},W_{(\alpha\beta)^{k}}\big{)}<\varepsilon)\}\rvert.

We assume that nn is large enough that mnkm_{n}\geq k.

Since μn\mu_{n} is 𝚂𝙱𝙼(σn,𝐲n,mn)\mathtt{SBM}(\sigma_{n},\mathbf{y}_{n},m_{n}) rather than 𝚂𝙱𝙼(σn,𝐲n,k)\mathtt{SBM}(\sigma_{n},\mathbf{y}_{n},k), we cannot apply Proposition 5.2 directly to this expression. We get around this as follows: Let

𝒲n(m,m){Wσ,(𝐗,𝐲m):σHom(G,Sym(n)),𝐗(𝙰B(e,m))n,𝐲𝙱n}.\mathcal{W}_{n}(m,m^{\prime})\coloneqq\left\{W_{\sigma,(\mathbf{X},\mathbf{y}^{m^{\prime}})}\,:\,\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n)),\ \mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,m)})^{n},\ \mathbf{y}\in\mathtt{B}^{n}\right\}.

All elements of this set are denominator-nn 𝙰B(e,m)×𝙱B(e,m)\mathtt{A}^{\mathrm{B}(e,m)}\times\mathtt{B}^{\mathrm{B}(e,m^{\prime})}-weights; we avoid the question of exactly which weights are in this set, but call such weights attainable. For kmk\leq m and kmk^{\prime}\leq m^{\prime} let

𝒲n(m,m;αβ,k,k;ε)={W𝒲n(m,m):d(πk,kW,Wαkβk)<ε}\mathcal{W}_{n}(m,m^{\prime};\alpha\beta,k,k^{\prime};\varepsilon)=\left\{W\in\mathcal{W}_{n}(m,m^{\prime})\,:\,d\!\left(\pi_{k,k^{\prime}}W,\ W_{\alpha^{k}\beta^{k^{\prime}}}\right)<\varepsilon\right\}

denote the set of such weights whose appropriate marginal is within ε\varepsilon of the (𝙰B(e,k)×𝙱B(e,k))(\mathtt{A}^{\mathrm{B}(e,k)}\times\mathtt{B}^{\mathrm{B}(e,k^{\prime})})-weight WαkβkW_{\alpha^{k}\beta^{k^{\prime}}}. For now we take m=k=km=k=k^{\prime} but we will need more generality below. Then

k(n,ε)=𝔼σμnW𝒲n(k,mn;αβ,k,k;ε)|{𝐗(𝙰B(e,k))n:Wσ,(𝐗,𝐲nmn)=W}|\mathcal{E}_{k}(n,\varepsilon)=\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\!\sum_{W\in\mathcal{W}_{n}(k,m_{n};\alpha\beta,k,k;\varepsilon)}\!\lvert\{\mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,k)})^{n}\,:\,W_{\sigma,(\mathbf{X},\mathbf{y}_{n}^{m_{n}})}=W\}\rvert

so we can apply Proposition 5.2 to get

k(n,ε)\displaystyle\mathcal{E}_{k}(n,\varepsilon) (9n)r|𝙱B(e,mn)|2(|𝙰B(e,k)|+1)W𝒲n(k,mn;αβ,k,k;ε)en(F(W)F(π𝙱W))𝟏{π𝙱W=Wn}.\displaystyle\leq(9n)^{r\lvert\mathtt{B}^{\mathrm{B}(e,m_{n})}\rvert^{2}(\lvert\mathtt{A}^{\mathrm{B}(e,k)}\rvert+1)}\!\sum_{W\in\mathcal{W}_{n}(k,m_{n};\alpha\beta,k,k;\varepsilon)}\!e^{n(F(W)-F(\pi_{\mathtt{B}}W))}\mathbf{1}_{\{\pi_{\mathtt{B}}W=W_{n}\}}.

By Lemma 7.2 we have (9n)r|𝙱B(e,mn)|2(|𝙰B(e,k)|+1)eon(n)(9n)^{r\lvert\mathtt{B}^{\mathrm{B}(e,m_{n})}\rvert^{2}(\lvert\mathtt{A}^{\mathrm{B}(e,k)}\rvert+1)}\leq e^{o_{n\to\infty}(n)}. Using this and Lemma 7.1 we have

k(n,ε)W𝒲n(k,mn;αβ,k,k;ε)en(F(W)f(T,β)+on(1))𝟏{π𝙱W=Wn},\mathcal{E}_{k}(n,\varepsilon)\leq\sum_{W\in\mathcal{W}_{n}(k,m_{n};\alpha\beta,k,k;\varepsilon)}\!e^{n(F(W)-f(T,\beta)+o_{n\to\infty}(1))}\mathbf{1}_{\{\pi_{\mathtt{B}}W=W_{n}\}},

where the little oo is uniform over all terms in the sum. Here we use the assumption that fμ(T,β)f_{\mu}(T,\beta) is finite.

By definition of 𝒲n(k,mn)\mathcal{W}_{n}(k,m_{n}), for any W𝒲n(k,mn;αβ,k,k;ε)W\in\mathcal{W}_{n}(k,m_{n};\alpha\beta,k,k;\varepsilon) we can pick σHom(G,Sym(n))\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n)), 𝐗(𝙰B(e,k))n\mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,k)})^{n}, and 𝐲𝙱n\mathbf{y}\in\mathtt{B}^{n} so that W=Wσ,(𝐗,𝐲mn)W=W_{\sigma,(\mathbf{X},\mathbf{y}^{m_{n}})}. Then since 𝐗𝐲mn\mathbf{X}\mathbf{y}^{m_{n}} is a splitting of 𝐗𝐲k\mathbf{X}\mathbf{y}^{k}, by Lemma 3.2 we have

F(W)=F(σ,𝐗𝐲mn)F(σ,𝐗𝐲k)=F(πk,kW).F(W)=F(\sigma,\mathbf{X}\mathbf{y}^{m_{n}})\leq F(\sigma,\mathbf{X}\mathbf{y}^{k})=F(\pi_{k,k}W).

By continuity of FF, for all small enough ε\varepsilon (depending on kk) we have

F(πk,kW)F(W(αβ)k)+δ=Fμ(T,(αβ)k)+δ.F(\pi_{k,k}W)\leq F(W_{(\alpha\beta)^{k}})+\delta=F_{\mu}(T,(\alpha\beta)^{k})+\delta.

Along with the above, this implies that

k(n,ε)en(F(T,(αβ)k)f(T,β)+on(1)+δ)W𝒲n(k,mn;αβ,k,k;ε)𝟏{π𝙱W=Wn}.\mathcal{E}_{k}(n,\varepsilon)\leq e^{n(F(T,(\alpha\beta)^{k})-f(T,\beta)+o_{n}(1)+\delta)}\!\sum_{W\in\mathcal{W}_{n}(k,m_{n};\alpha\beta,k,k;\varepsilon)}\mathbf{1}_{\{\pi_{\mathtt{B}}W=W_{n}\}}.

Bounding all terms in the sum by 1, we get

k(n,ε)en(F(T,(αβ)k)fμ(T,β)+on(1)+δ)|𝒲n(k,mn;αβ,k,k;ε)|.\mathcal{E}_{k}(n,\varepsilon)\leq e^{n(F(T,(\alpha\beta)^{k})-f_{\mu}(T,\beta)+o_{n}(1)+\delta)}\,\left\lvert\mathcal{W}_{n}(k,m_{n};\alpha\beta,k,k;\varepsilon)\right\rvert.

Using Lemma 7.2 we have

|𝒲n(k,mn;αβ,k,k;ε)||𝒲n(k,mn)|nr|𝙰B(e,k)×𝙱B(e,mn)|2eon(n),\left\lvert\mathcal{W}_{n}(k,m_{n};\alpha\beta,k,k;\varepsilon)\right\rvert\leq\left\lvert\mathcal{W}_{n}(k,m_{n})\right\rvert\leq n^{r\left\lvert\mathtt{A}^{\mathrm{B}(e,k)}\times\mathtt{B}^{\mathrm{B}(e,m_{n})}\right\rvert^{2}}\leq e^{o_{n\to\infty}(n)},

so this implies

lim supn1nlogk(n,ε)Fμ(T,(αβ)k)fμ(T,β)+δ.\limsup_{n\to\infty}\frac{1}{n}\log\mathcal{E}_{k}(n,\varepsilon)\leq F_{\mu}(T,(\alpha\beta)^{k})-f_{\mu}(T,\beta)+\delta.

Taking the infimum over ε\varepsilon and kk, and using the chain rule for ff (Corollary 3.5, again using the assumption that fμ(T,β)f_{\mu}(T,\beta) is finite), gives

infε,klim supn1nlogk(n,ε)fμ(T,αβ)fμ(T,β)=fμ(T,αβ).\inf_{\varepsilon,k}\limsup_{n\to\infty}\frac{1}{n}\log\mathcal{E}_{k}(n,\varepsilon)\leq f_{\mu}(T,\alpha\beta)-f_{\mu}(T,\beta)=f_{\mu}(T,\alpha\mid\beta).

Since

inf𝒪(αβ)Gμlim supn1nlog𝔼σμn|{𝐱𝙰n:(𝐱,𝐲n)Ω(σ,𝒪)}|infεlim supn1nlogk(n,ε),\inf_{\mathcal{O}\ni(\alpha\beta)^{G}_{*}\mu}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y}_{n})\in\Omega(\sigma,\mathcal{O})\}\rvert\leq\inf_{\varepsilon}\limsup_{n\to\infty}\frac{1}{n}\log\mathcal{E}_{k}(n,\varepsilon),

for every kk, this completes the upper bound.

7.2 Lower bound

In this section we denote

𝒳k1,k2(σ,αβ,ε𝐲)\displaystyle\mathcal{X}_{k_{1},k_{2}}(\sigma,\alpha\beta,\varepsilon\mid\mathbf{y})\coloneqq{} {𝐗(𝙰B(e,k1))n:(𝐗,𝐲k2)Ω0(σ,αk1βk2,ε)}\displaystyle\{\mathbf{X}\in\big{(}\mathtt{A}^{\mathrm{B}(e,k_{1})}\big{)}^{n}\,:\,(\mathbf{X},\mathbf{y}^{k_{2}})\in\Omega_{0}^{*}(\sigma,\alpha^{k_{1}}\beta^{k_{2}},\varepsilon)\}
Ωk(σ,αβ,ε𝐲)\displaystyle\Omega_{k}^{*}(\sigma,\alpha\beta,\varepsilon\mid\mathbf{y})\coloneqq{} {𝐱𝙰n:(𝐱,𝐲)Ωk(σ,αβ,ε)}\displaystyle\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y})\in\Omega_{k}^{*}(\sigma,\alpha\beta,\varepsilon)\}
(note the dependence on nn is implicitly specified by σHom(G,Sym(n))\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n)) and 𝐲𝙱n\mathbf{y}\in\mathtt{B}^{n}), and with Σ={μn}n=1\Sigma=\{\mu_{n}\}_{n=1}^{\infty}
hΣ(μ,αβ:k,ε)\displaystyle\mathrm{h}_{\Sigma}(\mu,\alpha\mid\beta\,:\,k,\varepsilon)\coloneqq{} lim supn1nlog𝔼σμn|{𝐱𝙰n:(𝐱,𝐲)Ωk(σ,αβ,ε)}|\displaystyle\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\left\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y})\in\Omega_{k}^{*}(\sigma,\alpha\beta,\varepsilon)\}\right\rvert
=\displaystyle={} lim supn1nlog𝔼σμn|Ωk(σ,αβ,ε𝐲)|.\displaystyle\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\left\lvert\Omega_{k}^{*}(\sigma,\alpha\beta,\varepsilon\mid\mathbf{y})\right\rvert.

The following two claims are used to relate the sizes of the sets defined above.

Claim 1.

Let kmin(k1,k2)k\leq\min(k_{1},k_{2}). For any σ,𝐲\sigma,\mathbf{y} we have

πe[𝒳k1,k2(σ,αβ,ε𝐲)]Ωk(σ,αβ,cε𝐲)\pi_{e}\left[\mathcal{X}_{k_{1},k_{2}}(\sigma,\alpha\beta,\varepsilon\mid\mathbf{y})\right]\subseteq\Omega_{k}^{*}(\sigma,\alpha\beta,c\varepsilon\mid\mathbf{y})

where c=1+|B(e,k)|c=1+\lvert\mathrm{B}(e,k)\rvert.

Proof.

If (𝐗,𝐲k2)Ω0(σ,αk1βk2,ε)(\mathbf{X},\mathbf{y}^{k_{2}})\in\Omega_{0}^{*}(\sigma,\alpha^{k_{1}}\beta^{k_{2}},\varepsilon), then

πk,k(𝐗,𝐲k2)Ω0(σ,(αβ)k,ε);\pi_{k,k}(\mathbf{X},\mathbf{y}^{k_{2}})\in\Omega_{0}^{*}(\sigma,(\alpha\beta)^{k},\varepsilon);

this follows from the fact that total variation distance is nonincreasing under pushforwards. Applying Proposition 2.5, we get

(πe𝐗,𝐲)=πe(πk,k(𝐗,𝐲k2))Ωk(σ,αβ,cε).(\pi_{e}\mathbf{X},\mathbf{y})=\pi_{e}\left(\pi_{k,k}(\mathbf{X},\mathbf{y}^{k_{2}})\right)\in\Omega_{k}^{*}(\sigma,\alpha\beta,c\varepsilon).\qed
Claim 2.

Fix σ,𝐲\sigma,\mathbf{y}, and kmin(k1,k2)k\leq\min(k_{1},k_{2}). As established in the previous claim, we can consider πe\pi_{e} as a map from 𝒳k1,k2(σ,αβ,ε𝐲)\mathcal{X}_{k_{1},k_{2}}(\sigma,\alpha\beta,\varepsilon\mid\mathbf{y}) to Ωk(σ,αβ,cε𝐲)\Omega_{k}^{*}(\sigma,\alpha\beta,c\varepsilon\mid\mathbf{y}). There are constants C,dC,d independent of nn such that πe\pi_{e} is at most Cexp(ndε+nH(2|B(e,k)|ε))C\exp\big{(}nd\varepsilon+n\mathrm{H}(2\lvert\mathrm{B}(e,k)\rvert\varepsilon)\big{)}-to-one.

Proof.

If Ωk(σ,αβ,cε𝐲)\Omega_{k}^{*}(\sigma,\alpha\beta,c\varepsilon\mid\mathbf{y}) is empty, then the claim is vacuously true. Otherwise, fix 𝐱Ωk(σ,αβ,cε𝐲)\mathbf{x}\in\Omega_{k}^{*}(\sigma,\alpha\beta,c\varepsilon\mid\mathbf{y}). If 𝐗πe1{𝐱}\mathbf{X}\in\pi_{e}^{-1}\{\mathbf{x}\}, then πe(𝐗,𝐲k)=(𝐱,𝐲)\pi_{e}(\mathbf{X},\mathbf{y}^{k})=(\mathbf{x},\mathbf{y}). By Claim 3 in the proof of Proposition 3.2 of [Bow10a] the number of such pairs (𝐗,𝐲k)(\mathbf{X},\mathbf{y}^{k}), and therefore the number of such 𝐗\mathbf{X}, is bounded above by

32|𝙰×𝙱||B(e,k)|(n|B(e,k)|ε1)exp(nH(2|B(e,k)|ε))3\sqrt{2}\lvert\mathtt{A}\times\mathtt{B}\rvert^{\lvert\mathrm{B}(e,k)\rvert\big{(}n\lvert\mathrm{B}(e,k)\rvert\varepsilon-1\big{)}}\exp\big{(}n\mathrm{H}(2\lvert\mathrm{B}(e,k)\rvert\varepsilon)\big{)}

where H\mathrm{H} is the Shannon entropy. (We give more explicit constants here than in [Bow10a] to make the dependence on nn clear). ∎

Claim 2 implies that

|𝒳k1,k2(σ,αβ,ε𝐲)|Cexp(ndε+nH(2|B(e,k)|ε))|Ωk(σ,αβ,cε𝐲)|,\left\lvert\mathcal{X}_{k_{1},k_{2}}(\sigma,\alpha\beta,\varepsilon\mid\mathbf{y})\right\rvert\leq C\exp\big{(}nd\varepsilon+n\mathrm{H}(2\lvert\mathrm{B}(e,k)\rvert\varepsilon)\big{)}\cdot\left\lvert\Omega_{k}^{*}(\sigma,\alpha\beta,c\varepsilon\mid\mathbf{y})\right\rvert, (1)

where C,dC,d are independent of nn.

We now find a lower bound for the expectation of |𝒳|\lvert\mathcal{X}\rvert. Fix k1,k2k_{1},k_{2}\in\mathbb{N}, and suppose nn is large enough that mnmax(k1,k2)m_{n}\geq\max(k_{1},k_{2}). Using Proposition 5.2 and Lemma 7.2, we have

𝔼σμn\displaystyle\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}} |𝒳k1,k2(σ,αβ,ε𝐲n)|\displaystyle\lvert\mathcal{X}_{k_{1},k_{2}}(\sigma,\alpha\beta,\varepsilon\mid\mathbf{y}_{n})\rvert
=W𝒲n(k1,mn;αβ,k1,k2;ε)𝔼σμn|{𝐗(𝙰B(e,k1))n:Wσ,(𝐗,𝐲nmn)=W}|\displaystyle=\sum_{W\in\mathcal{W}_{n}(k_{1},m_{n};\alpha\beta,k_{1},k_{2};\varepsilon)}\!\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{X}\in(\mathtt{A}^{\mathrm{B}(e,k_{1})})^{n}\,:\,W_{\sigma,(\mathbf{X},\mathbf{y}_{n}^{m_{n}})}=W\}\rvert
W𝒲n(k1,mn;αβ,k1,k2;ε)exp[n(F(W)F(π𝙱W)on(1))]𝟏{π𝙱W=Wσ,𝐲nmn}\displaystyle\geq\sum_{W\in\mathcal{W}_{n}(k_{1},m_{n};\alpha\beta,k_{1},k_{2};\varepsilon)}\!\exp\!\big{[}n(F(W)-F(\pi_{\mathtt{B}}W)-o_{n}(1))\big{]}\mathbf{1}_{\{\pi_{\mathtt{B}}W=W_{\sigma,\mathbf{y}_{n}^{m_{n}}}\}}
infW𝒲n(k1,mn;αβ,k1,k2;ε)exp[n(F(W)F(π𝙱W)on(1))]\displaystyle\geq\inf_{W\in\mathcal{W}_{n}(k_{1},m_{n};\alpha\beta,k_{1},k_{2};\varepsilon)}\exp\!\big{[}n(F(W)-F(\pi_{\mathtt{B}}W)-o_{n}(1))\big{]}
×W𝒲n(k1,mn;αβ,k1,k2;ε)𝟏{π𝙱W=Wσ,𝐲nmn}\displaystyle\quad\times\sum_{W\in\mathcal{W}_{n}(k_{1},m_{n};\alpha\beta,k_{1},k_{2};\varepsilon)}\mathbf{1}_{\{\pi_{\mathtt{B}}W=W_{\sigma,\mathbf{y}_{n}^{m_{n}}}\}}

We bound the infimum below as follows: Given any W𝒲n(k1,mn;αβ,k1,k2;ε)W\in\mathcal{W}_{n}(k_{1},m_{n};\alpha\beta,k_{1},k_{2};\varepsilon), we can let 𝐗,𝐲,σ\mathbf{X},\mathbf{y},\sigma be such that W=Wσ,(𝐗,𝐲mn)W=W_{\sigma,(\mathbf{X},\mathbf{y}^{m_{n}})}. Then by Lemma 3.2 and continuity of FF

F(W)F(π𝙱W)\displaystyle F(W)-F(\pi_{\mathtt{B}}W) =F(σ,𝐗|𝐲mn)\displaystyle=F(\sigma,\mathbf{X}|\mathbf{y}^{m_{n}})
F(σ,𝐗|𝐲k2)\displaystyle\geq F(\sigma,\mathbf{X}|\mathbf{y}^{k_{2}})
=F(πk1,k2W)F(π𝙱πk1,k2W)\displaystyle=F(\pi_{k_{1},k_{2}}W)-F(\pi_{\mathtt{B}}\pi_{k_{1},k_{2}}W)
F(T,αk1|βk2)δ\displaystyle\geq F(T,\alpha^{k_{1}}|\beta^{k_{2}})-\delta

for any δ>0\delta>0 for all small enough ε\varepsilon (with “small enough” dependent only on k1,k2k_{1},k_{2}). This implies that the infimum is bounded below by

exp[n(F(T,αk1|βk2)on(1)δ)].\exp\!\big{[}n(F(T,\alpha^{k_{1}}|\beta^{k_{2}})-o_{n}(1)-\delta)\big{]}.

We bound the sum below by first rewriting it as

|{W𝒲n(k1,mn;αβ,k1,k2;ε):π𝙱W=Wσ,𝐲nmn}|.\left\lvert\{W\in\mathcal{W}_{n}(k_{1},m_{n};\alpha\beta,k_{1},k_{2};\varepsilon)\,:\,\pi_{\mathtt{B}}W=W_{\sigma,\mathbf{y}_{n}^{m_{n}}}\}\right\rvert.

The following claim, then, implies that the sum is bounded below by 1.

Claim 3.

For all large enough nn,

{W𝒲n(k1,mn;αβ,k1,k2;ε):π𝙱W=Wσ,𝐲nmn}.\big{\{}W\in\mathcal{W}_{n}(k_{1},m_{n};\alpha\beta,k_{1},k_{2};\varepsilon)\,:\,\pi_{\mathtt{B}}W=W_{\sigma,\mathbf{y}_{n}^{m_{n}}}\big{\}}\neq\varnothing.
Proof.

By Lemma 2.3, if

n>680|𝙰B(e,k1)×𝙱B(e,mn)|2r/εn>680\lvert\mathtt{A}^{\mathrm{B}(e,k_{1})}\times\mathtt{B}^{\mathrm{B}(e,m_{n})}\rvert^{2}r/\varepsilon

and d(Wσ,𝐲nmn,Wβmn)<ε530rd(W_{\sigma,\mathbf{y}_{n}^{m_{n}}},W_{\beta^{m_{n}}})<\frac{\varepsilon}{530r} then there is a (𝙰B(e,k1)×𝙱B(e,mn))(\mathtt{A}^{\mathrm{B}(e,k_{1})}\times\mathtt{B}^{\mathrm{B}(e,m_{n})})-weight WW with π𝙱W=Wσ,𝐲nmn\pi_{\mathtt{B}}W=W_{\sigma,\mathbf{y}_{n}^{m_{n}}} and d(W,Wαk1βmn)<εd(W,W_{\alpha^{k_{1}}\beta^{m_{n}}})<\varepsilon. By definition of μn\mu_{n} and Lemma 7.2, both conditions are met for all large enough nn.

The claim will follow if we show that WW is attainable.

With WW as chosen above, by Proposition 2.4 we can choose σ~Hom(G,Sym(n))\tilde{\sigma}\in\operatorname{Hom}(G,\operatorname{Sym}(n)), 𝐗~(𝙰B(e,k1))n\mathbf{\tilde{X}}\in(\mathtt{A}^{\mathrm{B}(e,k_{1})})^{n}, and 𝐘~(𝙱B(e,mn))n\mathbf{\tilde{Y}}\in(\mathtt{B}^{\mathrm{B}(e,m_{n})})^{n} such that W=Wσ~,(𝐗~,𝐘~)W=W_{\tilde{\sigma},(\mathbf{\tilde{X}},\mathbf{\tilde{Y}})}.

Let 𝐲~=πe𝐘~𝙱n\mathbf{\tilde{y}}=\pi_{e}\mathbf{\tilde{Y}}\in\mathtt{B}^{n}. To complete the proof we show that 𝐲~mn=𝐘~\mathbf{\tilde{y}}^{m_{n}}=\mathbf{\tilde{Y}}, i.e.

𝐲~(σ~(g)i)=(𝐘~(i))g\mathbf{\tilde{y}}\big{(}\tilde{\sigma}(g)i\big{)}=\left(\mathbf{\tilde{Y}}(i)\right)_{g}

for all i[n]i\in[n] and gB(e,mn)g\in\mathrm{B}(e,m_{n}). We prove this by induction on the word length |g|\lvert g\rvert.

The base case |g|=0\lvert g\rvert=0 (i.e. g=eg=e) follows immediately from the definition of 𝐲~\mathbf{\tilde{y}}.

For the inductive step, write g=htg=ht with |h|=|g|1\lvert h\rvert=\lvert g\rvert-1 and t{s1±1,,sr±1}t\in\{s_{1}^{\pm 1},\ldots,s_{r}^{\pm 1}\}. Then, assuming the result holds for hh,

𝐲~(σ~(g)i)=𝐲~(σ~(h)σ~(t)i)=(𝐘~(σ~(t)i))h.\mathbf{\tilde{y}}\big{(}\tilde{\sigma}(g)i\big{)}=\mathbf{\tilde{y}}\big{(}\tilde{\sigma}(h)\tilde{\sigma}(t)i\big{)}=\left(\mathbf{\tilde{Y}}(\tilde{\sigma}(t)i)\right)_{h}.

Now since Wσ~,𝐘~=Wσn,𝐲nmnW_{\tilde{\sigma},\mathbf{\tilde{Y}}}=W_{\sigma_{n},\mathbf{y}_{n}^{m_{n}}}, we can pick j[n]j\in[n] such that

𝐘~(i)=𝐲nmn(j)and𝐘~(σ~(t)i)=𝐲nmn(σ(t)j).\mathbf{\tilde{Y}}(i)=\mathbf{y}_{n}^{m_{n}}(j)\quad\text{and}\quad\mathbf{\tilde{Y}}(\tilde{\sigma}(t)i)=\mathbf{y}_{n}^{m_{n}}(\sigma(t)j).

This implies

(𝐘~(σ~(t)i))h=(𝐲nmn(σ(t)j))h=𝐲n(σ(g)j)=(𝐲nmn(j))g=(𝐘~(i))g.\left(\mathbf{\tilde{Y}}(\tilde{\sigma}(t)i)\right)_{h}=\big{(}\mathbf{y}_{n}^{m_{n}}(\sigma(t)j)\big{)}_{h}=\mathbf{y}_{n}(\sigma(g)j)=\big{(}\mathbf{y}_{n}^{m_{n}}(j)\big{)}_{g}=\left(\mathbf{\tilde{Y}}(i)\right)_{g}.\qed

Hence for all large enough nn we have

𝔼σμn|𝒳k1,k2(σ,αβ,ε𝐲n)|exp[n(F(T,αk1βk2)on(1)δ)],\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\mathcal{X}_{k_{1},k_{2}}(\sigma,\alpha\beta,\varepsilon\mid\mathbf{y}_{n})\rvert\geq\exp\big{[}n(F(T,\alpha^{k_{1}}\mid\beta^{k_{2}})-o_{n}(1)-\delta)\big{]},

and therefore

lim supn1nlog𝔼σμn|𝒳k1,k2(σ,αβ,ε𝐲n)|F(T,αk1βk2)δ.\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\mathcal{X}_{k_{1},k_{2}}(\sigma,\alpha\beta,\varepsilon\mid\mathbf{y}_{n})\rvert\geq F(T,\alpha^{k_{1}}\mid\beta^{k_{2}})-\delta.

Combining this lower bound with Equation (1) and the definition of hΣ(μ,αβ:k,cε)\mathrm{h}_{\Sigma}(\mu,\alpha\mid\beta\,:\,k,c\varepsilon), we get

dε+H(2|B(e,k)|ε)+hΣ(μ,αβ:k,cε)F(T,αk1βk2)δ.d\varepsilon+\mathrm{H}(2\lvert\mathrm{B}(e,k)\rvert\varepsilon)+\mathrm{h}_{\Sigma}(\mu,\alpha\mid\beta\,:\,k,c\varepsilon)\geq F(T,\alpha^{k_{1}}\mid\beta^{k_{2}})-\delta.

Taking the inf in ε\varepsilon then letting δ\delta go to zero gives

infεlim supn1nlog𝔼σμn|{𝐱𝙰n:(𝐱,𝐲n)Ωk(σ,αβ,ε)}|F(T,αk1βk2)\inf_{\varepsilon}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\left\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y}_{n})\in\Omega_{k}^{*}(\sigma,\alpha\beta,\varepsilon)\}\right\rvert\geq F(T,\alpha^{k_{1}}\mid\beta^{k_{2}})

for kmin(k1,k2)k\leq\min(k_{1},k_{2}). First take k2k_{2}\to\infty, then k1k_{1}\to\infty, then take the infimum over kk. We get

fμ(T,αβ)\displaystyle f_{\mu}(T,\alpha\mid\beta) infε,klim supn1nlog𝔼σμn|{𝐱𝙰n:(𝐱,𝐲n)Ωk(σ,αβ,ε)}|\displaystyle\leq\inf_{\varepsilon,k}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\left\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y}_{n})\in\Omega_{k}^{*}(\sigma,\alpha\beta,\varepsilon)\}\right\rvert
=inf𝒪(αβ)Gμlim supn1nlog𝔼σμn|{𝐱𝙰n:(𝐱,𝐲n)Ω(σ,𝒪)}|\displaystyle=\inf_{\mathcal{O}\ni(\alpha\beta)^{G}_{*}\mu}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,(\mathbf{x},\mathbf{y}_{n})\in\Omega(\sigma,\mathcal{O})\}\rvert

where the last line follows because the collection of pseudometrics {dk:k}\{d_{k}^{*}\,:\,k\in\mathbb{N}\} generates the weak topology on Prob((𝙰×𝙱)G)\operatorname{Prob}((\mathtt{A}\times\mathtt{B})^{G}).

8 Proof of Theorem C

By analogy with sofic entropy, we denote Σ{μn}n=1\Sigma\coloneqq\{\mu_{n}\}_{n=1}^{\infty} and denote the left-hand side of the formula in the theorem statement as hΣ(μ,α)\mathrm{h}_{\Sigma}(\mu,\alpha).

Endow Prob(𝙰G)\operatorname{Prob}(\mathtt{A}^{G}) with the metric

d(λ,ν)r=12rdB(e,r)(λ,ν).d(\lambda,\nu)\coloneqq\sum_{r=1}^{\infty}2^{-r}d^{\mathrm{B}(e,r)}(\lambda,\nu).

Note that this induces the weak* topology (where 𝙰\mathtt{A} is given the discrete topology and 𝙰G\mathtt{A}^{G} the product topology).

Writing μ𝙰=αGμProb(𝙰G)\mu_{\mathtt{A}}=\alpha^{G}_{*}\mu\in\operatorname{Prob}(\mathtt{A}^{G}), we then have

hΣ(μ,α)=infε>0lim supn1nlog𝔼σμn|{𝐱𝙰n:d(P𝐱σ,μ𝙰)<ε}|.\mathrm{h}_{\Sigma}(\mu,\alpha)=\inf_{\varepsilon>0}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,d(P_{\mathbf{x}}^{\sigma},\mu_{\mathtt{A}})<\varepsilon\}\rvert.

We will similarly denote μ𝙱=βGμProb(𝙱G)\mu_{\mathtt{B}}=\beta^{G}_{*}\mu\in\operatorname{Prob}(\mathtt{B}^{G}).

8.1 Lower bound

Let λProb((𝙰×𝙱)G)\lambda\in\operatorname{Prob}((\mathtt{A}\times\mathtt{B})^{G}) be any joining of (the shift systems with respective measures) μ𝙰\mu_{\mathtt{A}} and μ𝙱\mu_{\mathtt{B}}. Then for any 𝐱𝙰n\mathbf{x}\in\mathtt{A}^{n} and 𝐲𝙱n\mathbf{y}\in\mathtt{B}^{n} we have

d(P𝐱σ,μ𝙰)d(P(𝐱,𝐲)σ,λ),d(P_{\mathbf{x}}^{\sigma},\mu_{\mathtt{A}})\leq d(P_{(\mathbf{x},\mathbf{y})}^{\sigma},\lambda),

where dd is defined on Prob((𝙰×𝙱)G)\operatorname{Prob}((\mathtt{A}\times\mathtt{B})^{G}) analogously to the definition given on Prob(𝙰G)\operatorname{Prob}(\mathtt{A}^{G}) above. This inequality holds because total variation distance is nonincreasing under pushforwards. Consequently

hΣ(μ,α)infε>0lim supn1nlog𝔼σμn|{𝐱𝙰n:d(P(𝐱,𝐲n)σ,λ)<ε}|=fλ(S,𝚊𝚋).\mathrm{h}_{\Sigma}(\mu,\alpha)\geq\inf_{\varepsilon>0}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,d(P_{(\mathbf{x},\mathbf{y}_{n})}^{\sigma},\lambda)<\varepsilon\}\rvert=f_{\lambda}(S,\mathtt{a}\mid\mathtt{b}).

Taking the supremum over joinings λ\lambda gives the lower bound.

8.2 Upper bound

For ε>0\varepsilon>0, let

𝖩ε{λProbS((𝙰×𝙱)G):d(𝚊Gλ,μ𝙰)<ε and d(𝚋Gλ,μ𝙱)<ε}\mathsf{J}_{\varepsilon}\coloneqq\{\lambda\in\operatorname{Prob}^{S}((\mathtt{A}\times\mathtt{B})^{G})\,:\,d(\mathtt{a}^{G}_{*}\lambda,\mu_{\mathtt{A}})<\varepsilon\text{ and }d(\mathtt{b}^{G}_{*}\lambda,\mu_{\mathtt{B}})<\varepsilon\}

be the set of shift-invariant “approximate joinings” of μ𝙰\mu_{\mathtt{A}} and μ𝙱\mu_{\mathtt{B}}. Since Prob((𝙰×𝙱)G)\operatorname{Prob}((\mathtt{A}\times\mathtt{B})^{G}) is compact, for each ε>0\varepsilon>0 there exist λ1,,λm𝖩ε\lambda_{1},\ldots,\lambda_{m}\in\mathsf{J}_{\varepsilon} such that

𝖩εi=1mB(λi,ε).\mathsf{J}_{\varepsilon}\subseteq\bigcup_{i=1}^{m}\mathrm{B}(\lambda_{i},\varepsilon).

By definition of μn\mu_{n} we have σμn(d(P𝐲nσ,μ𝙱)<ε)=1\operatorname*{\mathbb{P}}_{\sigma\sim\mu_{n}}(d(P_{\mathbf{y}_{n}}^{\sigma},\mu_{\mathtt{B}})<\varepsilon)=1 for all large enough nn. Therefore

hΣ(μ,α)\displaystyle\mathrm{h}_{\Sigma}(\mu,\alpha) =infεlim supn1nlog𝔼σμn|{𝐱𝙰n:P(𝐱,𝐲n)σ𝖩ε}|\displaystyle=\inf_{\varepsilon}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,P_{(\mathbf{x},\mathbf{y}_{n})}^{\sigma}\in\mathsf{J}_{\varepsilon}\}\rvert
infεlim supn1nlogi=1m𝔼σμn|{𝐱𝙰n:P(𝐱,𝐲n)σB(λi,ε)}|\displaystyle\leq\inf_{\varepsilon}\limsup_{n\to\infty}\frac{1}{n}\log\sum_{i=1}^{m}\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,P_{(\mathbf{x},\mathbf{y}_{n})}^{\sigma}\in\mathrm{B}(\lambda_{i},\varepsilon)\}\rvert
=infεmax1imlim supn1nlog𝔼σμn|{𝐱𝙰n:P(𝐱,𝐲n)σB(λi,ε)}|\displaystyle=\inf_{\varepsilon}\max_{1\leq i\leq m}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,P_{(\mathbf{x},\mathbf{y}_{n})}^{\sigma}\in\mathrm{B}(\lambda_{i},\varepsilon)\}\rvert
infεsupλ𝖩εlim supn1nlog𝔼σμn|{𝐱𝙰n:P(𝐱,𝐲n)σB(λ,ε)}|.\displaystyle\leq\inf_{\varepsilon}\sup_{\lambda\in\mathsf{J}_{\varepsilon}}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,P_{(\mathbf{x},\mathbf{y}_{n})}^{\sigma}\in\mathrm{B}(\lambda,\varepsilon)\}\rvert.

Note that the entire expression in the inf is decreasing as ε0\varepsilon\to 0, so we may replace the inf with a limit. Rather than taking a continuous limit we write

hΣ(μ,α)limmsupλ𝖩1/mlim supn1nlog𝔼σμn|{𝐱𝙰n:P(𝐱,𝐲n)σB(λ,1/m)}|.\mathrm{h}_{\Sigma}(\mu,\alpha)\leq\lim_{m\to\infty}\sup_{\lambda\in\mathsf{J}_{1/m}}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,P_{(\mathbf{x},\mathbf{y}_{n})}^{\sigma}\in\mathrm{B}(\lambda,1/m)\}\rvert.

For each mm pick λm𝖩1/m\lambda_{m}\in\mathsf{J}_{1/m} to get within 1/m1/m of the supremum. Then the right-hand side is equal to

limmlim supn1nlog𝔼σμn|{𝐱𝙰n:P(𝐱,𝐲n)σB(λm,1/m)}|.\lim_{m\to\infty}\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,P_{(\mathbf{x},\mathbf{y}_{n})}^{\sigma}\in\mathrm{B}(\lambda_{m},1/m)\}\rvert.

Let λmj\lambda_{m_{j}} be a subsequence with weak* limit λ0\lambda_{0}. By weak* continuity of pushforwards under projection we have λ0𝖩(μ𝙰,μ𝙱)\lambda_{0}\in\mathsf{J}(\mu_{\mathtt{A}},\mu_{\mathtt{B}}). Now for any δ>0\delta>0, for all large enough jj we have both 1/mj<δ/21/m_{j}<\delta/2 and d(λmj,λ0)<δ/2d(\lambda_{m_{j}},\lambda_{0})<\delta/2, so by the triangle inequality

B(λmj,1/mj)B(λ0,δ).\mathrm{B}(\lambda_{m_{j}},1/m_{j})\subseteq\mathrm{B}(\lambda_{0},\delta).

It follows that the expression in ()(*), and hence hΣ(α)h_{\Sigma}(\alpha), is bounded above by

lim supn1nlog𝔼σμn|{𝐱𝙰n:P(𝐱,𝐲n)σB(λ0,δ)}|.\limsup_{n\to\infty}\frac{1}{n}\log\operatorname*{\mathbb{E}}_{\sigma\sim\mu_{n}}\lvert\{\mathbf{x}\in\mathtt{A}^{n}\,:\,P_{(\mathbf{x},\mathbf{y}_{n})}^{\sigma}\in\mathrm{B}(\lambda_{0},\delta)\}\rvert.

Taking the infimum over δ\delta shows that

hΣ(μ,α)fλ0(S,𝚊𝚋)supλ𝖩(μ𝙰,μ𝙱)fλ(S,𝚊𝚋).h_{\Sigma}(\mu,\alpha)\leq f_{\lambda_{0}}(S,\mathtt{a}\mid\mathtt{b})\leq\sup_{\lambda\in\mathsf{J}(\mu_{\mathtt{A}},\mu_{\mathtt{B}})}f_{\lambda}(S,\mathtt{a}\mid\mathtt{b}).

9 Proof of Proposition 1.1

All sequences of interest are of the form

μn=𝚂𝙱𝙼(σn,𝐲n,mn)=Unif({σHom(G,Sym(n)):Wσ,𝐲nmn=Wn})\mu_{n}=\mathtt{SBM}(\sigma_{n},\mathbf{y}_{n},m_{n})=\operatorname{Unif}(\{\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n))\,:\,W_{\sigma,\mathbf{y}_{n}^{m_{n}}}=W_{n}\})

with 𝐲n𝙱n\mathbf{y}_{n}\in\mathtt{B}^{n}, σnSym(n)\sigma_{n}\in\operatorname{Sym}(n), mn=o(loglogn)m_{n}=o(\log\log n), and where WnW_{n} is the 𝙱B(e,mn)\mathtt{B}^{\mathrm{B}(e,m_{n})}-weight Wσn,𝐲nmnW_{\sigma_{n},\mathbf{y}_{n}^{m_{n}}}. In the case of Theorem A we simply have mn=0m_{n}=0 for all nn.

The theorem will follow from the following:

Lemma 9.1.

Let ζn\zeta_{n} denote the uniform measure on Hom(G,Sym(n))\operatorname{Hom}(G,\operatorname{Sym}(n)). Then for any finite DGD\subset G and δ>0\delta>0 there exists ε>0\varepsilon>0 such that

σζn(σ is (D,δ)-sofic)1nεn\operatorname*{\mathbb{P}}_{\sigma\sim\zeta_{n}}(\sigma\text{ is }(D,\delta)\text{-sofic})\geq 1-n^{-\varepsilon n}

for all large enough nn. ∎

This can be proven by making superficial changes to the proof of the similar result in [Bow20].

To prove Proposition 1.1, it now suffices to show that for any ε>0\varepsilon>0

σζn(Wσ,𝐲nmn=Wn)nεn\operatorname*{\mathbb{P}}_{\sigma\sim\zeta_{n}}(W_{\sigma,\mathbf{y}_{n}^{m_{n}}}=W_{n})\geq n^{-\varepsilon n}

for all large enough nn. To do this, first note that the left-hand side here depends only on the vector p𝐲nProb(𝙱)p_{\mathbf{y}_{n}}\in\operatorname{Prob}(\mathtt{B}) of letter frequencies. Therefore

σζn(𝐲𝙱n s.t. Wσ,𝐲mn=Wn)\displaystyle\operatorname*{\mathbb{P}}_{\sigma\sim\zeta_{n}}(\exists\mathbf{y}\in\mathtt{B}^{n}\text{ s.t. }W_{\sigma,\mathbf{y}^{m_{n}}}=W_{n}) 𝐲:p𝐲=p𝐲nσζn(Wσ,𝐲mn=Wn)\displaystyle\leq\sum_{\mathbf{y}\,:\,p_{\mathbf{y}}=p_{\mathbf{y}_{n}}}\operatorname*{\mathbb{P}}_{\sigma\sim\zeta_{n}}(W_{\sigma,\mathbf{y}^{m_{n}}}=W_{n})
=exp{nH(p𝐲n)+o(n)}σζn(Wσ,𝐲nmn=Wn).\displaystyle=\exp\{n\mathrm{H}(p_{\mathbf{y}_{n}})+o(n)\}\operatorname*{\mathbb{P}}_{\sigma\sim\zeta_{n}}(W_{\sigma,\mathbf{y}_{n}^{m_{n}}}=W_{n}).

But by Proposition 2.5, if σHom(G,Sym(n))\sigma\in\operatorname{Hom}(G,\operatorname{Sym}(n)) and 𝐘(𝙱B(e,mn))n\mathbf{Y}\in(\mathtt{B}^{\mathrm{B}(e,m_{n})})^{n} are such that Wσ,𝐘=Wn=Wσn,𝐲nmnW_{\sigma,\mathbf{Y}}=W_{n}=W_{\sigma_{n},\mathbf{y}_{n}^{m_{n}}}, then the projection 𝐘e𝙱n\mathbf{Y}_{e}\in\mathtt{B}^{n} satisfies (𝐘e)mn=𝐘(\mathbf{Y}_{e})^{m_{n}}=\mathbf{Y}. Therefore for each σ\sigma

|{𝐘(𝙱B(e,mn))n:Wσ,𝐘=Wn}|=|{𝐲𝙱n:Wσ,𝐲mn=Wn}|.\left\lvert\{\mathbf{Y}\in(\mathtt{B}^{\mathrm{B}(e,m_{n})})^{n}\,:\,W_{\sigma,\mathbf{Y}}=W_{n}\}\right\rvert=\left\lvert\{\mathbf{y}\in\mathtt{B}^{n}\,:\,W_{\sigma,\mathbf{y}^{m_{n}}}=W_{n}\}\right\rvert.

Hence

𝔼σζn|{𝐘(𝙱B(e,mn))n:Wσ,𝐘=Wn}|\displaystyle\operatorname*{\mathbb{E}}_{\sigma\sim\zeta_{n}}\left\lvert\{\mathbf{Y}\in(\mathtt{B}^{\mathrm{B}(e,m_{n})})^{n}\,:\,W_{\sigma,\mathbf{Y}}=W_{n}\}\right\rvert =𝔼σζn|{𝐲𝙱n:Wσ,𝐲mn=Wn}|\displaystyle=\operatorname*{\mathbb{E}}_{\sigma\sim\zeta_{n}}\left\lvert\{\mathbf{y}\in\mathtt{B}^{n}\,:\,W_{\sigma,\mathbf{y}^{m_{n}}}=W_{n}\}\right\rvert
|𝙱|nσζn(𝐲𝙱n s.t. Wσ,𝐲mn=Wn).\displaystyle\leq\lvert\mathtt{B}\rvert^{n}\operatorname*{\mathbb{P}}_{\sigma\sim\zeta_{n}}(\exists\mathbf{y}\in\mathtt{B}^{n}\text{ s.t. }W_{\sigma,\mathbf{y}^{m_{n}}}=W_{n}).

Combining these last few statements, we see that

σζn(Wσ,𝐲nmn=Wn)exp{2nlog|𝙱|+o(n)}𝔼σζn|{𝐘(𝙱B(e,mn))n:Wσ,𝐘=Wn}|.\operatorname*{\mathbb{P}}_{\sigma\sim\zeta_{n}}(W_{\sigma,\mathbf{y}_{n}^{m_{n}}}=W_{n})\geq\exp\{-2n\log\lvert\mathtt{B}\rvert+o(n)\}\operatorname*{\mathbb{E}}_{\sigma\sim\zeta_{n}}\left\lvert\{\mathbf{Y}\in(\mathtt{B}^{\mathrm{B}(e,m_{n})})^{n}\,:\,W_{\sigma,\mathbf{Y}}=W_{n}\}\right\rvert.

We can ignore the first factor here since it only decays exponentially fast. By Proposition 5.1,

𝔼σζn|{𝐘(𝙱B(e,mn))n:Wσ,𝐘=Wn}|=Zn(Wn)(n!)r(3n)r|𝙱B(e,mn)|2eF(Wn)nn(1r)/2.\operatorname*{\mathbb{E}}_{\sigma\sim\zeta_{n}}\left\lvert\{\mathbf{Y}\in(\mathtt{B}^{\mathrm{B}(e,m_{n})})^{n}\,:\,W_{\sigma,\mathbf{Y}}=W_{n}\}\right\rvert=\frac{Z_{n}(W_{n})}{(n!)^{r}}\geq(3\sqrt{n})^{-r\lvert\mathtt{B}^{\mathrm{B}(e,m_{n})}\rvert^{2}}e^{F(W_{n})n}n^{(1-r)/2}.

The third factor is clearly not a problem and can also be ignored. For the first factor,

1nlognlog(3n)r|𝙱B(e,mn)|2=r|𝙱B(e,mn)|2nlog3nlogn0 as n\frac{1}{n\log n}\log(3\sqrt{n})^{-r\lvert\mathtt{B}^{\mathrm{B}(e,m_{n})}\rvert^{2}}=-r\frac{\lvert\mathtt{B}^{\mathrm{B}(e,m_{n})}\rvert^{2}}{n}\frac{\log 3\sqrt{n}}{\log n}\to 0\text{ as }n\to\infty

using Lemma 7.2. For the second factor, first note that by definition of F(Wn)F(W_{n}) we have

F(Wn)\displaystyle F(W_{n}) =(12r)H(Wn())+i[r]H(Wn(,;i))\displaystyle=(1-2r)\mathrm{H}\big{(}W_{n}(\cdot)\big{)}+\sum_{i\in[r]}\mathrm{H}\big{(}W_{n}(\cdot,\cdot;i)\big{)}
2rH(Wn())\displaystyle\geq-2r\mathrm{H}\big{(}W_{n}(\cdot)\big{)}
2rlog|𝙱B(e,mn)|.\displaystyle\geq-2r\log\left\lvert\mathtt{B}^{\mathrm{B}(e,m_{n})}\right\rvert.

So

1nlognlogeF(Wn)n=F(Wn)logn2rlog|𝙱B(e,mn)|logn0 as n,\frac{1}{n\log n}\log e^{F(W_{n})n}=\frac{F(W_{n})}{\log n}\geq-2r\frac{\log\left\lvert\mathtt{B}^{\mathrm{B}(e,m_{n})}\right\rvert}{\log n}\to 0\text{ as }n\to\infty,

again using Lemma 7.2. This implies that for every ε>0\varepsilon>0 we have

(3n)r|𝙱B(e,mn)|2eF(Wn)nnεn(3\sqrt{n})^{-r\lvert\mathtt{B}^{\mathrm{B}(e,m_{n})}\rvert^{2}}e^{F(W_{n})n}\geq n^{-\varepsilon n}

for all large enough nn, which implies the result.

10 Proof of Lemma 2.3

We show how to construct a denominator-nn weight W𝙰𝙱W_{\mathtt{A}\mathtt{B}} that has a given 𝙱\mathtt{B}-marginal W𝙱W_{\mathtt{B}} and is close to a given (𝙰×𝙱)(\mathtt{A}\times\mathtt{B})-weight WW whose 𝙱\mathtt{B}-marginal π𝙱W\pi_{\mathtt{B}}W is close to W𝙱W_{\mathtt{B}}. As in the theorem statement, we assume

d(π𝙱W,W𝙱)<δ.d(\pi_{\mathtt{B}}W,W_{\mathtt{B}})<\delta.

To minimize the appearance of factors of 12\frac{1}{2}, in this section we work with the 1\ell^{1} distance on weights, which is twice the distance defined above. Therefore the previous assumption becomes

d1(π𝙱W,W𝙱)=i[r]b,b𝙱|π𝙱W(b,b;i)W𝙱(b,b;i)|<2δ.d_{1}(\pi_{\mathtt{B}}W,W_{\mathtt{B}})=\sum_{i\in[r]}\sum_{b,b^{\prime}\in\mathtt{B}}\lvert\pi_{\mathtt{B}}W(b,b^{\prime};i)-W_{\mathtt{B}}(b,b^{\prime};i)\rvert<2\delta.

We fix distinguished elements a0𝙰a_{0}\in\mathtt{A} and b0𝙱b_{0}\in\mathtt{B} which will be referred to throughout this section.

10.1 The vertex measure

We first define the weight’s vertex measure by

W𝙰𝙱((a,b))\displaystyle W_{\mathtt{A}\mathtt{B}}((a,b)) =1nnW((a,b))\displaystyle=\tfrac{1}{n}\left\lfloor n\cdot W((a,b))\right\rfloor a𝙰{a0},b𝙱\displaystyle a\in\mathtt{A}\setminus\{a_{0}\},\ b\in\mathtt{B}
W𝙰𝙱((a0,b))\displaystyle W_{\mathtt{A}\mathtt{B}}((a_{0},b)) =W𝙱(b)aa0W𝙰𝙱((a,b))\displaystyle=W_{\mathtt{B}}(b)-\sum_{a\neq a_{0}}W_{\mathtt{A}\mathtt{B}}((a,b)) b𝙱.\displaystyle b\in\mathtt{B}\mathrlap{.}

See Table 1.

a0a_{0} a1a_{1} \cdots
b0b_{0} \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor
b1b_{1} \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor
\vdots \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor
Table 1: Picking entries of the vertex measure W𝙰𝙱()W_{\mathtt{A}\mathtt{B}}(\cdot). First choose entries of the form W𝙰𝙱((a,b))W_{\mathtt{A}\mathtt{B}}((a,b)) for aa0a\neq a_{0} by rounding down W((a,b))W((a,b)), then fill in the first column in a way that guarantees the correct 𝙱\mathtt{B}-marginal.

Note that |W𝙰𝙱((a,b))W((a,b))|1/n\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b))-W((a,b))\right\rvert\leq 1/n for aa0a\neq a_{0} and

|W𝙰𝙱((a0,b))W((a0,b))|\displaystyle\left\lvert W_{\mathtt{A}\mathtt{B}}((a_{0},b))-W((a_{0},b))\right\rvert |W𝙱(b)π𝙱W(b)|+|𝙰|/n.\displaystyle\leq\left\lvert W_{\mathtt{B}}(b)-\pi_{\mathtt{B}}W(b)\right\rvert+\lvert\mathtt{A}\rvert/n.

Therefore the 1\ell^{1} distance between the vertex measures is

a,b|W𝙰𝙱((a,b))W((a,b))|\displaystyle\sum_{a,b}\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b))-W((a,b))\right\rvert |𝙰||𝙱|/n+b𝙱(|W𝙱(b)π𝙱W(b)|+|𝙰|/n)\displaystyle\leq\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert/n+\sum_{b\in\mathtt{B}}\big{(}\left\lvert W_{\mathtt{B}}(b)-\pi_{\mathtt{B}}W(b)\right\rvert+\lvert\mathtt{A}\rvert/n\big{)}
2δ+2|𝙰||𝙱|/n.\displaystyle\leq 2\delta+2\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert/n.

10.1.1 Nonnegativity

The terms defined by rounding down WW using the floor function are guaranteed to be nonnegative, but the others are not. In the following we show how to repair any negativity.

Let R/n-R/n denote the sum of all negative terms in the vertex measure. Since WW contains only nonnegative terms we have

𝟏{W𝙰𝙱((a,b))<0}|W𝙰𝙱((a,b))||W𝙰𝙱((a,b))W((a,b))|for all a,b.\mathbf{1}_{\{W_{\mathtt{A}\mathtt{B}}((a,b))<0\}}\cdot\lvert W_{\mathtt{A}\mathtt{B}}((a,b))\rvert\leq\lvert W_{\mathtt{A}\mathtt{B}}((a,b))-W((a,b))\rvert\quad\text{for all }a,b.

Therefore

R/nb𝙱|W𝙰𝙱((a0,b))W((a0,b))|2δ+|𝙰||𝙱|/n.R/n\leq\sum_{b\in\mathtt{B}}\lvert W_{\mathtt{A}\mathtt{B}}((a_{0},b))-W((a_{0},b))\rvert\leq 2\delta+\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert/n.

Suppose there is some b𝙱b\in\mathtt{B} such that W𝙰𝙱((a0,b))<0W_{\mathtt{A}\mathtt{B}}((a_{0},b))<0. Since W𝙰𝙱W_{\mathtt{A}\mathtt{B}} has denominator nn, we must have W𝙰𝙱((a0,b))1/nW_{\mathtt{A}\mathtt{B}}((a_{0},b))\leq-1/n. By construction, we have

aAW𝙰𝙱((a,b))=W𝙱(b)0,\sum_{a\in A}W_{\mathtt{A}\mathtt{B}}((a,b))=W_{\mathtt{B}}(b)\geq 0,

so there exists some a+𝙰a^{+}\in\mathtt{A} with W𝙰𝙱((a+,b))1/nW_{\mathtt{A}\mathtt{B}}((a^{+},b))\geq 1/n. Increase W𝙰𝙱((a0,b))W_{\mathtt{A}\mathtt{B}}((a_{0},b)) by 1/n1/n and decrease W𝙰𝙱((a+,b))W_{\mathtt{A}\mathtt{B}}((a^{+},b)) by 1/n1/n.

The number of times we must repeat this step before all terms are nonnegative is exactly RR, and each step moves the measure by 1\ell^{1} distance 2/n2/n; therefore the final edited vertex measure is distance at most 2R/n2R/n from the original W𝙰𝙱W_{\mathtt{A}\mathtt{B}}. If we now let W𝙰𝙱W_{\mathtt{A}\mathtt{B}} denote the new, nonnegative vertex measure, by the above bound on R/nR/n we get

a,b|W𝙰𝙱((a,b))W((a,b))|6δ+4|𝙰||𝙱|/n.\sum_{a,b}\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b))-W((a,b))\right\rvert\leq 6\delta+4\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert/n.

10.2 The 𝙱\mathtt{B} half-marginal

For the purposes of this construction we use the 𝙱\mathtt{B} “half-marginal,” which we denote

W(b,(a,b);i)\displaystyle W(b,(a^{\prime},b^{\prime});i) a𝙰W((a,b),(a,b);i).\displaystyle\coloneqq\sum_{a\in\mathtt{A}}W((a,b),(a^{\prime},b^{\prime});i).

This is an element of Prob((𝙱×(𝙰×𝙱))r)\operatorname{Prob}\big{(}(\mathtt{B}\times(\mathtt{A}\times\mathtt{B}))^{r}\big{)}.

Before constructing the edge measure of W𝙰𝙱W_{\mathtt{A}\mathtt{B}}, in this section we first construct what will be its half-marginal.

For each i[r]i\in[r], b,b𝙱b,b^{\prime}\in\mathtt{B}, and a𝙰a^{\prime}\in\mathtt{A} we define

W𝙰𝙱(b,(a,b);i)\displaystyle W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i) =1nnW(b,(a,b);i)\displaystyle=\tfrac{1}{n}\left\lfloor n\cdot W(b,(a^{\prime},b^{\prime});i)\right\rfloor for aa0,bb0,\displaystyle\text{for }a^{\prime}\neq a_{0},\ b\neq b_{0}, (2)
W𝙰𝙱(b,(a0,b);i)\displaystyle W_{\mathtt{A}\mathtt{B}}(b,(a_{0},b^{\prime});i) =W𝙱(b,b;i)aa0W𝙰𝙱(b,(a,b);i)\displaystyle=W_{\mathtt{B}}(b,b^{\prime};i)-\sum_{a^{\prime}\neq a_{0}}W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i) for bb0,\displaystyle\text{for }b\neq b_{0}, (3)
W𝙰𝙱(b0,(a,b);i)\displaystyle W_{\mathtt{A}\mathtt{B}}(b_{0},(a^{\prime},b^{\prime});i) =W𝙰𝙱((a,b))bb0W𝙰𝙱(b,(a,b);i).\displaystyle=W_{\mathtt{A}\mathtt{B}}((a^{\prime},b^{\prime}))-\sum_{b\neq b_{0}}W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i). (4)

See Table 2 for a representation of which terms are defined by each equation.

(a0,b0)(a_{0},b_{0}) (a1,b0)(a_{1},b_{0}) (a2,b0)(a_{2},b_{0}) (a0,b1)(a_{0},b_{1}) (a1,b1)(a_{1},b_{1}) (a2,b1)(a_{2},b_{1}) (a0,b2)(a_{0},b_{2}) (a1,b2)(a_{1},b_{2}) (a2,b2)(a_{2},b_{2})
b0b_{0} \downarrow \downarrow \downarrow \downarrow \downarrow \downarrow \downarrow \downarrow \downarrow
b1b_{1} \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor
b2b_{2} \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor
Table 2: A diagram of how the half-marginal W𝙰𝙱(,(,);i)W_{\mathtt{A}\mathtt{B}}(\cdot,(\cdot,\cdot);i) is chosen if 𝙰={a0,a1,a2}\mathtt{A}=\{a_{0},a_{1},a_{2}\} and 𝙱={b0,b1,b2}\mathtt{B}=\{b_{0},b_{1},b_{2}\}. First obtain the entries marked \lfloor\cdot\rfloor by rounding down WW. Then choose the entries marked \rightarrow according to Equation 3 which ensures that the 𝙱\mathtt{B}-marginal is W𝙱W_{\mathtt{B}}. Then choose the entries marked \downarrow according to Equation 4 which ensures that the vertex weight is the one we chose above.

The definition of the terms in (4) ensures that

b𝙱W𝙰𝙱(b,(a,b);i)=W𝙰𝙱((a,b))for all a,b,i.\sum_{b\in\mathtt{B}}W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)=W_{\mathtt{A}\mathtt{B}}((a^{\prime},b^{\prime}))\quad\text{for all }a^{\prime},b^{\prime},i.

This will ensure that W𝙰𝙱W_{\mathtt{A}\mathtt{B}} has the correct vertex measure. Note also that by line (3)

a𝙰W𝙰𝙱(b,(a,b);i)=W𝙱(b,b;i)for all b𝙱 and b𝙱{b0}.\sum_{a^{\prime}\in\mathtt{A}}W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)=W_{\mathtt{B}}(b,b^{\prime};i)\quad\text{for all }b\in\mathtt{B}\text{ and }b^{\prime}\in\mathtt{B}\setminus\{b_{0}\}.

Using this and definition (4) we also get

a𝙰W𝙰𝙱(b0,(a,b);i)\displaystyle\sum_{a^{\prime}\in\mathtt{A}}W_{\mathtt{A}\mathtt{B}}(b_{0},(a^{\prime},b^{\prime});i) =W𝙱(b0,b;i).\displaystyle=W_{\mathtt{B}}(b_{0},b^{\prime};i).

This will ensure that the 𝙱\mathtt{B}-marginal of W𝙰𝙱W_{\mathtt{A}\mathtt{B}} is W𝙱W_{\mathtt{B}}.

We show now that the half-marginal W𝙰𝙱(,(,);i)W_{\mathtt{A}\mathtt{B}}(\cdot,(\cdot,\cdot);i) is 1\ell^{1}-close to W(,(,);i)W(\cdot,(\cdot,\cdot);i) by considering separately the contributions to the 1\ell^{1} distance from terms defined using Equations 2, 3, and 4.

  1. (2) terms:

    Each of the terms of W𝙰𝙱W_{\mathtt{A}\mathtt{B}} defined using the floor in equation (2) is distance at most 1/n1/n from the corresponding term of WW; therefore the total contribution of these terms to the 1\ell^{1} distance is

    b𝙱{b0}a𝙰{a0},b𝙱i[r]|W𝙰𝙱(b,(a,b);i)W(b,(a,b);i)|\displaystyle\sum_{\begin{subarray}{c}b\in\mathtt{B}\setminus\{b_{0}\}\\ a^{\prime}\in\mathtt{A}\setminus\{a_{0}\},b^{\prime}\in\mathtt{B}\\ i\in[r]\end{subarray}}\left\lvert W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)-W(b,(a^{\prime},b^{\prime});i)\right\rvert |𝙰||𝙱|2r/n.\displaystyle\leq\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert^{2}r/n.
  2. (3) terms:

    By the triangle inequality,

    |W𝙰𝙱(b,(a0,b);i)W(b,(a0,b);i)|\displaystyle\left\lvert W_{\mathtt{A}\mathtt{B}}(b,(a_{0},b^{\prime});i)-W(b,(a_{0},b^{\prime});i)\right\rvert
    =|(W𝙱(b,b;i)aa0W𝙰𝙱(b,(a,b);i))(π𝙱W(b,b;i)aa0W(b,(a,b);i))|\displaystyle=\left\lvert\left(W_{\mathtt{B}}(b,b^{\prime};i)-\sum_{a^{\prime}\neq a_{0}}W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)\right)-\left(\pi_{\mathtt{B}}W(b,b^{\prime};i)-\sum_{a^{\prime}\neq a_{0}}W(b,(a^{\prime},b^{\prime});i)\right)\right\rvert
    |W𝙱(b,b;i)π𝙱W(b,b;i)|+aa0|W𝙰𝙱(b,(a,b);i)W(b,(a,b);i)|.\displaystyle\leq\left\lvert W_{\mathtt{B}}(b,b^{\prime};i)-\pi_{\mathtt{B}}W(b,b^{\prime};i)\right\rvert+\sum_{a^{\prime}\neq a_{0}}\left\lvert W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)-W(b,(a^{\prime},b^{\prime});i)\right\rvert.

    The total contribution of such terms is therefore

    b𝙱{b0},b𝙱i[r]|W𝙰𝙱(b,(a0,b);i)W(b,(a0,b);i)|\displaystyle\sum_{\begin{subarray}{c}b\in\mathtt{B}\setminus\{b_{0}\},\ b^{\prime}\in\mathtt{B}\\ i\in[r]\end{subarray}}\left\lvert W_{\mathtt{A}\mathtt{B}}(b,(a_{0},b^{\prime});i)-W(b,(a_{0},b^{\prime});i)\right\rvert
    b𝙱{b0},b𝙱i[r]|W𝙱(b,b;i)(π𝙱)W(b,b;i)|d1(W𝙱,π𝙱W)\displaystyle\leq\overbrace{\sum_{\begin{subarray}{c}b\in\mathtt{B}\setminus\{b_{0}\},\ b^{\prime}\in\mathtt{B}\\ i\in[r]\end{subarray}}\left\lvert W_{\mathtt{B}}(b,b^{\prime};i)-(\pi_{\mathtt{B}})_{*}W(b,b^{\prime};i)\right\rvert}^{\leq d_{1}(W_{\mathtt{B}},\pi_{\mathtt{B}}W)}
    +b𝙱{b0}a𝙰{a0},b𝙱i[r]|W𝙰𝙱(b,(a,b);i)W(b,(a,b);i)|=contribution from (2) terms\displaystyle\qquad+\overbrace{\sum_{\begin{subarray}{c}b\in\mathtt{B}\setminus\{b_{0}\}\\ a^{\prime}\in\mathtt{A}\setminus\{a_{0}\},\ b^{\prime}\in\mathtt{B}\\ i\in[r]\end{subarray}}\left\lvert W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)-W(b,(a^{\prime},b^{\prime});i)\right\rvert}^{=\text{contribution from (\ref{eqn:halfmarg1}) terms}}
    2δ+|𝙰||𝙱|2r/n.\displaystyle\leq 2\delta+\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert^{2}r/n.
  3. (4) terms:

    Again applying the triangle inequality,

    |W𝙰𝙱(b0,(a,b);i)W(b0,(a,b);i)|\displaystyle\left\lvert W_{\mathtt{A}\mathtt{B}}(b_{0},(a,b^{\prime});i)-W(b_{0},(a,b^{\prime});i)\right\rvert
    |W𝙰𝙱((a,b))W((a,b))|+bb0|W𝙰𝙱(b,(a,b);i)W(b,(a,b);i)|.\displaystyle\quad\leq\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b^{\prime}))-W((a,b^{\prime}))\right\rvert+\sum_{b\neq b_{0}}\left\lvert W_{\mathtt{A}\mathtt{B}}(b,(a,b^{\prime});i)-W(b,(a,b^{\prime});i)\right\rvert.

    Summing over all a𝙰a\in\mathtt{A}, b𝙱b^{\prime}\in\mathtt{B} and i[r]i\in[r], we see that the total contribution of such terms is bounded by

    a𝙰,b𝙱i[r]\displaystyle\sum_{\begin{subarray}{c}a\in\mathtt{A},b^{\prime}\in\mathtt{B}\\ i\in[r]\end{subarray}} [|W𝙰𝙱((a,b))W((a,b))|+bb0|W𝙰𝙱(b,(a,b);i)W(b,(a,b);i)|]\displaystyle\left[\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b^{\prime}))-W((a,b^{\prime}))\right\rvert+\sum_{b\neq b_{0}}\left\lvert W_{\mathtt{A}\mathtt{B}}(b,(a,b^{\prime});i)-W(b,(a,b^{\prime});i)\right\rvert\right]
    =i[r]a𝙰b𝙱|W𝙰𝙱((a,b))W((a,b))|vertex measure+b𝙱{b0}a𝙰{a0},b𝙱i[r]|W𝙰𝙱(b,(a,b);i)W(b,(a,b);i)|(2) terms\displaystyle=\sum_{i\in[r]}\overbrace{\sum_{\begin{subarray}{c}a\in\mathtt{A}\\ b\in\mathtt{B}\end{subarray}}\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b))-W((a,b))\right\rvert}^{\text{vertex measure}}+\overbrace{\sum_{\begin{subarray}{c}b\in\mathtt{B}\setminus\{b_{0}\}\\ a^{\prime}\in\mathtt{A}\setminus\{a_{0}\},\ b^{\prime}\in\mathtt{B}\\ i\in[r]\end{subarray}}\left\lvert W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)-W(b,(a^{\prime},b^{\prime});i)\right\rvert}^{\text{(\ref{eqn:halfmarg1}) terms}}
    +b𝙱{b0},b𝙱i[r]|W𝙰𝙱(b,(a0,b);i)W(b,(a0,b);i)|(3) terms\displaystyle\qquad+\overbrace{\sum_{\begin{subarray}{c}b\in\mathtt{B}\setminus\{b_{0}\},\ b^{\prime}\in\mathtt{B}\\ i\in[r]\end{subarray}}\left\lvert W_{\mathtt{A}\mathtt{B}}(b,(a_{0},b^{\prime});i)-W(b,(a_{0},b^{\prime});i)\right\rvert}^{\text{(\ref{eqn:halfmarg2}) terms}}
    r[6δ+4|𝙰||𝙱|/n]+[|𝙰||𝙱|2r/n]+[2δ+|𝙰||𝙱|2r/n]\displaystyle\leq r\cdot\left[6\delta+4\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert/n\right]+\left[\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert^{2}r/n\right]+\left[2\delta+\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert^{2}r/n\right]
    8rδ+6|𝙰||𝙱|2r/n.\displaystyle\leq 8r\delta+6\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert^{2}r/n.

Adding up the contributions of the three types of terms, we see that the 1\ell^{1} distance between the half-marginals of WW and W𝙰𝙱W_{\mathtt{A}\mathtt{B}} is bounded by

10rδ+8|𝙰||𝙱|2r/n.10r\delta+8\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert^{2}r/n.

10.2.1 Nonnegativity

Again, the preceding construction does not guarantee that all terms are nonnegative. In the following we describe how to correct negativity.

Let R/n-R/n be the sum of all negative terms of the half-marginal. As above, we get

R/n10rδ+7|𝙰||𝙱|2r/n.R/n\leq 10r\delta+7\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert^{2}r/n.

Suppose there is some bBb_{-}\in B, (a,b)𝙰×𝙱(a^{\prime}_{-},b^{\prime}_{-})\in\mathtt{A}\times\mathtt{B}, and i[r]i\in[r] such that W𝙰𝙱(b,(a,b);i)<0W_{\mathtt{A}\mathtt{B}}(b_{-},(a^{\prime}_{-},b^{\prime}_{-});i)<0. Then W𝙰𝙱(b,(a,b);i)1/nW_{\mathtt{A}\mathtt{B}}(b_{-},(a^{\prime}_{-},b^{\prime}_{-});i)\leq-1/n. Since

a𝙰W𝙰𝙱(b,(a,b);i)=W𝙱(b,b;i)0\sum_{a^{\prime}\in\mathtt{A}}W_{\mathtt{A}\mathtt{B}}(b_{-},(a^{\prime},b^{\prime}_{-});i)=W_{\mathtt{B}}(b_{-},b^{\prime}_{-};i)\geq 0

and

b𝙱W𝙰𝙱(b,(a,b);i)=W𝙰𝙱((a,b))0\sum_{b\in\mathtt{B}}W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime}_{-},b^{\prime}_{-});i)=W_{\mathtt{A}\mathtt{B}}((a^{\prime}_{-},b^{\prime}_{-}))\geq 0

there exist a+𝙰a^{\prime}_{+}\in\mathtt{A} and b+𝙱b_{+}\in\mathtt{B} such that

W𝙰𝙱(b,(a+,b);i)1/nandW𝙰𝙱(b+,(a,b);i)1/n.W_{\mathtt{A}\mathtt{B}}(b_{-},(a^{\prime}_{+},b^{\prime}_{-});i)\geq 1/n\quad\text{and}\quad W_{\mathtt{A}\mathtt{B}}(b_{+},(a^{\prime}_{-},b^{\prime}_{-});i)\geq 1/n.

Decrease both of these terms by 1/n1/n, and increase both W𝙰𝙱(b,(a,b);i)W_{\mathtt{A}\mathtt{B}}(b_{-},(a^{\prime}_{-},b^{\prime}_{-});i) and W𝙰𝙱(b+,(a+,b);i)W_{\mathtt{A}\mathtt{B}}(b_{+},(a^{\prime}_{+},b^{\prime}_{-});i) by 1/n1/n. This moves the half-marginal by 1\ell^{1} distance 4/n4/n.

a𝙰W𝙰𝙱(b,(a,b);i)=W𝙱(b,b;i)andb𝙱W𝙰𝙱(b,(a,b);i)=W𝙰𝙱((a,b)).\sum_{a^{\prime}\in\mathtt{A}}W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)=W_{\mathtt{B}}(b,b^{\prime};i)\quad\text{and}\quad\sum_{b\in\mathtt{B}}W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)=W_{\mathtt{A}\mathtt{B}}((a^{\prime},b^{\prime})).

This step must be done at most RR times to eliminate all negative entries, so the final half-marginal satisfies

i[r]b𝙱(a,b)𝙰×𝙱|W𝙰𝙱(b,(a,b);i)W(b,(a,b);i)|\displaystyle\sum_{i\in[r]}\sum_{b\in\mathtt{B}}\sum_{(a^{\prime},b^{\prime})\in\mathtt{A}\times\mathtt{B}}\lvert W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)-W(b,(a^{\prime},b^{\prime});i)\rvert (10rδ+8|𝙰||𝙱|2r/n)+R4/n\displaystyle\leq(10r\delta+8\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert^{2}r/n)+R\cdot 4/n
50rδ+36|𝙰||𝙱|2r/n.\displaystyle\leq 50r\delta+36\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert^{2}r/n.

10.3 The edge measure

Finally, we define the edge measure of W𝙰𝙱W_{\mathtt{A}\mathtt{B}} by

W𝙰𝙱((a,b),(a,b);i)=1nnW((a,b),(a,b);i)for aa0 and (a,b)(a0,b0),\displaystyle\begin{split}W_{\mathtt{A}\mathtt{B}}((a,b),(a^{\prime},b^{\prime});i)&=\tfrac{1}{n}\left\lfloor n\cdot W((a,b),(a^{\prime},b^{\prime});i)\right\rfloor\\ &\hskip 85.35826pt\text{for }a\neq a_{0}\text{ and }(a^{\prime},b^{\prime})\neq(a_{0},b_{0}),\end{split} (5)
W𝙰𝙱((a0,b),(a,b);i)=W𝙰𝙱(b,(a,b);i)aa0W𝙰𝙱((a,b),(a,b);i)for (a,b)(a0,b0),\displaystyle\begin{split}W_{\mathtt{A}\mathtt{B}}((a_{0},b),(a^{\prime},b^{\prime});i)&=W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)-\sum_{a\neq a_{0}}W_{\mathtt{A}\mathtt{B}}((a,b),(a^{\prime},b^{\prime});i)\\ &\hskip 85.35826pt\text{for }(a^{\prime},b^{\prime})\neq(a_{0},b_{0}),\end{split} (6)
W𝙰𝙱((a,b),(a0,b0);i)\displaystyle W_{\mathtt{A}\mathtt{B}}((a,b),(a_{0},b_{0});i) =W𝙰𝙱((a,b))(a,b)(a0,b0)W𝙰𝙱((a,b),(a,b);i).\displaystyle=W_{\mathtt{A}\mathtt{B}}((a,b))-\sum_{(a^{\prime},b^{\prime})\neq(a_{0},b_{0})}W_{\mathtt{A}\mathtt{B}}((a,b),(a^{\prime},b^{\prime});i). (7)

See Table 3.

(a0,b0)(a_{0},b_{0}) (a1,b0)(a_{1},b_{0}) (a2,b0)(a_{2},b_{0}) (a0,b1)(a_{0},b_{1}) (a1,b1)(a_{1},b_{1}) (a2,b1)(a_{2},b_{1}) (a0,b2)(a_{0},b_{2}) (a1,b2)(a_{1},b_{2}) (a2,b2)(a_{2},b_{2})
(a0,b0)(a_{0},b_{0}) \rightarrow \downarrow \downarrow \downarrow \downarrow \downarrow \downarrow \downarrow \downarrow
(a1,b0)(a_{1},b_{0}) \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor
(a2,b0)(a_{2},b_{0}) \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor
(a0,b1)(a_{0},b_{1}) \rightarrow \downarrow \downarrow \downarrow \downarrow \downarrow \downarrow \downarrow \downarrow
(a1,b1)(a_{1},b_{1}) \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor
(a2,b1)(a_{2},b_{1}) \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor
(a0,b2)(a_{0},b_{2}) \rightarrow \downarrow \downarrow \downarrow \downarrow \downarrow \downarrow \downarrow \downarrow
(a1,b2)(a_{1},b_{2}) \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor
(a2,b2)(a_{2},b_{2}) \rightarrow \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor \lfloor\cdot\rfloor
Table 3: A diagram of how the edge measure W𝙰𝙱((,),(,);i)W_{\mathtt{A}\mathtt{B}}((\cdot,\cdot),(\cdot,\cdot);i) is chosen if 𝙰={a0,a1,a2}\mathtt{A}=\{a_{0},a_{1},a_{2}\} and 𝙱={b0,b1,b2}\mathtt{B}=\{b_{0},b_{1},b_{2}\}. First obtain the entries marked \lfloor\cdot\rfloor by rounding down entries of WW. Then choose entries marked \downarrow according to Equation 6, which ensures that the 𝙱\mathtt{B} half-marginal is the one chosen above. Then choose entries marked \rightarrow according to Equation 7, which ensures that the vertex measure is the one chosen above.

It follows from this definition that W𝙰𝙱W_{\mathtt{A}\mathtt{B}} is a (signed) weight with 𝙱\mathtt{B}-marginal W𝙱W_{\mathtt{B}}.

We now check that W𝙰𝙱W_{\mathtt{A}\mathtt{B}} is 1\ell^{1}-close to WW. We consider separately the contribution to the 1\ell^{1} distance of terms defined in equations (5), (6), and (7):

  1. (5) terms:

    Each term of W𝙰𝙱W_{\mathtt{A}\mathtt{B}} defined using the floor function in equation (5) is distance at most 1/n1/n from the corresponding WW term. The total contribution of these terms to the 1\ell^{1} distance is therefore at most |𝙰|2|𝙱|2r/n\lvert\mathtt{A}\rvert^{2}\lvert\mathtt{B}\rvert^{2}r/n.

  2. (6) terms:

    Applying the triangle inequality to terms defined in equation (6),

    |W𝙰𝙱((a0,b),(a,b);i)W((a0,b),(a,b);i)|\displaystyle\left\lvert W_{\mathtt{A}\mathtt{B}}((a_{0},b),(a^{\prime},b^{\prime});i)-W((a_{0},b),(a^{\prime},b^{\prime});i)\right\rvert
    |W𝙰𝙱(b,(a,b);i)W(b,(a,b);i)|\displaystyle\leq\left\lvert W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)-W(b,(a^{\prime},b^{\prime});i)\right\rvert
    +aa0|W𝙰𝙱((a,b),(a,b);i)W((a,b),(a,b);i)|\displaystyle\qquad+\sum_{a\neq a_{0}}\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b),(a^{\prime},b^{\prime});i)-W((a,b),(a^{\prime},b^{\prime});i)\right\rvert
    |W𝙰𝙱(b,(a,b);i)W(b,(a,b);i)|+|𝙰|/n.\displaystyle\leq\left\lvert W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)-W(b,(a^{\prime},b^{\prime});i)\right\rvert+\lvert\mathtt{A}\rvert/n.

    By the 1\ell^{1} bound on the distance between the half-marginals, the total contribution of all such terms is therefore

    i[r]b(a,b)(a0,b0)(|W𝙰𝙱(b,(a,b);i)W(b,(a,b);i)|+|𝙰|/n)\displaystyle\sum_{i\in[r]}\sum_{b}\sum_{(a^{\prime},b^{\prime})\neq(a_{0},b_{0})}\left(\left\lvert W_{\mathtt{A}\mathtt{B}}(b,(a^{\prime},b^{\prime});i)-W(b,(a^{\prime},b^{\prime});i)\right\rvert+\lvert\mathtt{A}\rvert/n\right)
    [50rδ+36|𝙰|2|𝙱|2r/n]+|𝙰|2|𝙱|2r/n\displaystyle\leq[50r\delta+36\lvert\mathtt{A}\rvert^{2}\lvert\mathtt{B}\rvert^{2}r/n]+\lvert\mathtt{A}\rvert^{2}\lvert\mathtt{B}\rvert^{2}r/n
    =50rδ+37|𝙰|2|𝙱|2r/n\displaystyle=50r\delta+37\lvert\mathtt{A}\rvert^{2}\lvert\mathtt{B}\rvert^{2}r/n
  3. (7) terms:

    Applying the triangle inequality to terms defined in equation (7):

    |W𝙰𝙱((a,b),(a0,b0);i)W𝙰𝙱((a,b),(a0,b0);i)|\displaystyle\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b),(a_{0},b_{0});i)-W_{\mathtt{A}\mathtt{B}}((a,b),(a_{0},b_{0});i)\right\rvert
    |W𝙰𝙱((a,b))W((a,b))|+(a,b)(a0,b0)|W𝙰𝙱((a,b),(a,b);i)W((a,b),(a,b);i)|.\displaystyle\leq\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b))-W((a,b))\right\rvert+\sum_{(a^{\prime},b^{\prime})\neq(a_{0},b_{0})}\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b),(a^{\prime},b^{\prime});i)-W((a,b),(a^{\prime},b^{\prime});i)\right\rvert.

    Therefore the total contribution of all such terms is

    i[r]a,b|W𝙰𝙱((a,b),(a0,b0);i)W𝙰𝙱((a,b),(a0,b0);i)|\displaystyle\sum_{i\in[r]}\sum_{a,b}\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b),(a_{0},b_{0});i)-W_{\mathtt{A}\mathtt{B}}((a,b),(a_{0},b_{0});i)\right\rvert
    =i[r]a,b[|W𝙰𝙱((a,b))W((a,b))|\displaystyle=\sum_{i\in[r]}\sum_{a,b}\Bigg{[}\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b))-W((a,b))\right\rvert
    +(a,b)(a0,b0)|W𝙰𝙱((a,b),(a,b);i)W((a,b),(a,b);i)|]\displaystyle\qquad+\sum_{(a^{\prime},b^{\prime})\neq(a_{0},b_{0})}\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b),(a^{\prime},b^{\prime});i)-W((a,b),(a^{\prime},b^{\prime});i)\right\rvert\Bigg{]}
    =i[r]a,b|W𝙰𝙱((a,b))W((a,b))|vertex measure\displaystyle=\overbrace{\sum_{i\in[r]}\sum_{a,b}\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b))-W((a,b))\right\rvert}^{\text{vertex measure}}
    +i[r]aa0b(a,b)(a0,b0)|W𝙰𝙱((a,b),(a,b);i)W((a,b),(a,b);i)|(5) terms\displaystyle\qquad+\overbrace{\sum_{i\in[r]}\sum_{a\neq a_{0}}\sum_{b}\sum_{(a^{\prime},b^{\prime})\neq(a_{0},b_{0})}\left\lvert W_{\mathtt{A}\mathtt{B}}((a,b),(a^{\prime},b^{\prime});i)-W((a,b),(a^{\prime},b^{\prime});i)\right\rvert}^{\text{\eqref{eqn:fullmarg1} terms}}
    +i[r]b(a,b)(a0,b0)|W𝙰𝙱((a0,b),(a,b);i)W((a0,b),(a,b);i)|(6) terms]\displaystyle\qquad+\overbrace{\sum_{i\in[r]}\sum_{b}\sum_{(a^{\prime},b^{\prime})\neq(a_{0},b_{0})}\left\lvert W_{\mathtt{A}\mathtt{B}}((a_{0},b),(a^{\prime},b^{\prime});i)-W((a_{0},b),(a^{\prime},b^{\prime});i)\right\rvert}^{\text{\eqref{eqn:fullmarg2} terms}}\Bigg{]}
    r[6δ+3|𝙰||𝙱|/n]+[|𝙰|2|𝙱|2r/n]+[50rδ+37|𝙰|2|𝙱|2r/n]\displaystyle\leq r\cdot\left[6\delta+3\lvert\mathtt{A}\rvert\lvert\mathtt{B}\rvert/n\right]+\left[\lvert\mathtt{A}\rvert^{2}\lvert\mathtt{B}\rvert^{2}r/n\right]+\left[50r\delta+37\lvert\mathtt{A}\rvert^{2}\lvert\mathtt{B}\rvert^{2}r/n\right]
    56rδ+41|𝙰|2|𝙱|2r/n.\displaystyle\leq 56r\delta+41\lvert\mathtt{A}\rvert^{2}\lvert\mathtt{B}\rvert^{2}r/n.

Summing up the contributions from terms of all three types, we get that

d1(W𝙰𝙱,W)106rδ+79|𝙰|2|𝙱|2r/n.d_{1}(W_{\mathtt{A}\mathtt{B}},W)\leq 106r\delta+79\lvert\mathtt{A}\rvert^{2}\lvert\mathtt{B}\rvert^{2}r/n.

10.3.1 Nonnegativity

We can modify a solution with negative entries to get a nonnegative one similarly to above. Let R/n-R/n be the sum of all negative entries; then

R/n106rδ+78|𝙰|2|𝙱|2r/n.R/n\leq 106r\delta+78\lvert\mathtt{A}\rvert^{2}\lvert\mathtt{B}\rvert^{2}r/n.

Suppose there is some entry

W𝙰𝙱((a,b),(a,b);i)1/n.W_{\mathtt{A}\mathtt{B}}((a_{-},b_{-}),(a^{\prime}_{-},b^{\prime}_{-});i)\leq-1/n.

We want to increment this term by 1/n1/n without affecting the vertex measure or the 𝙱\mathtt{B} marginal. Since

(a,b)𝙰×𝙱W𝙰𝙱((a,b),(a,b);i)=W𝙰𝙱((a,b))0\sum_{(a^{\prime},b^{\prime})\in\mathtt{A}\times\mathtt{B}}W_{\mathtt{A}\mathtt{B}}((a_{-},b_{-}),(a^{\prime},b^{\prime});i)=W_{\mathtt{A}\mathtt{B}}((a_{-},b_{-}))\geq 0

there exists some (a+,b+)𝙰×𝙱(a^{\prime}_{+},b^{\prime}_{+})\in\mathtt{A}\times\mathtt{B} such that W𝙰𝙱((a,b),(a+,b+);i)1/nW_{\mathtt{A}\mathtt{B}}((a_{-},b_{-}),(a^{\prime}_{+},b^{\prime}_{+});i)\geq 1/n; similarly since

aAW𝙰𝙱((a,b),(a,b);i)=W𝙰𝙱(b,(a,b);i)0\sum_{a\in A}W_{\mathtt{A}\mathtt{B}}((a,b_{-}),(a^{\prime},b^{\prime}_{-});i)=W_{\mathtt{A}\mathtt{B}}(b_{-},(a^{\prime}_{-},b^{\prime}_{-});i)\geq 0

there exists some a+a_{+} such that W𝙰𝙱((a+,b),(a,b);i)1/nW_{\mathtt{A}\mathtt{B}}((a_{+},b_{-}),(a^{\prime}_{-},b^{\prime}_{-});i)\geq 1/n. Increase

W𝙰𝙱((a,b),(a,b);i)andW𝙰𝙱((a+,b),(a+,b+);i)W_{\mathtt{A}\mathtt{B}}((a_{-},b_{-}),(a^{\prime}_{-},b^{\prime}_{-});i)\quad\text{and}\quad W_{\mathtt{A}\mathtt{B}}((a_{+},b_{-}),(a^{\prime}_{+},b^{\prime}_{+});i)

by 1/n1/n, and decrease

W𝙰𝙱((a,b),(a+,b+);i)andW𝙰𝙱((a+,b),(a,b);i)W_{\mathtt{A}\mathtt{B}}((a_{-},b_{-}),(a^{\prime}_{+},b^{\prime}_{+});i)\quad\text{and}\quad W_{\mathtt{A}\mathtt{B}}((a_{+},b_{-}),(a^{\prime}_{-},b^{\prime}_{-});i)

by 1/n1/n. This moves the weight by 1\ell^{1} distance 4/n4/n.

Since RR is the maximum number of times we need to do this before there are no more negative entries, the final weight satisfies

d1(W𝙰𝙱,W)106rδ+79|𝙰|2|𝙱|2r/n+4R/n530rδ+391|𝙰|2|𝙱|2r/n.d_{1}(W_{\mathtt{A}\mathtt{B}},W)\leq 106r\delta+79\lvert\mathtt{A}\rvert^{2}\lvert\mathtt{B}\rvert^{2}r/n+4R/n\leq 530r\delta+391\lvert\mathtt{A}\rvert^{2}\lvert\mathtt{B}\rvert^{2}r/n.

To simplify, we write

d1(W𝙰𝙱,W)530r(δ+|𝙰×𝙱|2/n),d_{1}(W_{\mathtt{A}\mathtt{B}},W)\leq 530r(\delta+\lvert\mathtt{A}\times\mathtt{B}\rvert^{2}/n),

or

d(W𝙰𝙱,W)265r(δ+|𝙰×𝙱|2/n).d(W_{\mathtt{A}\mathtt{B}},W)\leq 265r(\delta+\lvert\mathtt{A}\times\mathtt{B}\rvert^{2}/n).

References

  • [BG13] Lewis Bowen and Yonatan Gutman. Nonabelian free group actions: Markov processes, the Abramov–Rohlin formula and Yuzvinskii’s formula – CORRIGENDUM. Ergodic Theory and Dynamical Systems, 33(2):643–645, April 2013.
  • [Bow10a] Lewis Bowen. The ergodic theory of free group actions: Entropy and the f-invariant. Groups, Geometry, and Dynamics, pages 419–432, 2010.
  • [Bow10b] Lewis Bowen. A measure-conjugacy invariant for free group actions. Annals of Mathematics, 171(2):1387–1400, March 2010.
  • [Bow10c] Lewis Bowen. Measure conjugacy invariants for actions of countable sofic groups. Journal of the American Mathematical Society, 23(1):217–217, January 2010.
  • [Bow10d] Lewis Bowen. Non-abelian free group actions: Markov processes, the Abramov–Rohlin formula and Yuzvinskii’s formula. Ergodic Theory and Dynamical Systems, 30(6):1629–1663, December 2010.
  • [Bow20] Lewis Bowen. Sofic homological invariants and the Weak Pinsker Property. arXiv:1807.08191 [math], February 2020.
  • [CT06] T. M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley-Interscience, Hoboken, N.J, 2nd ed edition, 2006.
  • [Hay16] Ben Hayes. Relative entropy and the Pinsker product formula for sofic groups. arXiv:1605.01747 [math], May 2016.
  • [OW87] Donald S. Ornstein and Benjamin Weiss. Entropy and isomorphism theorems for actions of amenable groups. Journal d’Analyse Mathématique, 48(1):1–141, December 1987.