The relative -invariant and non-uniform random sofic approximations
Abstract
The -invariant is an isomorphism invariant of free-group measure-preserving actions introduced by Lewis Bowen in [Bow10b], where it was used to show that two finite-entropy Bernoulli shifts over a finitely generated free group can be isomorphic only if their base measures have the same Shannon entropy. In [Bow10a] Bowen showed that the -invariant is a variant of sofic entropy; in particular it is the exponential growth rate of the expected number of good models over a uniform random homomorphism.
In this paper we present an analogous formula for the relative -invariant and use it to prove a formula for the exponential growth rate of the expected number of good models over a random sofic approximation which is a type of stochastic block model.
1 Introduction, Main Results
Let denote the the rank- free group with generating set and identity , and let be a measure-preserving -system, i.e. is a homomorphism from to the automorphism group of the standard probability space . We will not need to make explicit use of the -algebra on , so we leave it unnamed.
An observable on is a measurable map with domain . In this paper the codomain will be a finite set endowed with the discrete sigma algebra; in this case we call the map a finite observable and the codomain an alphabet.
Any observable induces a map by setting
The -coloring of is sometimes called the itinerary of , since it records the observations that will be made over the entire orbit of under the action of . We also similarly define the map for any subset of . We abbreviate , where is the closed ball of radius centered at the identity in , which is endowed with the word-length metric. If is a second finite observable, we denote by the map .
The (Shannon) entropy of a finite observable is defined by
where is the pushforward measure; we take the convention . The entropy of can be interpreted as the expected amount of information revealed by observing , assuming its distribution is known.
An early application of Shannon’s entropy to ergodic theory was its use by Kolmogorov and Sinai to show that there exist nonisomorphic Bernoulli shifts over . A Bernoulli shift over is a system of the form for some alphabet and ; is the shift action of . They did this by defining an entropy rate for -systems, which can be interpreted as the average information per unit time revealed by observing the system. For a Bernoulli shift , the entropy rate is simply the “base entropy” , where is the “time zero” observable.
Isomorphism invariance of the KS entropy rate is typically proven using the fact that entropy rate is nonincreasing under factor maps (which are surjective homomorphisms of measure-preserving systems). This fact can be interpreted as stating that a system cannot simulate another system that is “more random.”
The entropy rate was soon generalized to systems acted on by an arbitrary amenable group (such as ). Extending beyond amenable groups proved more difficult, and in fact it was found to be impossible for such an extension to preserve all desirable properties of the KS entropy rate. In particular, an entropy rate for nonamenable group which assigns Bernoulli shifts their base entropy cannot be nonincreasing under factor maps [OW87, Appendix C].
The first invariant to distinguish between Bernoulli shifts over free groups is Lewis Bowen’s -invariant. Following [Bow10a], this can be defined by
The main theorem of [Bow10b] is that depends on the observable only through the -algebra it generates. In particular, the common value of among all which generate the Borel -algebra on (assuming such exist) is a measure-conjugacy invariant of the system . In the same paper, he showed that the -invariant of a Bernoulli shift is the Shannon entropy of the base measure; in particular, Bernoulli shifts with different base entropies are nonisomorphic.
In [Bow10a], Bowen gave an alternate formula for the -invariant, which we now introduce.
For any homomorphism we have a -system , and we can consider a labeling as an observable on this system. We denote the law of its itinerary by and call this the empirical distribution of . We say that is a good model for over if it is difficult to distinguish the -systems and via their respective observables and . To make this precise, we denote
which is a set of good models for over if is a weak∗-open neighborhood of ; the particular set quantifies how good the models are. The alphabet is given the discrete topology and the product topology, so “weak∗-close” means marginals on some finite sets are close in total variation norm.
For each , let . Bowen showed in [Bow10a] that the -invariant is given by
To make an analogy with statistical physics, we can think of as a macroscopic statistical distribution of the state of a system; then the -invariant is the exponential growth rate of the number of “microstates” that are consistent with these statistics. What we here call good models are often called microstates for this reason.
If is a second observable, the conditional entropy is
This can be interpreted as the expected amount of information revealed by observing if both the value of and the joint distribution of and are known. By analogy we define
Both the infimum and supremum can be replaced by limits; this follows from Lemma 3.2 below. It follows from Corollary 3.5 that we could also directly define
as long as .
A few more definitions are required to state our main theorems. If is a finite subset of , we denote by the total variation distance between the marginals of and on . Our convention for the total variation distance between measures is
For each we define a pseudometric on by
Note that together generate the weak∗ topology on . These generalize the almost-pseudometric111Bowen’s is essentially a pseudometric except that its first and second arguments come from different sets. from [Bow10a], which corresponds to the case . For we write
In the present paper, instead of picking a homomorphism uniformly at random we will use the following type of stochastic block model: given , , and , let
The labeling partitions the elements of into communities, and we can think of the random homomorphism as a random choice of directed edges between and within the communities. Certain statistics of these random edge choices are determined by the reference homomorphism ; note that for these statistics are more precise than those specified by a standard stochastic block model. In Section 2 we define weights, which are the objects used to record the relevant statistics.
We first prove our formula for the relative -invariant under a Markov assumption: in this case, our stochastic block model only needs to take into account “one-step statistics.”
Theorem A.
Let and be finite observables, and for each let and be such that
Suppose that is a Markov measure. With , we have
Proposition A.
The assumptions of Theorem A are nonvacuous; that is, for any finite observable there exist sequences and such that .
If is not Markov, then the same formula holds with a more precise type of stochastic block model:
Theorem B.
Let and be finite observables. Let approach infinity as goes to infinity while satisfying . For each let and be such that
Suppose that . With ,
Proposition B.
The assumptions of Theorem B are nonvacuous; that is, for any finite observable and any sequence approaching infinity while satisfying , there exist sequences and such that .
The expressions appearing on the right-hand sides of Theorems A and B are very closely related to Ben Hayes’ definition of “relative sofic entropy in the presence” [Hay16, Definition 2.5]. Some differences are that we consider expected numbers of good models over random sofic approximations, and that Hayes takes a supremum inside the logarithm over which good model is to be extended, while we fix a sequence of planted good models. Hayes also does not restrict to shift systems as we do here.
Using Theorem B we prove the following formula for the growth rate of the expected number of good models over a homomorphism drawn from a stochastic block model:
Theorem C.
Let be as in the statement of Theorem B. Then
Here is the set of joinings of the -systems and , i.e. shift-invariant probability measures on whose marginals are , respectively. denotes the shift action of . We use to denote the maps
which observe the (resp. ) label at the identity.
Note: the supremum is always greater than or equal to , with equality attained by the product joining; this means that the expected number of good models for over a block model with built-in good models for any is at least the expected number of good models over a uniformly random homomorphism. It is possible for the supremum to be strictly larger, however. For example, suppose and , and let be the diagonal joining. Then
1.1 Random sofic approximations
The -invariant is closely related to another invariant of measure-preserving systems called sofic entropy, which was introduced by Lewis Bowen in [Bow10c].
A homomorphism is called -sofic for some finite and if
A sequence of homomorphisms is called a sofic approximation if for every the homomorphism is -sofic for all large enough .
The sofic entropy relative to is the exponential growth rate of the number of good models over . Specifically, if there is some finite observable which generates the Borel -algebra on then we have
By analogy with this expression, we might call the sequences of random homomophisms appearing in expressions above “random sofic approximations.” The following proposition provides further justification for this terminology.
Proposition 1.1.
In particular, if , etc. are independent then is a sofic approximation with probability 1.
Organization
In Section 2 we define weights and discuss some of their useful properties. In Section 3 we prove a few basic results about the functions and . Some of the results of these two sections are used in Section 4 to show that the assumptions of the main theorems are not vacuous. In Section 5 we show how the function is related to the number of homomorphism-labeling pairs that realize a given weight, which is the main ingredient of the proofs of Theorems A and B given in the next two sections. In Section 8 we show how to deduce Theorem C from Theorem B. Section 9 contains a proof of Proposition 1.1. The final section contains a proof of Lemma 2.3, which asserts that a weight can be approximated by a denominator- weight with a specified marginal.
Acknowledgements
Thanks to Tim Austin for suggesting that theorems similar to B and C should hold, for many helpful discussions, and for providing comments on an earlier draft. Thanks also to Ben Hayes for sharing helpful references.
This material is based upon work supported by the National Science Foundation under Grant No. DMS-1855694.
2 Weights
If is a finite observable, for and let
and also denote
For and let
and .
More abstractly, any is called an -weight if
for all and . For each we denote this common value . Note that the objects and defined above satisfy this condition.
We say that has denominator if for all .
The measures for are called the edge measures of , and is called the vertex measure.
For any alphabet , we use the metric on -weights defined by
We can use weights to count good models up to equivalence under the pseudometrics using the following proposition:
Proposition 2.1.
If and , then for any observable
Note this implies also that
Proof.
By definition of the distance between weights,
For many ‘incompatible’ pairs , both terms will be zero: suppose , so that . If the second term in the absolute value is nonzero, then for some we have and , and therefore
The same argument shows that for all whenever the first term is nonzero. Therefore we can restrict the sum to pairs with for all . Equivalently, we can sum over all to get
It will be useful to consider the pushforward map induced by a map between alphabets: if is a measurable map and is an -weight, then is the -weight given by
Note that this implies that the vertex measure of is
For example, let be the projection map. If is an -weight then is given by
We call this the -marginal of .
All weights in the present paper will be over alphabets of the form . We use this fact to introduce some simplified notation for projections:
-
•
denotes projection onto the entire factor ; is used similarly.
-
•
For and , denotes projection onto .
-
•
denotes the projection , except that if we write .
We define for an abstract weight by
where is the Shannon entropy. Note that this is consistent with the above definitions in that, for example,
We can revisit the definition of our version of the stochastic block model using weights: Let and let be a denominator- -weight. Suppose there exist and such that . Then
so we can also denote this distribution by . Specifying the distribution by a weight rather than a specific homomorphism will occasionally be more convenient.
2.1 Constructing Weights and Good Models
We borrow the first result of this type from [Bow10a]; it allows us to find a denominator- approximation to a given weight.
Lemma 2.2 (Lemma 2.3 of [Bow10a]).
There is a constant such that for any -weight there is a denominator- -weight within distance of .
The following lemma allows us not only to construct a denominator- approximation to a given weight, but also to specify a marginal of this approximation:
Lemma 2.3.
Let be an -weight. If is a -weight of denominator with then there is an -weight with denominator such that and .
The construction is fairly involved, so is postponed to Section 10. The constant 265 is not intended to be optimal.
The definition of a weight in terms of a homomorphism and a labeling is straightforward. However, we will also need to know whether a given weight can be realized in this way. The next two results address this inverse problem.
Proposition 2.4.
If is a denominator- -weight, then there exist and such that .
Proof.
This is implied by Proposition 2.1 of [Bow10a]. ∎
Unfortunately, this does not imply that for every denominator- -weight there is some and such that ; instead it provides such that .
However, if we already know that is close to a weight of the form for some observable , then the following proposition shows that is also close to a weight of the form .
Proposition 2.5.
Let , , and be such that for some . Writing , we have
An immediate consequence is that implies where ; cf. Claim 2 in the proof of Proposition 3.2 of [Bow10a].
Proof.
3 Properties of and
Lemma 3.1 (Continuity as weight function).
If are -weights with then
where denotes the entropy of the probability measure .
Proof.
We use Fano’s inequality in the following form (Equation (2.139) of [CT06]): suppose are -valued random variables defined on the same probability space and let be their probability of disagreement. Then
Using the chain rule and nonnegativity of Shannon entropy, we can deduce that
Let be the respective distributions of . Because is the minimum value of over all possible couplings, if then
The assumed bound implies that each vertex and edge measure of is within total variation distance of its counterpart in , so
Let and be observables. We say that is a coarsening of if each part of the partition of induced by is a union of parts of the partition induced by (up to null sets). Equivalently, there is some function such that almost surely. In this situation we can also call a refinement of .
A useful property of the Shannon entropy is monotonicity under refinement. The function does not share this property, but it is monotone under the following particular kind of refinement introduced in [Bow10b]:
We say that is a simple splitting of if there is some and a coarsening of such that, up to null sets, the partition induced by is the coarsest common refinement of the partitions induced by and .
We say that is a splitting of if there are observables such that is a simple splitting of for . We will use the following monotonicity properties of the relative version of :
Lemma 3.2 (Monotonicity under splitting).
-
1.
If is a splitting of then .
-
2.
If is a splitting of then .
Proof.
-
1.
This is essentially Proposition 5.1 of [Bow10b]; conditioning on makes no difference to the proof.
-
2.
The proof is based on the proof of Part 1, but in place of the chain rule for conditional entropy we use the following bound:
(monotonicity) (chain rule) We will also use the following consequence of the previous bound:
(previous bound) (subadditivity) It suffices to check the case where is a simple splitting of : let and let be a coarsening of such that the partition induced by is the same as the coarsest common refinement of the partitions induced by and up to null sets. Then, using the two bounds just derived,
But
so we can remove the term from the sum to get
One corollary is the following convenient formula:
Corollary 3.3.
Let be finite observables such that is a Markov measure. Then is independent of . In particular,
Proof.
By the previous proposition, for any we have
On the other hand, by Theorem 6.1 of [Bow10d] so
Applying monotonicity under splitting to the first term on the right gives
This establishes independence of ; the formula for follows. ∎
Proposition 3.4.
Let be finite observables. Then for any ,
It follows that
Proof.
By Lemma 3.2, . Using elementary properties of Shannon entropy, we have
By -invariance of we have
so the first inequality follows.
For any this gives
so the second inequality follows upon taking the supremum over then the infimum over . ∎
We can use this bound to give a proof of the chain rule for the relative -invariant, a version of which first appeared in [Bow10d] (there it is called the Abramov-Rokhlin formula; see also [BG13]):
Corollary 3.5 (Chain rule).
Proof.
By definition of the relative version of and the chain rule for conditional entropy, for each we have
By Lemma 3.2 each term is monotone in , so the limits as exist. By Proposition 3.4 all terms are bounded above (recall we only consider finite observables, so in particular all observables have finite entropy), so we can split the limit across the sum on the right to get
Taking to infinity gives the result. ∎
4 Non-vacuity of Main Theorems
4.1 Theorem A
4.2 Theorems B and C
Here we prove Proposition B, which asserts the nonvacuity of Theorem B (and by extension Theorem C, since the assumptions are the same).
Let approach infinity as approaches infinity while satisfying and let be a finite observable. We need to show that there exist and such that .
5 Counting Lemmas
For a -weight , let denote the number of pairs such that .
Proposition 5.1.
If is a -weight with denominator then
Proof.
We write
where denotes the expectation over a uniform choice of .
Proposition 2.1 of [Bow10a] states that
Lemma 2.2 of the same paper gives an estimate of this quantity, but for our purposes we need to be more careful about how the estimate depends on the size of the alphabet.
We use the version of Stirling’s approximation
valid for . To estimate the products that appear in the expectation, we will need to omit all factors which equal since Stirling’s approximation is not valid for these. To do this carefully, let
and for each let
For the numerator of the above expectation we get
and a lower bound which is identical except missing the first factor. For the denominator, let . We get
and again we have a lower bound which is identical except missing the first factor . Therefore the quotient is bounded above by
and below by
Since has denominator , we have
and
Therefore satisfies
Since and , we conclude that
and the stated inequality follows. ∎
The following proposition establishes the connection between the relative version of and expected numbers of good models over stochastic block models.
Proposition 5.2.
Given any denominator- -weight , let denote the -weight . Let be a fixed labeling with , and let
assuming is such that the desired support is nonempty. Then
In particular,
Lemma 5.3.
Let be a weight of denominator . Then
Proof.
Suppose ; we then need to show
The inequality is clear, since we have an injection .
The converse inequality holds because in an injection from the set on the right to the set on the left. This follows from the remark at the beginning of the proof of Proposition 2.5. ∎
Proof of Proposition.
Let
then, since is independent of the choice of with ,
(previous lemma) | ||||
Note that our assumption that the intended support of is nonempty allows us to rule out the “0” case in the application of the lemma.
The rest of the result then follows from our estimates on in Proposition 5.1. ∎
6 Proof of Theorem A
6.1 Upper bound
Note that we will not rely on the Markov assumption for the upper bound.
For each ,
Write
and assume that is large enough that .
Writing for the set of all denominator- weights with ,
since if then . But conditioned on is , so we can bound the expectation above using Proposition 5.2, getting
Note . Fix . By continuity of , for all small enough (possibly depending on ) we have
Bounding each probability by 1, we get
But
so this implies
for any , by monotonicity under splitting. Taking the limit as followed by the infimum over (which takes to 0) and gives
Since
for every , this completes the upper bound.
6.2 Lower bound
Fix . To estimate
we bound below using the expected size of
This is not a true lower bound but, by Equation 1 below, there are constants independent of such that
The ‘error’ factor has an exponential growth rate which vanishes as , so will not be a problem.
We now find a lower bound for the expectation of . Applying Proposition 5.2 as above, we have
For any , for small enough (independent of ), by continuity of this is at least
We give a lower bound for the sum by first rewriting it as
Fix . By Lemma 2.3, for all large enough the -weight can be extended to a -weight with ; to apply the lemma we can think of the extended weight as having alphabet , and recall that we assume . Choose such that . Since is an extension of , we can make this choice in such a way that .
Let . By Proposition 2.5,
So, as long as is small enough and is large enough (depending on ), by Lemma 2.3
Now consider the probability appearing in the term:
By symmetry in choice of with the correct letter frequencies, we can write this as
(Prop. 2.5) | ||||
(definition of ) | ||||
(Prop. 5.1) | ||||
By continuity of , we then get
for all large enough and small enough (again depending on ), with the same as chosen above. Since is a Markov chain, .
Putting this all together: for any , for all we have
for all large enough and small enough .
It follows that for any
Taking the limit as gives the desired bound, using Corollary 3.3 and that the family of pseudometrics generates the weak∗ topology.
7 Proof of Theorem B
Let , so that
Note that, by definition of ,
Lemma 7.1.
With as just defined in terms of , , and , we have
Proof.
The assumption in the theorem statement that implies the existence of a constant such that
By Lemma 3.1 we have
using that . Since approaches infinity as goes to infinity we have , so the result follows. ∎
Lemma 7.2.
If , then for any and we have .
Proof.
This is certainly true if ; assume therefore that .
Our assumption guarantees that
for all large enough . Therefore
This inequality can be rearranged to give
Since is arbitrary, the result follows. ∎
In the remainder of this section we prove Theorem B by first proving the right-hand side is an upper bound for the left, then proving it is also lower bound.
7.1 Upper bound
Just as in the proof of the upper bound in Theorem A, for each and we have
where
We assume that is large enough that .
Since is rather than , we cannot apply Proposition 5.2 directly to this expression. We get around this as follows: Let
All elements of this set are denominator- -weights; we avoid the question of exactly which weights are in this set, but call such weights attainable. For and let
denote the set of such weights whose appropriate marginal is within of the -weight . For now we take but we will need more generality below. Then
so we can apply Proposition 5.2 to get
By Lemma 7.2 we have . Using this and Lemma 7.1 we have
where the little is uniform over all terms in the sum. Here we use the assumption that is finite.
By definition of , for any we can pick , , and so that . Then since is a splitting of , by Lemma 3.2 we have
By continuity of , for all small enough (depending on ) we have
Along with the above, this implies that
Bounding all terms in the sum by 1, we get
Using Lemma 7.2 we have
so this implies
Taking the infimum over and , and using the chain rule for (Corollary 3.5, again using the assumption that is finite), gives
Since
for every , this completes the upper bound.
7.2 Lower bound
In this section we denote
(note the dependence on is implicitly specified by and ), and with | ||||
The following two claims are used to relate the sizes of the sets defined above.
Claim 1.
Let . For any we have
where .
Proof.
If , then
this follows from the fact that total variation distance is nonincreasing under pushforwards. Applying Proposition 2.5, we get
Claim 2.
Fix , and . As established in the previous claim, we can consider as a map from to . There are constants independent of such that is at most -to-one.
Proof.
If is empty, then the claim is vacuously true. Otherwise, fix . If , then . By Claim 3 in the proof of Proposition 3.2 of [Bow10a] the number of such pairs , and therefore the number of such , is bounded above by
where is the Shannon entropy. (We give more explicit constants here than in [Bow10a] to make the dependence on clear). ∎
Claim 2 implies that
(1) |
where are independent of .
We now find a lower bound for the expectation of . Fix , and suppose is large enough that . Using Proposition 5.2 and Lemma 7.2, we have
We bound the infimum below as follows: Given any , we can let be such that . Then by Lemma 3.2 and continuity of
for any for all small enough (with “small enough” dependent only on ). This implies that the infimum is bounded below by
We bound the sum below by first rewriting it as
The following claim, then, implies that the sum is bounded below by 1.
Claim 3.
For all large enough ,
Proof.
By Lemma 2.3, if
and then there is a -weight with and . By definition of and Lemma 7.2, both conditions are met for all large enough .
The claim will follow if we show that is attainable.
With as chosen above, by Proposition 2.4 we can choose , , and such that .
Let . To complete the proof we show that , i.e.
for all and . We prove this by induction on the word length .
The base case (i.e. ) follows immediately from the definition of .
For the inductive step, write with and . Then, assuming the result holds for ,
Now since , we can pick such that
This implies
Hence for all large enough we have
and therefore
Combining this lower bound with Equation (1) and the definition of , we get
Taking the inf in then letting go to zero gives
for . First take , then , then take the infimum over . We get
where the last line follows because the collection of pseudometrics generates the weak∗ topology on .
8 Proof of Theorem C
By analogy with sofic entropy, we denote and denote the left-hand side of the formula in the theorem statement as .
Endow with the metric
Note that this induces the weak* topology (where is given the discrete topology and the product topology).
Writing , we then have
We will similarly denote .
8.1 Lower bound
Let be any joining of (the shift systems with respective measures) and . Then for any and we have
where is defined on analogously to the definition given on above. This inequality holds because total variation distance is nonincreasing under pushforwards. Consequently
Taking the supremum over joinings gives the lower bound.
8.2 Upper bound
For , let
be the set of shift-invariant “approximate joinings” of and . Since is compact, for each there exist such that
By definition of we have for all large enough . Therefore
Note that the entire expression in the inf is decreasing as , so we may replace the inf with a limit. Rather than taking a continuous limit we write
For each pick to get within of the supremum. Then the right-hand side is equal to
Let be a subsequence with weak* limit . By weak* continuity of pushforwards under projection we have . Now for any , for all large enough we have both and , so by the triangle inequality
It follows that the expression in , and hence , is bounded above by
Taking the infimum over shows that
9 Proof of Proposition 1.1
All sequences of interest are of the form
with , , , and where is the -weight . In the case of Theorem A we simply have for all .
The theorem will follow from the following:
Lemma 9.1.
Let denote the uniform measure on . Then for any finite and there exists such that
for all large enough . ∎
This can be proven by making superficial changes to the proof of the similar result in [Bow20].
To prove Proposition 1.1, it now suffices to show that for any
for all large enough . To do this, first note that the left-hand side here depends only on the vector of letter frequencies. Therefore
But by Proposition 2.5, if and are such that , then the projection satisfies . Therefore for each
Hence
Combining these last few statements, we see that
We can ignore the first factor here since it only decays exponentially fast. By Proposition 5.1,
The third factor is clearly not a problem and can also be ignored. For the first factor,
using Lemma 7.2. For the second factor, first note that by definition of we have
So
again using Lemma 7.2. This implies that for every we have
for all large enough , which implies the result.
10 Proof of Lemma 2.3
We show how to construct a denominator- weight that has a given -marginal and is close to a given -weight whose -marginal is close to . As in the theorem statement, we assume
To minimize the appearance of factors of , in this section we work with the distance on weights, which is twice the distance defined above. Therefore the previous assumption becomes
We fix distinguished elements and which will be referred to throughout this section.
10.1 The vertex measure
Note that for and
Therefore the distance between the vertex measures is
10.1.1 Nonnegativity
The terms defined by rounding down using the floor function are guaranteed to be nonnegative, but the others are not. In the following we show how to repair any negativity.
Let denote the sum of all negative terms in the vertex measure. Since contains only nonnegative terms we have
Therefore
Suppose there is some such that . Since has denominator , we must have . By construction, we have
so there exists some with . Increase by and decrease by .
The number of times we must repeat this step before all terms are nonnegative is exactly , and each step moves the measure by distance ; therefore the final edited vertex measure is distance at most from the original . If we now let denote the new, nonnegative vertex measure, by the above bound on we get
10.2 The half-marginal
For the purposes of this construction we use the “half-marginal,” which we denote
This is an element of .
Before constructing the edge measure of , in this section we first construct what will be its half-marginal.
For each , , and we define
(2) | |||||
(3) | |||||
(4) |
See Table 2 for a representation of which terms are defined by each equation.
The definition of the terms in (4) ensures that
This will ensure that has the correct vertex measure. Note also that by line (3)
Using this and definition (4) we also get
This will ensure that the -marginal of is .
We show now that the half-marginal is -close to by considering separately the contributions to the distance from terms defined using Equations 2, 3, and 4.
-
(2) terms:
Each of the terms of defined using the floor in equation (2) is distance at most from the corresponding term of ; therefore the total contribution of these terms to the distance is
-
(3) terms:
By the triangle inequality,
The total contribution of such terms is therefore
-
(4) terms:
Again applying the triangle inequality,
Summing over all , and , we see that the total contribution of such terms is bounded by
Adding up the contributions of the three types of terms, we see that the distance between the half-marginals of and is bounded by
10.2.1 Nonnegativity
Again, the preceding construction does not guarantee that all terms are nonnegative. In the following we describe how to correct negativity.
Let be the sum of all negative terms of the half-marginal. As above, we get
Suppose there is some , , and such that . Then . Since
and
there exist and such that
Decrease both of these terms by , and increase both and by . This moves the half-marginal by distance .
This step must be done at most times to eliminate all negative entries, so the final half-marginal satisfies
10.3 The edge measure
It follows from this definition that is a (signed) weight with -marginal .
We now check that is -close to . We consider separately the contribution to the distance of terms defined in equations (5), (6), and (7):
-
(5) terms:
Each term of defined using the floor function in equation (5) is distance at most from the corresponding term. The total contribution of these terms to the distance is therefore at most .
-
(6) terms:
Applying the triangle inequality to terms defined in equation (6),
By the bound on the distance between the half-marginals, the total contribution of all such terms is therefore
-
(7) terms:
Applying the triangle inequality to terms defined in equation (7):
Therefore the total contribution of all such terms is
Summing up the contributions from terms of all three types, we get that
10.3.1 Nonnegativity
We can modify a solution with negative entries to get a nonnegative one similarly to above. Let be the sum of all negative entries; then
Suppose there is some entry
We want to increment this term by without affecting the vertex measure or the marginal. Since
there exists some such that ; similarly since
there exists some such that . Increase
by , and decrease
by . This moves the weight by distance .
Since is the maximum number of times we need to do this before there are no more negative entries, the final weight satisfies
To simplify, we write
or
References
- [BG13] Lewis Bowen and Yonatan Gutman. Nonabelian free group actions: Markov processes, the Abramov–Rohlin formula and Yuzvinskii’s formula – CORRIGENDUM. Ergodic Theory and Dynamical Systems, 33(2):643–645, April 2013.
- [Bow10a] Lewis Bowen. The ergodic theory of free group actions: Entropy and the f-invariant. Groups, Geometry, and Dynamics, pages 419–432, 2010.
- [Bow10b] Lewis Bowen. A measure-conjugacy invariant for free group actions. Annals of Mathematics, 171(2):1387–1400, March 2010.
- [Bow10c] Lewis Bowen. Measure conjugacy invariants for actions of countable sofic groups. Journal of the American Mathematical Society, 23(1):217–217, January 2010.
- [Bow10d] Lewis Bowen. Non-abelian free group actions: Markov processes, the Abramov–Rohlin formula and Yuzvinskii’s formula. Ergodic Theory and Dynamical Systems, 30(6):1629–1663, December 2010.
- [Bow20] Lewis Bowen. Sofic homological invariants and the Weak Pinsker Property. arXiv:1807.08191 [math], February 2020.
- [CT06] T. M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley-Interscience, Hoboken, N.J, 2nd ed edition, 2006.
- [Hay16] Ben Hayes. Relative entropy and the Pinsker product formula for sofic groups. arXiv:1605.01747 [math], May 2016.
- [OW87] Donald S. Ornstein and Benjamin Weiss. Entropy and isomorphism theorems for actions of amenable groups. Journal d’Analyse Mathématique, 48(1):1–141, December 1987.