Pasadena CA 91125, USA
11email: [email protected]
On the entropy of rectifiable and stratified measures
Abstract
We summarize some results of geometric measure theory concerning rectifiable sets and measures. Combined with the entropic chain rule for disintegrations (Vigneaux, 2021), they account for some properties of the entropy of rectifiable measures with respect to the Hausdorff measure first studied by (Koliander et al., 2016). Then we present some recent work on stratified measures, which are convex combinations of rectifiable measures. These generalize discrete-continuous mixtures and may have a singular continuous part. Their entropy obeys a chain rule, whose “conditional term” is an average of the entropies of the rectifiable measures involved. We state an asymptotic equipartition property (AEP) for stratified measures that shows concentration on strata of a few “typical dimensions” and that links the conditional term of the chain rule to the volume growth of typical sequences in each stratum.
Keywords:
Entropy stratified measures rectifiable measures chain rule asymptotic equipartition property1 Introduction
The starting point of our considerations is the asymptotic equipartition property:
Proposition 1 (AEP)
Let be a -finite measure space, and a probability measure on such that . Suppose that the entropy
(1) |
is finite. For every , define the set of weakly -typical realizations
(2) |
Then, for every , there exists such that, for all ,
-
1.
and
-
2.
The proof of this proposition only depends on the weak law of large numbers, which ensures the convergence in probability of to its mean , see [11, Ch. 12] and [3, Ch. 8].
Shannon’s discrete entropy and differential entropy are particular cases of (2): the former when is the counting measure on a discrete set equipped with the -algebra of all subsets; the latter when , is generated by the open sets, and is the Lebesgue measure. Up to a sign, the Kullback-Leibler divergence, also called relative entropy, is another particular case, that arises when is a probability measure.
One can imagine other examples, geometric in nature, that involve measures on that are singular continuous. For instance, could be a Riemannian manifold equipped with the Borel measure given by integration of its Riemannian volume form. The measure-theoretic nature of Proposition 1 makes the smoothness in this example irrelevant. It is more natural to work with geometric measure theory.
2 Some elements of Geometric Measure Theory
Geometric measure theory “could be described as differential geometry, generalized through measure theory to deal with maps and surfaces that are not necessarily smooth, and applied to the calculus of variations” [9]. The place of smooth maps is taken by Lipschitz maps, which are differentiable almost everywhere [2, p. 46]. In turn, manifolds are replaced by rectifiable sets, and the natural notion of volume for such sets is the Hausdorff measure.
2.1 Hausdorff measure and dimension
To define the Hausdorff measure, recall first that the diameter of a subset of the Euclidean space is For any and any , define
(3) |
where is the volume of the unit ball , and the infimum is taken over all countable coverings of such that each set has diameter at most . This is an outer measure and its restriction to the Borel -algebra is a measure i.e. a -additive -valued set-function [2, Thm. 1.49]. Moreover, the measure equals the standard Lebesgue measure [2, Thm. 2.53] and is the counting measure. More generally, when is an integer between and , gives a natural notion of -dimensional volume. The - and -dimensional volumes coincide, respectively, with the classical notions of length and area, see Examples 1 and 2 below.
For a given set , the number is either or , except possibly for a single value of , which is then called the Hausdorff dimension of . More precisely [2, Def. 2.51],
(4) |
Finally, we remark here that the Hausdorff measure interacts very naturally with Lipschitz maps.
Lemma 1 ([2, Prop. 2.49])
If is a Lipschitz function,111A function between metric spaces and is called Lipschitz if there exists such that The Lipschitz constant of , denoted , is the smallest that satisfies this condition. with Lipschitz constant , then for all and every subset of , .
2.2 Rectifiable sets
Definition 1 ([6, 3.2.14])
A subset of is called -rectifiable (for ) if it is the image of a bounded subset of under a Lipschitz map and countably -rectifiable if it is a countable union of -rectifiable sets. The subset is countably -rectifiable if there exist countable -rectifiable set containing -almost all of , this is, if there are bounded sets and Lipschitz functions , enumerated by , such that .
By convention, is a point, so that a countably -rectifiable set is simply a countable set. An example of countable -rectifiable set is an -dimensional -submanifold of , see [1, App. A]. A set that differs from an -dimensional -submanifold by a -null set is countably -rectifiable.
For every countably -rectifiable set , there exists (see [13, pp. 16-17] and the references therein) an -null set , compact sets and injective Lipschitz functions such that the sets are pairwise disjoint and
(5) |
It follows from Lemma 1 and the boundedness of the sets that every -rectifiable set has -finite -measure. Because of monotonicity of , any subset of an -rectifiable set is -rectifiable. Also, the countable union of -rectifiable subsets is -rectifiable.
The area and coarea formulas may be used to compute integrals on an -rectifiable set with respect to the -measure. As a preliminary, we define the area and coarea factors. Let , be finite-dimensional Hilbert spaces and be a linear map. Recall that the inner product gives explicit identifications and with their duals.
Proposition 2 (Area formula, cf. [2, Thm. 2.71] and [6, Thm. 3.2.3])
Let be integers such that and a Lipschitz function. For any Lebesgue measurable subset of and -integrable function , the function on is -measurable, and
(6) |
This implies in particular that, if is injective on , then is -measurable and .
Example 1 (Curves)
Let be a -curve; recall that a -map defined on a compact set is Lipschitz. Then , and . So we obtain the standard formula for the length of a curve,
(7) |
Example 2 (Surfaces)
Let be a Lipschitz, differentiable, and injective map defining a surface. In this case, where is the column vector
and similarly for . Then, if denotes the angle between and ,
(8) |
Therefore,
(9) |
which is again the classical formula for the area of a parametric surface.
In many situations, it is useful to compute an integral over a countably -rectifiable set as an iterated integral, first over the level sets of a Lipschitz function with , and then over . In particular, if and is a projection onto the space generated by some vectors of the canonical basis of , this procedure corresponds to Fubini’s theorem. Its generalization to the rectifiable case is the coarea formula.
Proposition 3 (Coarea formula, [2, Thm. 2.93])
Let be a Lipschitz function, a countably -rectifiable subset of (with ) and a Borel function. Then, the set is countably -rectifiable and -measurable, the function is -measurable on , and
(10) |
Here is the tangential differential [2, Def. 2.89], the restriction of to the approximate tangent space to at . The precise computation of this function is not essential here, but rather the fact that -almost surely, hence has a -disintegration given by the measures
, which are well-defined -almost surely.222Given a measure on a -algebra and , denotes the restricted measure .
Remark 1 (On disintegrations)
Disintegrations are an even broader generalization of Fubini’s theorem.
Let be a measurable map and let and be -finite measures on and respectively. The measure has a -disintegration if
-
1.
is a -finite measure on such that for -almost every ;
-
2.
for each measurable nonnegative function , the map is measurable, and .
In case such a disintegration exists, any probability measure has a -disintegration such that each is a probability measure with density w.r.t. , and the following chain rule holds [12, Prop. 3]:
(11) |
3 Rectifiable measures and their entropy
Let be a locally finite measure and a nonnegative real number. Marstrand proved that if the limiting density exists and is strictly positive and finite for -almost every , then is an integer not greater than . Later Preiss proved that such a measure is also -rectifiable in the sense of the following definition. For details, see e.g. [5].
Definition 2 ([8, Def. 16.6])
A Radon outer measure on is called -rectifiable if and there exists a countably -rectifiable Borel set such that .
The study of these measures from the viewpoint of information theory, particularly the properties of the entropy of an -rectifiable probability measure , was carried out relatively recently by Koliander, Pichler, Riegler, and Hlawatsch in [7]. We provide here an idiosyncratic summary of some of their results.
First, remark that in virtue of (5), an -rectifiable measure is absolutely continuous with respect to the restricted measure , where is countably -rectifiable and has the form with injective and Borel and bounded. (A refinement of this construction gives a similar set such that, additionally, the density of is strictly positive [7, App. A].) Although the product of an -rectifiable set and an rectifiable set is not -rectifiable—see [6, 3.2.24]—the carriers behave better.
Lemma 2 (See [7, Lem. 27] and [13, Lem. 6])
If is a carrier of an -rectifiable measure (for ), then is a carrier of , of Hausdorff dimension . Additionally, the Hausdorff measure equals .
Let be an -rectifiable measure, with carrier (we drop the hereon). It holds that and is -finite. If moreover , Proposition 1 gives estimates for . Lemma 2 tells us that is -rectifiable and that , which is desirable because the Hausdorff dimension of is and is the only nontrivial measure on it as well as on , which as a subset of is rectifiable too.
To apply the AEP we need to compute . In some cases, one can use the area formula (Proposition 2) to “change variables” and express in terms of the usual differential entropy. For instance, suppose is a bounded Borel subset of of nontrival -measure and is an injective Lipschitz function on . The set is -rectifiable. Moreover, if is a probability measure such that with density , then the area formula applied to and , for some Borel subset of , shows that with density , which is well-defined -almost surely. A simple computation yields
(12) |
There is a more general formula of this kind when is a rectifiable subset of .
Finally, we deduce the chain rule for the entropy of rectifiable measures as a consequence of our general theorem for disintegrations (Remark 1). Let be a countably -rectifiable subset of , a Borel function (with ), and a probability measure such that . Because has an -disintegration , with , then , and
(13) |
The probabilities are described in Remark 1. If one insists in only using the Hausdorff measures as reference measures, one must rewrite the integrand in (13) using the chain rule for the Radon-Nikodym derivative:
One recovers in this way the formula (50) in [7].
4 Stratified measures
Definition 3 (-stratified measure)
A measure on is -stratified, for , if there are integers such that and can be expressed as a sum , where each is a nonzero -rectifiable measure.
Thus -stratified measures are rectifiable measures. If is -stratified for some we simply say that is a stratified measure.
A fundamental nontrival example to bear in mind is a discrete-continuous mixture, which corresponds to , countable, and . More generally, a stratified measure has a Lebesgue decomposition with a singular continuous part provided some is strictly between and .
Let be a probability measure that is stratified in the sense above. We can always put it in the standard form , where each is a rectifiable probability measure with carrier of dimension (so that ), the carriers are disjoint, , and is a probability vector with strictly positive entries. The carriers can be taken to be disjoint because if has Hausdorff dimension , then for , hence one can prove [13, Sec. IV-B] that is a carrier for , for .
We can regard as the law of a random variable valued in and the vector as the law of the discrete random variable induced by the projection from to that maps to . We denote by the random variable , with expectation .
The measure is absolutely continuous with respect to , where , so it makes sense to consider the entropy ; it has a concrete probabilistic meaning in the sense of Proposition 1. Moreover, one can prove that [13, Lem. 3] and therefore
(14) |
holds [13, Lem. 4]. This formula also follows form the chain rule for general disintegrations (Remark 13), because is a -disintegration of .
The powers of are also stratified. In fact, remark that
(15) |
where counts the appearances of the symbol in the word . Each measure is absolutely continuous with respect to . It follows from Lemma 2 that for any , the stratum is also a carrier, of dimension , and the product measure equals . Therefore each measure is rectifiable. We can group together the of the same dimension to put as in Definition 3.
By Proposition 1, one might approximate with an arbitrary level of accuracy by its restriction to the weakly typical realizations of , provided is big enough. In order to get additional control on the dimensions appearing in this approximation, we restrict it further, retaining only the strata that correspond to strongly typical realizations of the random variable .
Let us denote by the probability mass function (p.m.f) of . Recall that induces a probability law on , known as empirical distribution, given by Csiszár and Körner [4, Ch. 2] define to be strongly -typical if , with p.m.f. , is such that and, for all , . We denote by the set of these sequences when . In virtue of the union bound and Hoeffding’s inequality, , where ; the choice of ensures that as . Moreover, the continuity of the discrete entropy in the total-variation distance implies that is a subset of with , which explains our notation. See [13, Sec. III-D]
We introduce the set of doubly typical sequences in , and call a doubly typical stratum for any .
The main result of [13] is a refined version of the AEP for stratified measures that gives an interpretation for the conditional term in the chain rule (14).
Theorem 4.1
(Setting introduced above) For any there exists an such that for any the restriction of to , satisfies . Moreover, the measure equals a sum of -rectifiable measures for . The conditional entropy quantifies the volume growth of most doubly typical fibers in the following sense:
-
1.
For any , one has
-
2.
For any , the set of such that
satisfies
References
- [1] G. Alberti, H. Bölcskei, C. De Lellis, G. Koliander, and E. Riegler. Lossless analog compression. IEEE Transactions on Information Theory, 65(11):7480–7513, 2019.
- [2] L. Ambrosio, N. Fusco, and D. Pallara. Functions of Bounded Variation and Free Discontinuity Problems. Oxford Science Publications. Clarendon Press, 2000.
- [3] T. M. Cover and J. A. Thomas. Elements of Information Theory. A Wiley-Interscience publication. Wiley, 2006.
- [4] I. Csiszár and J. Körner. Information theory: Coding theorems for discrete memoryless systems. Probability and mathematical statistics. Academic Press, 1981.
- [5] C. De Lellis. Rectifiable Sets, Densities and Tangent Measures. Zurich lectures in advanced mathematics. European Mathematical Society, 2008.
- [6] H. Federer. Geometric Measure Theory. Classics in Mathematics. Springer Berlin Heidelberg, 1969.
- [7] G. Koliander, G. Pichler, E. Riegler, and F. Hlawatsch. Entropy and source coding for integer-dimensional singular random variables. IEEE Transactions on Information Theory, 62(11):6124–6154, Nov. 2016.
- [8] P. Mattila. Geometry of Sets and Measures in Euclidean Spaces: Fractals and Rectifiability. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 1995.
- [9] F. Morgan. Geometric Measure Theory: A Beginner’s Guide. Elsevier Science, 2008.
- [10] A. Rényi. On the dimension and entropy of probability distributions. Acta Mathematica Academiae Scientiarum Hungarica, 10(1):193–215, 1959.
- [11] J. P. Vigneaux. Topology of Statistical Systems: A Cohomological Approach to Information Theory. PhD thesis, Université de Paris, 2019.
- [12] J. P. Vigneaux. Entropy under disintegrations. In F. ”Nielsen and F. Barbaresco, editors, GSI 2021: Geometric Science of Information, volume 12829 of Lecture Notes in Computer Science, pages 340–349. Springer, 2021.
- [13] J. P. Vigneaux. Typicality for stratified measures. arXiv preprint arXiv:2212.10809, 2022.