This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Department of Mathematics, California Institute of Technology,
Pasadena CA 91125, USA
11email: [email protected]

On the entropy of rectifiable and stratified measures

Juan Pablo Vigneaux
Abstract

We summarize some results of geometric measure theory concerning rectifiable sets and measures. Combined with the entropic chain rule for disintegrations (Vigneaux, 2021), they account for some properties of the entropy of rectifiable measures with respect to the Hausdorff measure first studied by (Koliander et al., 2016). Then we present some recent work on stratified measures, which are convex combinations of rectifiable measures. These generalize discrete-continuous mixtures and may have a singular continuous part. Their entropy obeys a chain rule, whose “conditional term” is an average of the entropies of the rectifiable measures involved. We state an asymptotic equipartition property (AEP) for stratified measures that shows concentration on strata of a few “typical dimensions” and that links the conditional term of the chain rule to the volume growth of typical sequences in each stratum.

Keywords:
Entropy stratified measures rectifiable measures chain rule asymptotic equipartition property

1 Introduction

The starting point of our considerations is the asymptotic equipartition property:

Proposition 1 (AEP)

Let (EX,𝔅,μ)(E_{X},\mathfrak{B},\mu) be a σ\sigma-finite measure space, and ρ\rho a probability measure on (EX,𝔅)(E_{X},\mathfrak{B}) such that ρμ\rho\ll\mu. Suppose that the entropy

Hμ(ρ):=Elndρdμdρ=𝔼ρ(lndρdμ)H_{\mu}(\rho):=-\int_{E}\ln\frac{\mathrm{d}\rho}{\mathrm{d}\mu}\mbox{$\,\mathrm{d}$}\rho=\mathbb{E}_{\rho}\left(-\ln\frac{\mathrm{d}\rho}{\mathrm{d}\mu}\right) (1)

is finite. For every δ>0\delta>0, define the set of weakly δ\delta-typical realizations

Wδ(n)(ρ;μ)={(x1,,xn)EXn:|1nlni=1ndρdμ(xi)Hμ(ρ)|<δ}.W^{(n)}_{\delta}(\rho;\mu)=\left\{\,(x_{1},...,x_{n})\in E_{X}^{n}\,:\,\left|-\frac{1}{n}\ln\prod_{i=1}^{n}\frac{\mathrm{d}\rho}{\mathrm{d}\mu}(x_{i})-H_{\mu}(\rho)\right|<\delta\,\right\}. (2)

Then, for every ε>0\varepsilon>0, there exists n0n_{0}\in\mathbb{N} such that, for all nn0n\geq n_{0},

  1. 1.

    (Wδ(n)(ρ;μ))>1ε\mathbb{P}\left(W_{\delta}^{(n)}(\rho;\mu)\right)>1-\varepsilon and

  2. 2.

    (1ε)exp{n(Hμ(ρ)δ)}μn(Wδ(n)(ρ;μ))exp{n(Hμ(ρ)+δ)}.(1-\varepsilon)\exp\{n(H_{\mu}(\rho)-\delta)\}\leq\mu^{\otimes n}(W_{\delta}^{(n)}(\rho;\mu))\leq\exp\{n(H_{\mu}(\rho)+\delta)\}.

The proof of this proposition only depends on the weak law of large numbers, which ensures the convergence in probability of 1ni=1nlnf(xi)-\frac{1}{n}\sum_{i=1}^{n}\ln f(x_{i}) to its mean Hμ(ρ)H_{\mu}(\rho), see [11, Ch. 12] and [3, Ch. 8].

Shannon’s discrete entropy and differential entropy are particular cases of (2): the former when μ\mu is the counting measure on a discrete set equipped with the σ\sigma-algebra of all subsets; the latter when EX=dE_{X}=\mathbb{R}^{d}, 𝔅\mathfrak{B} is generated by the open sets, and μ\mu is the Lebesgue measure. Up to a sign, the Kullback-Leibler divergence, also called relative entropy, is another particular case, that arises when μ\mu is a probability measure.

One can imagine other examples, geometric in nature, that involve measures μ\mu on d\mathbb{R}^{d} that are singular continuous. For instance, EXE_{X} could be a Riemannian manifold equipped with the Borel measure given by integration of its Riemannian volume form. The measure-theoretic nature of Proposition 1 makes the smoothness in this example irrelevant. It is more natural to work with geometric measure theory.

2 Some elements of Geometric Measure Theory

Geometric measure theory “could be described as differential geometry, generalized through measure theory to deal with maps and surfaces that are not necessarily smooth, and applied to the calculus of variations” [9]. The place of smooth maps is taken by Lipschitz maps, which are differentiable almost everywhere [2, p. 46]. In turn, manifolds are replaced by rectifiable sets, and the natural notion of volume for such sets is the Hausdorff measure.

2.1 Hausdorff measure and dimension

To define the Hausdorff measure, recall first that the diameter of a subset SS of the Euclidean space (d,2)(\mathbb{R}^{d},\left\|\cdot\right\|_{2}) is diam(S)=sup{xy2:x,yS}.\operatorname{diam}(S)=\sup\{\,\left\|x-y\right\|_{2}\,:\,x,y\in S\,\}. For any m0m\geq 0 and any AnA\subset\mathbb{R}^{n}, define

m(A)=limδ0inf{Si}iIiIwm(diam(Si)2)m,\mathcal{H}^{m}(A)=\lim_{\delta\to 0}\inf_{\{S_{i}\}_{i\in I}}\sum_{i\in I}w_{m}\left(\frac{\operatorname{diam}(S_{i})}{2}\right)^{m}, (3)

where wm=πm/2/Γ(m/2+1)w_{m}=\pi^{m/2}/\Gamma(m/2+1) is the volume of the unit ball B(0,1)mB(0,1)\subset\mathbb{R}^{m}, and the infimum is taken over all countable coverings {Si}iI\{S_{i}\}_{i\in I} of AA such that each set SiS_{i} has diameter at most δ\delta. This is an outer measure and its restriction to the Borel σ\sigma-algebra is a measure i.e. a σ\sigma-additive [0,][0,\infty]-valued set-function [2, Thm. 1.49]. Moreover, the measure d\mathcal{H}^{d} equals the standard Lebesgue measure d\mathcal{L}^{d} [2, Thm. 2.53] and 0\mathcal{H}^{0} is the counting measure. More generally, when mm is an integer between 0 and dd, m\mathcal{H}^{m} gives a natural notion of mm-dimensional volume. The 11- and 22-dimensional volumes coincide, respectively, with the classical notions of length and area, see Examples 1 and 2 below.

For a given set AdA\subset\mathbb{R}^{d}, the number m(A)\mathcal{H}^{m}(A) is either 0 or \infty, except possibly for a single value of mm, which is then called the Hausdorff dimension dimHA\dim_{H}A of AA. More precisely [2, Def. 2.51],

dimHA:=inf{k[0,):k(A)=0}=sup{k[0,):k(A)=}.\dim_{H}A:=\inf\{\,k\in[0,\infty)\,:\,\mathcal{H}^{k}(A)=0\,\}=\sup\{\,k\in[0,\infty)\,:\,\mathcal{H}^{k}(A)=\infty\,\}. (4)

Finally, we remark here that the Hausdorff measure interacts very naturally with Lipschitz maps.

Lemma 1 ([2, Prop. 2.49])

If f:ddf:\mathbb{R}^{d}\to\mathbb{R}^{d^{\prime}} is a Lipschitz function,111A function f:XYf:X\to Y between metric spaces (X,dX)(X,d_{X}) and (Y,dY)(Y,d_{Y}) is called Lipschitz if there exists C>0C>0 such that x,xX,dY(f(x),f(x))CdX(x,x).\forall x,x^{\prime}\in X,\quad d_{Y}(f(x),f(x^{\prime}))\leq Cd_{X}(x,x^{\prime}). The Lipschitz constant of ff, denoted Lip(f)\operatorname{Lip}(f), is the smallest CC that satisfies this condition. with Lipschitz constant Lip(f)\operatorname{Lip}(f), then for all m0m\geq 0 and every subset EE of d\mathbb{R}^{d}, m(f(E))Lip(f)km(E)\mathcal{H}^{m}(f(E))\leq\operatorname{Lip}(f)^{k}\mathcal{H}^{m}(E).

2.2 Rectifiable sets

Definition 1 ([6, 3.2.14])

A subset SS of d\mathcal{\mathbb{R}}^{d} is called mm-rectifiable (for mdm\leq d) if it is the image of a bounded subset of m\mathbb{R}^{m} under a Lipschitz map and countably mm-rectifiable if it is a countable union of mm-rectifiable sets. The subset SS is countably (m,m)(\mathcal{H}^{m},m)-rectifiable if there exist countable mm-rectifiable set containing m\mathcal{H}^{m}-almost all of SS, this is, if there are bounded sets AkmA_{k}\subset\mathbb{R}^{m} and Lipschitz functions fk:Akdf_{k}:A_{k}\to\mathbb{R}^{d}, enumerated by kk\in\mathbb{N}, such that m(Skfk(Ak))=0\mathcal{H}^{m}(S\setminus\bigcup_{k}f_{k}(A_{k}))=0.

By convention, 0\mathbb{R}^{0} is a point, so that a countably 0-rectifiable set is simply a countable set. An example of countable mm-rectifiable set is an mm-dimensional C1C^{1}-submanifold EE of d\mathbb{R}^{d}, see [1, App. A]. A set that differs from an mm-dimensional C1C^{1}-submanifold by a m\mathcal{H}^{m}-null set is countably (m,m)(\mathcal{H}^{m},m)-rectifiable.

For every countably (m,m)(\mathcal{H}^{m},m)-rectifiable set EdE\subset\mathbb{R}^{d}, there exists (see [13, pp. 16-17] and the references therein) an m\mathcal{H}^{m}-null set E0E_{0}, compact sets (Ki)i(K_{i})_{i\in\mathbb{N}} and injective Lipschitz functions (fi:Kid)i(f_{i}:K_{i}\to\mathbb{R}^{d})_{i\in\mathbb{N}} such that the sets fi(Ki)f_{i}(K_{i}) are pairwise disjoint and

EE0ifi(Ki).E\subset E_{0}\cup\bigcup_{i\in\mathbb{N}}f_{i}(K_{i}). (5)

It follows from Lemma 1 and the boundedness of the sets KiK_{i} that every (m,m)(\mathcal{H}^{m},m)-rectifiable set has σ\sigma-finite m\mathcal{H}^{m}-measure. Because of monotonicity of m\mathcal{H}^{m}, any subset of an (m,m)(\mathcal{H}^{m},m)-rectifiable set is (m,m)(\mathcal{H}^{m},m)-rectifiable. Also, the countable union of (m,m)(\mathcal{H}^{m},m)-rectifiable subsets is (m,m)(\mathcal{H}^{m},m)-rectifiable.

The area and coarea formulas may be used to compute integrals on an mm-rectifiable set with respect to the m\mathcal{H}^{m}-measure. As a preliminary, we define the area and coarea factors. Let VV, WW be finite-dimensional Hilbert spaces and L:VWL:V\to W be a linear map. Recall that the inner product gives explicit identifications VVV\cong V^{*} and WWW\cong W^{*} with their duals.

  1. 1.

    If k:=dimVdimVk:=\dim V\leq\dim V, the kk-dimensional Jacobian or area factor [2, Def. 2.68] is JkL=det(LL)J_{k}L=\sqrt{\det(L^{*}\circ L)}, where L:WVL^{*}:W^{*}\to V^{*} is the transpose of LL.

  2. 2.

    If dimVdimW=:d\dim V\geq\dim W=:d, then the dd-dimensional coarea factor [2, Def. 2.92] is CdL=det(LL)C_{d}L=\sqrt{\det(L\circ L^{*})}.

Proposition 2 (Area formula, cf. [2, Thm. 2.71] and [6, Thm. 3.2.3])

Let k,dk,d be integers such that kdk\leq d and f:kdf:\mathbb{R}^{k}\to\mathbb{R}^{d} a Lipschitz function. For any Lebesgue measurable subset EE of k\mathbb{R}^{k} and k\mathcal{L}^{k}-integrable function uu, the function yxEf1(y)u(x)y\mapsto\sum_{x\in E\cap f^{-1}(y)}u(x) on d\mathbb{R}^{d} is k\mathcal{H}^{k}-measurable, and

Eu(x)Jkf(x)dk(x)=dxEf1(y)u(x)dk(y).\int_{E}u(x)J_{k}f(x)\mbox{$\,\mathrm{d}$}\mathcal{L}^{k}(x)=\int_{\mathbb{R}^{d}}\sum_{x\in E\cap f^{-1}(y)}u(x)\mbox{$\,\mathrm{d}$}\mathcal{H}^{k}(y). (6)

This implies in particular that, if ff is injective on EE, then f(E)f(E) is k\mathcal{H}^{k}-measurable and k(f(E))=EJkf(x)dk(x)\mathcal{H}^{k}(f(E))=\int_{E}J_{k}f(x)\mbox{$\,\mathrm{d}$}\mathcal{L}^{k}(x).

Example 1 (Curves)

Let φ:[0,1]d,t(φ1(t),,φd(t))\varphi:[0,1]\to\mathbb{R}^{d},\>t\mapsto(\varphi_{1}(t),...,\varphi_{d}(t)) be a C1C^{1}-curve; recall that a C1C^{1}-map defined on a compact set is Lipschitz. Then dφ(t)=(φ1(t),,φd(t))d\varphi(t)=(\varphi_{1}^{\prime}(t),...,\varphi_{d}^{\prime}(t)), and J1φ=(dφ)(dφ)=dφ2J_{1}\varphi=\sqrt{(d\varphi)^{*}(d\varphi)}=\left\|d\varphi\right\|_{2}. So we obtain the standard formula for the length of a curve,

1(φ([0,1]))=01dφ(t)2dt.\mathcal{H}^{1}(\varphi([0,1]))=\int_{0}^{1}\left\|d\varphi(t)\right\|_{2}\mbox{$\,\mathrm{d}$}t. (7)
Example 2 (Surfaces)

Let φ=(φ1,φ2,φ3):V23\varphi=(\varphi_{1},\varphi_{2},\varphi_{3}):V\subset\mathbb{R}^{2}\to\mathbb{R}^{3} be a Lipschitz, differentiable, and injective map defining a surface. In this case, dφ(u,v)=(φx(u,v)φy(u,v))d\varphi(u,v)=\begin{pmatrix}\varphi_{x}(u,v)&\varphi_{y}(u,v)\end{pmatrix} where φx(u,v)\varphi_{x}(u,v) is the column vector

(xφ1(u,v),xφ2(u,v),xφ3(u,v))(\partial_{x}\varphi_{1}(u,v),\partial_{x}\varphi_{2}(u,v),\partial_{x}\varphi_{3}(u,v))

and similarly for φy(u,v)\varphi_{y}(u,v). Then, if θ\theta denotes the angle between φx(u,v)\varphi_{x}(u,v) and φy(u,v)\varphi_{y}(u,v),

J2φ=det(φx2φxφyφxφyφy2)=φxφy1cosθ=φx×φy.J_{2}\varphi=\sqrt{\det\begin{pmatrix}\left\|\varphi_{x}\right\|^{2}&\varphi_{x}\bullet\varphi_{y}\\ \varphi_{x}\bullet\varphi_{y}&\left\|\varphi_{y}\right\|^{2}\end{pmatrix}}=\left\|\varphi_{x}\right\|\left\|\varphi_{y}\right\|\sqrt{1-\cos\theta}=\left\|\varphi_{x}\times\varphi_{y}\right\|. (8)

Therefore,

2(φ(V))=Vφx×φy(u,v)dudv,\mathcal{H}^{2}(\varphi(V))=\int_{V}\left\|\varphi_{x}\times\varphi_{y}(u,v)\right\|\mbox{$\,\mathrm{d}$}u\mbox{$\,\mathrm{d}$}v, (9)

which is again the classical formula for the area of a parametric surface.

In many situations, it is useful to compute an integral over a countably (m,m)(\mathcal{H}^{m},m)-rectifiable set EkE\subset\mathbb{R}^{k} as an iterated integral, first over the level sets E{x:f(x)=t}E\cap\{\,x\,:\,f(x)=t\,\} of a Lipschitz function f:kdf:\mathbb{R}^{k}\to\mathbb{R}^{d} with dkd\leq k, and then over tdt\in\mathbb{R}^{d}. In particular, if E=kE=\mathbb{R}^{k} and ff is a projection onto the space generated by some vectors of the canonical basis of k\mathbb{R}^{k}, this procedure corresponds to Fubini’s theorem. Its generalization to the rectifiable case is the coarea formula.

Proposition 3 (Coarea formula, [2, Thm. 2.93])

Let f:kdf:\mathbb{R}^{k}\to\mathbb{R}^{d} be a Lipschitz function, EE a countably (m,m)(\mathcal{H}^{m},m)-rectifiable subset of k\mathbb{R}^{k} (with mdm\geq d) and g:k[0,]g:\mathbb{R}^{k}\to[0,\infty] a Borel function. Then, the set Ef1(t)E\cap f^{-1}(t) is countably (md,md)(\mathcal{H}^{m-d},m-d)-rectifiable and md\mathcal{H}^{m-d}-measurable, the function tEf1(t)g(y)dmdt\mapsto\int_{E\cap f^{-1}(t)}g(y)\mbox{$\,\mathrm{d}$}\mathcal{H}^{m-d} is d\mathcal{L}^{d}-measurable on d\mathbb{R}^{d}, and

Eg(x)CddEfxdm(x)=d(f1(t)g(y)dmd(y))dt.\int_{E}g(x)C_{d}d^{E}f_{x}\mbox{$\,\mathrm{d}$}\mathcal{H}^{m}(x)=\int_{\mathbb{R}^{d}}\left(\int_{f^{-1}(t)}g(y)\mbox{$\,\mathrm{d}$}\mathcal{H}^{m-d}(y)\right)\mbox{$\,\mathrm{d}$}t. (10)

Here dEfxd^{E}f_{x} is the tangential differential [2, Def. 2.89], the restriction of dfxdf_{x} to the approximate tangent space to EE at xx. The precise computation of this function is not essential here, but rather the fact that CkdEfx>0C_{k}d^{E}f_{x}>0 m\mathcal{H}^{m}-almost surely, hence m|E\mathcal{H}^{m}|_{E} has a (f,d)(f,\mathcal{L}^{d})-disintegration given by the measures
{(CkdEf)1md|Ef1(t)}tk\{(C_{k}d^{E}f)^{-1}\mathcal{H}^{m-d}|_{E\cap f^{-1}(t)}\}_{t\in\mathbb{R}^{k}}, which are well-defined d\mathcal{L}^{d}-almost surely.222Given a measure μ\mu on a σ\sigma-algebra 𝔅\mathfrak{B} and B𝔅B\in\mathfrak{B}, μ|B\mu|_{B} denotes the restricted measure Aμ|B(A):=μ(AB)A\mapsto\mu|_{B}(A):=\mu(A\cap B).

Remark 1 (On disintegrations)

Disintegrations are an even broader generalization of Fubini’s theorem.

Let T:(E,𝔅)(ET,𝔅T)T:(E,\mathfrak{B})\to(E_{T},\mathfrak{B}_{T}) be a measurable map and let ν\nu and ξ\xi be σ\sigma-finite measures on (E,𝔅)(E,\mathfrak{B}) and (ET,𝔅T)(E_{T},\mathfrak{B}_{T}) respectively. The measure ν\nu has a (T,ξ)(T,\xi)-disintegration {νt}tET\{\nu_{t}\}_{t\in E_{T}} if

  1. 1.

    νt\nu_{t} is a σ\sigma-finite measure on 𝔅\mathfrak{B} such that νt(Tt)=0\nu_{t}(T\neq t)=0 for ξ\xi-almost every tt;

  2. 2.

    for each measurable nonnegative function f:Ef:E\to\mathbb{R}, the map tEfdνtt\mapsto\int_{E}f\mbox{$\,\mathrm{d}$}\nu_{t} is measurable, and Efdν=ET(Ef(x)dνt(x))dξ(t)\int_{E}f\mbox{$\,\mathrm{d}$}\nu=\int_{E_{T}}\left(\int_{E}f(x)\mbox{$\,\mathrm{d}$}\nu_{t}(x)\right)\mbox{$\,\mathrm{d}$}\xi(t).

In case such a disintegration exists, any probability measure ρ=rν\rho=r\cdot\nu has a (T,Tρ)(T,T_{*}\rho)-disintegration {ρt}tEt\{\rho_{t}\}_{t\in E_{t}} such that each ρt\rho_{t} is a probability measure with density r/Erdνtr/\int_{E}r\mbox{$\,\mathrm{d}$}\nu_{t} w.r.t. νt\nu_{t}, and the following chain rule holds [12, Prop. 3]:

Hν(ρ)=Hξ(Tρ)+ETHνt(ρt)dTρ(t).H_{\nu}(\rho)=H_{\xi}(T_{*}\rho)+\int_{E_{T}}H_{\nu_{t}}(\rho_{t})\mbox{$\,\mathrm{d}$}T_{*}\rho(t). (11)

3 Rectifiable measures and their entropy

Let ρ\rho be a locally finite measure and ss a nonnegative real number. Marstrand proved that if the limiting density Θs(ρ,x):=limr0ρ(B(x,r))/(wsrs)\Theta_{s}(\rho,x):=\lim_{r\downarrow 0}\rho(B(x,r))/(w_{s}r^{s}) exists and is strictly positive and finite for ρ\rho-almost every xx, then ss is an integer not greater than nn. Later Preiss proved that such a measure is also ss-rectifiable in the sense of the following definition. For details, see e.g. [5].

Definition 2 ([8, Def. 16.6])

A Radon outer measure ν\nu on d\mathbb{R}^{d} is called mm-rectifiable if νm\nu\ll\mathcal{H}^{m} and there exists a countably (m,m)(\mathcal{H}^{m},m)-rectifiable Borel set EE such that ν(dE)=0\nu(\mathbb{R}^{d}\setminus E)=0.

The study of these measures from the viewpoint of information theory, particularly the properties of the entropy Hm(ρ)H_{\mathcal{H}^{m}}(\rho) of an mm-rectifiable probability measure ρ\rho, was carried out relatively recently by Koliander, Pichler, Riegler, and Hlawatsch in [7]. We provide here an idiosyncratic summary of some of their results.

First, remark that in virtue of (5), an mm-rectifiable measure ν\nu is absolutely continuous with respect to the restricted measure m|E\mathcal{H}^{m}|_{E^{*}}, where EE^{*} is countably mm-rectifiable and has the form ifi(Ki)\bigcup_{i\in\mathbb{N}}f_{i}(K_{i}) with fif_{i} injective and KiK_{i} Borel and bounded. (A refinement of this construction gives a similar set such that, additionally, the density of ρ\rho is strictly positive [7, App. A].) Although the product of an (m1,m1)(\mathcal{H}^{m_{1}},m_{1})-rectifiable set and an (m2,m2)(\mathcal{H}^{m_{2}},m_{2}) rectifiable set is not (m1+m2,m1+m2)(\mathcal{H}^{m_{1}+m_{2}},m_{1}+m_{2})-rectifiable—see [6, 3.2.24]—the carriers behave better.

Lemma 2 (See [7, Lem. 27] and [13, Lem. 6])

If SiS_{i} is a carrier of an mim_{i}-rectifiable measure νi\nu_{i} (for i=1,2i=1,2), then S1×S2S_{1}\times S_{2} is a carrier of ν1ν2\nu_{1}\otimes\nu_{2}, of Hausdorff dimension m1+m2m_{1}+m_{2}. Additionally, the Hausdorff measure m1+m2|S1×S2\mathcal{H}^{m_{1}+m_{2}}|_{S_{1}\times S_{2}} equals m1|S1m2|S2\mathcal{H}^{m_{1}}|_{S_{1}}\otimes\mathcal{H}^{m_{2}}|_{S^{2}}.

Let ρ\rho be an mm-rectifiable measure, with carrier EE (we drop the * hereon). It holds that ρm|E\rho\ll\mathcal{H}^{m}|_{E} and m|E\mathcal{H}^{m}|_{E} is σ\sigma-finite. If moreover Hm|E(ρ)<H_{\mathcal{H}^{m}|_{E}}(\rho)<\infty, Proposition 1 gives estimates for (m|E)n(𝒲δ(n)(ρ;m|E))(\mathcal{H}^{m}|_{E})^{\otimes n}(\mathcal{W}^{(n)}_{\delta}(\rho;\mathcal{H}^{m}|_{E})). Lemma 2 tells us that EnE^{n} is mnmn-rectifiable and that (m|E)n=mn|En(\mathcal{H}^{m}|_{E})^{\otimes n}=\mathcal{H}^{mn}|_{E^{n}}, which is desirable because the Hausdorff dimension of EnE^{n} is mnmn and mn\mathcal{H}^{mn} is the only nontrivial measure on it as well as on W(n)W^{(n)}, which as a subset of EnE^{n} is mnmn rectifiable too.

To apply the AEP we need to compute Hm|E(ρ)H_{\mathcal{H}^{m}|_{E}}(\rho). In some cases, one can use the area formula (Proposition 2) to “change variables” and express Hm|E()H_{\mathcal{H}^{m}|_{E}}(\cdot) in terms of the usual differential entropy. For instance, suppose AA is a bounded Borel subset of k\mathbb{R}^{k} of nontrival k\mathcal{L}^{k}-measure and ff is an injective Lipschitz function on AA. The set f(A)f(A) is kk-rectifiable. Moreover, if ρ\rho is a probability measure such that ρk|A\rho\ll\mathcal{L}^{k}|_{A} with density rr, then the area formula applied to u=r/Jkfu=r/J_{k}f and E=f1(B)E=f^{-1}(B), for some Borel subset BB of d\mathbb{R}^{d}, shows that fρk|f(A)f_{*}\rho\ll\mathcal{H}^{k}|_{f(A)} with density (r/Jkf)f1(r/J_{k}f)\circ f^{-1}, which is well-defined (k|f(A))(\mathcal{H}^{k}|_{f(A)})-almost surely. A simple computation yields

Hk|f(A)(fρ)=Hk(ρ)+𝔼ρ(lnJkf).H_{\mathcal{H}^{k}|_{f(A)}}(f_{*}\rho)=H_{\mathcal{L}^{k}}(\rho)+\mathbb{E}_{\rho}\left(\ln J_{k}f\right). (12)

There is a more general formula of this kind when AA is a rectifiable subset of d\mathbb{R}^{d}.

Finally, we deduce the chain rule for the entropy of rectifiable measures as a consequence of our general theorem for disintegrations (Remark 1). Let EE be a countably (m,m)(\mathcal{H}^{m},m)-rectifiable subset of k\mathbb{R}^{k}, f:kdf:\mathbb{R}^{k}\to\mathbb{R}^{d} a Borel function (with dmd\leq m), and ρ\rho a probability measure such that ρm|E\rho\ll\mathcal{H}^{m}|_{E}. Because m|E\mathcal{H}^{m}|_{E} has an (f,d)(f,\mathcal{L}^{d})-disintegration {F1md|Ef1(t)}tk\{F^{-1}\mathcal{H}^{m-d}|_{E\cap f^{-1}(t)}\}_{t\in\mathbb{R}^{k}}, with F=CddEfF=C_{d}d^{E}f, then fρdf_{*}\rho\ll\mathcal{L}^{d}, and

Hm|E(ρ)=Hd(fρ)+kHF1md|Ef1(t))(ρt)dfρ.H_{\mathcal{H}^{m}|_{E}}(\rho)=H_{\mathcal{L}^{d}}(f_{*}\rho)+\int_{\mathbb{R}^{k}}H_{F^{-1}\mathcal{H}^{m-d}|_{E\cap f^{-1}(t))}}(\rho_{t})\mbox{$\,\mathrm{d}$}f_{*}\rho. (13)

The probabilities ρt\rho_{t} are described in Remark 1. If one insists in only using the Hausdorff measures as reference measures, one must rewrite the integrand in (13) using the chain rule for the Radon-Nikodym derivative:

HF1md|Ef1(t)(ρt)\displaystyle H_{F^{-1}\mathcal{H}^{m-d}|_{E\cap f^{-1}(t)}}(\rho_{t}) =𝔼ρt(lndρtdmd|Ef1(t)dmd|Ef1(t)dF1md|Ef1(t))\displaystyle=\mathbb{E}_{\rho_{t}}\left(-\ln\frac{\mathrm{d}\rho_{t}}{\mathrm{d}\mathcal{H}^{m-d}|_{E\cap f^{-1}(t)}}\frac{\mathrm{d}\mathcal{H}^{m-d}|_{E\cap f^{-1}(t)}}{\mathrm{d}F^{-1}\mathcal{H}^{m-d}|_{E\cap f^{-1}(t)}}\right)
=𝔼ρt(lndρtdmd|Ef1(t))𝔼ρt(lnF).\displaystyle=\mathbb{E}_{\rho_{t}}\left(-\ln\frac{\mathrm{d}\rho_{t}}{\mathrm{d}\mathcal{H}^{m-d}|_{E\cap f^{-1}(t)}}\right)-\mathbb{E}_{\rho_{t}}\left(\ln F\right).

One recovers in this way the formula (50) in [7].

4 Stratified measures

Definition 3 (kk-stratified measure)

A measure ν\nu on (d,(d))(\mathbb{R}^{d},\mathcal{B}(\mathbb{R}^{d})) is kk-stratified, for kk\in\mathbb{N}^{*}, if there are integers (mi)i=1k(m_{i})_{i=1}^{k} such that 0m1<m2<<mkd0\leq m_{1}<m_{2}<...<m_{k}\leq d and ν\nu can be expressed as a sum i=1kνi\sum_{i=1}^{k}\nu_{i}, where each νi\nu_{i} is a nonzero mim_{i}-rectifiable measure.

Thus 11-stratified measures are rectifiable measures. If ν\nu is kk-stratified for some kk we simply say that ν\nu is a stratified measure.

A fundamental nontrival example to bear in mind is a discrete-continuous mixture, which corresponds to k=2k=2, E1E_{1} countable, and E2=dE_{2}=\mathbb{R}^{d}. More generally, a stratified measure has a Lebesgue decomposition with a singular continuous part provided some mim_{i} is strictly between 0 and dd.

Let ρ\rho be a probability measure that is stratified in the sense above. We can always put it in the standard form ρ=i=1kqiρi\rho=\sum_{i=1}^{k}q_{i}\rho_{i}, where each ρi\rho_{i} is a rectifiable probability measure with carrier EiE_{i} of dimension mim_{i} (so that ρi=ρi|Ei\rho_{i}=\rho_{i}|_{E_{i}}), the carriers (Ei)i=1k(E_{i})_{i=1}^{k} are disjoint, 0m1<<mkd0\leq m_{1}<\cdots<m_{k}\leq d, and (q1,,qk)(q_{1},...,q_{k}) is a probability vector with strictly positive entries. The carriers can be taken to be disjoint because if EE has Hausdorff dimension mm, then k(E)=0\mathcal{H}^{k}(E)=0 for k>mk>m, hence one can prove [13, Sec. IV-B] that Ei(j=1i1Ej)E_{i}\setminus(\bigcup_{j=1}^{i-1}E_{j}) is a carrier for νi\nu_{i}, for i=2,,ki=2,...,k.

We can regard ρ\rho as the law of a random variable XX valued in EX:=i=1kEiE_{X}:=\bigcup_{i=1}^{k}E_{i} and the vector (q1,,qk)(q_{1},...,q_{k}) as the law πρ\pi_{*}\rho of the discrete random variable YY induced by the projection π\pi from EXE_{X} to EY:={1,,k}E_{Y}:=\{1,...,k\} that maps xEix\in E_{i} to ii. We denote by DD the random variable dimHEY\dim_{H}E_{Y}, with expectation 𝔼(D)=i=1kmiqi\mathbb{E}(D)=\sum_{i=1}^{k}m_{i}q_{i}.

The measure ρ\rho is absolutely continuous with respect to μ=i=1kμi\mu=\sum_{i=1}^{k}\mu_{i}, where μi=mi|Ei\mu_{i}=\mathcal{H}^{m_{i}}|_{E_{i}}, so it makes sense to consider the entropy Hμ(ρ)H_{\mu}(\rho); it has a concrete probabilistic meaning in the sense of Proposition 1. Moreover, one can prove that dρdμ=i=1mqidρidμi𝟙Ei\frac{\mathrm{d}\rho}{\mathrm{d}\mu}=\sum_{i=1}^{m}q_{i}\frac{\mathrm{d}\rho_{i}}{\mathrm{d}\mu_{i}}\mathds{1}_{E_{i}} [13, Lem. 3] and therefore

Hμ(ρ)=H(q1,,qn)+i=1kqiHμi(ρi)H_{\mu}(\rho)=H(q_{1},...,q_{n})+\sum_{i=1}^{k}q_{i}H_{\mu_{i}}(\rho_{i}) (14)

holds [13, Lem. 4]. This formula also follows form the chain rule for general disintegrations (Remark 13), because {ρi}iEY\{\rho_{i}\}_{i\in E_{Y}} is a (π,πρ)(\pi,\pi_{*}\rho)-disintegration of ρ\rho.

The powers of ρ\rho are also stratified. In fact, remark that

ρn=𝐲=(y1,,yn)EYnq1N(1;𝐲)qkN(k;𝐲)ρy1ρyn\rho^{\otimes n}=\sum_{\mathbf{y}=(y_{1},...,y_{n})\in E_{Y}^{n}}q_{1}^{N(1;\mathbf{y})}\cdots q_{k}^{N(k;\mathbf{y})}\rho_{y_{1}}\otimes\cdots\otimes\rho_{y_{n}} (15)

where N(a;𝐲)N(a;\mathbf{y}) counts the appearances of the symbol aEYa\in E_{Y} in the word 𝐲\mathbf{y}. Each measure ρ𝐲:=ρy1ρyn\rho_{\mathbf{y}}:=\rho_{y_{1}}\otimes\cdots\otimes\rho_{y_{n}} is absolutely continuous with respect to μ𝐲:=μy1μyn\mu_{\mathbf{y}}:=\mu_{y_{1}}\otimes\cdots\otimes\mu_{y_{n}}. It follows from Lemma 2 that for any 𝐲EYn\mathbf{y}\in E_{Y}^{n}, the stratum Σ𝐲:=Ey1××Eyn\Sigma_{\mathbf{y}}:=E_{y_{1}}\times\cdots\times E_{y_{n}} is also a carrier, of dimension m(𝐲):=j=1ndimHEyjm(\mathbf{y}):=\sum_{j=1}^{n}\dim_{H}E_{y_{j}}, and the product measure μ𝐲\mu_{\mathbf{y}} equals m(𝐲)|Σ𝐲\mathcal{H}^{m(\mathbf{y})}|_{\Sigma_{\mathbf{y}}}. Therefore each measure ρ𝐲\rho_{\mathbf{y}} is rectifiable. We can group together the ρ𝐲\rho_{\mathbf{y}} of the same dimension to put ρn\rho^{\otimes n} as in Definition 3.

By Proposition 1, one might approximate ρn\rho^{\otimes n} with an arbitrary level of accuracy by its restriction to the weakly typical realizations of ρ\rho, provided nn is big enough. In order to get additional control on the dimensions appearing in this approximation, we restrict it further, retaining only the strata that correspond to strongly typical realizations of the random variable YY.

Let us denote by QQ the probability mass function (p.m.f) of πρ\pi_{*}\rho. Recall that 𝐲EYn\mathbf{y}\in E_{Y}^{n} induces a probability law τ𝐲\tau_{\mathbf{y}} on EYE_{Y}, known as empirical distribution, given by τ𝐲({a})=N(a;𝐲)/n.\tau_{\mathbf{y}}(\{a\})=N(a;\mathbf{y})/n. Csiszár and Körner [4, Ch. 2] define 𝐲EYn\mathbf{y}\in E_{Y}^{n} to be strongly (Q,η)(Q,\eta)-typical if τ𝐲\tau_{\mathbf{y}}, with p.m.f. PP, is such that τ𝐲πρ\tau_{\mathbf{y}}\ll\pi_{*}\rho and, for all aEYa\in E_{Y}, |P(a)Q(a)|<η|P(a)-Q(a)|<\eta. We denote by Aδn(n)A_{\delta^{\prime}_{n}}^{(n)} the set of these sequences when ηn=n1/2+ξ\eta_{n}=n^{-1/2+\xi}. In virtue of the union bound and Hoeffding’s inequality, (πρ)n(Aδn(n))1εn(\pi_{*}\rho)^{\otimes n}(A_{\delta^{\prime}_{n}}^{(n)})\geq 1-\varepsilon_{n}, where εn=2|EY|e2nηn2\varepsilon_{n}=2|E_{Y}|\mathrm{e}^{-2n\eta_{n}^{2}}; the choice of ηn\eta_{n} ensures that εn0\varepsilon_{n}\to 0 as nn\to\infty. Moreover, the continuity of the discrete entropy in the total-variation distance implies that Aδn(n)A^{(n)}_{\delta^{\prime}_{n}} is a subset of Wδn(n)W^{(n)}_{\delta^{\prime}_{n}} with δn=|EY|ηnlnηn\delta_{n}^{\prime}=-|E_{Y}|\eta_{n}\ln\eta_{n}, which explains our notation. See [13, Sec. III-D]

We introduce the set Tδ,δn(n)=(π×n)1(Aδn(n))Wδ(n)(ρ)T^{(n)}_{\delta,\delta^{\prime}_{n}}=(\pi^{\times n})^{-1}(A_{\delta^{\prime}_{n}}^{(n)})\cap W_{\delta}^{(n)}(\rho) of doubly typical sequences in EXnE_{X}^{n}, and call Tδ,δn(n)(𝐲)=Tδ,δn(n)(π×n)1(𝐲)T^{(n)}_{\delta,\delta^{\prime}_{n}}(\mathbf{y})=T^{(n)}_{\delta,\delta^{\prime}_{n}}\cap(\pi^{\times n})^{-1}(\mathbf{y}) a doubly typical stratum for any 𝐲Aδn(n)\mathbf{y}\in A_{\delta^{\prime}_{n}}^{(n)}.

The main result of [13] is a refined version of the AEP for stratified measures that gives an interpretation for the conditional term in the chain rule (14).

Theorem 4.1

(Setting introduced above) For any ε>0\varepsilon>0 there exists an n0n_{0}\in\mathbb{N} such that for any nn0n\geq n_{0} the restriction of ρ\rho to Tδ,δn(n)T^{(n)}_{\delta,\delta^{\prime}_{n}}, ρ(n)=𝐲Aδ(n)ρn|Tδ,δ(n)(𝐲),\rho^{(n)}=\sum_{\mathbf{y}\in A_{\delta^{\prime}}^{(n)}}\rho^{\otimes n}|_{T^{(n)}_{\delta,\delta^{\prime}}(\mathbf{y})}, satisfies dTV(ρn,ρ(n))<εd_{TV}(\rho^{\otimes n},\rho^{(n)})<\varepsilon. Moreover, the measure ρ(n)\rho^{(n)} equals a sum of mm-rectifiable measures for m[n𝔼(D)n1/2+ξ,n𝔼(D)+n1/2+ξ]m\in[n\mathbb{E}(D)-n^{1/2+\xi},n\mathbb{E}(D)+n^{1/2+\xi}]. The conditional entropy H(X|Y):=i=1kqiHμi(ρi)H(X|Y):=\sum_{i=1}^{k}q_{i}H_{\mu_{i}}(\rho_{i}) quantifies the volume growth of most doubly typical fibers in the following sense:

  1. 1.

    For any 𝐲Aδ(n)\mathbf{y}\in A_{\delta^{\prime}}^{(n)}, one has n1lnm(𝐲)(Tδ,δ(n)(𝐲))H(X|Y)+(δ+δn).n^{-1}\ln\mathcal{H}^{m(\mathbf{y})}(T^{(n)}_{\delta,\delta^{\prime}}(\mathbf{y}))\leq H(X|Y)+(\delta+\delta^{\prime}_{n}).

  2. 2.

    For any ε>0\varepsilon>0, the set Bε(n)B_{\varepsilon}^{(n)} of 𝐲Aδn(n)\mathbf{y}\in\subset A_{\delta^{\prime}_{n}}^{(n)} such that

    1nlnm(y)(Tδ,δ(n)(𝐲))>H(X|Y)ε+(δ+δn),\frac{1}{n}\ln\mathcal{H}^{m(y)}(T^{(n)}_{\delta,\delta^{\prime}}(\mathbf{y}))>H(X|Y)-\varepsilon+(\delta+\delta^{\prime}_{n}),

    satisfies

    lim sup(δ,δn)0lim supn1nln|Bε(n)|=H(Y)=lim sup(δ,δn)0lim supn1nln|Aδn(n)|.\limsup_{||(\delta,\delta^{\prime}_{n})||\to 0}\limsup_{n\to\infty}\frac{1}{n}\ln|B^{(n)}_{\varepsilon}|=H(Y)=\limsup_{||(\delta,\delta^{\prime}_{n})||\to 0}\limsup_{n\to\infty}\frac{1}{n}\ln|A^{(n)}_{\delta_{n}^{\prime}}|.

This gives a geometric interpretation to the possibly noninteger dimension 𝔼(D)=i=1kqimi\mathbb{E}(D)=\sum_{i=1}^{k}q_{i}m_{i}, which under suitable hypotheses is the information dimension of ρ\rho [13, Sec. V], thus answering an old question posed by Renyi in [10, p. 209].

References

  • [1] G. Alberti, H. Bölcskei, C. De Lellis, G. Koliander, and E. Riegler. Lossless analog compression. IEEE Transactions on Information Theory, 65(11):7480–7513, 2019.
  • [2] L. Ambrosio, N. Fusco, and D. Pallara. Functions of Bounded Variation and Free Discontinuity Problems. Oxford Science Publications. Clarendon Press, 2000.
  • [3] T. M. Cover and J. A. Thomas. Elements of Information Theory. A Wiley-Interscience publication. Wiley, 2006.
  • [4] I. Csiszár and J. Körner. Information theory: Coding theorems for discrete memoryless systems. Probability and mathematical statistics. Academic Press, 1981.
  • [5] C. De Lellis. Rectifiable Sets, Densities and Tangent Measures. Zurich lectures in advanced mathematics. European Mathematical Society, 2008.
  • [6] H. Federer. Geometric Measure Theory. Classics in Mathematics. Springer Berlin Heidelberg, 1969.
  • [7] G. Koliander, G. Pichler, E. Riegler, and F. Hlawatsch. Entropy and source coding for integer-dimensional singular random variables. IEEE Transactions on Information Theory, 62(11):6124–6154, Nov. 2016.
  • [8] P. Mattila. Geometry of Sets and Measures in Euclidean Spaces: Fractals and Rectifiability. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 1995.
  • [9] F. Morgan. Geometric Measure Theory: A Beginner’s Guide. Elsevier Science, 2008.
  • [10] A. Rényi. On the dimension and entropy of probability distributions. Acta Mathematica Academiae Scientiarum Hungarica, 10(1):193–215, 1959.
  • [11] J. P. Vigneaux. Topology of Statistical Systems: A Cohomological Approach to Information Theory. PhD thesis, Université de Paris, 2019.
  • [12] J. P. Vigneaux. Entropy under disintegrations. In F. ”Nielsen and F. Barbaresco, editors, GSI 2021: Geometric Science of Information, volume 12829 of Lecture Notes in Computer Science, pages 340–349. Springer, 2021.
  • [13] J. P. Vigneaux. Typicality for stratified measures. arXiv preprint arXiv:2212.10809, 2022.