Functional strong law of large numbers for Betti numbers in the tail
Abstract.
The objective of this paper is to investigate the layered structure of topological complexity in the tail of a probability distribution. We establish the functional strong law of large numbers for Betti numbers, a basic quantifier of algebraic topology, of a geometric complex outside an open ball of radius , such that as the sample size increases. The nature of the obtained law of large numbers is determined by the decay rate of a probability density. It especially depends on whether the tail of a density decays at a regularly varying rate or an exponentially decaying rate. The nature of the limit theorem depends also on how rapidly diverges. In particular, if diverges sufficiently slowly, the limiting function in the law of large numbers is crucially affected by the emergence of arbitrarily large connected components supporting topological cycles in the limit.
Key words and phrases:
Functional strong law of large numbers, extreme value theory, Betti number, topological crackle.2010 Mathematics Subject Classification:
Primary 60G70, 60F15. Secondary 55U10, 60D05, 60F17.1. Introduction
The main theme of this paper is a phenomenon called topological crackle, which refers to a layered structure of homological elements of different orders. In a practical setting, topological crackle appears in a manifold learning problem as in Figure 1. Our aim in Figure 1 is to recover the topology of the circle from the union of balls centered around uniformly distributed points. In Figure 1 (c), Gaussian noise is added to the uniform random sample. Then, the union is similar in shape to the circle and the recovery of its topology is possible. However, if the noise has a “heavy tailed” Cauchy distribution as in Figure 1 (d), the extraneous homological elements (i.e., two distinct components and a one-dimensional cycle) away from the center of , will render the recovery of its topology difficult (or even impossible). This example indicates that a small number of “topologically essential” components and cycles may drastically change topology at a global level.

The current work is partially motivated by extreme value theory (EVT), in light of the fact that topological crackle is associated with a “rare event” occurring in the tail of probability distributions. Beyond the standard literature on EVT (see, e.g., [19, 8, 7]), there have been many attempts so far to understand the geometric and topological features of extreme sample clouds [4, 21, 6, 1, 16]. This paper contributes to the existing literature by examining crackle phenomena in a higher dimensional setting. To make our implication more transparent, we consider the power-law density
(1.1) |
for some and a normalizing constant . Let Ann be an annulus with inner radius and outer radius . We then divide into the layers of annuli at different radii,
(1.2) |
where
(1.3) |
Figure 2 visualizes the annuli (1.2) as a layer of Betti numbers of all feasible dimensions. The Betti number is a quantifier of topological complexity in view of homological elements such as components and cycles in Figure 1 (d). For every , the th Betti number represents the number of -dimensional cycles (henceforth we call it a -cycle) as a boundary of a -dimensional body. In Section 2, we provide a more rigorous description of Betti numbers under a more precise setup. Suppose denotes independently and identically distributed () random points in , , drawn from (1.1). Let be an open ball of radius around ( is the Euclidean norm), and
be the union of open balls around the points in . Given a growing sequence and a positive integer , the th Betti number in Figure 2 is defined as
(1.4) |
Then, counts the number of -cycles in the outside of an expanding ball . Note also that (1.4) is seen as a stochastic process of right continuous sample paths with left limits.

After the pioneering paper of [1], the layered structure in Figure 2 has been intensively studied via the behavior of various topological invariants [16, 14, 15, 17, 23]. In particular, from the viewpoints of (1.4), we have in an asymptotic sense,
-
•
Outside Ann there are finitely many -cycles (accordingly, the first order Betti number is approximated by a Poisson distribution), but none of the cycles of dimensions .
-
•
Inside Ann there are infinitely many -cycles and finitely many -cycles, but none of the cycles of dimensions .
In general, for every ,
-
•
Inside Ann there are infinitely many cycles of dimensions and finitely many -cycles, but none of the cycles of dimensions ,
and finally,
-
•
Inside Ann there are infinitely many cycles of all dimensions .
In the literature (e.g., [14]), the innermost ball is called a weak core. A weak core is a centered ball, in which random points are highly densely distributed and the homology of the union of balls becomes nearly trivial as , i.e., almost all cycles of any dimension inside of a weak core will be asymptotically filled in. See Section 2 for a formal definition of a weak core.
As expected from Figure 2, it is not surprising that the stochastic features of (1.4) drastically vary, depending on how rapidly diverges. Among many of the related studies [16, 14, 15, 17, 23], the authors of [16] explored the case , so that there appear at most finitely many -cycles outside of as . Consequently, (1.4) converges weakly to a Poisson distribution. If diverges more slowly, such that as , then, infinitely many -cycles would appear in Ann. In this case, (1.4) obeys a functional central limit theorem (CLT) (see [15]). However, the spatial distribution of -cycles is still sparse. As a result, all the -cycles contributing to the limiting Gaussian process will be of the minimal size (i.e., they all consist of points). If diverges even more slowly, so that , then becomes highly connected in the area close to a boundary of a weak core. Although (1.4) still follows a CLT under an appropriate normalization, the components supporting the limiting -cycles will become arbitrarily large [15]. In other words, the limiting -cycles can be supported not only on points, but also on points for all .
The main objective of this paper is to establish the functional strong law of large numbers (SLLN) for (1.4) in the space of right continuous functions defined on with left limits. This study is relevant only when the behavior of (1.4) is governed by a CLT. In this sense, the current study is viewed as a natural continuation of the work in [15]. We will see that the nature of the functional SLLN for (1.4), including the scaling constants and the properties of the limiting functions, differs according to the growth rate of . In Section 3.1, we demonstrate two distinct functional SLLNs, depending on whether or , when the density has a regularly varying tail as in (1.1).
In the literature, the previous studies [10, 24, 22, 11] also proved the SLLNs for topological invariants such as Betti numbers and the Euler characteristic. However, all of these studies treated topology of an entire space . Then, the topological invariants are crucially impacted by densely scattered random points closer to the origin, especially those inside of a weak core. As a consequence, the resulting SLLN is robust to the choice of the density , and the topological invariants grow proportionally to the sample size. In the context of topological crackle, however, the limit theorems for topological invariants heavily depend on the decay rate of a probability density. For instance, according to [1, 16, 17], the layered structure in Figure 2 appears only when the distribution for has a tail at least as heavy as that of an exponential distribution. Therefore, if is drawn from a Gaussian distribution, for each , in (1.4) simply vanishes as .
From this point of view, the other main thrust of this paper, which was never studied in [15], is to explore how crackle occurs when the density of has a (sub)exponential tail. One of the simplest cases of such densities is
(1.5) |
for some and a normalizing constant . Then, instead of (1.2) we obtain the “logarithmic scale” annuli structure
where
In Section 3.2, we establish the functional SLLN for (1.4) in the space when the density has an exponentially decaying tail as in (1.5). The asymptotics of (1.4) once again depends on the growth rate of : a phase transition occurs between the cases and .
From an analytic viewpoint, the main challenge of the current study is that the scaling constants for the Betti number may grow logarithmically, whereas in the previous studies [10, 24, 22, 11], the scaling constant was equal to the sample size up to multiplicative constants. More concretely, if the density has an (sub)exponential tail as in (1.5), the radii of annuli are all logarithmic. Accordingly, the scaler for the Betti number will be logarithmic as well; see Example 3.5 for more detailed analyses. In the case of a power-law density, the radii of annuli are polynomial as shown in (1.3). Nevertheless, the scaler for the Betti numbers can still be logarithmic, especially when and have the same regular variation exponent (i.e., ), and the difference between and is at most logarithmic. Example 3.3 below provides more detailed analyses on this point. If the scaler is logarithmic, a direct application of the Borel-Cantelli lemma, together with the lower-order moment calculations, does not help to establish the required SLLNs. To overcome this difficulty, we shall utilize the concentration inequalities in [2], which themselves were developed for analyzing Poisson -statistics of the geometric configuration of a point cloud, such as subgraph counts of a random geometric graph. For the application of these concentration bounds, one needs to detect appropriate subsequential upper and lower bounds for various quantities that are used to approximate Betti numbers. The latter approach is a standard technique for the theory of geometric graphs; see, e.g., Chapter 3 of the monograph [18].
This paper is organized as follows. In Section 2, we provide a precise setup of the Betti number in (1.4) via the notion of a Čech complex. In Section 3, we develop the functional SLLNs in two distinct scenarios: one where the distribution has a regularly varying tail and the other where the distribution has an exponentially decaying tail. For each of the distributional contexts, the behavior of (1.4) splits into two additional distinct cases, according to how rapidly diverges. All the proofs are deferred to Section 4 and the Appendix.
As a final remark, our proposed methods are applicable to other geometric complexes beyond a Čech complex. In particular, for a Vietoris-Rips complex, all the results can be carried over by a simple modification of the scaler of the SLLN.
2. Setup
We begin with introducing fundamental concepts towards proving the main functional SLLNs. Let be a set of -valued random variables with probability density . For the rigorous description of the Betti number in (1.4), we need to introduce a higher-dimensional notion of graphs, called the geometric complex. Among many varieties of geometric complexes (see [9]), we especially focus on the Čech complex.
Definition 2.1.
Given a set of points in and a positive number , the Čech complex is defined as follows.
-
(1)
The -simplices are the points in .
-
(2)
For each , forms an -simplex if .
The main advantage of the Čech complex is its homotopy equivalence to the union of balls . This fundamental result is known as the Nerve lemma (e.g., Theorem 10.7 in [5]).
The objective of this paper is to study the “extreme-value behavior” of the Čech complex generated by points far away from the origin. More concretely, for any sequence of positive numbers with , and a positive number , we consider a Čech complex built over random points in lying outside of . Then, the collection of Čech complexes
(2.1) |
induces a filtration in parameter ; that is, the Čech complexes under consideration are non-decreasing,
for all . Given a positive integer that remains fixed henceforth, the th Betti number of (2.1) is denoted as
(2.2) |
where the second equality is a consequence of the Nerve lemma. Then, (2.2) can be viewed as a non-negative integer-valued stochastic process in parameter that has right continuous sample paths with left limits.
To derive the required functional SLLN, we need a more explicit expression of . For this purpose, we express in the same way as the previous studies in [12, 15]. Define for , , , and ,
(2.3) |
and
(2.4) |
with
For any subset consisting of finitely many -dimensional real vectors, we define
and also,
(2.5) |
and
(2.6) |
Then, we can express as
(2.7) |
Before concluding this section, we shall formally define the notion of a weak core.
Definition 2.2.
Let be a spherically symmetric probability density in . A weak core is a centered ball such that as , for every (equivalently, some) , where is the -dimensional unit sphere in
3. Functional strong law of large numbers
We develop two main theorems in this section. In the below, we denote by the collection of regularly varying functions or sequences (at infinity) of exponent .
3.1. Regularly varying tail case
First, we explore the case that the density is spherically symmetric and has a regularly varying tail (at infinity) of exponent with . Namely, we assume that
(3.1) |
for every and . With spherical symmetry of , one can denote , for any fixed . Then, (3.1) can be written as .
From the literature of topological crackle [1, 16, 15], it is known that
where is a finite and positive constant and
Since the main theme of this paper is a (functional) SLLN for , we treat only the case . Under this assumption, Theorem 3.1 below splits the behavior of into two distinct regimes:
Clearly, in case diverges more rapidly than that in case . In case , the Čech complex outside of is so sparse that all the -cycles remaining in the limit of the SLLN will be the simplest one formed by vertices. In contrast, if is determined by condition , the ball agrees with the weak core (see Definition 2.2) up to a proportionality constant. Then, the random points are highly connected to one another in the area sufficiently close to the weak core, and the limiting function in the SLLN will be much more complicated, because of many connected components formed on vertices for any . In the special case when is given by (1.1), condition together with is equivalent to where and are defined in (1.3). Furthermore, condition implies that is equal to up to multiplicative factors.
Before stating the main theorem, for and we define
(3.2) |
where is surface area of the -dimensional unit sphere and , and
is the union of open balls of radius around the points in . Moreover, vol is the volume of a subset , and is the volume of the unit ball in . In the special case , and , we can simplify (3.2) as
In the below, for two sequences and , we write if there exists a positive constant such that for all .
Theorem 3.1.
Suppose that is a regularly varying sequence of positive exponent, such that as and
(3.3) |
Then, we have, as ,
Suppose that as , for some . Then, as ,
(3.4) |
where
Remark 3.2.
In Theorem 3.1 above, there is a technical assumption that . As shown in Section 4.2, however, the SLLN holds for every , in the case of the truncated Betti number
(3.5) |
In fact, the upper bound condition for is applied only when we show that the difference between (2.7) and (3.5) vanishes a.s. when they are scaled by .
Example 3.3.
We consider the power-law density
for some and a normalizing constant . In this example, we consider three distinct sequences of growing radii,
Note that in cases and satisfies as , where and are defined in (1.3). In case , is equal to up to multiplicative factors.
Since in case grows fastest, the occurrence of -cycles outside is the least likely of the three regimes. In particular, and are “close” in size, in the sense that both the sequences have the same regular variation exponent. Then, grows only logarithmically. In fact, one can readily check that
and Theorem 3.1 yields that
(3.6) |
If diverges more slowly as in case , then, far more -cycles would occur in Ann, and begins to grow polynomially. More concretely, we have
and Theorem 3.1 gives that
(3.7) |
3.2. Exponentially decaying tail case
The aim of this section is to explore the limiting behavior of (2.7) when has an exponentially decaying tail. As a generalization of (1.5), we consider the density
(3.9) |
where is a normalizing constant and is a regularly varying function (at infinity) with index . Because of the spherical symmetry of (3.9), we can define , for any fixed .
Before continuing, we need to put additional assumptions on . First, it is assumed to be twice differentiable such that for all , and is eventually non-increasing; that is, there is a so that is non-increasing for all . Let , . It then follows from Proposition 2.5 in [20] that . In [16] it was shown that topological crackle occurs if and only if
(3.10) |
From this viewpoint, the case is irrelevant for the current study. Indeed, if , it holds that as and topological crackle does not occur. Moreover, if , then (3.10) is automatic with . We thus need to assume (3.10) only when .
Under this setup, it was shown in the literature [1, 16, 17] that
where is a finite and positive constant and
By the same reasoning as in Section 3.1, we focus only on the case as . As before, the asymptotics of is divided into two different regimes:
For the analyses of case , we shall restrict to a slightly narrower class,
(3.11) |
where , , is the (left continuous) inverse of . Note that condition requires that a constant in (3.11) must be positive. Moreover, if one takes , then no longer diverges as ; hence, we need to restrict the range of as in (3.11).
Finally, for the description of the limiting function in the SLLN, we need the following functions: for and ,
(3.12) | ||||
where and is the Euclidean inner product. Further, is the Jacobian
Theorem 3.4.
Suppose that for some . Then, as ,
Suppose that as , for some . If , we restrict the range of to . Then, as ,
where
Example 3.5.
We consider the density
for some and a normalizing constant . Then, and for . We first set
for some . It is then straightforward to calculate that
where . Thus, Theorem 3.4 shows that, as ,
(3.13) |
4. Proofs
4.1. Preliminaries
Before commencing the proof, we need to introduce additional functions and objects pertaining to the indicator (2.3). For , , and a connected simplicial complex that has vertices,
where denotes isomorphism between simplicial complexes. With this notation we can interpret as
(4.1) |
where the sum is taken over all connected simplicial complexes on vertices. From (4.1), can be decomposed as
where
(4.2) | ||||
(4.3) |
In the above expressions, the inner sums in (4.2) and (4.3) are taken over all connected complexes built over points that contain as its proper subcomplex. Since there are at most finitely many isomorphism classes of such complexes, these sums consist of at most a finite number of terms. Consequently, for any and , (4.2) and (4.3) are bounded functions. Notice further that (4.2) and (4.3) are non-decreasing functions in . Namely, for all and ,
(4.4) |
In the special case and , it is easy to check that for all ,
and
In addition to the monotonicity (4.4), the following properties of are important for our analyses.
-
•
are shift-invariant, i.e.,
(4.5) -
•
are locally determined, i.e., there exists (depending only on ) such that
(4.6) where diam.
Finally, we define various functions analogous to those defined at (2.4), (2.5), (2.6), (3.2), (3.12), respectively.
(4.7) | ||||
(4.8) | ||||
(4.9) | ||||
(4.10) | ||||
One of the main ideas for proving the SLLN for (2.7) is to detect appropriate subsequential upper and lower bounds for the quantities that are used to approximate the Betti number (2.7). After detecting such bounds, we apply the Borel-Cantelli lemma to the obtained bounds. This is a standard approach for proving the SLLN for graph related statistics, such as subgraph and component counts, in random geometric graphs; see, e.g., Theorems 3.18 and 3.19 in [18].
4.2. Proof of Theorem 3.1
For ease of the description, we first prove Part and then proceed to the proof of Part . The discussion for Part is more delicate, requiring to make use of the concentration bounds in [2], which is stated in Proposition 5.3 below. All the technical results necessary for our proof are provided in Sections 5.1 and 5.2 of the Appendix.
Proof of Theorem 3.1 .
Note first that (3.4) is implied by
We introduce the truncated version of (2.7); that is, for every ,
(4.16) |
Analogously, we also define by the same truncation as above:
Then, it is easy to see that, for every ,
From this it is sufficient to show that
(4.17) | |||
(4.18) | |||
(4.19) |
Of the three requirements above, we first deal with (4.17). Since is finite, it is enough to demonstrate that, for every and ,
Clearly, this convergence is obtained by
(4.20) | |||
where and are defined in (4.8) and (4.9) respectively. We prove only the former convergence. One can actually handle the latter in the same way. For this purpose, we extend (4.7), (4.8), and (4.9) by adding an extra time parameter. Namely, for , a finite subset of points in , , and ,
(4.21) | ||||
(4.22) |
Clearly, we have that , , and . It immediately follows from (4.4) that (4.21) and (4.22) are non-decreasing in and non-increasing in . Moreover, (4.22) is a continuous function in . Hence, according to Lemma 5.1 in the Appendix, (4.20) follows from the pointwise SLLN,
(4.23) |
for every .
To show (4.23), note that for every , there exists a unique such that , where is defined in (4.11). Using and in (4.12) as well as , we define
(4.24) |
(4.25) |
Since and for all , one can immediately derive that
for all . It follows from Lemma 5.5 that
and
From these bounds, we now need to show that as ,
(4.26) | |||
(4.27) |
For (4.26), Chebyshev’s inequality and Lemma 5.5 yield that, for every ,
From the assumption that as , one can readily check that ; therefore,
(4.28) |
and we thus conclude that . Now, the Borel-Cantelli lemma ensures (4.26). The proof of (4.27) is analogous by virtue of Lemma 5.5 . Now, (4.26) and (4.27) complete the proof of (4.23).
Next we turn to condition (4.18). Since the th Betti number of any simplicial complex on vertices is bounded above by the number of -simplices, which is further bounded by , we have that
(4.29) | ||||
By construction, is a non-decreasing function in . Thus,
and we now need to show that
By Lemma 5.5 , it is sufficient to demonstrate that for every ,
(4.30) |
For every , it follows from Lemma 5.5 and (4.28) that
Thus, the Borel-Cantelli lemma ensures (4.30). Finally, we assert that the argument similar to (or even easier than) that for (4.18) can establish (4.19). Now, the entire proof has been completed. ∎
Proof of Theorem 3.1 .
Our main idea is to justify that in (2.7) can be approximated by
(4.31) |
This implies that the Čech complexes are asymptotically distributed sparsely with many separate connected components, with each consisting of only points. Note that “” is the minimum required number of points for a non-trivial th Betti number of a Čech complex.
As in the proof of Theorem 3.1 , our goal is to show that
This clearly follows if one can establish that
(4.32) | |||
(4.33) |
For (4.32), we define
(4.34) |
and demonstrate that
(4.35) |
where is defined at (4.9). In the below, we show the result of the “+” case only. One can immediately show that
which in turn indicates that is a continuous and increasing function in . Moreover, is also a non-decreasing function in ; hence, by Lemma 5.1 , the first convergence in (4.35) is obtained from the pointwise SLLN,
for every . As before, our next task is to detect the subsequential upper and lower bounds for . Let be a sequence defined in (4.11). Recall that for every , there exists a unique such that . Next we define
(4.36) | ||||
(4.37) |
where and are given in (4.12). Furthermore, and denote Poisson processes in with intensity measures and , respectively. In other words, one can denote , where are i.i.d random points with density , and is Poisson distributed with mean , independently of .
Since , , and (see (4.13) for the definition of ), one can bound in a way that
(4.38) | ||||
(4.39) | |||
Now, by Lemmas 5.4 and 5.7 as well as (4.39),
(4.40) | ||||
As for the lower bound of , noting that , , and (see (4.13) for the definition of ), we obtain that
Repeating the same argument as in (4.40) and using Lemmas 5.4, 5.7, and 5.8,
(4.41) |
From (4.40) and (4.41), it now suffices to demonstrate that, for every ,
According to the Borel-Cantelli lemma, we need to show that
(4.42) | |||
(4.43) |
For the proof of (4.42), note that is a Poisson -statistics of order , satisfying (5.5) and (5.6), for which an underlying Poisson point process has a finite intensity measure . From this observation, one can appeal to Proposition 5.3 to get that
By Lemmas 5.4 and 5.7, and (3.3),
In conclusion,
so that as desired.
We proceed to the proof of (4.43). Applying the second concentration inequality in Proposition 5.3, it holds that
Because of Lemmas 5.4 and 5.7, as well as (3.3),
We now conclude that
so that . This completes the proof of (4.43), and thus, (4.32) follows.
The next step is to establish (4.33). By virtue of Lemma 5.2, we obtain that
(4.44) |
where
One can further bound the right hand side in (4.44) as follows:
Observing that , and is a non-decreasing function in ,
Since , we have, for every ,
We now define
(4.45) |
This is a Poisson -statistics of order , such that fulfills conditions (5.5) and (5.6). Then, Lemmas 5.4 and 5.8 guarantee that the proof of (4.33) will be complete if one can verify that
(4.46) |
By the Borel-Cantelli lemma, (4.46) is implied by
for every . Lemmas 5.4 and 5.7 yield that as ,
(4.47) | ||||
where the last equality comes from as . Now, there exists , such that for all ,
The last inequality above is a direct result of Proposition 5.3. It follows from (4.47) that
This, together with Lemma 5.4 and (3.3), gives that
Now we get that
and the Borel-Cantelli lemma concludes (4.46), as desired. ∎
4.3. Proof of Theorem 3.4
This section is divided into two parts. For ease of the description, the first part is devoted to proving Theorem 3.4 , while the second part treats Theorem 3.4 . Here we exploit the results in Sections 5.1 and 5.3. In particular, the concentration bounds in Proposition 5.3 will play a key role in the proof of Theorem 3.4 . As our proof is similar in nature to that of Theorem 3.1, we occasionally skip detailed arguments.
Proof of Theorem 3.4 .
We first truncate in the same way as (4.16) and define also by the same truncation. With the same reasoning as in the proof of Theorem 3.1 , the required functional SLLN can be obtained as a result of the following statements.
(4.48) | |||
(4.49) | |||
(4.50) |
For (4.48), it suffices to prove that for every and ,
which itself is implied by
(4.51) | |||
where and are defined in (4.8) and (4.10) respectively. As before, we focus only on the asymptotics of the “” part. For this purpose, we again extend above to in the same way as (4.21). Additionally, we also define
(4.52) | ||||
Owing to the monotonicity in (4.4), one can see that (4.52) is non-decreasing in and non-increasing in . Additionally, (4.52) is a continuous function on . Thus, by Lemma 5.1 , showing
(4.53) |
for every , will suffice for (4.51).
To show (4.53), we take a constant and define , as in (4.11). Recall that the range of is restricted to when , so one can always take such . For every , there exists a unique such that . Additionally, let , , , and be defined as in (4.12) and (4.14) respectively. Defining and as in (4.24) and (4.25), it is now easy to see that
for all . By Lemmas 5.9 and 5.10 ,
(4.54) | ||||
and similarly,
(4.55) |
For the application of the Borel-Cantelli lemma to the rightmost term in (4.54), we have, for every ,
where Lemma 5.10 is applied for the second inequality. Due to the constraint , one can choose , , so that
Observing that , we have for all . Moreover, it follows from (see Proposition 2.6 in [20]) and (5.38) that
Therefore,
so that
(4.56) |
Now, the Borel-Cantelli lemma verifies that
and hence,
Applying the similar argument to (4.55), we also get that
hence, the proof of (4.48) has been completed.
Our next task is to verify (4.49). Repeating the same analysis as in (4.29), while using the same notation , we obtain that
Since is a non-decreasing function in , we now need to show that
By Lemmas 5.9 and 5.10 , it is enough to prove that for every ,
To apply the Borel-Cantelli lemma, notice from Lemma 5.10 that for every ,
The series above is convergent as shown in (4.56), and thus, (4.49) follows as desired. Finally, as in the proof of Theorem 3.1 , we will skip the proof of (4.50). ∎
Proof of Theorem 3.4 .
By the same reasoning as in the proof of Theorem 3.1 , it is sufficient to demonstrate that
(4.57) | |||
(4.58) |
where is defined in (4.31). For (4.57), we decompose and as and (see (4.34) and (4.10) respectively). We discuss only the asymptotics of the “” part. We then recall that and are both non-decreasing in , and is a continuous function in . Now, because of Lemma 5.1 , what needs to be shown is
for every . Let be a sequence at (4.11). Then, for every , there exists a unique such that . Using and in (4.36) and (4.37), as well as the associated Poisson processes , , we shall derive the subsequential upper and lower bounds for . For the upper bound, as an analogue of (4.38) we obtain that
where and are defined in (4.12) and (4.15) respectively. Repeating the calculations similar to those in (4.39) and (4.40), while appealing to Lemmas 5.9, 5.12, and 5.13, we get that
Similarly, exploiting the same lemmas, we have that
Now, according to the Borel-Cantelli lemma, what need to be shown are the following: for every ,
(4.59) | |||
(4.60) |
Here, we claim that
(4.61) |
for some . If this is proven, one can establish (4.59) and (4.60) by repeating the same arguments as those for the proofs of (4.42) and (4.43), with the aid of Lemma 5.9, Lemma 5.12, and Proposition 5.3. To show (4.61), choose such that
Notice that
Since and , we see that is a regularly varying function (at infinity) of exponent . Therefore, for all ,
From this, we get that . Now, (4.61) has been obtained by setting , and the proof of (4.59) and (4.60) has been completed.
5. Appendix
In the Appendix, we provide a series of lemmas and propositions that will be used for the proofs of Theorems 3.1 and 3.4. As in the last section, denotes a generic and positive constant independent of .
5.1. Technical results commonly used for the proof of Theorems 3.1 and 3.4
The result below allows us to develop a functional SLLN from its pointwise version.
Lemma 5.1.
Let be a sequence of random elements of with non-decreasing sample paths. Suppose is a continuous and non-decreasing function. Suppose that
for every , then
where is endowed with the uniform topology.
Let be a sequence of random elements, such that for each , has right continuous sample paths with left limits in each of the coordinates. Assume further that for every , is non-decreasing in and non-increasing in . Suppose is a real-valued, continuous function on , which has the same monotonicity as in each of the coordinates. If we have that
(5.1) |
for every , then, as ,
where is equipped with the uniform topology.
Proof.
Part is proven in Proposition 4.2 of [22]. For Part , it is clear that is uniformly continuous on . Given , we can choose such that for all , ,
(5.2) |
Then, we see that
In the above, the second inequality follows from the monotonicity of and , and we have used (5.2) for the third inequality. By virtue of (5.1), the last expression converges to almost surely as . Since is arbitrary, the proof is complete. ∎
The next lemma provides the upper and lower bounds for in (2.7). We will make use of these bounds when estimating the difference between and (see (4.31)) in the proof of Theorem 3.1 and Theorem 3.4 .
Lemma 5.2.
For all ,
(5.3) |
where
Proof.
The inequality on the left hand side in (5.3) is obvious due to (2.7). Owing to (2.7) again, the remaining inequality is equivalent to
By the definition of , the left hand side above is equal to
(5.4) |
Note that is bounded by the number of -simplices of . Suppose that for some and with , is a connected component of with . Then, there exists with such that is a connected subcomplex of . Every time one finds such a connected subcomplex on points, it can increase the -simplex counts of by at most . In conclusion,
Substituting this bound back into (5.4),
∎
The next result we introduce here is the concentration bound derived in [2] for a Poisson -statistics. Let us rephrase the setup and assumptions of [2] in a way suitable for the current study. Let denote a Poisson point process in with finite intensity measure of no atoms. Let be a symmetric indicator function of order with the following properties.
There exists such that
(5.5) |
There is a constant such that
(5.6) |
Finally, we define a Poisson -statistics of order by
Proposition 5.3.
[Theorem 3.1 in [2]] Under the above conditions, there is a constant , depending only on , and , such that for all ,
5.2. Technical lemmas for the proof of Theorem 3.1
In this section, we offer a series of technical lemmas for the proof of Theorem 3.1. The first result below deals with asymptotic ratios of the regularly varying sequences as a function of , , , , and in (4.11), (4.12), and (4.13).
Lemma 5.4.
Proof.
By the definition of these five sequences, it is evident that
for all . Note that
(5.8) |
As , the rightmost term goes to as .
For the proof of the second statement in (5.7), let be a regular variation exponent of (In the case of Theorem 3.1 , we can take ). One can then rewrite the ratio as
Since as , we have that for sufficiently large . By the uniform convergence of regularly varying sequences (see, e.g., Proposition 2.4 in [20]),
Now, the proof is complete. Finally, since is also a regularly varying sequence, the third statement in (5.7) can be shown in the same way as above. ∎
Lemma 5.5.
Proof.
For the proof of (5.9), by the conditioning on we have
(5.16) | ||||
where
Here, we consider the case in which a point set is contained in . We treat only this case, but the other cases (i.e., ) can be handled in the same way. Changing the variables , , and using shift invariance condition (4.5), we have that
(5.17) | |||
where . The polar coordinate transform with , , gives that
(5.18) | ||||
Furthermore, an additional change of variable yields that
(5.19) | ||||
By Lemma 5.4, we have
By the regular variation assumption in (3.1),
for all , , and . As for the remaining term of the integrand in (5.19), we get that
(5.20) | |||
Now, (3.1) and Lemma 5.4 ensure that the last term in (5.20) converges to
Appealing to all of these convergence results and assuming temporarily that the dominated convergence theorem is applicable, we can obtain (5.9).
It now remains to find an integrable upper bound for the terms under the integral sign in (5.19). First it is evident that
Using Potter’s bounds (see Proposition 2.6 in [20]), for every , we have, for sufficiently large ,
(5.21) |
and
(5.22) |
for all , and such that . Combining all the bounds derived above, together with , we can obtain an integrable upper bound, as desired. The proof of (5.10) is similar, so we skip it here.
For the proof of (5.11),
From this, can be partitioned as , where
(5.23) |
For ,
(5.24) | ||||
where the last inequality comes from Lemma 5.6 below. This implies that . In order to treat in (5.23), we derive an upper bound for by
(5.25) |
where the second inequality is obtained from an obvious relation as well as the conditioning on as in (5.16). Although we here consider the case when all the points in belong to , the other cases (i.e., ) can be treated in the same manner. By the calculation similar to the above, we have
(5.26) | ||||
By (5.25) and (5.26), we get that , where
Furthermore, can be decomposed as , where
Since whenever , it is straightforward to check that
(5.27) |
by the same arguments as in (5.16), (5.19), and (5.20). Moreover, by Lemma 5.6 ,
(5.28) | |||
It thus follows that , and hence (5.11) has been established. Since the proof of (5.12) is very similar to that of (5.11), we will omit it.
Next, turning to (5.13) we apply Fubini’s theorem to obtain that
Taking so small that , Lemma 5.6 demonstrates that
(5.29) |
where is a positive constant independent of and . By Stirling’s formula for sufficiently large , (5.29) is further bounded by , which is finite due to the constraint . By Lemma 5.6 and the dominated convergence theorem, one can obtain (5.13) as required.
The next lemma is complementary to the proof of Lemma 5.5.
Lemma 5.6.
Proof.
We first prove (5.31). By the change of variables , for , which is followed by an additional change of variables as in (5.18) and (5.19), we have that
(5.32) | |||
Employing Potter’s bound as in (5.21) and (5.22) with , there exists , such that for all ,
(5.33) |
By the definition of above and Lemma 5.4, one can find , such that for all , . Thus, for all ,
The last inequality above follows from an elementary fact that there exist spanning trees on a set of points.
All the lemmas below will apply to the proof of Theorem 3.1 . We provide the asymptotic moments of , , and ; see (4.36), (4.37), and (4.45).
Lemma 5.7.
Proof.
The proof here is mostly the same as those for Lemma 5.5, so we provide only the sketch of proof for the first statement. Appealing to Palm theory for Poisson processes (see, e.g., Theorem 1.6 in [18]),
By the same change of variables as in (5.19) with and ,
The rest of our discussion is completely the same as the argument after (5.19). ∎
The next result justifies that with a proper scaling, the asymptotic behaviors of and will remain unchanged, even if the Poisson point process is replaced with the corresponding binomial process.
Lemma 5.8.
Proof.
The proofs of these statements are essentially the same, so we show only the first result. By the Borel-Cantelli lemma and Markov’s inequality, it suffices to demonstrate that
(5.34) |
Recall that (i.e., the cardinality of ) is Poisson distributed with parameter . By the conditioning on the values of , we get that
(5.35) | |||
where are i.i.d random variables with density . Proceeding as in the proof of Lemma 5.5, we can derive that
Referring this back into (5.35), the left hand side in (5.34) is now bounded by a constant multiple of
(5.36) | |||
where the last relation is due to the Cauchy-Schwarz inequality. It is elementary to check that there are constants , , with , such that
Therefore, the last expression in (5.36) can be written as
(5.37) |
for some constants , (note that has disappeared here). Finally, (5.37) is further bounded by
∎
5.3. Technical lemmas for the proof of Theorem 3.4
In the below we deduce various auxiliary results for the proof of Theorem 3.4, all of which are analogous to the corresponding lemmas in Section 5.2. Among them, Lemma 5.9 below analyzes asymptotic ratios of the sequences as a function of , , , , , , and defined in (4.11), (4.12), (4.14), and (4.15). Moreover, Lemmas 5.10 and 5.12 are used for calculating the asymptotic moments of various quantities appearing in the proof. Before stating these lemmas, recall the notations , , , , , and , which are defined respectively at (4.24), (4.25), (4.29), (4.36), (4.37), and (4.45).
Lemma 5.9.
Proof.
Because of Proposition 2.6 in [20], we see that . Under the setup of Theorem 3.4 , we have as . In the case of Theorem 3.4 , we have , , which again implies as . In both cases, by the uniform convergence of regularly varying sequences (see, e.g., Proposition 2.4 in [20]),
Since , we conclude that
(5.38) |
We are now ready to show the first statement. By the uniform convergence of regularly varying sequences,
where the last convergence is obtained from as (see (5.8)).
The following two lemmas will be applied for the proof of Theorem 3.4 .
Lemma 5.10.
Under the setup of Theorem 3.4 , for every , , and , we have as ,
(5.39) | |||
Moreover,
(5.40) | |||
Furthermore, for every and , we have as ,
(5.41) |
where
(5.42) | ||||
and also,
Proof.
Here, we show only (5.39), (5.40), and (5.41). For (5.39), by the same change of variables as in (5.17) after conditioning on as in (5.16), we get that
(5.43) | |||
where . Note that (5.43) completely agrees with (5.17). Moreover, by the polar coordinate transform with , , it turns out that above becomes the right hand side of (5.18). Next, we make an additional change of variable to get that
(5.44) | ||||
Lemma 5.9 gives that
In the following, we calculate the limit of each of the terms under the integral sign of (5.44). By Proposition 2.5 in [20], we have , which implies as , and hence,
(5.45) |
for every . Furthermore, (5.45) is bounded by for sufficiently large . For the untreated term in the second line of (5.44), we have
(5.46) | ||||
Since , the uniform convergence of regularly varying sequences, together with , , indicates that
(5.47) |
for every . It thus follows that for every ,
(5.48) |
In order to find an upper bound of (5.46), we define an array by
which is equivalent to . According to Lemma 5.2 in [3], for , there exists such that for all and . Since is increasing, the upper bound of (5.46) is given by
(5.49) | |||
Subsequently, we calculate the limit of the term in the third line of (5.44). By an expansion of for , we can define
(5.50) | ||||
If , it then holds that
(5.51) |
uniformly for all , , and with ( is given in (4.6)). Let
where
(5.52) |
Then, from (5.50) and (5.52) we have
(5.53) | |||
From (3.10) and (5.51) it follows that, for every ,
uniformly for all , , and with . Thus, by (5.47), we have as ,
(5.54) | |||
Notice further that (5.53) is bounded by . For the remaining term in (5.44), we use (5.54) and Lemma 5.9 to ensure that
Multiplying all of the upper bounds derived thus far, one can bound the triple integral in (5.44) by
Since the last term is finite due to , one can apply the dominated convergence theorem to obtain (5.39).
For the variance asymptotics in (5.40), we write , where and are defined in (5.23). Our argument here is nearly the same as that for the proof of (5.11). More concretely, by virtue of Lemma 5.11 below, one can replace (5.24), (5.27), and (5.28) respectively by
and
and
Since the rest of the argument is totally the same as that for (5.11), we now conclude that
as desired.
The result below is analogous to Lemma 5.6 when the density has an exponentially decaying tail.
Lemma 5.11.
Proof.
We first show (5.55). Performing the same change of variables as in (5.43) and (5.44), we have
(5.56) | |||
By Lemma 5.9, there exists so that for all , we have . Employing the bounds derived in the proof of (5.39), there exists such that for all , one can bound (5.56) by
where is determined in (5.49) and is a constant independent of and .
The last two lemmas below will be used for the proof of Theorem 3.4 . The proof of these lemmas are considerably similar to those of Lemmas 5.7 and 5.8, so we skip their proofs.
Lemma 5.12.
Lemma 5.13.
References
- [1] R. J. Adler, O. Bobrowski, and S. Weinberger. Crackle: The homology of noise. Discrete & Computational Geometry, 52:680–704, 2014.
- [2] S. Bachmann and M. Reitzner. Concentration for Poisson -statistics: subgraph counts in random geometric graphs. Stochastic Processes and their Applications, 128:3327–3352, 2018.
- [3] G. Balkema and P. Embrechts. Multivariate excess distributions. www.math.ethz.ch/ embrecht/ftp/guuspe08Jun04.pdf, 2004.
- [4] G. Balkema and P. Embrechts. High Risk Scenarios and Extremes: A Geometric Approach. European Mathematical Society, 2007.
- [5] A. Björner. Topological methods. In Handbook of Combinatorics. Elsevier, Amsterdam, 1995.
- [6] L. Decreusefond, M. Schulte, and C. Thäle. Functional Poisson approximation in Kantorovich-Rubinstein distance with applications to -statistics and stochastic geometry. The Annals of Probability, 44:2147–2197, 2016.
- [7] L. de Haan and A. Ferreira. Extreme Value Theory: An Introduction. Springer, New York, 2006.
- [8] P. Embrechts, C. Klüppelberg, and T. Mikosch. Modelling Extremal Events: for Insurance and Finance. Springer, New York, 1997.
- [9] R. Ghrist. Elementary Applied Topology. Createspace, 2014.
- [10] A. Goel, K. T. Duy, and K. Tsunoda. Strong law of large numbers for Betti numbers in the thermodynamic regime. Journal of Statistical Physics, 174, 2019.
- [11] Y. Hiraoka, T. Shirai, and K. D. Trinh. Limit theorems for persistence diagrams. The Annals of Applied Probability, 28:2740–2780, 2018.
- [12] M. Kahle and E. Meckes. Limit theorems for Betti numbers of random simplicial complexes. Homology, Homotopy and Applications, 15:343–374, 2013.
- [13] P. Niyogi, S. Smale, and S. Weinberger. A topological view of unsupervised learning from noisy data. SIAM Journal on Computing, 40:646–663, 2011.
- [14] T. Owada. Functional central limit theorem for subgraph counting processes. Electronic Journal of Probability, 22, 2017.
- [15] T. Owada. Limit theorems for Betti numbers of extreme sample clouds with application to persistence barcodes. The Annals of Applied Probability, 28:2814–2854, 2018.
- [16] T. Owada and R. J. Adler. Limit theorems for point processes under geometric constraints (and topological crackle). The Annals of Probability, 45:2004–2055, 2017.
- [17] T. Owada and O. Bobrowski. Convergence of persistence diagrams for topological crackle. Bernoulli, 26:2275–2310, 2020.
- [18] M. Penrose. Random Geometric Graphs, Oxford Studies in Probability 5. Oxford University Press, Oxford, 2003.
- [19] S. Resnick. Extreme Values, Regular Variation and Point Processes. Springer-Verlag, New York, 1987.
- [20] S. Resnick. Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. Springer, New York, 2007.
- [21] M. Schulte and C. Thäle. The scaling limit of Poisson-driven order statistics with applications in geometric probability. Stochastic Processes and their Applications, 122:4096–4120, 2012.
- [22] A. M. Thomas and T. Owada. Functional limit theorems for the Euler characteristic process in the critical regime. Advances in Applied Probability, 53:57–80, 2021.
- [23] A. M. Thomas and T. Owada. Functional strong law of large numbers for Euler characteristic processes of extreme sample clouds. To appear in Extremes, 2021.
- [24] D. Yogeshwaran, E. Subag, and R. J. Adler. Random geometric complexes in the thermodynamic regime. Probability Theory and Related Fields, 167:107–142, 2017.