Convergence of persistence diagrams
for discrete time stationary processes
Abstract.
In this article we establish two fundamental results for the sublevel set persistent homology for stationary processes indexed by the positive integers. The first is a strong law of large numbers for the persistence diagram (treated as a measure “above the diagonal” in the extended plane) evaluated on a large class of sets and functions—more than just continuous functions with compact support. We prove this result subject to only minor conditions that the sequence is ergodic and the tails of the marginals are not too heavy. The second result is a central limit theorem for the persistence diagram evaluated on the class of all step functions; this result holds as long as a -mixing criterion is satisfied and the distributions of the partial maxima do not decay too slowly. Our results greatly expand those extant in the literature to allow for more fruitful use in statistical applications, beyond idealized settings. Examples of distributions and functions for which the limit theory holds are provided throughout.
1. Introduction
Understanding the persistent homology of large samples from various probability distributions is of increasing utility in goodness-of-fit testing (Biscio et al., 2020; Krebs and Hirsch, 2022). For goodness-of-fit testing in the “geometric” setting there are a number of results to choose from, as much attention has been focused on the limiting stochastic behavior of Čech and Vietoris-Rips persistent homology of (Euclidean) point clouds (ibid. as well as Hiraoka et al., 2018; Divol and Polonik, 2019; Krebs and Polonik, 2019; Owada and Bobrowski, 2020; Krebs, 2021; Owada, 2022; Bobrowski and Skraba, 2024). However, less attention has been focused on the asymptotics of the entire sublevel (or superlevel) set persistent homology of stochastic processes and random fields—with a few notable exceptions (Chazal and Divol, 2018; Baryshnikov, 2019; Miyanaga, 2023; Perez, 2023; Kanazawa et al., 2024).
In recent years, summaries of sublevel set persistent homology of time series—such as those we establish limit theory for below—have been applied to the problems of heart rate variability analysis (Chung et al., 2021; Graff et al., 2021), eating behavior detection (Chung et al., 2022), and sleep stage scoring using respiratory signals (Chung et al., 2024). Thus, a comprehensive treatment of the asymptotic properties of sublevel set persistent homology of stochastic processes is needed for rigorous statistical approaches to the aformentioned problems. In this article we greatly extend the existing limit theory for persistence diagrams derived from sublevel set filtrations of discrete time stochastic processes. As a result, we understand the behavior of certain real-valued summaries of these random persistence diagrams—so-called persistence statistics—that are particularly relevant to machine learning and goodness-of-fit testing.
Work pertaining to the topology of sub/superlevel sets of random functions has its most prominent originator in Rice (1944). Current work in the area of establishing results about the sublevel set () persistent homology of stochastic processes has focused on almost surely continuous processes, such as investigations into the expected persistence diagrams of Brownian motion (Chazal and Divol, 2018); expected persistence diagrams of Brownian motion with drift (Baryshnikov, 2019); and expectations for the number of barcodes and persistent Betti numbers of continuous semimartingales (Perez, 2023). The formulas in Perez (2023), save for the expected number of barcodes with lifetime greater than , follow asymptotic formulas with tending to 0 or .
Though not overlapping entirely with our setting, some results for cubical persistent homology are applicable here. Notable results include the strong law of large numbers for persistence diagrams (Kanazawa et al., 2024) of random cubical sets (with the quality of the strong law being vague convergence) and central limit theorems for persistent betti numbers of sublevel sets of i.i.d. sequences found in Miyanaga (2023). In this article, we establish the most general strong law of large numbers yet for functionals of persistence diagrams. We do so by normalizing the persistence diagrams so they become probability measures and by leveraging the tools of weak convergence. We also prove a central limit theorem for persistence diagrams evaluated on step functions using recent results for weakly dependent and potentially nonstationary triangular arrays, subject to standard dependence decay conditions on the underlying stationary sequence.
The quality of most strong laws of large numbers for persistence diagrams has been vague convergence, with Hiraoka et al. (2018), Krebs (2021), and Owada (2022) tackling the geometric (i.e. Čech and Vietoris-Rips persistent homology) setting, and Kanazawa et al. (2024) addressing the cubical setting. Recently however, the authors of Bobrowski and Skraba (2024) have employed the weak convergence ideas that we use here to prove a strong law of large numbers for the probability measure defined by death/birth ratios in a persistence diagram, for the geometric setting. In Divol and Polonik (2019)—again in the geometric setting—the authors extend the set functions for which the strong law of Hiraoka et al. (2018) holds to a class of unbounded functions.
In Section 3.1, we accomplish this extension as well in the setting of sublevel set persistent homology. We extend the strong law of large numbers (SLLN) of Kanazawa et al. (2024) (that which pertains to the 1-dimensional setting) from continuous functions with compact support to a large class of unbounded functions. We achieve this based solely on minor conditions such as ergodicity and restrictions on the heaviness of the tails of the marginal distributions of our underlying stochastic process. We also remove the need for any local dependence condition, such as that of Kanazawa et al. (2024). In doing so, we answer an open question of Chung et al. (2021) about the limiting empirical distribution of persistence diagram lifetimes for sublevel sets of discrete time stationary processes. For this specific setting, we also derive an explicit representation of the strong limit of our sublevel set persistent betti numbers in Proposition 3.3, answering a query set forth in the conclusion to Hiraoka and Tsunoda (2018). Finally, we extend the current state-of-the art result central limit theorem (CLT) for persistent Betti numbers of sublevel set filtrations of 1-dimensional processes (Theorem 1.2.3 in Miyanaga, 2023) to finite-dimensional convergence and beyond the realm of i.i.d. observations.
This article proceeds in Section 2 with a treatment of persistent homology specialized to our setting, as well as details of our probabilistic setup. In Section 3 the strong law of large numbers is stated and proved (Theorems 3.1 and 3.8, on pages 3.1 and 3.8) and examples for which it holds are given for specific unbounded functionals of persistence diagrams in Corollary 3.10. Beyond this, we derive some satisfying results in the case of i.i.d. stochastic processes in Corollary 3.5 and state a Glivenko-Cantelli theorem for persistence lifetimes in Corollary 3.7. Finally in Section 4, we state the setting and results of our central limit theorem for persistence diagrams (Theorem 4.6, on page 4.6). We conclude with a brief discussion about the potential improvements and extensions of this work in Section 5. The proof of the central limit theorem is deferred to Section 6.
2. Background
We begin by discussing the necessary notions in topological data analysis—specifically zero-dimensional sublevel set persistent homology. From there, we detail crucial results for the representation of zero-dimensional sublevel set persistent homology for stochastic processes.
Before continuing, let us make a brief note about notation. For a real numbers we define , , and . We set and . If is a set in some topological space, we denote the interior (i.e. largest open subset) of and its boundary. We denote to be the open Euclidean ball of radius centered at . If for a real sequence and a positive sequence we have as , we write ; if there exists a such that for large enough, we write .
2.1. Homology
Recall that an (abstract) simplicial complex is a collection of subsets of a set with the property that it is closed under inclusion. Let be the graph (i.e. a special case of a simplicial complex) with vertex set and edge set
For a fixed function that satisfies , we define . It is clear that for we have and thus defines a filtration of graphs. For any we can assess the connectivity information of by calculating its -dimensional homology group . We do so by initially forming two vector spaces and of all formal linear combinations of the vertices
and
where is the field of two elements . The elements of and are called 0-chains and 1-chains, respectively. Addition of -chains in is done componentwise. To calculate we need to specify the boundary map , which is defined by
We can extend this to an arbitrary by
By analogy to the construction above, each vertex in gets sent to 0 by so . Defining (the image of ), we define the homology group as the quotient vector space,
A more general setup of homology with coefficients can be seen in Chapter 4 of Edelsbrunner and Harer (2010).
2.2. Persistent homology and representations
The vector spaces111Conventionally called groups, as coefficients may lie in , for example. capture intuitive connectivity information—the elements of are the equivalence classes of vertices that satisfy . More simply put, elements of are vertices connected by a chain of edges. The information in gives us useful information on the function , but being able to assess how connected components (i.e. elements of ) appear and merge as we vary would be better. We can do so by introducing the notion of persistent homology.

Given the inclusion maps , for there exist linear maps between all homology groups
which are induced by . The persistent homology groups of the filtration are the quotient vector spaces
whose elements represent the cycles that are “born” in or before and that “die” after . The dimensions of these vector spaces are the persistent Betti numbers . Heuristically, a connected component is born at if it appears for the first time in —formally, , for . The component dies entering if it merges with an older class (born before ) entering . The persistent homology of , denoted , is the collection of homology groups and maps , for . All of the information in the persistent homology groups is contained in a multiset in called the persistence diagram (Edelsbrunner and Harer, 2010). The persistence diagram of , denoted , consists of the points with multiplicity equal to the number of the classes that are born at and die entering . Often, the diagonal is added to this diagram, but we need not consider this here. Formally, we have
where is a multiset. Each point in can also be represented as a barcode, or interval (cf. Carlsson and Vejdemo-Johansson, 2021). As such, we may represent as a measure
on . See Figure 1 for an illustration of a persistence diagram associated to a sublevel set filtration of a given stochastic process.
2.3. Probability and persistence
Throughout the paper, let us fix a probability space . For random variables we write to convey that converges weakly to , i.e. for all bounded, continuous . We write to convey that converges in probability to . We say an event occurs “a.s.” (almost surely), if . We use the term stationary throughout this work to refer to the strict stationarity of invariance of finite-dimensional distributions under shifts. A stationary sequence of random variables is said to be ergodic if any a.s. shift-invariant event satisfies either or .
As we are interested in studying the stochastic behavior of persistence diagrams, we want to associate to each vertex a random variable for each . Consider a stationary sequence of random variables and define . We then define for the filtration
where for and otherwise. Furthermore, set . Crucially, we can show that
(1) |
We now formalize (1) into a proposition and present a proof.
Proposition 2.1.
The formula (1) holds.
Proof.
Take two vertices . These vertices are equivalent if and only if
i.e. if they can be connected by edges lying in . Hence, and must lie in the same connected component in . Thus there is a one-to-one correspondence between the number of connected components in (which contain a vertex from ) and the number of equivalence classes present in . Hence, these same classes form a spanning set. Let denote the equivalence class of a chain . Now take the vertices that constitute (note that ). Then,
if and only if
where the terms lie in . Suppose without loss of generality that
As lies in a different connected component from the rest of the vertices, any -chain of edges in including an edge that is a part of, must have a boundary containing a point not equal to and also not equal to . Hence , and induction furnishes the other cases. Hence, (1) holds. ∎
Having brought forth the representation of persistent Betti numbers that will prove crucial to the results herein, we turn our attention to persistence diagrams. Let be the measure on associated to the persistence diagram of the filtration . Note that
If we let
for , then
(2) |
due to the so-called “Fundamental Lemma of Persistent Homology” (Edelsbrunner and Harer, 2010). If has the above representation, we will say that are the coordinates of . We define the class of sets by
An important result holds for the class .
Lemma 2.2.
is a convergence-determining class for weak convergence on equipped with the Borel -algebra, . Namely, if and are probability measures on and
for all such that , then
Furthermore, for each probability measure on there is a countable convergence-determining class for .
Proof.
We will adapt the proof of Theorem A.2 from Hiraoka et al. (2018). First, it is clear that is closed under finite intersections, so we have satisfied the first condition of Theorem 2.4 in Billingsley (1999) (i.e. that is a -system). It is also evident that is separable. Now, for any if we denote
then the class of boundaries contains uncountably many disjoint sets, regardless of if or , where (in the former case ). Thus is a convergence-determining class by Theorem 2.4 of Billingsley (1999).
For the final part of the proof, let us fix a probability measure and choose an open set . Note that for every , there is an such that . By the first part of this proof, for each of these there exists a set such that and hence we have
and is the union of sets with -null boundaries. By separable, there exists a countable subcover of . Also, there exists a countable basis of . Hence, if we denote then
If we let be the class of finite intersections of the sets . As the boundary of an intersection is a subset of the union of the boundaries, each element of has a -null boundary. Furthermore, every open set in is the countable union of elements of . Hence, we apply Theorem 2.2 in Billingsley (1999) and the result holds.
∎
An important result holds for the measure . Namely that the value is equal to the number of local minima of .
Proposition 2.3.
Suppose that is a stationary sequence of random variables with . Then
Proof.
The case when is trivial, so suppose that . As the underlying stochastic process is stationary and then every value is distinct with probability 1. Let be the order statistics of —which are distinct with probability 1—and let be the associated vertices (see above). If we define
with , then and contains all the simplices of along with the 0-simplex and any edges containing it. If then there are points at if and only if —see p. 152 in Edelsbrunner and Harer (2010). By Proposition 2.1, we have that
Now, so by cancelling sums—and the fact that implies that cannot happen—we have that
(3) |
because the only way the maximum and minimum of a collection of of random variables are idenitical is if they’re constant—which is only possible if as the are almost surely distinct. The desired formula follows from applying this same uniqueness to (3).
∎
To finish this section, we must introduce the restricted measure on the set —equipped with the usual Borel sub -algebra —defined by
Note that as is Borel subset of that . To reduce notational clutter, we will mostly write in place of from here on out, unless otherwise noted.
3. Strong law of large numbers
In this section we establish our strong law of large numbers for sublevel set persistence diagrams for a very broad class of sets and functions. We do this for the class of bounded, continuous functions initially via a weak convergence argument, and proceed to extend our result to a class of unbounded functions which are of great practical use in topological data analysis. Along the way, we give an explicit representation for the limiting persistent Betti number for i.i.d. sequences.
Theorem 3.1.
Consider a stationary and ergodic sequence where each has distribution and density such that . For the random probability measure induced by there exists a probability measure on such that
Additionally, if we define on then
Proof.
We will begin by establishing the almost sure convergence of the persistent Betti numbers for . Recall that
Define for the indicator random variable
(4) |
with the indicators defined as with the second subscript dropped. If we fix , we have for that
which yields
Similarly, we see that
because
It is readily observed for fixed that are indicator random variables and form a stationary and ergodic sequence, owing to Theorem 7.1.3 in Durrett (2010), for example. Thus, Birkhoff’s ergodic theorem implies that for any we have
The monotone convergence theorem then implies that
To establish the convergence of , it suffices to recall that from Proposition 2.3 the total number of points in the persistence diagram is equal to the number of local minima of . Therefore, the ergodic theorem once again implies that converges a.s. to and
(5) |
(By our assumptions we must have that ). Define a set function by
which can likewise be defined on in a straightforward manner, by (2). It is clear that the convergence in (5) holds for any set in as well. As is a semiring which generates the Borel -algebra on (as is separable), then extends uniquely to a probability measure on , provided that is countably additive on . By Lemma 2.2, there is a countable convergence-determining class for . We have shown thus far that
so convergence for all sets in with -null boundary follows (with probability 1). It remains to demonstrate that is countably additive on . Let
where are disjoint. Then, almost surely,
by the monotone convergence theorem.
To finish the proof, note that it is the case222This fact implies that is supported on . that —as they both tend to infinity and differ by 1. Also, we have that for any set —which is also a Borel subset of —if , then almost surely
As for , the proof is finished. ∎
Remark 3.2.
In Theorem 3.1 we assumed that in our stationary sequence, to ensure consecutive points are distinct, as stated in Proposition 2.3. It seems straightforward to generalize this result to the situation where consecutive points can be identical, by accounting for this in the proof of Proposition 2.3, and ensuring that the number of points in tends to infinity.
Before seeing an example of the strong law in action, we will establish a result that will provide us an explicit representation of the limiting measure. Let us define the quantity
which represents the probability that there is some index such that and all other random variables are less than or equal to . In the setup with all i.i.d. with distribution function we have
and
(6) |
We will assume that , as if then and if then . Dividing (6) by we can see that
as the other terms are finite or tend to zero upon dividing by . Let us make the substitution and consider a general with . Thus,
(7) |
by differentiating with respect to . We have the following pleasing result for the limiting expectation for the persistent Betti number in this simplified i.i.d case.
Proposition 3.3.
For i.i.d. having distribution , we have that
for any with and otherwise.
Proof.
Dividing by and taking the limit in (7) for the two cases and gives
and
Simplifying the above two expressions yields the ultimate result. ∎
Example 3.4.
If the stationary and ergodic sequence in Theorem 3.1 is i.i.d, Proposition 3.3 shows we can characterize the limiting probability measure quite nicely. We note that
for all as . Therefore, admits a probability density
This density facilities the simulation of random variables according to the limiting persistence distribution in the case that corresponds to i.i.d. noise. After a Monte Carlo random sample is generated from this distribution, we may test for “significant” points in the diagram , based off of what we would expect from .
Of particular importance to us is the partial derivative
(8) |
If we set , then (8) evaluates to
Define for . As a result of the above discussion, we have the following corollary.
Corollary 3.5.
For i.i.d. with distribution function satisfying the conditions of Theorem 3.1, we have that
where .
Example 3.6.
Corollary 3.5 implies that for uniform on we have for that
This is a rather interesting, given that there is no a priori reason that uniform noise should also produce asymptotically uniformly distributed persistence lifetimes.
Before addressing strong laws for unbounded functions, we conclude with a corollary of Theorem 3.1, establishing a Glivenko-Cantelli result for persistence lifetimes. We omit the proof of Corollary 3.7 as it is proved in exactly the same manner as the Glivenko-Cantelli theorem—see Theorem 1.3 in Dudley (2014).
Corollary 3.7.
Suppose the conditions on the sequence stated in Theorem 3.1 hold. Then we have
3.1. SLLN for unbounded functions
At this point, we have established almost surely that
for any bounded, continuous real-valued function on , when is induced by a stationary and ergodic sequence of random variables (similar for ). In general, if is continuous, nonnegative function and is the function that equals when , then almost surely
for all . Following this line of inquiry, we establish a result which yields convergence results for a large class of persistence statistics often seen in practice, including many of the functions for which convergence holds for geometric complexes in Divol and Polonik (2019), though we make no requirements on the behavior near the diagonal nor do we require polynomial growth. Prior to stating the result, it is necessary to define the notion of largely nondecreasing. We say that an unbounded function is largely nondecreasing if there exists an such that is non-empty and is nondecreasing on where . Furthermore, recall that the function is coercive if as .
Theorem 3.8.
Assume the conditions of Theorem 3.1 and suppose that and is a continuous, coercive, and largely nondecreasing function with for some . If , then
Proof.
Before beginning, fix any such that is nondecreasing on . We will focus our proof on the case where the marginal distribution can take negative and positive values, but the proofs follow from a simplified version of the argument below when the support of is restricted to a half-line. To show that
for as in the statement of the theorem, it will suffice to first bound the quantity
(9) |
Recall that . In this situation, we have that the unnormalized form of (9) equals
(10) |
because of the fact when , and we have . Furthermore,
This occurs as if . With a similar argument for the term, we can see that (10) is bounded above by
By a similar argument to Proposition 2.3 occurs at if and only is a local maxima. Birkhoff’s ergodic theorem then implies that
as . Hölder’s inequality then implies that for and ,
By assumption, for some , so that coercivity of entails we may choose large enough such that
Therefore, for such an we have
A similar argument holds for the term
so the additivity of furnishes that
By Theorem 3.1 and the triangle inequality, it remains to show that
for some , which follows from . ∎
If all are nonnegative, we have an easy corollary to Theorem 3.8. We omit the proof as it follows directly from the one above.
Corollary 3.9.
If for all then Theorem 3.8 holds for .
The utility of Theorem 3.8 can be seen in the following section.
3.2. Strong law of large numbers: two examples
Strong laws of large numbers can be established from Theorem 3.8 for various quantities used in topological data science called persistence statistics. For instance, we have a strong law of large numbers for degree- total persistence333See Cohen-Steiner et al. (2010) for a definition and Divol and Polonik (2019) for the geometric complex result, provided that
A more difficult example is persistent entropy (Merelli et al., 2015; Atienza et al., 2020). Persistent entropy has been used as part of a suite of statistics in the studies of Chung et al. (2021, 2022, 2024) and Thomas et al. (2024), as well as to detect activation in the immune system (Rucco et al., 2016), and to detect structure in nanoparticle images (Thomas et al., 2023; Crozier et al., 2024). The definition (excluding the longest barcode) is
where . We may represent as
Another nontrivial statistic of interest is the ALPS statistic, defined in Thomas et al. (2023) and utilized in Thomas et al. (2023), Crozier et al. (2024), and Thomas et al. (2024). Its representation is
and we define a truncation of the ALPS statistic as Before continuing, let us define and . Both and are continuous, coercive, and largely nondecreasing in .
Corollary 3.10.
Proof.
The proof follows fairly simply from Theorem 3.8. We know that
Subtracting and applying Theorem 3.8 yields a limit of
which finishes the proof, as a probability measure. For the ALPS statistic, we see that
If we fix a positive , Corollary 3.7 implies that for ( depending on the sample point ), we have
for all . Therefore, the bounded convergence assumption holds for all such that convergence holds. Hence, our result follows almost surely. ∎
Having demonstrated our strong law of large numbers for persistence diagrams, and its ramifications, we now turn our attention to the central limit theorem.
4. Central limit theorem
In this section, we prove a central limit theorem for the integral , where is a step function. This follows from proving a CLT for linear combinations of persistent Betti numbers using the Lindeberg method for weakly dependent triangular arrays given in Neumann (2013). The desired result will follow as a consequence of demonstrating
obeys a central limit theorem when obeys weak dependence conditions (to be specified below) and are arbitrary real numbers. The reason for this is that if then
The Crámer-Wold device also provides us with finite-dimensional weak convergence as an added benefit.
As for the aforementioned notions of weak dependence, the one we employ is that of -mixing. To begin, note that for any two sub- algebras we define
where (resp. ) is the space of square-integrable -measurable (resp. -measurable) random variables444For random variables the value .. Furthermore, we define
so that the stochastic process is said to be -mixing if as . For our limit theorems, we will require that , which implies -mixing. More details on -mixing and other mixing conditions can be seen in Bradley (2005). Another particularly important condition for our proofs is that our stationary process obeys a certain condition on the probability distributions of the partial maxima decaying sufficiently quickly. This serves to limit any percolation-esque phenomena that would preclude a central limit theorem.
Definition 4.1.
A stationary stochastic process with marginal distribution function is said to be max-root summable if for all with we have
Before stating our main theorem, we will establish conditions on the stochastic process that guarantee max-root summability.
Proposition 4.2.
Suppose that is a stationary stochastic process. If there is some s.t.
for all with , then is max-root summable.
Proof.
If the condition above holds there is some such that
the right-hand side of which is clearly summable. ∎
Example 4.3.
Example 4.4.
Suppose that is stationary and -dependent, i.e. for all . Then we have
Because establishes max-root summability trivially, we take . Then as for any and large enough, then the condition in Proposition 4.2 is established.
To establish our CLT (Theorem 4.6 below), we first need to assess the limiting behavior of the covariance.
Proposition 4.5.
Let be a stationary stochastic process that is max-root summable and satisfies . Assume further that the marginal distribution of is continuous with distribution . Suppose that for with and .
where the terms are defined at (4) respectively.
With this all at hand, we may finally state the central limit theorem.
Theorem 4.6.
Let be a stationary stochastic process that is max-root summable and satisfies . Assume further that the marginal distribution of is continuous with distribution . Then for any function with and , , if the corners of the rectangles satisfy and we have:
and if each of the coordinates of lie in for then
as , where is a nonnegative constant depending on .
We defer the proof to Section 6.
5. Discussion
In this paper, we have demonstrated a strong law of large numbers for a large class of integrals with the respect to the random measure induced by the sublevel set persistent homology of general stationary and ergodic processes. We also proved a central limit theorem for the same random measure for a large class of step functions. As the SLLNs—by consideration of the negated process —also pertain to superlevel sets, it would be interesting to consider the limiting behavior of the persistent homology of the extremes of a stationary stochastic process; the reason is due to the natural connection between the superlevel set value (number of connected components above levels , ) and the clusters of exceedances seen in the extreme value theory literature (see chapter 6 of Kulik and Soulier, 2020).
Two potential improvements for this paper seem to lie in the weakening of conditions and the augmentation of the class of functions for which the central limit theorem holds (Theorem 4.6). There are likely only improvements to be made in the latter case, as the condition is only slightly stronger than the slowest mixing rate of for a conventional CLT to hold for a stationary sequence (Bradley, 1987). The improvement of the second objective seemingly depends on a more precise treatment of the covariance in Proposition 4.5, which is rather tedious as it stands. Nonetheless, such improvements would see utility as the class of functions of persistence diagrams used in practice are large, which is what motivated Section 3.1 (and this paper) to begin with. Expanding the CLT results to a functional CLT for the persistent Betti numbers (as in Krebs and Hirsch, 2022) may yield some progress towards this end, but we leave all the pursuits mentioned in these last two paragraphs for future work.
6. Central limit theorem proof
For the proof of our central limit theorem, we will employ Theorem 2.1 from Neumann (2013), which establishes a CLT for potentially nonstationary weakly dependent triangular arrays. As mentioned at the beginning of Section 4, it is sufficient to show that
(11) |
converges to a Gaussian distribution for each , to establish our desired convergence. Recall that at (4) we defined the indicator (Bernoulli) random variable and on the following line we noticed that
so that (11) is equal to
For the proof the CLT it is convenient for us to establish first a CLT for a truncated version of the persistent Betti numbers—as was done in the proof of the Betti number CLT for the critical regime in the geometric setting, in Theorem 4.1 of Owada and Thomas (2020). Define first
Therefore, if we define
establishing Theorem 4.6 amounts to establishing a CLT for for each then showing that the difference between and disappears in probability. We will now quote the theorem which we will use to establish this.
Theorem 6.1 (Theorem 2.1 in Neumann, 2013).
Suppose that with is a triangular array of random variables with for all and for some . Suppose further that
(12) |
for some , and that for every we have
(13) |
Furthermore, assume that there exists a summable sequence of , , such that for all and indices , the following upper bounds for covariances hold true:
(14) |
and
(15) |
for all measurable with . Then
Proof of Theorem 4.6.
The finite-dimensional CLT proof for follows by checking that the conditions of Theorem 6.1 hold for our setup. First, we notice that
so that
which is bounded above by
for by the inequalities and . Thus, . If we note that
then converges to some limit via arguments analogous to and much simpler than those of Proposition 4.5. Thus, (12) is satisfied. If we use the triangle inequality, we can see that for each
using the trivial indicator random variable bound . Therefore when we have that so that (13) holds as well. To finish the proof, we must show that (14) and (15) hold in Theorem 6.1 above. For both situations, we can ignore the case for , as we can set arbitrarily large in this case to get the required bounds in this case. Therefore, suppose that , so that only depends on indices up to and only depends on indices starting at .
We will only demonstrate (14), as (15) follows by a similar, simpler argument. For a fixed set of indices and fixed let us denote . By the bilinearity of covariance, it will suffice to establish the required bounds in (14) for a single summand in
(16) | . |
provided that such a bound is uniform in . It can be shown that the covariance term in (16) is equal to
(17) |
and the absolute value of (17) can be bounded above by
Because , for measurable and , the required bound will follow provided we find a suitable bound for the quantity . By definition of -mixing and the trivial bound we have
By assumption, so that (14) is established. As alluded to earlier, the proof for (15) follows in exactly the same way, hence
(18) |
for all . As the dominated convergence assumption holds true in Proposition 4.5, it is straightforward to see that as , where is the limiting variance of (11). Hence as as well (using Lévy’s continuity theorem, for example). Theorem 3.2 in Billingsley (1999) will yield the rest if we can show that
where is the sum of persistent Betti numbers in (11) and is the -truncated version on the left-hand side of (18). An application of Chebyshev’s inequality and the covariance inequality yields
(19) |
The quantity converges to a limit defined by the terms (24), (25), and (26) below with the restriction that . As each of the sums in (24), (25), and (26) are absolutely convergent, their restriction with tends to 0 as , and the CLT follows.
Finally, for any
so that if each coordinate of is in , then and the result is proved for the restricted persistence diagram as well. ∎
We finish this section with a proof of the limiting covariance seen in Proposition 4.5, which we will break into a few lemmas.
Proof of Proposition 4.5. Let us define
where is analogously defined for the entire sequence . Therefore we have
Thus, it follows that
(20) |
where with , and . We may then break (20) into
(21) |
For now, we will exclude the boundary terms from each sum—which use and . We will treat the boundary terms later. The nonboundary terms of the expression (21) can thus be simplified based on the assumed stationarity of to be
(22) |
Dividing by , we may express the first term in (22) as
(23) |
Assuming we can show that
where we drop the superscript as mentioned at the start of Section 4, then (23) will converge to
(24) |
Similarly, we will get limits of
(25) |
and
(26) |
for the second and third terms in (22), provided the dominated convergence assumption holds for each of these cases. In fact, these three sums comprise the limit of the covariance. However, to establish that, we must ensure that the “boundary terms” vanish, which we do in Lemma 6.3. A useful fact will aid in the proof of the covariance limit above and the lemma below.
Lemma 6.2.
Fix . Suppose that and , then for any values of we have
Analogously, if and , then for any values of we have
Proof.
Note that if and , then it must be the case that there exists indices such that if then
a contradiction because and cannot simultaneously hold—even if . The proof for the second case follows by the same argument. ∎
Lemma 6.3.
If is a -mixing stationary stochastic process that is max-root summable then the boundary terms in (21) are as .
Proof.
The boundary terms (21) comprise those terms in the first sum that satisfy or , the terms in the second sum satisfying or , and the terms in the third sum satisfying , or . Thus, the boundary terms can be represented as
(27) |
We may bound the absolute value of the first sum in (27) by
and thus —where we use the inequalities , , and the fact that is max-root summable. We now will finish the proof by showing that the second sum in (27) is as well. That the third sum in (27) is follows by an essentially symmetric proof. We may bound the absolute value of the second sum in (27) by
(28) |
The first sum in (28) follows from the definition of -mixing and the fact that and . Dividing the aforementioned first sum by we see that
which tends to as by max-root summability and the fact that . The second sum in (28) is a little more delicate. Before continuing, note that when both terms are at least 1. Hence, Lemma 6.2 implies that the second sum in (28) equals
(29) | |||
We may bound the first term in (29)
by the max-root summability condition. Furthermore, we can bound the second sum in (29) by
using the covariance inequality , and again using the max-root summability condition. Finally, we bound the third sum in (29) by
by a final application of the max-root summability condition. ∎
Having shown that the boundary terms vanish under our conditions, it will suffice to show the dominated convergence condition for the terms in (22) divided by , which will then tend to the sums of (24), (25), and (26) respectively. First, we divide each term by and see that the first covariance term with absolute summands is bounded above (using again the usual covariance inequalities) by
by applying max-root summability for each sum. We now prove the dominated convergence assumption for the second sum (divided by ) in (22), as the third sum follows an analogous proof. This procedure yields an upper bound of
(30) |
The first sum in (30) we may bound by
by max-root summability of . The second sum in (30) is bounded above by
by assumption.
References
- Atienza et al. (2020) Nieves Atienza, Rocio Gonzalez-Díaz, and Manuel Soriano-Trigueros. On the stability of persistent entropy and new summary functions for topological data analysis. Pattern Recognition, 107:107509, 2020.
- Baryshnikov (2019) Yuliy Baryshnikov. Time series, persistent homology and chirality. arXiv preprint arXiv:1909.09846, 2019.
- Billingsley (1999) Patrick Billingsley. Convergence of probability measures. John Wiley & Sons, Inc., 2nd edition, 1999. ISBN 0-471-19745-9.
- Biscio et al. (2020) Christophe A. N. Biscio, Nicolas Chenavier, Christian Hirsch, and Anne Marie Svane. Testing goodness of fit for point processes via topological data analysis. Electronic Journal of Statistics, 14(1):1024–1074, 2020. ISSN 1935-7524. doi: 10.1214/20-EJS1683.
- Bobrowski and Skraba (2024) Omer Bobrowski and Primoz Skraba. Weak universality in random persistent homology and scale-invariant functionals. arXiv preprint arXiv:2406.05553, 2024.
- Bradley (1987) Richard C. Bradley. The central limit question under -mixing. The Rocky Mountain journal of mathematics, pages 95–114, 1987.
- Bradley (2005) Richard C. Bradley. Basic Properties of Strong Mixing Conditions. A Survey and Some Open Questions. Probability Surveys, 2(none):107 – 144, 2005. doi: 10.1214/154957805100000104.
- Carlsson and Vejdemo-Johansson (2021) Gunnar Carlsson and Mikael Vejdemo-Johansson. Topological Data Analysis with Applications. Cambridge University Press, 2021.
- Chazal and Divol (2018) Frédéric Chazal and Vincent Divol. The density of expected persistence diagrams and its kernel based estimation. In 34th International Symposium on Computational Geometry (SoCG 2018). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2018.
- Chung et al. (2021) Yu-Min Chung, Chuan-Shen Hu, Yu-Lun Lo, and Hau-Tieng Wu. A persistent homology approach to heart rate variability analysis with an application to sleep-wake classification. Frontiers in physiology, 12:637684, 2021.
- Chung et al. (2022) Yu-Min Chung, Amir Nikooienejad, and Bo Zhang. Automatic eating behavior detection from wrist motion sensor using bayesian, gradient boosting, and topological persistence methods. In 2022 IEEE International Conference on Big Data (Big Data), pages 1809–1815, 2022. doi: 10.1109/BigData55660.2022.10021031.
- Chung et al. (2024) Yu-Min Chung, Whitney K. Huang, and Hau-Tieng Wu. Topological data analysis assisted automated sleep stage scoring using airflow signals. Biomedical Signal Processing and Control, 89:105760, 2024. ISSN 1746-8094. doi: 10.1016/j.bspc.2023.105760.
- Cohen-Steiner et al. (2010) David Cohen-Steiner, Herbert Edelsbrunner, John Harer, and Yuriy Mileyko. Lipschitz functions have -stable persistence. Foundations of computational mathematics, 10(2):127–139, 2010.
- Crozier et al. (2024) Peter A. Crozier, Matan Leibovich, Piyush Haluai, Mai Tai, Andrew M. Thomas, Joshua Vincent, David M. Matteson, Yifan Wang, and Carlos Fernandez-Granda. Atomic resolution observations of nanoparticle surface dynamics and instabilities enabled by artificial intelligence. 2024. Submitted.
- Divol and Polonik (2019) Vincent Divol and Wolfgang Polonik. On the choice of weight functions for linear representations of persistence diagrams. Journal of Applied and Computational Topology, 3(3):249–283, September 2019. ISSN 2367-1734. doi: https://doi.org/10.1007/s41468-019-00032-z.
- Dudley (2014) Richard M. Dudley. Uniform central limit theorems, volume 142. Cambridge university press, 2014.
- Durrett (2010) Rick Durrett. Probability: theory and examples. Cambridge university press, 4th edition, 2010.
- Edelsbrunner and Harer (2010) Herbert Edelsbrunner and John Harer. Computational Topology: An Introduction. American Mathematical Soc., 2010.
- Graff et al. (2021) Grzegorz Graff, Beata Graff, Paweł Pilarczyk, Grzegorz Jabłoński, Dariusz Gąsecki, and Krzysztof Narkiewicz. Persistent homology as a new method of the assessment of heart rate variability. Plos one, 16(7):e0253851, 2021.
- Hiraoka and Tsunoda (2018) Yasuaki Hiraoka and Kenkichi Tsunoda. Limit theorems for random cubical homology. Discrete & Computational Geometry, 60:665–687, 2018.
- Hiraoka et al. (2018) Yasuaki Hiraoka, Tomoyuki Shirai, and Khanh Duy Trinh. Limit theorems for persistence diagrams. The Annals of Applied Probability, 28(5):2740–2780, 2018.
- Kanazawa et al. (2024) Shu Kanazawa, Yasuaki Hiraoka, Jun Miyanaga, and Kenkichi Tsunoda. Large deviation principle for persistence diagrams of random cubical filtrations. Journal of Applied and Computational Topology, pages 1–52, 2024.
- Krebs (2021) Johannes Krebs. On limit theorems for persistent betti numbers from dependent data. Stochastic Processes and their Applications, 139:139–174, 2021. ISSN 0304-4149. doi: https://doi.org/10.1016/j.spa.2021.04.013.
- Krebs and Hirsch (2022) Johannes Krebs and Christian Hirsch. Functional central limit theorems for persistent betti numbers on cylindrical networks. Scandinavian Journal of Statistics, 49(1):427–454, 2022.
- Krebs and Polonik (2019) Johannes Krebs and Wolfgang Polonik. On the asymptotic normality of persistent betti numbers. arXiv preprint arXiv:1903.03280, 2019.
- Kulik and Soulier (2020) Rafal Kulik and Philippe Soulier. Heavy-tailed time series. Springer, 2020.
- Merelli et al. (2015) Emanuela Merelli, Matteo Rucco, Peter Sloot, and Luca Tesei. Topological characterization of complex systems: Using persistent entropy. Entropy, 17(10):6872–6892, 2015.
- Meyn and Tweedie (2009) S. Meyn and R.L. Tweedie. Markov Chains and Stochastic Stability. Cambridge Mathematical Library. Cambridge University Press, 2009. ISBN 9780521731829.
- Miyanaga (2023) Jun Miyanaga. Limit theorems of persistence diagrams for random cubical filtrations. Phd thesis, Kyoto University, 2023.
- Neumann (2013) Michael H Neumann. A central limit theorem for triangular arrays of weakly dependent random variables, with applications in statistics. ESAIM: Probability and Statistics, 17:120–134, 2013.
- Owada (2022) Takashi Owada. Convergence of persistence diagram in the sparse regime. The Annals of Applied Probability, 32(6):4706–4736, 2022.
- Owada and Bobrowski (2020) Takashi Owada and Omer Bobrowski. Convergence of persistence diagrams for topological crackle. Bernoulli, 26(3):2275–2310, aug 2020. ISSN 1350-7265. doi: 10.3150/20-BEJ1193.
- Owada and Thomas (2020) Takashi Owada and Andrew M. Thomas. Limit theorems for process-level betti numbers for sparse and critical regimes. Advances in Applied Probability, 52(1):1–31, 2020.
- Perez (2023) Daniel Perez. On the persistent homology of almost surely stochastic processes. Journal of Applied and Computational Topology, 7(4):879–906, 2023.
- Rice (1944) S. O. Rice. Mathematical analysis of random noise. The Bell System Technical Journal, 23(3):282–332, 1944. doi: 10.1002/j.1538-7305.1944.tb00874.x.
- Rucco et al. (2016) Matteo Rucco, Filippo Castiglione, Emanuela Merelli, and Marco Pettini. Characterisation of the idiotypic immune network through persistent entropy. In Proceedings of ECCS 2014: European Conference on Complex Systems, pages 117–128. Springer, 2016.
- Thomas et al. (2023) Andrew M. Thomas, Peter A. Crozier, Yuchen Xu, and David S. Matteson. Feature detection and hypothesis testing for extremely noisy nanoparticle images using topological data analysis. Technometrics, 65(4):590–603, 2023. doi: 10.1080/00401706.2023.2203744.
- Thomas et al. (2024) Andrew M. Thomas, Michael Jauch, and David S. Matteson. Bayesian changepoint detection via logistic regression and the topological analysis of image series. arXiv preprint arXiv:2401.02917, 2024.