The Shadow Knows: Empirical Distributions of Minimum Spanning Acycles and Persistence Diagrams of Random Complexes

Nicolas Fraiman Supported by UNC’s Junior Faculty Award funded by IBM and R.J. Reynolds Industries, and NSF grant DMS-2134107. Sayan Mukherjee Supported by funding from NSF DEB-1840223, NIH R01 DK116187-01, HFSP RGP0051/2017, NSF DMS 17-13012, and NSF CCF-1934964 and the Alexander von Humboldt Foundation. High-performance computing is partially supported by grant 2016-IDG-1013 from the North Carolina Biotechnology Center. Sayan Mukherjee would also like to acknowledge the German Federal Ministry of Education and Research within the project Competence Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig (BMBF 01IS18026B). Gugan Thoppe Supported partially by IISc Start Up Grants SG/MHRD-19-0054 and SR/MHRD-19-0040, partially by DST SERB’s Core Research Grant CRG/2021/008330, and partially by the Pratiksha Trust Young Investigator Award.

Abstract

In 1985, Frieze showed that the expected sum of the edge weights of the minimum spanning tree (MST) in the uniformly weighted graph converges to $\zeta(3)$ . Recently, Hino and Kanazawa extended this result to a uniformly weighted simplicial complex, where the role of the MST is played by its higher-dimensional analog—the Minimum Spanning Acycle (MSA). Our work goes beyond and describes the histogram of all the weights in this random MST and random MSA. Specifically, we show that their empirical distributions converge to a measure based on a concept called the shadow. The shadow of a graph is the set of all the missing transitive edges and, for a simplicial complex, it is a related topological generalization. As a corollary, we obtain a similar claim for the death times in the persistence diagram corresponding to the above weighted complex, a result of interest in applied topology.

\dajAUTHORdetails

title = The Shadow knows: Empirical Distributions of Minimum Spanning Acycles and Persistence Diagrams of Random Complexes, author = Nicolas Fraiman, Sayan Mukherjee, and Gugan Thoppe, plaintextauthor = Nicolas Fraiman, Sayan Mukherjee, and Gugan Thoppe, runningtitle = Empirical Distributions of Minimum Spanning Acycles and Persistence Diagrams, runningauthor = Nicolas Fraiman, Sayan Mukherjee, and Gugan Thoppe, copyrightauthor = Nicolas Fraiman, Sayan Mukherjee, and Gugan Thoppe, keywords = minimum spanning tree, minimum spanning acycle, persistence diagram, shadow, empirical measure, random graph, random complex, histogram, \dajEDITORdetailsyear=2023, number=2, received=13 January 2021, published=3 May 2023, doi=10.19086/da.73323,

[classification=text]

1 Introduction

Random graphs, as models for binary relations, deeply impact discrete mathematics, computer science, engineering, and statistics. However, modern-day data analysis also involves studying models with higher-order relations such as random simplicial complexes [13]. Our contributions here are vital from both these perspectives. Specifically, we consider a mean-field model for random graphs and random simplicial complexes and provide a complete description of the distribution of weights in the Minimum Spanning Tree (MST) and its higher dimensional analog—the Minimum Spanning Acycle (MSA). As a corollary, we also obtain the distribution of the death times in the associated persistence diagram.

To help judge our contributions, we refer the reader to Robert Adler’s 2014 article [1]. There he summed up the state-of-the-art in random topology as follows: while a lot is known about the asymptotic behaviors of the sums of MST and MSA weights and also their associated death times, almost nothing is known about the individual values. The work in [12] addressed this gap partially. There, the distributions of the extremal MSA weights and the extremal death times were studied. In this work, we go beyond and describe the behavior in the bulk. We emphasize that our result is new, even in the graph case.

1.1 Overview of Key Contributions

We now provide a summary of our main result along with its visual illustration. The necessary background is given in parallel.

A simplicial complex $\mathcal{K}$ on a vertex set $V$ is a collection of subsets of $V$ that is closed under the subset operation. Trivially, every graph is a simplicial complex. However, the closure is what distinguishes a simplicial complex from a hypergraph, the other standard model for studying higher-order relations. In particular, all simplicial complexes are hypergraphs, but the reverse is not true. We refer to an element of $\mathcal{K}$ with cardinality $k+1$ as a $k$ -dimensional face or simply a $k$ -face. Similarly, the $k$ -skeleton of $\mathcal{K},$ denoted $\mathcal{K}^{k},$ refers to the subset of faces with dimension less than or equal to $k.$

Let $d\geq 1$ and $(\mathcal{K}_{n},w_{u})$ be the weighted simplicial complex, where i.) $\mathcal{K}_{n}$ is the complete $d$ -skeleton on $n$ vertices, i.e., $\mathcal{K}_{n}$ is the set of all subsets that have cardinality less than or equal to $d+1;$ and ii.) $w_{u}$ is a weight function such that $w_{u}(\sigma)$ is an independent uniform $[0,1]$ random variable if $\sigma$ is a $d$ -face, and zero otherwise. Let $M_{n}$ be a $d$ -dimensional MSA¹¹1We provide the formal definition of an MSA in Section 2, but the reader unfamiliar with this term can set $d=1$ and replace MSA with a MST to read ahead. (or $d$ -MSA in short) in $(\mathcal{K}_{n},w_{u}).$ Then our main result can be informally stated as follows (see Section 2 for the details).

Theorem.

(Informal summary of our Main Result) As $n\to\infty,$ the empirical measure related to $\{nw_{u}(\sigma):\sigma\in M_{n}\}$ converges to a deterministic measure $\mu$ related to the asymptotic shadow density of the $d$ -dimensional Linial-Meshulam complex $Y(n,c/n),$ $c\geq 0;$ see (6) for the exact definition of $\mu.$

Intuitively, our result says that the normalized count of the $M_{n}$ face weights—after being scaled by $n$ —that lie in any subset of the real line asymptotically converges to that subset’s measure under $\mu.$ The next couple of paragraphs briefly explain what this measure $\mu$ is.

Recall that the Erdős-Rényi graph, denoted by $G(n,p),$ is a random graph on $n$ vertices where each edge is present with probability $p$ independently. The $d$ -dimensional Linial-Meshulam complex $Y_{d}(n,p)$ or simply $Y(n,p)$ is an analog of this model in higher dimensions. This random complex has $n$ vertices and the complete $(d-1)$ -skeleton; further, the $d$ -faces are included with probability $p$ independently. It is easy to see that $Y_{1}(n,p)$ and $G(n,p)$ are equivalent.

The shadow of a graph $G$ is the set of all the missing transitive edges. That is, it includes a vertex pair $\{u,v\},$ if $G$ ’s edge set excludes such a pair, but contains an alternative path from $u$ to $v$ via other edges. Clearly, each vertex pair in $G$ ’s shadow has endpoints in any one component of $G.$ A simplicial complex’s shadow is a related topological generalization; see Definition 3 for details.

The shadow of the random complex $Y(n,c/n)$ was investigated in [10], and its cardinality, after a suitable normalization, was shown to converge to a deterministic constant $s(c).$ Now, $\mu$ is that measure whose density is, loosely, one minus the function made up of these constants for different $c$ values.

Figure 1 illustrates our main result pictorially. The two scenarios shown there correspond to two different values of the pair $(n,d).$ The yellow plots show the normalized histogram of the set $\{nw_{u}(\sigma):\sigma\in M_{n}\},$ while the red curves show the density of the shadow based limiting measure $\mu.$ Observe that the red curves and the top of the yellow plots more or less resemble each other. Our main result loosely states that the difference between them decays to zero, as $n\to\infty.$

The weighted simplicial complex $(\mathcal{K}_{n},w_{u})$ can also be viewed as the evolving simplicial complex $(\mathcal{L}_{n}(s))_{s\in[0,1]},$ where $\mathcal{L}_{n}(s):=\{\sigma\in\mathcal{K}_{n}:w_{u}(\sigma)\leq s\}.$ In this evolution, $\mathcal{L}_{n}(s)$ monotically grows from a simplicial complex with an (almost surely) empty $d$ -skeleton (when $s=0$ ) to one that has all the $d$ -faces (when $s=1$ ). There is a natural persistence diagram associated with this process, which records the birth and death times of the different topological holes that appear and disappear as the process evolves. In [12, Theorem 3], it was shown that the set of death times in the $(d-1)$ -th dimension of this diagram exactly equals the set of weights in the $d$ -MSA of $(\mathcal{K}_{n},w_{u}).$ Consequently, the above result can also be stated in terms of the death times in the persistence diagram related to $(\mathcal{K}_{n},w_{u}).$

1.2 Related Work

Our work lies at the intersection of two broad strands of research: one concerning component sizes, shadow densities, and homologies of random graphs and random complexes, and the other dealing with the statistics of weights in random MSTs and MSAs. In this section, we look at a few of the historical milestones in these two strands.

Erdős and Rényi were the ones who initiated the first strand with their work in [2]. There they showed that $p=\log n/n$ is a sharp asymptotic threshold for connectivity in $G(n,p).$ Also that, if $p=(\log n+c)/n$ for $c\in\mathbb{R}$ and $n\to\infty,$ then i.) almost all the vertices in $G(n,p)$ lie in one single component, and ii.) the vertices outside this are all isolated and their number has a Poisson distribution with mean $e^{-c}.$ This result was subsequently refined in [3] and the new statement included the following additional facts: i.) the asymptotic order of the largest component jumps from logarithmic to linear around $p=1/n,$ and ii.) for $c>1,$ the largest component in $G(n,c/n),$ denoted $L_{n}(c),$ satisfies $|L_{n}(c)|/n\to 1-t(c),$ where $|S|$ denotes the cardinality of a set $S,$ and $t(c)$ is the unique root in $(0,1)$ of the equation

t=e^{-c(1-t)}.

(1)

A graph is a one-dimensional simplicial complex, so one can ask if phenomena like the ones above also occur in random complexes. Since connectivity is related to the vanishing of the zeroth homology, it is natural to look at the higher order Betti numbers to answer this question. Such a study was done in [8, 11], and it was found that the $(d-1)$ -th Betti number of $Y(n,p)$ indeed shows a non-vanishing to vanishing phase transition at $p=d\log n/n.$ A separate study [6] also showed that this Betti number converges to a Poisson random variable with mean $e^{-c},$ when $p=(d\log n-\log(d!)+c)/n$ for $c\in\mathbb{R}.$

The result on component sizes, in contrast, was not so easy to generalize. The challenge was in coming up with a higher dimensional analog of a component in a simplicial complex. The breakthrough came in [9] with the introduction of the shadow. The underlying motivation was that, in a sparse graph, a giant component exists if and only if the shadow includes a positive fraction of all the possible edges. With this in mind, the behavior of $\textnormal{Sh}(Y(n,c/n)),$ the $d$ -dimensional shadow (or $d$ -shadow) of $Y(n,p),$ was investigated in [10], and these were the key findings there. One, $|\textnormal{Sh}(Y(n,p))|$ changes from $o(n^{d+1})$ to $\Theta(n^{d+1})$ at $p=c_{*}/n,$ where $c_{*}$ equals $1$ when $d=1,$ but is strictly greater than $1$ for $d\geq 2.$ Two, for $c>c_{*},$ $|\textnormal{Sh}(Y(n,c/n))|/\binom{n}{d+1}\to(1-t(c))^{d},$ where $t(c)$ is the smallest root in $(0,1)$ of the equation

t=e^{-c(1-t)^{d}}.

(2)

Note that (2) matches (1) when $d=1.$ Also, for $c>1,$ $|\textnormal{Sh}(Y_{1}(n,c/n))|=\Theta(n^{2}),$ while $|L_{n}(c)|=\Theta(n).$

The pioneering work in the second strand was done by Frieze [4]. He showed that, given a complete graph on $n$ vertices with uniform $[0,1]$ weights on each edge, the expected sum of weights in the MST converges to the constant $\zeta(3)\approx 1.2,$ as $n\to\infty.$ Note that this graph is the $d=1$ case of our $(\mathcal{K}_{n},w_{u})$ model. Recently, [5] showed that a similar result exists even for the $d$ -MSA of $(\mathcal{K}_{n},w_{u})$ for $d\geq 2.$

Separately, [12] studied the extremal weights in the $d$ -MSA of $(\mathcal{K}_{n},w_{u}).$ An illustration of a main result there is given in Figure 2. It shows two distinct $(n,d)$ scenarios. In both, a plot of the values in $\{nw_{u}(\sigma)-d\log(n)+\log(d!):\sigma\in M_{n}\}$ is given. This choice of scaling gives more prominence to the extremal weights, which in the two plots are the values on the extreme right. The result in [12] states that these extremal values converge to a Poisson point process, while the rest go to $-\infty.$

1.3 Proof Outline

We now briefly describe how we combine the different facts stated above for proving our main result. Let $\sigma$ be an arbitrary $d$ -face in $(\mathcal{K}_{n},w_{u})$ and suppose $w_{u}(\sigma)=s>0.$ Let $\mathcal{L}_{n}(s^{-}):=\{\sigma\in\mathcal{K}_{n}:w(\sigma)<s\}.$ Clearly, the distributions of $\mathcal{L}_{n}(s^{-})$ and $Y(n,s$ ), and hence of their shadows, are identical. Now, [12, Lemma 32] states that $\sigma$ belongs to the $d$ -MSA of $(\mathcal{K}_{n},w_{u})$ if and only if it does not lie in the shadow²²2The word ‘shadow’ is not used in [12], but the result can be interpreted in terms of the shadow in a straightforward fashion. of $\mathcal{L}_{n}(s^{-}).$ Given that $\sigma$ could potentially have been any of the $d$ -faces outside of $\mathcal{L}_{n}(s^{-}),$ the probability that it belongs to the $d$ -MSA must then equal one minus the shadow density of $\mathcal{L}_{n}(s^{-}).$ By shadow density, we mean the shadow’s size divided by the total number of potential $d$ -faces. Based on this and the shadow density results from [10], the desired result is then easy to see.

2 Main Result

We give our main result here along with all the definitions that were skipped in Section 1. All the while, we presume that the reader is well versed with the basics of simplicial homology. If not, the necessary background can be found in [12, Section 2] and the references therein.

For a simplicial complex $\mathcal{K},$ we use $\mathcal{F}^{k}(\mathcal{K})$ and $\beta_{k}(\mathcal{K})$ to denote the set of $k$ -faces and the $k$ -th reduced Betti number³³3Throughout we assume the Betti numbers are defined using real coefficients, as in [10]., respectively. We drop $\mathcal{K}$ from these notations, when the underlying simplicial complex is clear. Separately, for a random variable $X,$ we use $\left\|X\right\|_{q}=\mathbf{E}(X^{q})^{1/q}$ to represent its $L^{q}$ -th norm.

Our first aim is to define spanning acycles [7] and MSAs, the main objects of our study. These are topological generalizations of spanning trees and MSTs, respectively. Recall that a spanning tree in a connected graph is a subset of edges that connects all the vertices together without creating any cycles. In that same spirit, a $d$ -spanning acycle in a simplicial complex $\mathcal{K}$ is a subset of $d$ -faces which when added to $\mathcal{K}^{d-1},$ i.e., the $(d-1)$ -skeleton of $\mathcal{K},$ kills the $(d-1)$ -th Betti number of $\mathcal{K}^{d-1},$ but does not create any new $d$ -cycles. Finally, a MSA in a weighted simplicial complex is simply the spanning acycle with the smallest possible weight. As shown in [12, Lemma 23], a spanning acycle and, hence, a $d$ -MSA exists in $\mathcal{K}$ if and only if $\beta_{d-1}(\mathcal{K})=0.$

Definition 1 (Spanning Acycle).

Let $\mathcal{K}$ be a simplicial complex with $\beta_{d-1}(\mathcal{K})=0$ . Then, a $d$ -spanning acycle of $\mathcal{K}$ is a set $S$ of $d$ -faces in $\mathcal{K}$ such that $\beta_{d-1}(\mathcal{K}^{d-1}\sqcup S)=\beta_{d}(\mathcal{K}^{d-1}\sqcup S)=0$ .

Definition 2 (Minimum Spanning Acycle).

Let $(\mathcal{K},w)$ be a weighted $d$ -complex with $\beta_{d-1}(\mathcal{K})=0$ . Then, a $d$ -MSA of $(\mathcal{K},w)$ is an element of ${\textnormal{arg}\min}_{S}\{w(S)\}$ , where the minimum is taken over all the $d$ -spanning acycles and $w(S)$ represents the sum of weights of the faces in $S$ .

We remark that an MSA is unique when the $d$ -faces in $\mathcal{K}$ have unique weights, e.g., see [12].

Next, we discuss the concept of a shadow. Introduced in [9], the shadow of a graph is the set of all those missing edges which when added will create a cycle. A simplicial complex’s shadow is the topological generalization of this concept. To state the formal definition, we need the notion of the dimension of a simplicial complex. This is the largest $k$ for which $\mathcal{F}^{k}\neq\emptyset.$

Definition 3 (Shadow).

For a $d$ -dimensional simplicial complex $\mathcal{K}$ with vertex set $V$ and having the complete $(d-1)$ -skeleton, its $d$ -shadow is given by

\textnormal{Sh}(\mathcal{K})=\left\{\sigma\in\tbinom{V}{d+1}\setminus\mathcal{F}^{d}:\beta_{d}(\mathcal{K}\sqcup\sigma)=\beta_{d}(\mathcal{K})+1\right\},

where $\tbinom{V}{d+1}$ denotes the set of all $(d+1)$ -sized subsets of $V.$

In analogy with the terms for the spectrum of random matrices, we next define the empirical measures that we study in this work. Let $D_{n}$ be the set of death times in the $(d-1)$ -th persistence diagram of $\left(\mathcal{L}_{n}(s)\right)_{s\in[0,1]},$ the evolving simplicial complex associated with $(\mathcal{K}_{n},w_{u}).$ Also, let $\delta$ be the Dirac measure.

Definition 4 (Empirical Measure).

The empirical measures corresponding to the face weights in $M_{n}$ and the death times in $D_{n}$ are the random measures respectively given by

\mu_{n}:=\frac{1}{\binom{n-1}{d}}\sum_{\sigma\in M_{n}}\delta_{nw(\sigma)}\quad\text{ and }\quad\tilde{\mu}_{n}:=\frac{1}{\binom{n-1}{d}}\sum_{\Delta\in D_{n}}\delta_{n\Delta}.

(3)

Finally, we define the deterministic measure $\mu$ which serves as the limit for both $\mu_{n}$ and $\tilde{\mu}_{n}.$ Let $t_{*}$ be the smallest root in $(0,1]$ of the equation

(d+1)(1-t)+(1+dt)\log t=0.

(4)

It can be checked that $t_{*}=1$ for $d=1$ and is $<1$ for $d\geq 2.$ Next, let $c_{*}=\lim_{t\to t_{*}^{-}}\psi(t),$ where $\psi(t):=-\log t/(1-t)^{d}$ for $t\in(0,1).$ That is, let

c_{*}=\begin{cases}1,&d=1,\\ -\dfrac{\log t_{*}}{(1-t_{*})^{d}},&d\geq 2.\end{cases}

(5)

Now, for $c\geq c_{*},$ let $t(c)$ be as in Section 1.2, i.e., the smallest root in $(0,1]$ of (2). Clearly, $t_{*}=t(c_{*}),$ i.e., $t_{*}=e^{-c_{*}(1-t_{*})^{d}},$ for any $d\geq 1.$ Building upon these terms, let $\mu$ be the measure whose density is given by

\textnormal{ d}\mu(x)=\frac{1-s(x)}{d+1}\textnormal{ d}x,\quad x\geq 0,

(6)

where

s(x)=\begin{cases}0,&x\leq c_{*},\\ \big{(}1-t(x)\big{)}^{d+1},&x>c_{*}.\end{cases}

(7)

We remark that this $c_{*}$ is the same constant that we came across in Section 1.3. Separately, it follows from [10, Theorem 1.4] that $s(c)$ for $c>c_{*}$ represents the asymptotic density of the shadow of $Y(n,c/n),$ i.e.,

s(c)=\lim_{n\to\infty}\frac{|\textnormal{Sh}(Y(n,c/n))|}{\binom{n}{d+1}}.

We now state our main result. Define $K(\rho,\rho^{\prime})$ to be the Kolmogorov metric between the two measures $\rho$ and $\rho^{\prime}$ , i.e., let

K(\rho,\rho^{\prime})=\sup_{x\in\mathbb{R}}\big{|}\rho(-\infty,x]-\rho^{\prime}(-\infty,x]\big{|}.

Theorem 1 (Main Result).

Let $q\geq 1$ . Then the random measures $\mu_{n}$ and $\tilde{\mu}_{n}$ converge in the Kolmogorov metric to $\mu$ in $L^{q}$ . Moreover, if $f:\mathbb{R}\to\mathbb{R}$ is a continuous function with $\mu|f|<\infty$ and

\lim_{b\to\infty}\sup_{n\geq 0}\left\|\mu_{n}|f|-\mu_{n}(|f|\wedge b)\big{.}\right\|_{q}=0,

(8)

then both $\mu_{n}f$ and $\tilde{\mu}_{n}f$ converge to $\mu f$ in $L^{q}$ .

Corollary 2.

For the unbounded function $f(x)=x^{\alpha},$ $\alpha>0,$ the hypothesis of Theorem 1 holds true for any $q\geq 1.$ That is, $\mu|f|<\infty$ and (8) holds.

Remark 3.

In Section 4, we discuss three extensions of the above result. The first two relate to the cases where the $d$ -face weights have a generic distribution and an additional noisy perturbation. The third one is about the weighted $Y(n,p)$ complex, wherein not all the potential $d$ -faces may be present.

Remark 4.

Theorem 1 readily applies when $f$ is bounded since hypothesis (8) is then trivially satisfied. Hence, an immediate consequence of our result is that $\mu_{n}$ converges to $\mu$ weakly in $L^{q}$ . However, our result also applies to unbounded functions such as $f(x)=x^{\alpha}$ for $\alpha>0$ . This is an important example since Frieze’s result [4] concerning the sum of weights in the MST and its recent generalization to higher dimensions by Hino and Kanazawa ([5, Theorem 4.11]) then become special consequences of Theorem 1. Moreover, the limiting constant $I_{d-1}^{(\alpha)}$ in [5] can now be interpreted as the $\alpha$ -th moment of the measure $\mu.$

Remark 5.

A variant of the second part of Theorem 1, obtained by replacing condition (8) by

\lim_{b\to\infty}\sup_{n\geq 0}\mathbf{P}\{\mu_{n}|f|-\mu_{n}(|f|\wedge b)\geq\epsilon\}=0

(9)

and convergence in $L^{q}$ by convergence in probability, also holds. Notice that, while the condition on $f$ that we require here is weaker, the conclusion is also weaker. The details are given in Proposition 8.

3 Proofs

We derive here all the results stated in Section 2 and, also, Proposition 8 which concretizes the claim in Remark 5. We emphasize once again that both Theorem 1 and Proposition 8 hold for a class of unbounded functions, examples of which are given in Corollary 2.

We begin with two propositions.

Proposition 6.

$\mu$ is a probability measure with the density function given in (6).

Proposition 7.

Let $c\geq 0$ and $q\geq 1.$ Then, $\lim_{n\to\infty}\left\|\mu_{n}(c,\infty)-\mu(c,\infty)\right\|_{q}=0.$

The latter proposition concerns a particular case of Theorem 1, where $f=\mathds{1}(c,\infty)$ for some $c\geq 0.$ Notice that Proposition 6 and the boundedness of $f$ imply that the condition in (8) and $\mu|f|<\infty$ trivially hold. The proofs of these two results are postponed to the end of this section.

Proof of Theorem 1.

Due to [12, Theorem 3], it suffices to derive the results only for $\mu_{n}.$ We first show that $K(\mu_{n},\mu)\to 0$ in $L^{q}$ . Let $G_{n}(x)=\mu_{n}(-\infty,x]$ and $G(x)=\mu(-\infty,x]$ . Fix $m$ and pick $c_{0},c_{1},\ldots,c_{m}$ such that $c_{0}=0$ , $c_{m}=\infty,$ and $G(c_{i+1})-G(c_{i})=1/m$ , which can be done due to Proposition 6.

For $x\in[c_{i},c_{i+1}]$ , we get

	$\displaystyle\|G_{n}(x)-G(x)\|\leq{}$	$\displaystyle\max\big{\{}G(c_{i+1})-G_{n}(c_{i}),\;G_{n}(c_{i+1})-G(c_{i})\big{\}}$
	$\displaystyle\leq{}$	$\displaystyle\max\left\{G(c_{i})+\frac{1}{m}-G_{n}(c_{i}),\;G_{n}(c_{i+1})-G(c_{i+1})+\frac{1}{m}\right\}$
	$\displaystyle\leq{}$	$\displaystyle\sum_{j=1}^{m}\|G_{n}(c_{j})-G(c_{j})\|+\frac{1}{m}.$

Since the bound is independent of $x,$ it then follows that

K(\mu_{n},\mu)=\sup_{x\in\mathbb{R}}|G_{n}(x)-G(x)|\leq\sum_{j=1}^{m}|G_{n}(c_{j})-G(c_{j})|+\frac{1}{m}.

(10)

Therefore,

\displaystyle\left\|K(\mu_{n},\mu)\right\|_{q}\leq{}

\displaystyle\sum_{j=1}^{m}\left\|G_{n}(c_{j})-G(c_{j})\right\|_{q}+\frac{1}{m}.

Clearly, $G(x)=1-\mu(x,\infty)$ and $G_{n}(x)=1-\mu_{n}(x,\infty).$ Thus, by Proposition 7, $\left\|G_{n}(c_{j})-G(c_{j})\right\|_{q}\to 0$ for all $j.$ Now, since $m$ is arbitrary, $\left\|K(\mu_{n},\mu)\right\|_{q}\to 0,$ as desired.

To extend the convergence to unbounded functions satisfying the given conditions, we begin by assuming that $f$ is non-negative. Let $\epsilon>0$ be arbitrary. For any $b>0$ , triangle inequality shows

\left\|\mu_{n}f-\mu f\right\|_{q}\leq\left\|\mu_{n}f-\mu_{n}(f\wedge b)\right\|_{q}+\left\|\mu_{n}(f\wedge b)-\mu(f\wedge b)\right\|_{q}+\left\|\mu f-\mu(f\wedge b)\right\|_{q}.

Now pick a large enough $b>0$ so that

\sup_{n\geq 0}\big{\|}\mu_{n}f-\mu_{n}(f\wedge b)\big{\|}_{q}\leq\epsilon\quad\text{ and }\quad\left\|\mu f-\mu(f\wedge b)\right\|_{q}=\mu f-\mu(f\wedge b)\leq\epsilon.

Such a $b$ indeed exists due to (8) and the fact that $\mu f<\infty.$ Therefore,

\left\|\mu_{n}f-\mu f\right\|_{q}\leq 2\epsilon+\left\|\mu_{n}(f\wedge b)-\mu(f\wedge b)\right\|_{q}.

However, $f\wedge b$ is a bounded continuous function. Also, the Kolmogorov metric dominates the Lévy metric which metrizes weak convergence. Consequently, $\limsup_{n\to\infty}\left\|\mu_{n}f-\mu f\right\|_{q}\leq 2\epsilon.$ Since $\epsilon>0$ is arbitrary, it follows that $\mu_{n}f$ converges $\mu f$ in $L^{q}$ .

It remains to deal with the case of general $f$ . Clearly, $f=f^{+}-f^{-}$ , where $f^{+}=\max\{f,0\}$ and $f^{-}=\max\{-f,0\}$ . Furthermore, both $(f^{+}-b)\;\mathds{1}[f^{+}\geq b]$ and $(f^{-}-b)\;\mathds{1}[f^{-}\geq b]$ are bounded from above by $(|f|-b)\;\mathds{1}[|f|\geq b]$ , whence it follows that $f^{+}-(f^{+}\wedge b)$ and $f^{-}-(f^{-}\wedge b))$ are bounded from above by $|f|-|f|\wedge b$ . Therefore, by repeating the above arguments for both $f^{+}$ and $f^{-}$ , individually, it is easy to see that the result holds for the case of general $f$ as well. ∎

We next discuss the claim in Remark 5. Formally, it can be stated as follows.

Proposition 8.

$K(\mu_{n},\mu)\to 0$ in probability. Moreover, if $f:\mathbb{R}\to\mathbb{R}$ is a continuous function such that (9) holds for every $\epsilon>0,$ then $\mu_{n}f$ converges to $\mu f$ in probability.

Proof.

From Proposition 7 and Markov’s inequality, we have that $\mu_{n}(c,\infty)$ converges to $\mu(c,\infty)$ in probability for any $c\geq 0.$ This along with (10) then shows that $K(\mu_{n},\mu)\to 0$ in probability, as desired. The fact that the Kolmogorov metric dominates the Lévy metric then immediately shows that $\mu_{n}f$ converges to $\mu f$ in probability whenever $f:\mathbb{R}\to\mathbb{R}$ is a bounded continuous function.

We now discuss the case of unbounded functions. Again, as in the proof of Theorem 1, it suffices to derive the result assuming $f$ to be non-negative. Let $\epsilon,\eta>0$ . Now pick a $b>0$ so that

\sup_{n\geq 0}\mathbf{P}\{|\mu_{n}f-\mu(f\wedge b)|>\epsilon\}\leq\eta\quad\text{ and }\quad|\mu f-\mu(f\wedge b)|\leq\epsilon.

This can be done on account of (9) and the fact that $\mu f<\infty.$ We then have

\{|\mu_{n}f-\mu f|>3\epsilon\}\subseteq\{|\mu_{n}f-\mu_{n}(f\wedge b)|>\epsilon\}\cup\{|\mu_{n}(f\wedge b)-\mu(f\wedge b)|>\epsilon\},

whence it follows that

\mathbf{P}\{|\mu_{n}f-\mu f|>3\epsilon\}\leq\eta+\mathbf{P}\{|\mu_{n}(f\wedge b)-\mu(f\wedge b)|>\epsilon\}.

However, $f\wedge b$ is bounded and continuous and, hence, $\mu_{n}(f\wedge b)$ converges to $\mu(f\wedge b)$ in probability as shown above. Consequently, we have that $\limsup_{n\to\infty}\mathbf{P}\{|\mu_{n}f-\mu f|>3\epsilon\}\leq\eta.$ Now, since $\epsilon,\eta$ are arbitrary, the desired result is easy to see. ∎

We now exploit a recent bound on Betti numbers [5] to prove Corollary 2 and thereby show that unbounded functions such as polynomials indeed satisfy the hypothesis of Theorem 1. Since (9) is implied by (8), such functions also satisfy the assumptions of Proposition 8. A technical result is also needed for deriving Corollary 2. We state it here, but prove it afterwards.

Lemma 9.

$t(c)$ is continuous over $(c_{*},\infty)$ and $\lim_{c\to c_{*}^{+}}t(c)=t_{*}.$ Moreover, $t(c)=O(e^{-c/2^{d}})$ as $c\to\infty.$

Proof of Corollary 2.

For any $\alpha>0,$ Lemma 9, along with the fact that $0\leq t(x)\leq 1$ for $x\geq c_{*},$ shows that $x^{\alpha}[1-(1-t(x))^{d+1}]=O(x^{\alpha}t(x))=O(x^{\alpha}e^{-x/2^{d}})$ as $x\to\infty.$ It is then straightforward to see from (6) that $\mu|f|<\infty$ for $f(x)=x^{\alpha},$ as desired.

We next show that $f(x)=x^{\alpha}$ satisfies condition (8) as well. Recall that $\beta_{d-1}(\mathcal{K})$ is the $(d-1)$ -th Betti number of the simplicial complex $\mathcal{K}.$ Since the set of $d$ -MSA weights in a weighted⁴⁴4This result requires that the weights be monotone, i.e., $w(\sigma)\leq w(\sigma^{\prime})$ whenever $\sigma\subseteq\sigma^{\prime}.$ complex equals the set of death times in the associated $(d-1)$ -th persistence diagram [12, Theorem 3], it follows by arguing as in the proof of [5, Proposition 4.9] that

	$\displaystyle\mu_{n}f-\mu_{n}(f\wedge b)$	$\displaystyle=\frac{n^{\alpha}}{\binom{n-1}{d}}\sum_{\sigma\in M_{n}}\left[w(\sigma)^{\alpha}-w(\sigma)^{\alpha}\wedge\frac{b}{n^{\alpha}}\right]$
		$\displaystyle=\frac{n^{\alpha}\alpha}{\binom{n-1}{d}}\int_{b^{1/\alpha}/n}^{1}\beta_{d-1}(\mathcal{L}_{n}(s))s^{\alpha-1}\textnormal{ d}s.$

Now substituting $r=sn$ shows that

	$\displaystyle\mu_{n}f-\mu_{n}(f\wedge b)$	$\displaystyle=\frac{\alpha}{\binom{n-1}{d}}\int_{b^{1/\alpha}}^{n}r^{\alpha-1}\beta_{d-1}(\mathcal{L}_{n}(r/n))\textnormal{ d}r$
		$\displaystyle=\frac{\alpha n^{d}}{\binom{n-1}{d}}\int_{b^{1/\alpha}}^{n}r^{\alpha-1}\frac{\beta_{d-1}(\mathcal{L}_{n}(r/n))}{n^{d}}\textnormal{ d}r.$

By using Minkowski’s inequality applied to integrals, we then get

\left\|\mu_{n}f-\mu_{n}(f\wedge b)\right\|_{q}\leq\frac{\alpha n^{d}}{\binom{n-1}{d}}\int_{b^{1/\alpha}}^{n}r^{\alpha-1}\left\|\dfrac{\beta_{d-1}(\mathcal{L}_{n}(r/n))}{n^{d}}\right\|_{q}\textnormal{ d}r.

Finally, since $\mathcal{L}_{n}(s)$ has the same distribution as that of $Y(n,s)$ for $s\in[0,1],$ it follows that

\left\|\mu_{n}f-\mu_{n}(f\wedge b)\right\|_{q}\leq\frac{\alpha n^{d}}{\binom{n-1}{d}}\int_{b^{1/\alpha}}^{n}r^{\alpha-1}\left\|\dfrac{\beta_{d-1}(Y(n,r/n))}{n^{d}}\right\|_{q}\textnormal{ d}r.

(11)

Now, $0\leq\beta_{d-1}(Y(n,r/n))\leq\binom{n}{d},$ whence

0\leq\dfrac{\beta_{d-1}(Y(n,r/n))}{n^{d}}\leq\frac{1}{d!}.

Moreover, (4.14) from [5] shows that there is a $\ell>1\vee q\alpha$ such that

\mathbf{E}\left[\frac{\beta_{d-1}(Y(n,r/n))}{n^{d}}\right]\leq 1\wedge\dfrac{C}{r^{\ell}}.

By writing $q=q-1+1$ and then using the above two inequalities, we get

\left\|\dfrac{\beta_{d-1}(Y(n,r/n))}{n^{d}}\right\|_{q}\leq\left(\frac{1}{d!}\right)^{(q-1)/q}\left(1\wedge\dfrac{C^{1/q}}{r^{\ell/q}}\right)

for all $r\in[0,n]$ . Now, applying this bound in (11), it follows that there exists a constant $K\geq 0$ such that

\left\|\mu_{n}f-\mu_{n}(f\wedge b)\right\|_{q}\leq\frac{K}{\ell/q-\alpha}b^{-(\ell/q-\alpha)/\alpha}

for all sufficiently large $b$ (so that $C^{1/q}/r^{\ell/q}\leq 1$ for all $r\geq b^{1/\alpha}$ ). This then shows that the condition in (8) holds, as desired. ∎

It remains to prove Lemma 9, and Propositions 6 and 7.

Proof of Lemma 9.

In [10, Appendix B], it is shown that $\psi(t)$ (defined above (5)) is continuous and monotonically decreasing for $0<t<t(c_{*}).$ The monotonicity implies that i.) $t(c)=\psi^{-1}(c)$ is continuous over $(c_{*},\infty),$ ii.) $t_{*}=t(c_{*})=\lim_{c\to c_{*}^{+}}t(c),$ and iii.) $\lim_{c\to\infty}t(c)=0.$ The latter fact in turn shows $1-t(c)\geq 1/2$ (say) whenever $c$ is large. This, combined with (2), implies $t(c)=O(e^{-c/2^{d}}),$ as desired. ∎

To prove Propositions 6 and 7, we need two additional technical results. Let

g(c)=\begin{cases}0,&0\leq c\leq c_{*},\\ \displaystyle ct(c)\big{(}1-t(c)\big{)}^{d}+\frac{c}{d+1}\big{(}1-t(c)\big{)}^{d+1}-\big{(}1-t(c)\big{)},&c>c_{*},\end{cases}

(12)

and $h(c)=g(c)+1-c/(d+1).$

Lemma 10.

For $c\geq 0,$

\lim_{n\to\infty}\left\|\frac{\beta_{d-1}(Y(n,c/n))}{n^{d}}\mathds{1}_{[0,n]}(c)-\frac{h(c)}{d!}\right\|_{q}=0.

Proof.

This is shown in the proof of [5, Theorem 4.11]. ∎

Lemma 11.

$\mu$ is a well-defined measure. Moreover, for any $c\geq 0$ , we have $\mu(c,\infty)=h(c)$ .

Proof.

Lemma 9 shows that the density function $s(x)$ is continuous everywhere for $d=1,$ and discontinous only at $c_{*}$ for $d\geq 2.$ This implies that the measure $\mu$ is well-defined.

With regards to the other claim, we first show that $\mu(c,\infty)$ is finite for all $c\geq 0.$ Clearly,

\mu(c,\infty)\leq\frac{2}{d+1}\max\left\{\int_{0}^{c_{*}}(1-s(x))\textnormal{ d}x,\int_{c_{*}}^{\infty}(1-s(x))\textnormal{ d}x\right\}.

Separately, since $0\leq t(x)\leq 1$ for all $x\geq c_{*}$ , it follows from (7) and Lemma 9 that $1-s(x)=O(t(x))=O(e^{-x/2^{d}}).$ From these observations, it is then easy to see that $\mu(c,\infty)$ is finite, as desired.

Next, we claim that $h$ is continuous over $[0,\infty).$ For $c\neq c_{*},$ this follows from $h$ ’s definition and Lemma 9. At $c_{*},$ this holds due to (4) and (5) and since $\lim_{c\to c_{*}^{+}}t(c)=t_{*},$ which imply $\lim_{c\to c_{*}^{+}}g(c)=0.$

Our final claim is that $\lim_{c\to\infty}h(c)=\mu(c,\infty)=0$ and $h^{\prime}(c)=\frac{\textnormal{ d}missing}{\textnormal{ d}c}\mu(c,\infty)$ for all $c\neq c_{*}.$ This and the continuity of $h(c)$ and $\mu(c,\infty)$ at $c_{*}$ will then show that $h(\cdot)=\mu(\cdot,\infty),$ as desired.

Since $\mu(c,\infty),c\geq 0,$ is finite, we have $\lim_{c\to\infty}\mu(c,\infty)=0.$ In contrast, $t(c)\in(0,1]$ implies $h(c)=O(ct(c))$ as $c\to\infty;$ this when combined with Lemma 9 then shows that $\lim_{c\to\infty}h(c)=0.$

Regarding the statement on the derivatives, we begin by showing that the two functions are differentiable for all $c\neq c_{*}.$ This statement holds for $h$ on account of (12) and [10, Claim 5.3], while it is true for $\mu(\cdot,\infty)$ simply by definition. From [10, Claim 5.3], we also have that

h^{\prime}(c)=\begin{cases}-\frac{1}{d+1},&c<c_{*},\\ -\frac{1}{d+1}\big{[}1-\big{(}1-t(c)\big{)}^{d+1}\big{]},&c>c_{*}.\end{cases}

From this, it is then easy to see that $h^{\prime}(c)=\frac{\textnormal{ d}missing}{\textnormal{ d}c}\,\mu(c,\infty)$ for all $c\neq c_{*}$ , as desired. ∎

Proof of Proposition 6.

This follows from Lemma 11 which shows that $\mu(0,\infty)=h(0)=1.$ ∎

Proof of Proposition 7.

Let $\{D_{i}\}$ be the set of death times in the $(d-1)$ -th persistence diagram of the filtration $\left(\mathcal{L}_{n}(s)\right)_{s\in[0,1]}.$ Then, for all $n\geq c,$ we have

$\displaystyle\binom{n-1}{d}\mu_{n}(c,\infty)=\sum_{\sigma\in M_{n}}\delta_{nw(\sigma)}(c,\infty)={}$	$\displaystyle\left\|\left\{\sigma\in M_{n}:w(\sigma)>\frac{c}{n}\right\}\right\|$
$\displaystyle={}$	$\displaystyle\left\|\left\{i:D_{i}>\frac{c}{n}\right\}\right\|$	(13)
$\displaystyle={}$	$\displaystyle\beta_{d-1}\left(\mathcal{L}_{n}\left(\frac{c}{n}\right)\right)$
$\displaystyle\overset{d}{=}{}$	$\displaystyle\beta_{d-1}\left(Y\left(n,\frac{c}{n}\right)\right),$

where (13) follows due to [12, Theorem 3], while the last relation holds since $\mathcal{L}_{n}\left(c/n\right)$ has the same distribution as $Y(n,c/n).$ From Lemma 10 and the fact that ${n-1\choose d}/n^{d}\to 1/d!$ , it then follows that $\lim_{n\to\infty}\left\|\mu_{n}(c,\infty)-h(c)\right\|_{q}=0.$ The desired result now holds due to Lemma 11. ∎

4 Extensions

We discuss three different extensions of Theorem 1 here. The first one relates to the case where the $d$ -face weights come from some generic distribution. The next one concerns the robustness of our result to noisy perturbations in the $d$ -face weights. The third and final one is about the randomly weighted $Y(n,p)$ complex, wherein the set of $d$ -faces is random and may not include all the potential ones.

4.1 Generic distribution for $d$ -face weights

Let $(\mathcal{K}_{n},w_{g})$ be the weighted simplicial complex, where $\mathcal{K}_{n}$ is as before, while the weight function $w_{g}$ is such that i.) $\{w_{g}(\sigma):\sigma\in\mathcal{F}^{d}\}$ are real-valued i.i.d. random variables with some generic distribution $F,$ and ii.) $w_{g}(\sigma)=-\infty$ whenever $|\sigma|\leq d.$ We claim that a version of our result also holds in this setup. Let

\mu_{n}^{g}=\frac{1}{\binom{n-1}{d}}\sum_{\sigma\in M_{n}^{g}}\delta_{nF(w_{g}(\sigma))}\quad\text{ and }\quad\tilde{\mu}_{n}^{g}:=\frac{1}{\binom{n-1}{d}}\sum_{\Delta\in D_{n}^{g}}\delta_{nF(\Delta)},

(14)

where $M_{n}^{g}$ is the $d$ -MSA in $(\mathcal{K}_{n},w_{g}),$ and $D_{n}^{g}$ is the set of death times in the associated persistence diagram.

Corollary 12.

Suppose $F$ is continuous. Then, Theorem 1 holds for $\mu_{n}^{g}$ and $\tilde{\mu}_{n}^{g}.$

Proof.

From $(\mathcal{K}_{n},w_{g}),$ we construct a new weighted complex $(\mathcal{K}_{n},w_{F}),$ where $w_{F}(\sigma)=F(w_{g}(\sigma))$ for all $\sigma\in\mathcal{K}_{n}.$ Since $F$ is continuous, $\{F(w_{g}(\sigma)):\sigma\in\mathcal{F}^{d}\}$ is a set of i.i.d. $U[0,1]$ random variables. Also, $F(w_{g}(\sigma))=0,$ whenever $|\sigma|\leq d.$ Hence, Theorem 1 readily apply to $(\mathcal{K}_{n},w_{F}).$ Furthermore, when viewed in the context of $(\mathcal{K}_{n},w_{F}),$ $\mu_{n}^{g}$ and $\tilde{\mu}_{n}^{g}$ resemble the measures given in (3). The only thing that remains to be checked is if $M_{n}^{g}$ is a $d$ -MSA in $(\mathcal{K}_{n},w_{F}).$ However, since any distribution function is non-decreasing, this is trivially true. The desired claim is now easy to see. ∎

4.2 Noisy perturbations in $d$ -face weights

We establish here the robustness of our result to additional noisy perturbations in the $d$ -face weights.

Consider the weighted complex $(\mathcal{K}_{n},w_{g}^{\prime}),$ where $\mathcal{K}_{n}$ is as before, and

w_{g}^{\prime}(\sigma)=\begin{cases}-\infty&\text{ if $|\sigma|\leq d,$}\\ w_{g}(\sigma)+\epsilon_{n}(\sigma)&\text{ if $\sigma\in\mathcal{F}^{d}.$}\end{cases}

In this sum, $\{w_{g}(\sigma):\sigma\in\mathcal{F}^{d}\}$ are real-value i.i.d. random variables with some generic distribution $F,$ while $\{\epsilon_{n}(\sigma):\sigma\in\mathcal{F}^{d}\}$ are a separate set of random variables denoting noisy perturbations in the $d$ -face weights. Note that these latter variables need not be independent nor identically distributed. Let

\mu_{n}^{\prime}=\frac{1}{\binom{n-1}{d}}\sum_{\sigma\in M_{n}^{\prime}}\delta_{nF(w_{g}^{\prime}(\sigma))}\quad\text{ and }\quad\tilde{\mu}_{n}^{\prime}:=\frac{1}{\binom{n-1}{d}}\sum_{\Delta\in D_{n}^{\prime}}\delta_{nF(\Delta)},

where $M_{n}^{\prime}$ is the $d$ -MSA in $(\mathcal{K}_{n},w_{g}^{\prime}),$ and $D_{n}^{\prime}$ is the set of death times in the associated persistence diagram.

Corollary 13.

Suppose $F$ is Lipschitz continuous with Lipschitz constant $\zeta>0,$ and $n\|\epsilon_{n}\|_{\infty}\to 0$ in probability, where $\|\epsilon_{n}\|_{\infty}:=\max|\epsilon_{n}(\sigma)|.$ Then, Theorem 1 holds for $\mu_{n}^{\prime}$ and $\tilde{\mu}_{n}^{\prime}.$

Proof.

We only discuss the $\mu_{n}^{\prime}$ result since the $\tilde{\mu}_{n}^{\prime}$ case can be dealt with similarly. For the $\mu_{n}^{\prime}$ case, it suffices to show that an analog of Proposition 7 holds. This is because the arguments from the proof of Theorem 1 can then be again used to get the actual result.

Let $(\mathcal{K}_{n},w_{g})$ be the weighted complex defined in Section 4.1, but this time we couple $w_{g}$ to the one in the definition of $w_{g}^{\prime}.$ Also, let $M_{n}^{g}$ denote the $d$ -MSA in $(\mathcal{K}_{n},w_{g}),$ and let $\mu_{n}^{g}$ be as in (14).

From⁵⁵5There is a typo in the statement of [12, Lemma 38]. In the second and third displays, $\gamma(D_{i})$ should be $\gamma(D_{i}^{\prime})$ and $\gamma(\phi(\sigma_{i}))$ must be $\gamma(\phi^{\prime}(\sigma^{\prime}_{i})).$ This follows from Theorem 4 in ibid. [12, Lemma 38], we have

\inf_{\gamma}\max_{\sigma\in\mathcal{F}^{d}}|w_{g}(\sigma)-\gamma(w_{g}(\sigma))|\leq\|\epsilon_{n}\|_{\infty},

where the infimum is over all the bijections $\gamma:\{w_{g}(\sigma):\sigma\in M_{n}^{g}\}\to\{w_{g}^{\prime}(\sigma^{\prime}):\sigma^{\prime}\in M_{n}^{\prime}\}.$ Hence, for any measurable set $K\subseteq\mathbb{R},$ it follows that

\mu_{n}^{g}(K^{-\xi_{n}})\leq\mu_{n}^{\prime}(K)\leq\mu_{n}^{g}(K^{\xi_{n}}),

(15)

where $\xi_{n}:=\zeta n\|\epsilon_{n}\|_{\infty},$ and, for $\xi>0,$

K^{\xi}:=\{x\in\mathbb{R}:|x-y|\leq\xi\text{ for some $y\in K$ }\}\quad\text{ and }\quad K^{-\xi}:=\{x\in K:|x-y|>\xi\text{ for all $y\not\in K$ }\}.

Let $\xi>0$ be arbitrary. Then,

$\displaystyle\|\mu_{n}^{\prime}(c,\infty)-\mu_{n}^{g}$	$\displaystyle(c,\infty)\|$
$\displaystyle\leq{}$	$\displaystyle\max\left\{\|\mu_{n}^{g}(c-\xi_{n},\infty)-\mu_{n}^{g}(c,\infty)\|,\|\mu_{n}^{g}(c,\infty)-\mu_{n}^{g}(c+\xi_{n},\infty)\|\right\}$	(16)
$\displaystyle\leq{}$	$\displaystyle\mu_{n}^{g}(c-\xi_{n},c+\xi_{n}]$
$\displaystyle\leq{}$	$\displaystyle\mu_{n}^{g}(c-\xi,c+\xi]\mathds{1}[\xi_{n}\leq\xi]+\mu_{n}^{g}(c-\xi_{n},c+\xi_{n}]\mathds{1}[\xi_{n}>\xi]$
$\displaystyle\leq{}$	$\displaystyle\mu_{n}^{g}(c-\xi,c+\xi]+\mathds{1}[\xi_{n}>\xi]$	(17)
$\displaystyle\leq{}$	$\displaystyle\|\mu_{n}^{g}(c-\xi,c+\xi]-\mu(c-\xi,c+\xi]\|+\mu(c-\xi,c+\xi]+\mathds{1}[\xi_{n}>\xi].$

where (16) follows from (15), while the second term in (17) is obtained by using the fact that $\mu_{n}^{g}(c-\xi_{n},c+\xi_{n}]\leq 1$ which itself holds since $\mu_{n}^{g}$ is a probability measure.

Now, Proposition 7 applies to $\mu_{n}^{g}.$ This, along with the given condition on $\|\epsilon_{n}\|_{\infty}$ and the fact that $\xi$ is arbitrary, then shows that $|\mu_{n}^{\prime}(c,\infty)-\mu_{n}^{g}(c,\infty)|\to 0$ in $L^{q},$ as desired. ∎

4.3 Weighted $Y(n,p)$ complex

Consider the weighted simplicial complex $(Y,w_{g}),$ where $Y$ is a sample of $Y(n,p),$ and $w_{g}$ is such that $\{w_{g}(\sigma):\sigma\in\mathcal{F}^{d}(Y)\}$ is a set of i.i.d. random variables with some generic distribution $F,$ while $w_{g}(\sigma)=-\infty$ whenever $|\sigma|\leq d.$ This complex differs from $(\mathcal{K}_{n},w_{g})$ since not all the potential $d$ -faces may be present here. Despite this, we now show that a version of our result holds for this complex as well. Let $M^{g}_{n,p}$ be a $d$ -MSA in $(Y,w_{u})$ and $D^{g}_{n,p}$ the set of death times in the associated persistence diagram, whenever $\beta_{d-1}(Y)=0$ (so that $M^{g}_{n,p}$ exists). Further, let

\mu_{n,p}^{g}=\frac{1}{\binom{n-1}{d}}\sum_{\sigma\in M^{g}_{n,p}}\delta_{npF(w_{g}(\sigma))}\quad\text{ and }\quad\tilde{\mu}_{n,p}^{g}:=\frac{1}{\binom{n-1}{d}}\sum_{\Delta\in D^{g}_{n,p}}\delta_{npF(\Delta)},

whenever $\beta_{d-1}(Y)=0,$ and some arbitrary probability measure otherwise.

Corollary 14.

Suppose $p=(d\log n+\omega_{n})/n$ for any function $\omega_{n}$ that tends to infinity. Then, Thorem 1 holds for $\mu_{n,p}^{g}$ and $\tilde{\mu}_{n,p}^{g}.$

Proof.

As in the proof of Corollary 13, it suffices to show that an analog of Proposition 7 holds for $\mu_{n,p}^{g}.$

From $(Y,w_{g}),$ we construct a coupled weighted complex $(\mathcal{K}_{n},w_{Y}),$ where $\mathcal{K}_{n}\supseteq Y$ is the complete $d$ -skeleton on $n$ vertices (as before), and $w_{Y}:\mathcal{K}_{n}\to[0,1]$ is the weight function given by

w_{Y}(\sigma)=\begin{cases}U[p,1]&\text{if $\sigma\in\mathcal{F}^{d}(\mathcal{K}_{n})\setminus\mathcal{F}^{d}(Y),$}\\ pF(w_{g}(\sigma))&\text{if $\sigma\in\mathcal{F}^{d}(Y)$ or $|\sigma|\leq d.$}\end{cases}

(18)

In the above expression, $U[p,1]$ is an independent uniform random variable on the interval $[p,1].$ It is easy to see that $\{w_{Y}(\sigma):\sigma\in\mathcal{F}^{d}(\mathcal{K}_{n})\}$ is a set of i.i.d. $U[0,1]$ random variables. Also, $w_{Y}(\sigma)=0$ whenever $|\sigma|\leq d.$ Therefore, Theorem 1 readily applies to $(\mathcal{K}_{n},w_{Y}).$

Let $\mu_{n}$ be as in (3), but defined in the context of $(\mathcal{K}_{n},w_{Y}).$ Similarly, let $M_{n}$ denote a $d$ -MSA in $(\mathcal{K}_{n},w_{Y}).$ Note that a $d$ -MSA always exists in this complex, since $\mathcal{K}_{n}$ is a complete $d$ -skeleton.

Now, suppose the event $\{\beta_{d-1}(Y)=0\}$ holds. Then, from [12, Lemma 23], a $d$ -MSA $M^{g}_{n,p}$ exists in $(Y,w_{g}).$ Further, from (18), the $d$ -faces of $\mathcal{K}_{n},$ that are present in $Y,$ have weights smaller than those that aren’t; the former have weights less than or equal to $p,$ while the latter at least $p.$ Combining this with the fact a distribution function is always monotone, it follows that $M^{g}_{n,p}$ is also a $d$ -MSA in $(\mathcal{K}_{n},w_{Y}).$

Therefore,

	$\displaystyle\|\mu_{n,p}^{g}(c,\infty)-\mu_{n}(c,\infty)\|\leq{}$	$\displaystyle\|\mu_{n,p}^{g}(c,\infty)-\mu_{n}(c,\infty)\|\mathds{1}[\beta_{d-1}(Y)=0]+\|\mu_{n,p}^{g}(c,\infty)-\mu_{n}(c,\infty)\|\mathds{1}[\beta_{d-1}(Y)\neq 0]$
	$\displaystyle\leq{}$	$\displaystyle\|\mu_{n,p}^{g}(c,\infty)-\mu_{n}(c,\infty)\|\mathds{1}[\beta_{d-1}(Y)\neq 0]$
	$\displaystyle\leq{}$	$\displaystyle 2\mathds{1}[\beta_{d-1}(Y)\neq 0],$

where the second relation makes use of the fact that $\mu_{n,p}^{g}=\mu_{n}$ on the event $\{\beta_{d-1}(Y)=0\},$ while the last holds since $\mu_{n,p}^{g}$ and $\mu_{n}$ are probability measures.

Now, due to the condition on $p,$ it follows from [8, Theorem 1.1] and [11, Theorem 1.1] that $\lim_{n\to\infty}\mathbf{P}\{\beta_{d-1}(Y)=0\}=1.$ The desired result is now easy to see. ∎

Figure 3 illustrates the above result for the case where $F$ is the uniform distribution on $[0,1]$ and $p$ is constant. We consider two different cases for the $(n,d)$ pair. In each case, we also consider three different values of $p,$ and look at the $d$ -MSA $M_{n,p}$ in one sample each of the resultant $Y(n,p)$ complex; we resample if the MSA does not exist. Notice that all the panels have two distinct plots: the blue one is the histogram corresponding to $\{nw(\sigma):\sigma\in M_{n,p}\},$ while the yellow one corresponds to $\{npw(\sigma):\sigma\in M_{n,p}\}.$ Clearly, unlike the blue plots, the yellow ones look similar irrespective of the $p$ values.

5 Summary and Future Directions

Our work went beyond the sum and studied the distribution of the individual weights in a random $d$ -MSA. A key contribution here is an explicit connection between the $d$ -MSA weights and the $d$ -shadow.

In future, one of the scenarios that we interested in exploring is the streaming setup. Here, the $d$ -faces along with their weights are revealed one by one and the $d$ -MSA is then incrementally updated by either including the face and then updating the $d$ -MSA estimate or by discarding the face altogether. Note that we don’t assume the faces are revealed in any order. The goal then is to understand how the distribution of the weights in the the $d$ -MSA changes with time.

Our preliminary conjecture on how the sum of these weights will change in the uniformly weighted setup is given in Figure 4. In this figure, we consider two scenarios: i.) $n=100$ and $d=1$ and ii.) $n=30$ and $d=2.$ In both the scenarios, the weights of the $d$ -faces are i.i.d. uniform $[0,1]$ random variables which are revealed one at a time in an arbitrary order. That is, the weight of each $d$ -face is initially presumed to be $1$ which then gets replaced with the actual value when it gets revealed. Accordingly, we begin with an arbitrary $d$ -MSA and then sequentially update it as the weight of a new face gets revealed.

Let $M_{n}(k)$ denote the updated $d$ -MSA after the weight of the $k$ -th face is revealed. Also, let $c_{1}=n/\tbinom{n}{d-1}$ and $c_{2}=\binom{n}{d+1}\mu f$ for $f(x)=x.$ The blue and red curves in the figure show the value of $c_{1}w(M_{n}(k))$ and $c_{2}/k,$ respectively. The red curve is our guess for $c_{1}M_{n}(k)$ based on the results in this paper. Finally, the black curve shows the constant value $\mu x.$ Clearly, the blue and red curves closely match and they both get close to the black one as the value of $k$ increases.

Some of the other directions that we wish to pursue in the future are as follows.

1.

Geometric random graphs and complexes: Extending our results from the Erdős-Rényi and Linial-Meshulam models to geometric random graphs and geometric complexes and understanding differences between the two scenarios.
2.

Large deviation results and central limit theorems: Obtaining general large deviation results for the Kolmogorov distance between $\mu_{n}$ and $\mu$ as well as central limit theorems characterizing the variance of the weight distribution.
3.

Rates of convergence: Deriving the rate at which $\mu_{n}$ converges to $\mu.$

Acknowledgments

The authors would like to thank Matthew Kahle, Omer Bobrowski, Primoz Skraba, Ron Rosenthal, Robert Adler, Christina Goldschmidt, D. Yogeshwaran, and Takashi Owada for useful suggestions. A portion of this work was done when Gugan Thoppe was a postdoc with Sayan Mukherjee at Duke University.

References

[1] Robert Adler. TOPOS: Pinsky was wrong, Euler was right. \urlhttps://imstat.org/2014/11/18/topos-pinsky-was-wrong-euler-was-right/, 2014. [Online; accessed 14-Nov.-2020].
[2] Paul Erdős and Alfred Rényi. On random graphs I. Publicationes Mathematicae Debrecen, 6(290-297):18, 1959.
[3] Paul Erdős and Alfred Rényi. On the evolution of random graphs. Publ. Math. Inst. Hungary. Acad. Sci., 5:17–61, 1960.
[4] Alan M Frieze. On the value of a random minimum spanning tree problem. Discrete Applied Mathematics, 10(1):47–56, 1985.
[5] Masanori Hino and Shu Kanazawa. Asymptotic behavior of lifetime sums for random simplicial complex processes. Journal of the Mathematical Society of Japan, 2019.
[6] Matthew Kahle and Boris Pittel. Inside the critical window for cohomology of random k-complexes. Random Structures & Algorithms, 48(1):102–124, 2016.
[7] Gil Kalai. Enumeration of Q-acyclic simplicial complexes. Israel Journal of Mathematics, 45(4):337–351, 1983.
[8] Nathan Linial and Roy Meshulam. Homological connectivity of random 2-complexes. Combinatorica, 26(4):475–487, 2006.
[9] Nathan Linial, Ilan Newman, Yuval Peled, and Yuri Rabinovich. Extremal problems on shadows and hypercuts in simplicial complexes. arXiv preprint arXiv:1408.0602, 2014.
[10] Nathan Linial and Yuval Peled. On the phase transition in random simplicial complexes. Annals of Mathematics, pages 745–773, 2016.
[11] Roy Meshulam and Nathan Wallach. Homological connectivity of random k-dimensional complexes. Random Structures & Algorithms, 34(3):408–417, 2009.
[12] Primoz Skraba, Gugan Thoppe, and D Yogeshwaran. Randomly Weighted $d$ - complexes: Minimal Spanning Acycles and Persistence Diagrams. Electronic Journal of Combinatorics, 27(2), 2019.
[13] Matthew Kahle. Topology of random simplicial complexes: a survey. AMS Contemporary Mathematics, 620, 201–222, 2014.

{dajauthors}{authorinfo}

[nico] Nicolas Fraiman
Assistant Professor
University of North Carolina at Chapel Hill
Chapel Hill, NC, United States
fraiman\imageatemail\imagedotunc\imagedotedu
\urlhttps://fraiman.web.unc.edu/ {authorinfo}[sayan] Sayan Mukherjee
Professor
Duke University,
Durham, NC, United States
Center for Scalable Data Analytics and Artificial Intelligence, Universität Leipzig
Max Planck Institute for Mathematics in the Sciences
Leipzig, Saxony, Germany sayan.mukherjee\imageatmis\imagedotmpg\imagedotde
\urlhttps://sayanmuk.github.io/ {authorinfo}[gugan] Gugan Thoppe
Assistant Professor
Indian Institute of Science
Bengaluru, India
gthoppe\imageatiisc\imagedotac\imagedotin
\urlhttps://sites.google.com/site/gugancth/

The Shadow Knows: Empirical Distributions of Minimum Spanning Acycles and Persistence Diagrams of Random Complexes

Abstract

1 Introduction

1.1 Overview of Key Contributions

Theorem.

1.2 Related Work

1.3 Proof Outline

2 Main Result

Definition 1 (Spanning Acycle).

Definition 2 (Minimum Spanning Acycle).

Definition 3 (Shadow).

Definition 4 (Empirical Measure).

Theorem 1 (Main Result).

Corollary 2.

Remark 3.

Remark 4.

Remark 5.

3 Proofs

Proposition 6.

Proposition 7.

Proof of Theorem 1.

Proposition 8.

Proof.

Lemma 9.

Proof of Corollary 2.

Proof of Lemma 9.

Lemma 10.

Proof.

Lemma 11.

Proof.

Proof of Proposition 6.

Proof of Proposition 7.

4 Extensions

4.1 Generic distribution for dd-face weights

Corollary 12.

Proof.

4.2 Noisy perturbations in dd-face weights

Corollary 13.

Proof.

4.3 Weighted Y​(n,p)Y(n,p) complex

Corollary 14.

Proof.

5 Summary and Future Directions

Acknowledgments

References

4.1 Generic distribution for $d$ -face weights

4.2 Noisy perturbations in $d$ -face weights

4.3 Weighted $Y(n,p)$ complex