This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Maximal persistence in random clique complexes

Ayat Ababneh Department of Mathematics, The University of Jordan, Amman 11942, Jordan  and  Matthew Kahle Department of Mathematics, The Ohio State University, Columbus, OH, USA
Abstract.

We study the persistent homology of an Erdős–Rényi random clique complex filtration on nn vertices. Here, each edge ee appears at a time pe[0,1]p_{e}\in[0,1] chosen uniform randomly in the interval, and the persistence of a cycle σ\sigma is defined as p2/p1p_{2}/p_{1}, where p1p_{1} and p2p_{2} are the birth and death times of the cycle respectively. We show that for fixed k1k\geq 1, with high probability the maximal persistence of a kk-cycle is of order roughly n1/k(k+1)n^{1/k(k+1)}. These results are in sharp contrast with the random geometric setting where earlier work by Bobrowski, Kahle, and Skraba shows that for random Čech and Vietoris–Rips filtrations, the maximal persistence of a kk-cycle is much smaller, of order (logn/loglogn)1/k\left(\log n/\log\log n\right)^{1/k}.

Both authors gratefully acknowledge the support of NSF-DMS #2005630.

1. Introduction

Recently, the topology of random simplicial complexes has been an active area of study — see, for example, the surveys [14] and [3]. This study has had applications in topological data analysis, including in neuroscience [11]. We will assume that the reader is familiar with the notions of persistent homology and a persistence diagram [9]. In topological inference, one sometimes considers points far from the diagonal in the persistence diagram to be representing “signal” and points near the diagonal as representing “noise”.

With this in mind, Bobrowski, Kahle, and Skraba studied maximally persistent cycles in random geometric complexes in [4]. They showed that with high probability the maximal persistence of a kk-dimensional cycle in a random geometric complex in d\mathbb{R}^{d} is (logn/loglogn)1/k\asymp\left(\log n/\log\log n\right)^{1/k}. We write fgf\asymp g if ff and gg grow at the same rate in the sense that there exist constants c1,c2>0c_{1},c_{2}>0 such that c1f(n)g(n)c2f(n)c_{1}f(n)\leq g(n)\leq c_{2}f(n) for all large enough nn. Both the Vietoris–Rips and Čech filtrations have an underlying parameter rr. Persistence of a cycle is measured multiplicatively as r2/r1r_{2}/r_{1} where r1r_{1} and r2r_{2} are the birth and death radius.

Our main result is that for fixed k1k\geq 1, the maximal persistence of kk-cycles in a random clique complex filtration is of order roughly n1/k(k+1)n^{1/k(k+1)}. A precise statement is given in Theorem 4.1. Here we measure the persistence of a cycle as p2/p1p_{2}/p_{1}, where p1p_{1} and p2p_{2} are the birth and death edge probabilities, respectively.

The comparison between the Erdős–Rényi and random geometric settings may be more apparent if we renormalize so that the persistence of the associated filtrations can be measured on the same scale. One natural way to try to do this is to reconsider maximal persistence in random geometric complexes, using instead birth and death edge probability rather than radius.

The edge probability PP in a random geometric complex is of order PrdP\asymp r^{d}. So if

r2/r1(logn/loglogn)1/k,r_{2}/r_{1}\asymp\left(\log n/\log\log n\right)^{1/k},

then

P2/P1(logn/loglogn)d/k.P_{2}/P_{1}\asymp\left(\log n/\log\log n\right)^{d/k}.

As long as dd is fixed, (logn/loglogn)d/k\left(\log n/\log\log n\right)^{d/k} is still much smaller than the maximal persistence of cycles in the random clique complex. However, this parameterization makes it clear that if dd grows, we expect cycles to persist for longer. It is known that random geometric graphs in dimension growing quickly enough converge in total variation distance to Erdős-Rényi random graphs, and this connection has been further explored and quantified in a number of recent papers — see, for example [7, 6, 18]. From this point of view, our main result can be seen as a “curse of dimensionality” for topological inference—as the ambient dimension gets bigger, noisy cycles persist for much longer.

2. Topology of random clique complexes

In this section, we review the definition of the random clique complex, and briefly survey the literature on topology of random clique complexes.

The following random graph is sometimes called the Erdős–Rényi model.

Definition 2.1.

For n1n\geq 1 and p[0,1]p\in[0,1], G(n,p)G(n,p) is the probability space of all simplicial graphs on nn vertices where every edge is included with probability pp, jointly independently.

We use the notation GG(n,p)G\sim G(n,p) to indicate that a graph GG is chosen according to this distribution. We say that GG has a given property with high probability (w.h.p.) if the probability that GG has the property tends to one as nn\to\infty.

We define X(n,p)X(n,p) to be the clique complex of the Erdős–Rényi random graph G(n,p)G(n,p). We write XX(n,p)X\sim X(n,p) to indicate that XX is a random simplicial complex chosen according to this distribution.

In  [12], Kahle studied the topology of random clique complexes, and the following theorem roughly identified the threshold for homology to appear.

Theorem 2.2.

Let XX(n,p)X\sim X(n,p) be the random clique complex, and assume k1k\geq 1 is fixed.

  1. (1)

    If pnαp\leq n^{-\alpha} with α>1k\alpha>\frac{1}{k}, then w.h.p. Hk(X)=0H_{k}(X)=0. On the other hand,

  2. (2)

    If n1kpn1k+1n^{-\frac{1}{k}}\ll p\ll n^{-\frac{1}{k+1}} then w.h.p. Hk(X)0H_{k}(X)\neq 0.

We use the notation fgf\ll g to indicate limnf/g=0\lim_{n\to\infty}f/g=0. Theorem 2.2 shows that the threshold for the appearance of kkth homology is roughly n1/kn^{-1/k}. The following is the main result of a later paper, [13], showing that the threshold for the vanishing of kkth homology is roughly n1/(k+1)n^{-1/(k+1)}.

Theorem 2.3.

Let k1k\geq 1 and ϵ>0\epsilon>0 be fixed, and XX(n,p)X\sim X(n,p). If

p((k2+1+ϵ)lognn)1/(k+1)p\geq\left(\frac{(\frac{k}{2}+1+\epsilon)\log n}{n}\right)^{1/(k+1)}

then w.h.p. Hk(X,)=0H^{k}(X,\mathbb{Q})=0.

Theorem 2.3 describes a sharp threshold for cohomology to vanish, in the same spirit as in Linial and Meshulam’s work [15]. These theorems all describe high-dimensional cohomological generalizations of the Erdős–Rényi theorem on the threshold for connectivity of G(n,p)G(n,p) [10]. By universal coefficients, since we are working over a field, the results hold for homology with \mathbb{Q} coefficients as well as cohomology.

Malen gave a topological strengthening of part (1) of Theorem 2.2 in [16].

Theorem 2.4 (Malen, 2019).

Let k1k\geq 1 be fixed and XX(n,p)X\sim X(n,p). If pnαp\leq n^{-\alpha} with α>1k\alpha>\frac{1}{k}, then w.h.p. XX collapses onto a subcomplex of dimension at most k1k-1.

This implies, in particular, that Hk1(X)H_{k-1}(X) is torsion-free, so this represents an important step toward the “bouquet-of-spheres conjecture” described in [12] and [13].

Newman recently refined Malen’s collapsing argument to give a probabilistic refinement [17].

Theorem 2.5 (Newman, 2021).

Let k1k\geq 1 be fixed and XX(n,p)X\sim X(n,p). If

pn1/kp\ll n^{-1/k}

then w.h.p. XX collapses onto a subcomplex of dimension at most k1k-1.

In summary, earlier results show that there is one threshold where homology is born for the first time, when pn1/kp\approx n^{-1/k}, and another where homology dies for the last time, when pn1/(k+1)p\approx n^{-1/(k+1)}. Our main result is that there exist cycles that persist for nearly the entire interval of nontrivial homology.

3. The second moment method

We briefly review the second moment method, i.e. the use of Chebyshev’s inequality, which is our main probabilistic tool.

Theorem 3.1 (Chebyshev’s Inequality).

For any λ>0\lambda>0,

(|Xμ|λσ)1λ2.\mathbb{P}\left(\lvert X-\mu\rvert\geq\lambda\sigma\right)\leq\frac{1}{\lambda^{2}}.

Where μ\mu is the expectation and σ2\sigma^{2} is the variance.

The variance is defined by

σ2=Var(X)=𝔼(X𝔼(X)2).\sigma^{2}=\text{Var}(X)=\mathbb{E}\left(X-\mathbb{E}(X)^{2}\right).

If XX can be written as a sum of indicator random variables X=iXiX=\displaystyle\sum_{i}X_{i}, then the following is easy to derive and its proof appears, for example, in Chapter 4 of Alon and Spencer’s book [1].

Var(X)=iVar(Xi)+ijCov(Xi,Xj)𝔼(X)+ijCov(Xi,Xj).\begin{array}[]{lll}\text{Var}(X)&=&\displaystyle\sum_{i}\text{Var}(X_{i})+\sum_{i\neq j}\text{Cov}(X_{i},X_{j})\\ \\ &\leq&\mathbb{E}(X)+\displaystyle\sum_{i\neq j}\text{Cov}(X_{i},X_{j}).\end{array}

It follows from Chebyshev’s inequality 3.1 that if 𝔼(X)\mathbb{E}(X)\rightarrow\infty and

ijCov(Xi,Xj)=o(𝔼(X)2),\displaystyle\sum_{i\neq j}\mbox{Cov}(X_{i},X_{j})=o\left(\mathbb{E}(X)^{2}\right),

then X>0X>0 w.h.p. In fact, X𝔼(X)X\sim\mathbb{E}(X) w.h.p., meaning that X/𝔼(X)1X/\mathbb{E}(X)\to 1 in probability.

Finally, we note that if are Xi,XjX_{i},X_{j} are indicator random variables for events Ai,AjA_{i},A_{j}, we have that

Cov(Xi,Xj)=𝔼(XiXj)𝔼(Xi)(Xj)=(AiAj)(Ai)(Aj).\text{Cov}(X_{i},X_{j})=\mathbb{E}(X_{i}X_{j})-\mathbb{E}(X_{i})(X_{j})=\mathbb{P}(A_{i}\wedge A_{j})-\mathbb{P}(A_{i})\mathbb{P}(A_{j}).

Here AiAjA_{i}\wedge A_{j} denotes the event that both AiA_{i} and AjA_{j} occur.

4. Main result and proof

We consider the random graph G(n,p)G(n,p) as a stochastic process, as follows. Consider the random filtration of the complete graph KnK_{n} where each edge ee appears at time pep_{e}, chosen uniform randomly in the interval [0,1][0,1]. Similarly, the random clique complex X(n,p)X(n,p) is a random filtration of the simplex on nn vertices Δn\Delta_{n}.

We assume the reader is familiar with persistent diagrams [9, 8]. A point (x,y)(x,y) in the persistence diagram for HkH_{k} with \mathbb{Q} coefficients and k1k\geq 1 represents a kk-dimensional cycle with birth time xx and death time yy. We measure the persistence of that cycle multiplicatively, as y/xy/x. Define

Mk(n)=max{y/x},M_{k}(n)=\max\{y/x\},

where the maximum is taken over all points in the persistence diagram for homology in degree kk.

An equivalent definition is the following. Consider the natural inclusion map i:X(n,p1)X(n,p2)i:X(n,p_{1})\hookrightarrow X(n,p_{2}), where 0p1p210\leq p_{1}\leq p_{2}\leq 1. For every k1k\geq 1, there is an induced map on homology i:Hk(X(n,p1))Hk(X(n,p2))i_{*}:H_{k}(X(n,p_{1}))\to H_{k}(X(n,p_{2})). Then

Mk(n)=max{p2/p1i:Hk(X(n,p1))Hk(X(n,p2)) is nontrivial},M_{k}(n)=\max\left\{p_{2}/p_{1}\mid i_{*}:H_{k}(X(n,p_{1}))\to H_{k}(X(n,p_{2}))\mbox{ is nontrivial}\right\},

where the maximum is taken as p1p_{1} and p2p_{2} range over all values with 0p1p210\leq p_{1}\leq p_{2}\leq 1.


Our main result is the following.

Theorem 4.1.

For fixed k1k\geq 1 and ϵ>0\epsilon>0,

n1/k(k+1)ϵMk(n)n1/k(k+1)+ϵ,n^{-1/k(k+1)-\epsilon}\leq M_{k}(n)\leq n^{-1/k(k+1)+\epsilon},

with high probability.

Equivalently, if

M~k(n)=logMk(n)logn,\widetilde{M}_{k}(n)=\frac{\log{M_{k}(n)}}{\log{n}},

then M~k(n)\widetilde{M}_{k}(n) converges in probability to 1/k(k+1).1/k(k+1).

Our results are actually slightly sharper then this. A slightly sharper statement is the following.

Suppose that

Lk(n)n1/k(k+1)L_{k}(n)\ll n^{1/k(k+1)}

and

Uk(n)n1/k(k+1)(logn)1/(k+1).U_{k}(n)\gg n^{1/k(k+1)}(\log n)^{1/(k+1)}.

Then w.h.p.

Lk(n)Mk(n)Uk(n).L_{k}(n)\leq M_{k}(n)\leq U_{k}(n).

So then our results are sharp, up to a small power of logn\log n.

Since Theorem 2.3 is only known to hold for \mathbb{Q} coefficients (or any field of characteristic zero), our upper bounds on maximal persistence only apply for persistent homology with \mathbb{Q} coefficients. For lower bounds, we use the second moment method to prove the existence of cycles that persist nearly as long as possible, and these lower bounds hold for arbitrary field coefficients.

Proof.

First we prove an upper bound on Mk(n)M_{k}(n).

Suppose that p1=o(n1/k)p_{1}=o\left(n^{-1/k}\right). By Theorem 2.5, we have that Hk(X)=0H_{k}(X)=0 w.h.p. Now let ϵ>0\epsilon>0, and suppose that

p2((k2+1+ϵ)lognn)1/(k+1).p_{2}\geq\left(\frac{(\frac{k}{2}+1+\epsilon)\log n}{n}\right)^{1/(k+1)}.

By Theorem 2.3, we have that Hk(X,Q)=0H_{k}(X,Q)=0. So we have that

Mk(n)p2/p1.M_{k}(n)\leq p_{2}/p_{1}.

Set

fk(n)=n1/k(k+1)(logn)1/(k+1),f_{k}(n)=n^{1/k(k+1)}\left(\log{n}\right)^{1/(k+1)},

and let Uk(n)U_{k}(n) be any function such that Uk(n)fk(n)U_{k}(n)\gg f_{k}(n). We have showed that

Mk(n)Uk(n).M_{k}(n)\leq U_{k}(n).

Most of our work is in proving a lower bound for Mk(n)M_{k}(n). We focus our attention on a particular type of nontrivial kk-cycle, namely simplicial spheres which are combinatorially isomorphic to cross-polytope boundaries.

In the following, let YY and ZZ denote distinct subsets of 2k+22k+2 vertices. That is, we write [n]:={1,2,,n}[n]:=\{1,2,\dots,n\} and suppose that Y,Z[n]Y,Z\subseteq[n] with |Y|=|Z|=2k+2|Y|=|Z|=2k+2. A notation we can use for this is

Y,Z([n]2k+2).Y,Z\in\binom{[n]}{2k+2}.

Suppose that Y={u1,uk+1}{v1,,vk+1}Y=\{u_{1},\dots u_{k+1}\}\cup\{v_{1},\dots,v_{k+1}\}, where u1<uk+1<v1<vk+1u_{1}<\dots u_{k+1}<v_{1}\dots<v_{k+1}. Recall that every vertex is an element of [n][n], so they come with a natural ordering. We use xyx\sim y and x≁yx\not\sim y to denote adjacency and non-adjacency of vertices xx and yy. For any choice of 0p1p210\leq p_{1}\leq p_{2}\leq 1, we say that YY is a (p1,p2)(p_{1},p_{2}) special persistent cycle in the random clique complex filtration if

  1. (1)

    uiuju_{i}\sim u_{j}, vivjv_{i}\sim v_{j}, and uivju_{i}\sim v_{j} for every iji\neq j at time p1p_{1},

  2. (2)

    ui≁viu_{i}\not\sim v_{i} for every ii at time p2p_{2}, and

  3. (3)

    {u1,uk+1}\{u_{1},\dots u_{k+1}\} have no common neighbors outside of vertex set YY at time p2p_{2}.

Condition (1) implies that YY spans a kk-dimensional cycle at time p1p_{1}, namely a cycle that is combinatorially equivalent to the boundary of k+1k+1-dimensional cross-polytope. Conditions (2) and (3) imply that YY is still not a boundary of anything, even at time p2p_{2}. So then it is not only a nontrivial cycle at time p1p_{1}, but it persists at least until time p2p_{2}. Note that condition (2) already implies that {u1,uk+1}\{u_{1},\dots u_{k+1}\} have no common neighbor within vertex set YY. So condition (3) implies that they have no common neighbor at all, and then {u1,uk+1}\{u_{1},\dots u_{k+1}\} is a maximal kk-dimensional face.

Let Nk=Nk(p1,p2)N_{k}=N_{k}(p_{1},p_{2}) be the number of (p1,p2)(p_{1},p_{2}) special persistent cycles. We want to show that (Nk>0)1\mathbb{P}\left(N_{k}>0\right)\rightarrow 1, which in turn will imply that Mk(n)>p2/p1M_{k}(n)>p_{2}/p_{1} with high probability. In the following, we will assume whenever necessary that

n1/kp1p2n1/(k+1).n^{-1/k}\ll p_{1}\leq p_{2}\ll n^{-1/(k+1)}.

In particular, we assume that np1knp_{1}^{k}\to\infty and np2k+10np_{2}^{k+1}\to 0.

Let AYA_{Y} be the event that the set of vertices in YY form a (p1,p2)(p_{1},p_{2}) special persistent cycle, and let IYI_{Y} be its indicator random variable for this event. Then we can write

Nk=Y([n]2k+2)IY,N_{k}=\displaystyle\sum_{Y\in\binom{[n]}{2k+2}}I_{Y},

where the sum is taken over all subsets Y[n]Y\subseteq[n] of size |Y|=2k+2|Y|=2k+2.

By edge independence, the probability of condition (1) is p12k(k+1)p_{1}^{2k(k+1)}, and the probability of condition (2) is (1p2)k+1\left(1-p_{2}\right)^{k+1}, the probability of condition (3) is (1p2k+1)n2k2\left(1-p_{2}^{k+1}\right)^{n-2k-2}. Moreover, these events are independent since they involve disjoint sets of edges. So we have

𝔼(IY)=(AY)=p12k(k+1)(1p2)k+1(1p2k+1)n2k2.\mathbb{E}(I_{Y})=\mathbb{P}(A_{Y})=p_{1}^{2k(k+1)}\left(1-p_{2}\right)^{k+1}\left(1-p_{2}^{k+1}\right)^{n-2k-2}.

By linearity of expectation,

𝔼(Nk)=𝔼(IY)=(n2k+2)p12k(k+1)(1p2)k+1(1p2k+1)n2k2=n2k+2(2k+2)!p12k(k+1)(1o(1)),\begin{split}\mathbb{E}(N_{k})&=\sum\mathbb{E}(I_{Y})\\ &=\displaystyle\binom{n}{2k+2}p_{1}^{2k(k+1)}\left(1-p_{2}\right)^{k+1}\left(1-p_{2}^{k+1}\right)^{n-2k-2}\\ &=\displaystyle\frac{n^{2k+2}}{\left(2k+2\right)!}p_{1}^{2k(k+1)}\left(1-o(1)\right),\end{split}

since np2k+10np_{2}^{k+1}\to 0. Since we also assume that np1knp_{1}^{k}\to\infty, we have 𝔼(Nk)\mathbb{E}(N_{k})\rightarrow\infty. By Chebyshev’s inequality, if we show that Var(Nk)=o(𝔼(Nk)2)\text{Var}(N_{k})=o(\mathbb{E}(N_{k})^{2}), then Nk>0N_{k}>0 w.h.p.

We have the standard inequality

Var(Nk)𝔼(Nk)+YZCov(IY,IZ).\text{Var}(N_{k})\leq\mathbb{E}(N_{k})+\sum_{Y\neq Z}\text{Cov}(I_{Y},I_{Z}).

We recall that

Cov(IY,IZ)=(AYAZ)(AY)(AZ).\text{Cov}(I_{Y},I_{Z})=\mathbb{P}(A_{Y}\wedge A_{Z})-\mathbb{P}(A_{Y})\mathbb{P}(A_{Z}).

We always have

(AY)(AZ)=p14k(k+1)(1p2)2k+2(1p2k+1)2(n2k2),\mathbb{P}(A_{Y})\mathbb{P}(A_{Z})=p_{1}^{4k(k+1)}\left(1-p_{2}\right)^{2k+2}\left(1-p_{2}^{k+1}\right)^{2(n-2k-2)},

and we note the simpler estimate

(AY)(AZ)=p14k(k+1)(1o(1))\mathbb{P}(A_{Y})\mathbb{P}(A_{Z})=p_{1}^{4k(k+1)}\left(1-o(1)\right)

since k1k\geq 1 is fixed and np2k+10np_{2}^{k+1}\to 0.

Let m:=|YZ|m:=\rvert Y\cap Z\lvert. In estimating (AYAZ)\mathbb{P}(A_{Y}\wedge A_{Z}), we consider cases depending on the value of mm.


Case I:

First, consider m=0m=0. It might be tempting to believe that in the case that YZ=Y\cap Z=\emptyset, AYA_{Y} and AZA_{Z} are independent sets, so the covariance is zero, but unfortunately this is not the case. Conditions (1) and (2) for a (p1,p2)(p_{1},p_{2}) special persistent cycle only depend on adjacency between vertices within the (2k+2)(2k+2)-set, but condition (3) depends on connections with the rest of the graph and these are not independent.

Nevertheless, we still have in this case

(AYAZ)=p14k(k+1)(1o(1)),\mathbb{P}\left(A_{Y}\wedge A_{Z}\right)=p_{1}^{4k(k+1)}\left(1-o(1)\right),

as follows.

The term p14k(k+1)p_{1}^{4k(k+1)} is the probability of condition (1) holding for both vertex sets YY and ZZ. So this is also an upper bound on the probability of conditions (1), (2), and (3) holding for both vertex sets. For a lower bound on (AYAZ)\mathbb{P}\left(A_{Y}\wedge A_{Z}\right), we consider a slightly smaller event, slightly simpler but whose probability is of the same order of magnitude.

Let Y={u1,uk+1}{v1,,vk+1}Y=\{u_{1},\dots u_{k+1}\}\cup\{v_{1},\dots,v_{k+1}\}, where u1<uk+1<v1<vk+1u_{1}<\dots u_{k+1}<v_{1}\dots<v_{k+1}, as before. Similarly, let Z={u1,uk+1}{v1,,vk+1}Z=\{u^{\prime}_{1},\dots u^{\prime}_{k+1}\}\cup\{v^{\prime}_{1},\dots,v^{\prime}_{k+1}\}, where u1<uk+1<v1<vk+1u^{\prime}_{1}<\dots u^{\prime}_{k+1}<v^{\prime}_{1}\dots<v^{\prime}_{k+1}.

The event AYZA^{*}_{YZ} is defined as follows.

  1. (1)

    We have uiuju_{i}\sim u_{j}, vivjv_{i}\sim v_{j}, and uivju_{i}\sim v_{j}, uiuju^{\prime}_{i}\sim u^{\prime}_{j}, vivjv^{\prime}_{i}\sim v^{\prime}_{j}, and uivju^{\prime}_{i}\sim v^{\prime}_{j} for every iji\neq j at time p1p_{1}. That is, condition (1) holds for both YY and ZZ. Some edges may be listed more than once if YY and ZZ overlap. This does not happen when m=0m=0 but these are the cases we consider below.

  2. (2)

    Besides the edges that appear in the previous condition other edges occur between vertices in vertex set YZY\cup Z, at time p2p_{2}. This happens with probability 1𝒪(p2)=1o(1)1-\mathcal{O}(p_{2})=1-o(1).

  3. (3)

    Neither {u1,uk+1}\{u_{1},\dots u_{k+1}\} nor {u1,uk+1}\{u^{\prime}_{1},\dots u^{\prime}_{k+1}\} has any mutual neighbors outside of vertex set YZY\cup Z, at time p2p_{2}. The probability of this condition being satisfied can be bounded below by a union bound by 12np2k+11-2np_{2}^{k+1}, which is again 1o(1)1-o(1) since np2k+10np_{2}^{k+1}\to 0.

It is clear that AYZA^{*}_{YZ} imples AYAZA_{Y}\wedge A_{Z}. Indeed, condition (1) is the same, condition (2) of AYZA^{*}_{YZ} implies condition (2) of AYAZA_{Y}\wedge A_{Z}, and conditions (2) and (3) of AYZA^{*}_{YZ} together imply condition (3) of AYAZA_{Y}\wedge A_{Z}. So putting it all together, we have that (AYZ)p14k(k+1)(1o(1))\mathbb{P}(A^{*}_{YZ})\geq p_{1}^{4k(k+1)}\left(1-o(1)\right).

Then

p14k(k+1)(AYAZ)(AYZ)p14k(k+1)(1o(1)),p_{1}^{4k(k+1)}\geq\mathbb{P}(A_{Y}\wedge A_{Z})\geq\mathbb{P}(A^{*}_{YZ})\geq p_{1}^{4k(k+1)}\left(1-o(1)\right),

and

(AYAZ)=p14k(k+1)(1o(1)),\mathbb{P}(A_{Y}\wedge A_{Z})=p_{1}^{4k(k+1)}\left(1-o(1)\right),

as desired.

So then

(AYAZ)(AY)(AZ)=o(p14k(k+1)).\mathbb{P}\left(A_{Y}\wedge A_{Z}\right)-\mathbb{P}(A_{Y})\mathbb{P}(A_{Z})=o\left(p_{1}^{4k(k+1)}\right).

Since the number of pairs Y,ZY,Z is bounded by n4k+4n^{4k+4} we have that the total contribution to the variance, S0S_{0}, is bounded by

S0=o(n4k+4p14k(k+1)).S_{0}=o\left(n^{4k+4}p_{1}^{4k(k+1)}\right).

Comparing this to

𝔼(Nk)2=(n2k+2)2p14k(k+1)(1o(1))\mathbb{E}(N_{k})^{2}=\displaystyle\binom{n}{2k+2}^{2}p_{1}^{4k(k+1)}\left(1-o(1)\right)

we see that

S0=o(𝔼(Nk)2).S_{0}=o\left(\mathbb{E}(N_{k})^{2}\right).

Case II:

An essentially identical calculation shows that when m=1m=1, we have

(AYAZ)=p14k(k+1)(1o(1)).\mathbb{P}\left(A_{Y}\wedge A_{Z}\right)=p_{1}^{4k(k+1)}\left(1-o(1)\right).

So in this case we have again

(AYAZ)(AY)(AZ)=o(p14k(k+1)).\mathbb{P}\left(A_{Y}\wedge A_{Z}\right)-\mathbb{P}(A_{Y})\mathbb{P}(A_{Z})=o\left(p_{1}^{4k(k+1)}\right).

Hence, the total contribution to the variance, S1S_{1}, is

S1=o(n4k+3p14k(k+1))S_{1}=o\left(n^{4k+3}p_{1}^{4k(k+1)}\right)

and then,

S1/𝔼(Nk)2=o(n1)S_{1}/\mathbb{E}(N_{k})^{2}=o\left(n^{-1}\right)

and in particular S1=o(𝔼(Nk)2)S_{1}=o\left(\mathbb{E}(N_{k})^{2}\right).

Case III:

When 2m2k+12\leq m\leq 2k+1, we consider two sub-cases. The first subcase is that events AYA_{Y} and AZA_{Z} are not compatible in the sense that they cannot both occur due to the ways in which YY and ZZ overlap. This happens if for a certain pair of vertices u,vYZu,v\in Y\cap Z, u,vu,v are required to be adjacent in one of Y,ZY,Z and non-adjacent in the other. In this subcase, we have

(AYAZ)=0,\mathbb{P}(A_{Y}\wedge A_{Z})=0,

so

(AYAZ)(AY)(AZ)=p4k(k+1)(1o(1))0.\mathbb{P}(A_{Y}\wedge A_{Z})-\mathbb{P}(A_{Y})\mathbb{P}(A_{Z})=-p^{4k(k+1)}\left(1-o(1)\right)\leq 0.

The second subcase is that the events AYA_{Y} and AZA_{Z} are compatible, in the sense that they could possibly both happen. In this case, let jj denote the number of pairs in YZY\cap Z that are forced to be non-adjacent in AYAZA_{Y}\cap A_{Z}. Then the same argument as in Case I shows that

(AYAZ)(AYZ)=p14k(k+1)(m2)+j(1o(1)).\mathbb{P}\left(A_{Y}\wedge A_{Z}\right)\geq\mathbb{P}\left(A^{*}_{YZ}\right)=p_{1}^{4k(k+1)-{\binom{m}{2}+j}}\left(1-o(1)\right).

So

(AY)(AZ)(AYAZ)p1(m2)j(1o(1))\begin{array}[]{lll}\displaystyle\frac{\mathbb{P}(A_{Y})\mathbb{P}(A_{Z})}{\mathbb{P}(A_{Y}\wedge A_{Z})}&\leq&p_{1}^{{\binom{m}{2}}-j}\left(1-o(1)\right)\end{array}

Since np1k,np_{1}^{k}\rightarrow\infty, as nn\rightarrow\infty, we get

(AY)(AZ)(AYAZ)0.\displaystyle\frac{\mathbb{P}(A_{Y})\mathbb{P}(A_{Z})}{\mathbb{P}(A_{Y}\wedge A_{Z})}\rightarrow 0.

So,

(AYAZ)(AY)(AZ)=(1o(1))(AYAZ).\mathbb{P}\left(A_{Y}\wedge A_{Z}\right)-\mathbb{P}(A_{Y})\mathbb{P}(A_{Z})=\left(1-o(1)\right)\mathbb{P}\left(A_{Y}\wedge A_{Z}\right).

The total contribution SmS_{m} of a pair of events AYA_{Y} and AZA_{Z} with YZ=mY\cap Z=m to the variance is then bounded by

Smn4k+4mp14k(k+1)(m2)+j(1+o(1)).S_{m}\leq n^{4k+4-m}p_{1}^{4k(k+1)-{\binom{m}{2}}+j}\left(1+o(1)\right).

Comparing this to

𝔼(Nk)2=(n2k+2)2p14k(k+1)(1o(1)).\mathbb{E}(N_{k})^{2}=\displaystyle\binom{n}{2k+2}^{2}p_{1}^{4k(k+1)}\left(1-o(1)\right).

we get

Sm/𝔼(Nk)2=O(nmp1(m2)+j).S_{m}/\mathbb{E}(N_{k})^{2}=O\left(n^{-m}p_{1}^{-{\binom{m}{2}}+j}\right).

We have

nmp1(m2)+j=(np1m12)mp1j.n^{-m}p_{1}^{-{\binom{m}{2}}+j}=\left(np_{1}^{\frac{m-1}{2}}\right)^{-m}p_{1}^{j}.

We are assuming that np1knp_{1}^{k}\rightarrow\infty. Since m2k+1m\leq 2k+1, we have k(m1)/2k\geq(m-1)/2. Then

(np1m12)m0,\left(np_{1}^{\frac{m-1}{2}}\right)^{-m}\rightarrow 0,

p1j0p_{1}^{j}\to 0, and Sm=o(𝔼(Nk)2)S_{m}=o\left(\mathbb{E}(N_{k})^{2}\right).

Summing the inequalities from the different cases, we conclude that

YZCov(IY,IZ)=m=02k+1Sm=o(𝔼(Nk)2),\sum_{Y\neq Z}\text{Cov}(I_{Y},I_{Z})=\sum_{m=0}^{2k+1}S_{m}=o\left(\mathbb{E}(N_{k})^{2}\right),

since Sm=o(𝔼(Nk)2)S_{m}=o\left(\mathbb{E}(N_{k})^{2}\right) for each mm and kk is fixed.

We conclude that as long as

n1/kp1p2n1/(k+1),n^{-1/k}\ll p_{1}\leq p_{2}\ll n^{-1/(k+1)},

then Nk>0N_{k}>0 with high probability.

It follows that if Lk(n)n1/k(k+1)L_{k}(n)\ll n^{1/k(k+1)} then w.h.p. Mk(n)Lk(n)M_{k}(n)\geq L_{k}(n), as desired.

5. Future directions

Recall that we earlier defined

fk(n)=n1/k(k+1)(logn)1/(k+1).f_{k}(n)=n^{1/k(k+1)}\left(\log{n}\right)^{1/(k+1)}.

We believe that the Mk(n)M_{k}(n) is likely of order roughly fk(n)f_{k}(n), in the following sense.

Let ω(n)\omega(n) be any function that tends to infinity with nn. We showed in the proof of Theorem 4.1 that

Mk(n)fk(n)ω(n).M_{k}(n)\leq f_{k}(n)\omega(n).

We believe that an analogous lower bound should hold.

Conjecture 5.1.

Let Mk(n)M_{k}(n) denote the maximal persistence over all kk-dimensional cycles in X(n,p)X(n,p). Then

fk(n)ω(n)Mk(n)\frac{f_{k}(n)}{\omega(n)}\leq M_{k}(n)

with high probability.

The following kind of limit theorem would provide precise answers to questions like, “Given a prior of this kind of distribution, what is the probability P(λ)P(\lambda) that there exists a cycle of persistence greater than λ\lambda?”

Conjecture 5.2.

Let Mk(n)M_{k}(n) denote the maximal persistence over all kk-dimensional cycles in X(n,p)X(n,p). Then

Mk(n)fk(n)\frac{M_{k}(n)}{f_{k}(n)}

converges in law to a limiting distribution supported on an interval [λk,)[\lambda_{k},\infty) for some λk>0\lambda_{k}>0.

Refer to caption
Refer to caption
Figure 1. Maximally persistent 11-cycles. On the left, a histogram for logM1(n)/logn\log M_{1}(n)/\log n. We prove that this converges to 1/21/2 as nn\to\infty. On the right, a histogram for M1(n)/f1(n)M_{1}(n)/f_{1}(n). Both these figures are based on 1000 samples on n=250n=250 vertices.
Refer to caption
Refer to caption
Figure 2. Maximally persistent 22-cycles. On the left, a histogram for logM2(n)/logn\log M_{2}(n)/\log n. We prove that this converges to 1/61/6 as nn\to\infty. On the right, a histogram for M2(n)/f2(n)M_{2}(n)/f_{2}(n). Both these figures are based on 1000 samples on n=150n=150 vertices.

See Figures 1 and 2 for some numerical experiments illustrating these conjectures. These experiments were computed with the aid of Ulrich Bauer’s software Ripser [2].

It also seems natural to study more about the “rank invariant” of a random clique complex filtration. That is, given k1k\geq 1, p1p_{1}, and p2p_{2}, how large do we expect the rank of the map i:Hk(X(n,p1))Hk(X(n,p2))i_{*}:H_{k}\left(X(n,p_{1})\right)\to H_{k}\left(X(n,p_{2})\right) to be?

Conjecture 5.3.

Suppose that k1k\geq 1 is fixed, and

n1/kp1p2n1/(k+1).n^{-1/k}\ll p_{1}\leq p_{2}\ll n^{-1/(k+1)}.

If i:X(n,p1)X(n,p2)i:X(n,p_{1})\to X(n,p_{2}) is the inclusion map, and

i:Hk(X(n,p1))Hk(X(n,p2))i_{*}:H_{k}\left(X\left(n,p_{1}\right)\right)\to H_{k}\left(X\left(n,p_{2}\right)\right)

is the induced map on homology, then

rank(i)=(1o(1))(nk+1)p1(k+12).\mbox{rank}(i_{*})=\left(1-o(1)\right)\binom{n}{k+1}p_{1}^{\binom{k+1}{2}}.

In [12], it is shown that

dimHk(X(n,p1),)=(1o(1))(nk+1)p1(k+12),\dim H_{k}(X(n,p_{1}),\mathbb{Q})=\left(1-o(1)\right)\binom{n}{k+1}p_{1}^{\binom{k+1}{2}},

so this conjecture is that almost all of the homology persists for as long as possible.

Bobrowski and Skraba study limiting distributions for maximal persistence in their recent preprint [5]. They describe experimental evidence that there is a universal distribution for maximal persistence over a wide class of models, including random Čech and Vietoris–Rips complexes. It is not clear whether we should expect the random clique complex filtration studied here to be in the same conjectural universality class.

References

  • [1] Noga Alon and Joel H. Spencer. The probabilistic method. Wiley Series in Discrete Mathematics and Optimization. John Wiley & Sons, Inc., Hoboken, NJ, fourth edition, 2016.
  • [2] Ulrich Bauer. Ripser: efficient computation of Vietoris-Rips persistence barcodes. J. Appl. Comput. Topol., 5(3):391–423, 2021.
  • [3] Omer Bobrowski and Matthew Kahle. Topology of random geometric complexes: a survey. J. Appl. Comput. Topol., 1(3-4):331–364, 2018.
  • [4] Omer Bobrowski, Matthew Kahle, and Primoz Skraba. Maximally persistent cycles in random geometric complexes. Ann. Appl. Probab., 27(4):2032–2060, 2017.
  • [5] Omer Bobrowski and Primoz Skraba. On the universality of random persistence diagrams. (submitted), arXiv:2207.03926, 2022.
  • [6] Matthew Brennan, Guy Bresler, and Dheeraj Nagaraj. Phase transitions for detecting latent geometry in random graphs. Probab. Theory Related Fields, 178(3-4):1215–1289, 2020.
  • [7] Sébastien Bubeck, Jian Ding, Ronen Eldan, and Miklós Z. Rácz. Testing for high-dimensional geometry in random graphs. Random Structures Algorithms, 49(3):503–532, 2016.
  • [8] David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of persistence diagrams. Discrete Comput. Geom., 37(1):103–120, 2007.
  • [9] Herbert Edelsbrunner, David Letscher, and Afra Zomorodian. Topological persistence and simplification. Discrete Comput. Geom., 28(4):511–533, 2002.
  • [10] P. Erdős and A. Rényi. On random graphs. I. Publ. Math. Debrecen, 6:290–297, 1959.
  • [11] Chad Giusti, Eva Pastalkova, Carina Curto, and Vladimir Itskov. Clique topology reveals intrinsic geometric structure in neural correlations. Proceedings of the National Academy of Sciences, 112(44):13455–13460, 2015.
  • [12] Matthew Kahle. Topology of random clique complexes. Discrete Math., 309(6):1658–1671, 2009.
  • [13] Matthew Kahle. Sharp vanishing thresholds for cohomology of random flag complexes. Ann. of Math. (2), 179(3):1085–1107, 2014.
  • [14] Matthew Kahle. Random simplicial complexes. In Handbook of Discrete and Computational Geometry, pages 581–603. Chapman and Hall/CRC, 2017.
  • [15] Nathan Linial and Roy Meshulam. Homological connectivity of random 2-complexes. Combinatorica, 26(4):475–487, 2006.
  • [16] Greg Malen. Collapsibility of random clique complexes. arXiv:1903.05055, 2019.
  • [17] Andrew Newman. One-sided sharp thresholds for homology of random flag complexes. arXiv:2108.04299, 2021.
  • [18] Elliot Paquette and Andrew V. Werf. Random geometric graphs and the spherical Wishart matrix. arXiv:2110.10785, 2021.