Modularity of preferential attachment graphs

Katarzyna Rybarczyk

{}^{\textrm{*}}

¹¹1

{}^{\textrm{*}}

[email protected], Adam Mickiewicz University, Poznań, Poland and Małgorzata Sulkowska

{}^{\textrm{$\star$}}

²²2

{}^{\textrm{$\star$}}

[email protected], Wrocław University of Science and Technology, Department of Fundamentals of Computer Science, Poland

Abstract.

Modularity is a graph parameter measuring how clearly the set of graph vertices may be partitioned into subsets of high edge density. It indicates the presence of community structure in the graph. We study its value for a random preferential attachment model $G_{n}^{h}$ introduced by Barabási and Albert in 1999. A graph $G_{n}^{h}$ is created from some finite starting graph by adding new vertices one by one. A new vertex always connects to $h\geq 1$ already existing vertices and those are chosen with probability proportional to their current degrees. We prove that modularity of $G_{n}^{h}$ is with high probability upper bounded by a function tending to $0$ with $h$ tending to infinity. This resolves the conjecture of Prokhorenkova, Prałat and Raigorodskii from 2016.

As a byproduct we obtain novel concentration results for the volume and the edge density parameters of subsets of $G_{n}^{h}$ .

Key words and phrases:

Modularity, preferential attachment model, expansion

This research was funded in whole or in part by National Science Centre, Poland, grant OPUS-25 no 2023/49/B/ST6/02517.

1. Introduction

Identifying communities in a complex network is a task of a great importance. It has a number of practical applications, varying from identifying the groups of common interests in social networks, through classifying fake news or spam, searching articles on related topic, identifying proteins with the same biological functions, optimizing large technological infrastructures, up to networks visualization [19, 27].

The notion of a graph featuring a community structure may be understood in a variety of ways. Therefore, some measures of the presence of community structure in the network had to be introduced. One of them is modularity proposed by Newman and Girvan in 2004 [27, 28]. The vertices of a graph with high modularity may be partitioned into subsets in which there are much more internal edges than we would expect by chance (see Definition 1). Nowadays, modularity is widely used not only as a quality function judging the performance of community detection algorithms [19], but also as a central ingredient of such algorithms, like in Louvain algorithm [5], Leiden algorithm [33] or Tel-Aviv algorithm [13].

Early theoretical results on modularity were given for trees [2] and regular graphs [24]. For a summary of results for various families of graphs check the appendix of [26] by McDiarmid and Skerman from $2020$ . More recent discoveries include [9] by Chellig, Fountoulakis and Skerman (for random graphs on the hyperbolic plane), [20] by Lasoń and Sulkowska (for minor-free graphs) or [21] by Lichev and Mitsche (for $3$ -regular graphs and graphs with given degree sequence).

Despite being so widely used in practice, modularity still suffers from a narrow theoretical study in the families of random graphs devoted to modeling real-life networks. The first results for the basic and most studied random graph, binomial $G(n,p)$ , were given by McDiarmid and Skerman just in 2020 [26]. It is commonly known that $G(n,p)$ does not reflect well most of real-world networks [19]. So-called preferential attachment models usually perform here much better. The first in this family were random recursive trees, see for example [22, 32]. However, the in-depth study of preferential attachment models was initiated in 1999 by the work of Barabási and Albert [3], who indicated the applications of such graphs in network modeling. The preferential attachment model was subsequently formally defined and analyzed by Bollobás, Riordan, Spencer, and Tusnády [8] and Bolloás and Riordan in [6, 7]. It relies on two mechanisms: growth (the graph is growing over time, gaining a new vertex and a bunch of $h\geq 1$ edges at each time step) and preferential attachment (arriving vertex is more likely to attach to other vertices with high degree rather than with low degree), for a precise definition check Section 2. Its degree distribution as well as diameter often fit in with the ones spotted in reality [15, 27]. Nevertheless, experimental study shows that its modularity, unlike in real networks, is low. Prokhorenkova, Prałat and Raigorodskii opened the preliminary study on modularity of a basic preferential attachment graph in [29, 30]. They obtained non-trivial upper and lower bounds, however, the gap to close remained big. They conjectured that modularity of such graph with high probability tends to $0$ with $h$ (the number of edges added per step) tending to infinity (see Conjecture 3). In this paper we prove their conjecture, confirming the supposition that a basic preferential attachment model might have too small modularity to mirror well the behavior of real networks. As a byproduct we obtain new interesting concentration results for the volume and the edge density parameters of a given subset of vertices in such graph. These results are interesting in their own right and may have future applications to other problems involving the preferential attachment model.

In the following section we give the formal definition of the preferential attachment model and state the main result. Section 3 is devoted to presenting the results regarding the volume and the edge density parameters of subsets of vertices in $G_{n}^{h}$ . Section 4 is technical, it contains several facts and auxiliary lemmas used in the latter parts of the paper. In Section 5 we derive concentration results stated in Section 3. These results are used in Section 6 to prove the main theorem about vanishing modularity in a basic preferential attachment graph. Section 7 contains concluding remarks.

2. Model and main result

Let $\mathbb{N}$ denote the set of natural numbers, $\mathbb{N}=\{1,2,3,\ldots\}$ . For $n\in\mathbb{N}$ let $[n]=\{1,2,\ldots,n\}$ . For the functions $f(n)$ and $g(n)$ we write $f(n)\sim g(n)$ if $\lim_{n\rightarrow\infty}f(n)/g(n)=1$ . We say that an event $\mathscr{E}$ occurs with high probability (whp) if the probability $\mathbb{P}[\mathscr{E}]$ depends on a certain number $n$ and tends to $1$ as $n$ tends to infinity.

All the graphs considered in this paper are finite, undirected and loops and multiple edges are allowed. Thus a graph is a pair $G=(V,E)$ , where $V$ is a finite set of vertices and $E$ is a finite multiset of elements from $V^{(1)}\cup V^{(2)}$ with $V^{(k)}$ being a set of all $k$ -element subsets of $V$ . Let $e(G)=|E|$ and for $S,U\subseteq V$ set $e_{G}(S)=|\{e\in E\cap(S^{(1)}\cup S^{(2)})|$ and $e_{G}(S,U)=|\{e\in E:e\cap S\neq\emptyset\wedge e\cap U\neq\emptyset\}|$ . The degree of a vertex $v\in V$ in $G$ , denoted by $\deg_{G}(v)$ , is the number of edges to which $v$ belongs but loops are counted twice, i.e., $\deg_{G}(v)=2|\{e\in E:v\in e\wedge e\in V^{(1)}\}|+|\{e\in E:v\in e\wedge e\in V^{(2)}\}|$ . We define the volume of $S\subseteq V$ in $G$ by $\operatorname{vol}_{G}(S)=\sum_{v\in S}\deg_{G}(v)$ . By the volume of the graph, $\operatorname{vol}(G)$ , we understand $\operatorname{vol}_{G}(V)$ . Whenever the context is clear we write $e(S)$ instead of $e_{G}(S)$ , $e(S,U)$ instead of $e_{G}(S,U)$ , $\deg(v)$ instead of $\deg_{G}(v)$ and $\operatorname{vol}(S)$ instead of $\operatorname{vol}_{G}(S)$ .

We focus on a particular random graph model, called here simply a preferential attachment graph (consult [3, 6, 15]). Given $h,n\in\mathbb{N}$ we construct a preferential attachment graph $G_{n}^{h}$ in two phases. In the first phase we sample a particular random tree $T_{hn}$ , its vertices are called mini-vertices. (We call $T_{hn}$ a tree, however loops, i.e. single-vertex edges, are allowed in $T_{hn}$ .) Next, the appropriate mini-vertices of $T_{hn}$ are grouped to form vertices of $G_{n}^{h}$ . Let us describe this procedure in details.
Phase 1. We start the whole process with $T_{1}$ which is a graph consisting of a single mini-vertex $1$ with a single loop (thus degree of vertex $1$ is $2$ ). For $t\geq 1$ graph $T_{t+1}$ is built upon $T_{t}$ by adding a mini-vertex $(t+1)$ and joining it by an edge with a mini-vertex $i$ according to the following probability distribution:

\mathbb{P}(i=s)=\begin{cases}\frac{\deg_{T_{t}}(s)}{2t+1}&\textnormal{for}\quad 1\leq s\leq t\\ \frac{1}{2t+1}&\textnormal{for}\quad s=t+1.\end{cases}

Note that we allow a newly arrived vertex to connect to itself. We continue the process until we get the random tree $T_{hn}$ .
Phase 2. A random multigraph $G_{n}^{h}$ is obtained from $T_{hn}$ by merging each set of mini-vertices $\{h(i-1)+1,h(i-1)+2,\ldots,h(i-1)+h\}$ to a single vertex $i$ for $i\in\{1,2,\ldots,n\}$ , keeping loops and multiple edges.

Note that if $G_{n}^{h}=(V,E)$ then $V=[n]$ , $|V|=n$ and $|E|=hn$ . Since we will refer very often to the number of edges of $G_{n}^{h}$ , it will be also denoted by $M$ , i.e., $M:=hn$ . Given $G_{n}^{h}$ , by $G_{t}^{h}$ , where $t\in[n]$ , we understand a subgraph of $G_{n}^{h}$ induced by the set of vertices $[t]$ .

Our main goal is to upper bound the graph parameter called modularity for $G_{n}^{h}$ . Its formal definition is given just below.

Definition 1 (Modularity, [28]).

Let $G$ be a graph with at least one edge. For a partition $\mathscr{A}$ of $V$ define a modularity score of $G$ as

\operatorname{mod}_{\mathscr{A}}(G)=\sum_{S\in\mathscr{A}}\left(\frac{e(S)}{e(G)}-\left(\frac{\operatorname{vol}(S)}{\operatorname{vol}(G)}\right)^{2}\right).

Modularity of $G$ is given by

\operatorname{mod}(G)=\max_{\mathscr{A}}\operatorname{mod}_{\mathscr{A}}(G),

where maximum runs over all the partitions of the set $V$ .

Conventionally, a graph with no edges has modularity equal to $0$ . A single summand of the modularity score is the difference between the fraction of edges within $S$ and the expected fraction of edges within $S$ if we considered a certain random multigraph on $V$ with the expected degree sequence given by $G$ (see, e.g., [18]). It is easy to check that $\operatorname{mod}(G)\in[0,1)$ .

Non-trivial lower and upper bounds for modularity of $G_{n}^{h}$ obtained by Prokhorenkova, Prałat and Raigorodskii in [29, 30] are the following.

Theorem 2 ([30], Theorem 4.2, Section 4.2).

Let $G_{n}^{h}=(V,E)$ be a preferential attachment graph. Then whp

\operatorname{mod}(G_{n}^{h})=\Omega(1/\sqrt{h})

and whp

\operatorname{mod}(G_{n}^{h})\leq 1-\min\{\delta(G_{n}^{h})/(2h),1/16\},

where $\delta(G_{n}^{h})=\min\limits_{\begin{subarray}{c}S\subseteq V,1\leq|S|\leq|V|/2\end{subarray}}\frac{e(S,V\setminus S)}{|S|}$ is the edge expansion of $G_{n}^{h}$ .

Applying the results for edge expansion of $G_{n}^{h}$ by Mihail, Papadimitriou and Saberi from [MiPaSa06_journalv] to the upper bound one obtains that whp $\operatorname{mod}(G_{n}^{h})\leq 1-O(1/h)$ . Indeed, the gap between the upper and the lower bound remained big. The authors stated two following conjectures suggesting that the upper bound could be improved.

Conjecture 3 ([29]).

Let $G_{n}^{h}$ be a preferential attachment graph. Then whp $\operatorname{mod}(G_{n}^{h})\xrightarrow{h\rightarrow\infty}0$ .

Conjecture 4 ([29]).

Let $G_{n}^{h}$ be a preferential attachment graph. Then whp $\operatorname{mod}(G_{n}^{h})=\Theta(1/\sqrt{h})$ .

In this paper we present much better upper bound for modularity of $G_{n}^{h}$ than the one from Theorem 2 when $h$ is large, resolving, in the positive, Conjecture 3. Conjecture 4 still remains open. The main result of the paper may be presented as follows.

Theorem 5.

Let $G_{n}^{h}$ be a preferential attachment graph. Then for every $\varepsilon>0$ whp

\operatorname{mod}(G_{n}^{h})\leq\frac{(1+\varepsilon)f(h)}{\sqrt{h}},

where

f(h)=6g_{\mathscr{V}}(h)+4\sqrt{2\ln{2}}-g_{\mathscr{V}}(h)^{2}/\sqrt{h}

with

g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2.

Remark.

Note that $f(h)\sim 3\sqrt{2\ln{2}}\sqrt{\ln{h}}$ as $h\rightarrow\infty$ thus $f(h)/\sqrt{h}\rightarrow 0$ as $h\rightarrow\infty$ . The value of $f(h)/\sqrt{h}$ drops below $1$ for $h\geq 810$ .

Corollary 6.

Let $G_{n}^{h}$ be a preferential attachment graph. Then whp

\operatorname{mod}(G_{n}^{h})\leq\frac{3.54\sqrt{\ln{h}+0.62}+19.49}{\sqrt{h}}.

Remark.

The value $\frac{3.54\sqrt{\ln{h}+0.62}+19.49}{\sqrt{h}}$ drops below $1$ for $h\geq 847$ .

Remark.

Some new results on the fact that $\operatorname{mod}(G_{n}^{h})$ is whp bounded below $1$ also for small values of $h$ can be found in [23].

3. Volume and edge density

When talking about $G_{n}^{h}=(V,E)$ we will very often refer to its corresponding random tree $T_{hn}=(\tilde{V},\tilde{E})$ . Recall that $V=[n]$ , $\tilde{V}=[hn]$ and $|\tilde{E}|=|E|=hn=:M$ . For $S\subseteq V$ the corresponding set of its mini-vertices in $\tilde{V}$ will be denoted by $\tilde{S}$ , thus $|\tilde{S}|=h|S|$ . For $i\in[n]$ and $S\subseteq V$ let $S_{i}=S\cap[i]$ , in particular $S_{n}=S$ . Analogously, for $i\in[M]$ and $\tilde{S}\subseteq\tilde{V}$ set $\tilde{S}_{i}=\tilde{S}\cap[i]$ , in particular $\tilde{S}_{M}=\tilde{S}$ . Note that for $S\subseteq V$ we have $\operatorname{vol}_{G_{n}^{h}}(S)=\operatorname{vol}_{T_{hn}}(\tilde{S})$ and $e_{G_{n}^{h}}(S)=e_{T_{hn}}(\tilde{S})$ .

When working with modularity we need to have a control over $e_{G_{n}^{h}}(S)$ and $\operatorname{vol}_{G_{n}^{h}}(S)$ , where $S\subseteq V$ . Those values depend a lot on the arrival times of vertices from $S$ . To capture this phenomenon we define a special measure $\mu:2^{\tilde{V}}\rightarrow[0,\infty)$ , where $2^{\tilde{V}}$ stands for the set of all subsets of $\tilde{V}$ .

Definition 7 (Measure $\mu$ ).

Let $G_{n}^{h}=(V,E)$ be a preferential attachment graph and $T_{hn}=(\tilde{V},\tilde{E})$ its corresponding random tree. Let $S\subseteq V$ thus $\tilde{S}\subseteq\tilde{V}$ is the set of its corresponding mini-vertices. Associate $\tilde{S}$ with the set of indicator functions

{\delta}_{i}^{\tilde{S}}=\begin{cases}1&\quad\textnormal{if}\quad i\in\tilde{S}\\ 0&\quad\textnormal{if}\quad i\notin\tilde{S},\end{cases}

where $i\in[M]$ (whenever the context is clear we write ${\delta}_{i}$ instead of ${\delta}_{i}^{\tilde{S}}$ ). Define a function $\mu:2^{\tilde{V}}\rightarrow[0,\infty)$ as follows:

\mu(\tilde{S})=\frac{\sqrt{\pi}}{2}\cdot\sum_{j=1}^{M}{\delta}_{j}^{\tilde{S}}c_{j-1}

with $c_{j}=\prod_{i=1}^{j}\frac{2i-1}{2i}$ for $j\geq 1$ and $c_{0}=1$ .

Remark.

Let $G_{n}^{h}=(V,E)$ be a preferential attachment graph, $S\subseteq V$ and $t\in[M]$ . Note that

\mu(\tilde{S}_{t})=\frac{\sqrt{\pi}}{2}\cdot\sum_{j=1}^{t}{\delta}_{j}^{\tilde{S}}c_{j-1}.

We use the measure $\mu$ to express the following novel concentration results for $\operatorname{vol}_{G_{n}^{h}}(S)$ , $e_{G_{n}^{h}}(S)$ and $e_{G_{n}^{h}}(S,V\setminus S)$ , where $S$ is an arbitrary subset of $V$ .

Theorem 8.

Let $G_{n}^{h}=(V,E)$ be a preferential attachment graph and $T_{hn}=(\tilde{V},\tilde{E})$ its corresponding random tree. Then for every $\varepsilon>0$ whp

\forall S\subseteq V\,\,\left|\operatorname{vol}(S)-2\sqrt{M}\,\mu(\tilde{S})\right|\leq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{M}{\sqrt{h}},

where $g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2$ .

Theorem 9.

Let $G_{n}^{h}$ be a preferential attachment graph and $T_{hn}=(\tilde{V},\tilde{E})$ its corresponding random tree. Then for every $\varepsilon>0$ whp

\forall S\subseteq V\,\,\left|e(S)-\mu(\tilde{S})^{2}\right|\leq(1+\varepsilon)g_{\mathscr{E}}(h)\frac{M}{\sqrt{h}},

where

g_{\mathscr{E}}(h)=\frac{g_{\mathscr{V}}(h)}{2}+\sqrt{2\ln{2}}

with

g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2.

Theorem 10.

Let $G_{n}^{h}$ be a preferential attachment graph and $T_{hn}=(\tilde{V},\tilde{E})$ its corresponding random tree. Then for every $\varepsilon>0$ whp

\forall S\subseteq V\,\left|e(S,V\setminus S)-2\mu(\tilde{S})(\sqrt{M}-\mu(\tilde{S}))\right|\leq(1+\varepsilon)\left(\frac{3}{2}g_{\mathscr{V}}(h)+\sqrt{2\ln{2}}\right)\frac{M}{\sqrt{h}},

where

g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2.

To grasp the intuition hidden behind the above concentration results it is helpful to know that there is a relation between the structure of the graph $T_{hn}$ and the structure of a random graph $\hat{G}$ on the vertex set $[M]$ in which every edge $\{i,j\}$ (for $i,j\in[M]$ ) is present with probability $1/(2\sqrt{ij})$ , independently of the other possible edges (consult Section 4 in [7] by Bollobás and Riordan).

Let $\tilde{S}\subseteq[M]$ . The number of inner edges of $\tilde{S}$ in $\hat{G}$ , $e_{\hat{G}}(\tilde{S})$ , satisfies

\mathbb{E}[e_{\hat{G}}(\tilde{S})]=\left(\sum_{i\in\tilde{S}}\frac{1}{2\sqrt{i}}\right)^{2}+O(\ln{M}),

while the number of edges between $\tilde{S}$ and $[M]\setminus\tilde{S}$ , $e_{\hat{G}}(\tilde{S},[M]\setminus\tilde{S})$ , fulfills

\mathbb{E}[e_{\hat{G}}(\tilde{S},[M]\setminus\tilde{S})]=2\left(\sum_{i\in\tilde{S}}\frac{1}{2\sqrt{i}}\right)\left(\sum_{i\in[M]\setminus\tilde{S}}\frac{1}{2\sqrt{i}}\right).

Since $\operatorname{vol}_{\hat{G}}(\tilde{S})=2e_{\hat{G}}(\tilde{S})+e_{\hat{G}}(\tilde{S},\tilde{V}\setminus{\tilde{S}})$ one also gets

\mathbb{E}[\operatorname{vol}_{\hat{G}}(\tilde{S})]=2\left(\sum_{i\in[M]}\frac{1}{2\sqrt{i}}\right)\left(\sum_{i\in\tilde{S}}\frac{1}{2\sqrt{i}}\right)+O(\ln{M}).

Therefore the measure $\mu$ is constructed in such a way that $\mu(\tilde{S})$ mimics the behavior of $\sum_{i\in\tilde{S}}\frac{1}{2\sqrt{i}}$ in $\hat{G}$ . We will see later (consult Lemma 14) that the value of $\sqrt{\pi}c_{j}$ is asymptotically close to $1/\sqrt{j}$ . Thus in particular $\mu(\{i\})\sim\frac{1}{2\sqrt{i}}$ for $i\xrightarrow{n\rightarrow\infty}\infty$ and $\mu([M])\sim\sqrt{M}$ . Therefore we may expect that in $T_{hn}$ we will get $e(\tilde{S})\approx\mu(\tilde{S})^{2}$ , $e(\tilde{S},V\setminus\tilde{S})\approx 2\mu(\tilde{S})(\sqrt{M}-\mu(\tilde{S}))$ and $\operatorname{vol}(\tilde{S})\approx 2\sqrt{M}\mu(\tilde{S})$ .

4. Auxiliary lemmas

The current section gathers all technical lemmas needed in the latter parts of the paper.

The concentration results presented in Section 3 and proved in Section 5 are based on two variants of Azuma-Hoeffding martingale inequality. The first one is basic. We state it as it appears in [16] by Janson, Łuczak and Ruciński.

Lemma 11 (Azuma-Hoeffding inequality, [1, 14]).

If $X_{0},X_{1},\ldots,X_{n}$ is a martingale and there exist $b_{1},\ldots,b_{n}$ such that $|X_{j}-X_{j-1}|\leq b_{j}$ for each $j\in[n]$ , then, for every $x>0$ ,

\mathbb{P}[X_{n}\geq\mathbb{E}[X_{n}]+x]\leq\exp\left\{-\frac{x^{2}}{2\sum_{j=1}^{n}b_{j}^{2}}\right\}.

The second one is Freedman’s inequality. We state it below in the form very similar to the one presented in Lemma 22 of [34] by Warnke (one may consult also [4] by Bennett and Dudek).

Lemma 12 (Freedman’s inequality, [11]).

Let $X_{0},X_{1},\ldots,X_{n}$ be a martingale with respect to a filtration $\mathscr{F}_{0}\subseteq\mathscr{F}_{1}\subseteq\ldots\subseteq\mathscr{F}_{n}$ . Set $A_{k}=\max_{i\in[k]}(X_{i}-X_{i-1})$ and $W_{k}=\sum_{i=1}^{k}\mathrm{Var}[X_{i}-X_{i-1}|\mathscr{F}_{i-1}]$ . Then for every $\lambda>0$ and $W,A>0$ we have

\mathbb{P}[\exists k\in[n]\quad X_{k}\geq X_{0}+\lambda,W_{k}\leq W,A_{k}\leq A]\leq\exp\left\{-\frac{\lambda^{2}}{2W+2A\lambda/3}\right\}.

Next, we present bounds for the values of $c_{j}$ and an upper bound for the function $\mu(\tilde{S})$ , both introduced in Definition 7. These results will be referred to very often later on. They are derived with the use of Stirling’s approximation.

Lemma 13 (Stirling’s approximation, [31]).

Let $n\in\mathbb{N}$ . Then

\sqrt{2\pi n}\left(\frac{n}{e}\right)^{n}e^{\frac{1}{12n+1}}<n!<\sqrt{2\pi n}\left(\frac{n}{e}\right)^{n}e^{\frac{1}{12n}}.

Lemma 14.

For $j\geq 1$ let $c_{j}=\prod_{i=1}^{j}\frac{2i-1}{2i}$ . Then

e^{-\frac{1}{8j}-\frac{1}{4\cdot 144j^{2}}}\cdot\frac{1}{\sqrt{\pi j}}\leq c_{j}\leq\frac{1}{\sqrt{\pi j}.}

Proof.

By Stirling’s approximation (Lemma 13) we get

c_{j}=\prod_{i=1}^{j}\frac{2i-1}{2i}=\frac{(2j)!}{2^{2j}(j!)^{2}}\leq\frac{1}{\sqrt{\pi j}}e^{\frac{1}{24j}-\frac{2}{12j+1}}\leq\frac{1}{\sqrt{\pi j}}.

Analogously, since $\frac{1}{12\cdot 2j+1}\geq\frac{1}{12\cdot 2j}-\frac{1}{144(2j)^{2}}$ ,

c_{j}\geq\frac{1}{\sqrt{\pi j}}e^{\frac{1}{24j}-\frac{1}{4\cdot 144j^{2}}-\frac{2}{12j}}=\frac{1}{\sqrt{\pi j}}e^{-\frac{1}{8j}-\frac{1}{4\cdot 144j^{2}}}.

∎

Lemma 15.

Let $G_{n}^{h}$ be a preferential attachment graph and $T_{hn}=(\tilde{V},\tilde{E})$ its corresponding random tree. Let also $t\in[M]$ and $\tilde{S}\subseteq\tilde{V}$ . Then

\mu(\tilde{S}_{t})\leq\sqrt{t}+\frac{1}{2}.

Proof.

By Lemma 14 we get

\begin{split}\mu(\tilde{S}_{t})&\leq\frac{\sqrt{\pi}}{2}\sum_{j=1}^{t}c_{j-1}\leq\frac{\sqrt{\pi}}{2}+\frac{1}{2}\sum_{j=2}^{t}\frac{1}{\sqrt{j-1}}\\ &\leq\frac{\sqrt{\pi}}{2}+\frac{1}{2}+\int_{1}^{t}\frac{1}{2\sqrt{j}}\,dj\leq\sqrt{t}+\frac{1}{2}.\end{split}

∎

Lemmas 17, 18 and 19 are auxiliary calculations for expressions involving the volumes of the subsets of $\tilde{V}$ (however the reader might not notice the connection with volumes at this point). They will be directly used in Section 5 in the proof of Theorem 9 stating the result on the concentration of $e_{G_{n}^{h}}(S)$ for $S\subseteq V$ . Their proofs utilize the following well known approximation.

Lemma 16 (See [17]).

Let $n\in\mathbb{N}$ and $H_{n}=\sum_{k=1}^{n}\frac{1}{k}$ . Then

H_{n}=\ln{n}+\gamma+\frac{1}{2n}-\alpha_{n},

where $\gamma\approx 0.5772$ is known as Euler-Mascheroni constant and $0\leq\alpha_{n}\leq 1/(8n^{2})$ .

Lemma 17.

Let $G_{n}^{h}$ be a preferential attachment graph and $T_{hn}=(\tilde{V},\tilde{E})$ its corresponding random tree. Fix $\tilde{S}\subseteq\tilde{V}$ and let $t_{0}\in[M]$ be such that $t_{0}=t_{0}(n)\xrightarrow{n\rightarrow\infty}\infty$ . Then

\sum_{i=t_{0}+1}^{M}\frac{{\delta}_{i}}{2i-1}=\frac{\pi}{2}\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}+O(\ln{M}+\ln{t_{0}}).

Proof.

First note that by Lemma 16

\sum_{i=1}^{t_{0}}\frac{{\delta}_{i}}{2i-1}\leq\sum_{i=1}^{t_{0}}\frac{1}{2i-1}=O(\ln t_{0}).

Therefore

(1)

\sum_{i=t_{0}+1}^{M}\frac{{\delta}_{i}}{2i-1}=\sum_{i=1}^{M}\frac{{\delta}_{i}}{2i-1}+O(\ln t_{0}).

For $i\in\{2,\ldots,M\}$ , by Lemma 14 we have

\frac{1}{2i-1}\leq\frac{1}{2(i-1)}\leq\frac{\pi}{2}c_{i-1}^{2}(1+O(1/i))

and note that the constant hidden in $O(1/i)$ is the same for all $i$ ’s. Therefore, by the fact that ${\delta}_{i}\in\{0,1\}$ and $c_{i}\leq 1$ ,

\begin{split}\sum_{i=1}^{M}\frac{{\delta}_{i}}{2i-1}&\leq\delta_{1}+\sum_{i=2}^{M}\frac{\pi}{2}{\delta}_{i}c_{i-1}^{2}+\frac{\pi}{2}\sum_{i=2}^{M}O(1/i)\\ &\leq\frac{\pi}{2}\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}+O(\ln{M}),\end{split}

which, together with (1), yields the desired upper bound.

On the other hand, for $i\in\{2,\ldots,M\}$ , by Lemma 14 we have

\frac{1}{2(i-1)}\geq\frac{\pi}{2}c_{i-1}^{2}

thus

\begin{split}\sum_{i=1}^{M}\frac{{\delta}_{i}}{2i-1}&\geq\sum_{i=2}^{M}\frac{2i-2}{2i-1}\frac{{\delta}_{i}}{2i-2}=\sum_{i=2}^{M}\frac{{\delta}_{i}}{2(i-1)}-\sum_{i=2}^{M}\frac{{\delta}_{i}}{(2i-1)(2i-2)}\\ &\geq\sum_{i=1}^{M}\frac{\pi}{2}{\delta}_{i}c_{i-1}^{2}-O(\ln{M})=\frac{\pi}{2}\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}-O(\ln{M}),\end{split}

which, together with (1), yields the desired lower bound. ∎

Lemma 18.

\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}=\frac{\pi}{2}\sum_{i=1}^{M}\left({\delta}_{i}c_{i-1}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)+O(\ln{M}+t_{0}).

Proof.

First note that by Lemma 15

\sum_{i=1}^{t_{0}}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}\leq\sum_{i=1}^{t_{0}}\frac{2\sqrt{i-1}(\sqrt{i-1}+\frac{1}{2})}{2i-1}\leq 2t_{0}.

Therefore

(2)

\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}=\sum_{i=1}^{M}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}+O(t_{0}).

Moreover, for $i\in\{2,\ldots,M\}$ , by Lemma 14 we have

\frac{1}{\sqrt{i-1}}\leq\sqrt{\pi}c_{i-1}(1+O(1/i))\quad\quad\textnormal{and}\quad\quad c_{i-1}\leq\frac{1}{\sqrt{\pi}\sqrt{i-1}}

and note that the constant hidden in $O(1/i)$ is the same for all $i$ ’s. Therefore, by the definition of $\mu(\tilde{S}_{i})$ , Lemma 15, and the fact that ${\delta}_{i}\in\{0,1\}$

\begin{split}\sum_{i=1}^{M}{\delta}_{i}&\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}=\sum_{i=2}^{M}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}\leq\sum_{i=2}^{M}{\delta}_{i}\frac{\mu(\tilde{S}_{i-1})}{\sqrt{i-1}}\\ &\leq\sum_{i=2}^{M}{\delta}_{i}\sqrt{\pi}c_{i-1}\mu(\tilde{S}_{i-1})(1+O(1/i))\\ &\leq\sum_{i=2}^{M}\left({\delta}_{i}\sqrt{\pi}c_{i-1}\frac{\sqrt{\pi}}{2}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)+\sum_{i=2}^{M}\sqrt{\pi}\frac{\sqrt{i-1}+\frac{1}{2}}{\sqrt{\pi}\sqrt{i-1}}O(1/i)\\ &=\frac{\pi}{2}\sum_{i=1}^{M}\left({\delta}_{i}c_{i-1}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)+O(\ln{M}),\end{split}

which, together with (2), yields the desired upper bound.

On the other hand, again by Lemma 14, the definition of $\mu(\tilde{S}_{i})$ , Lemma 15 and the fact that ${\delta}_{i}\in\{0,1\}$

\begin{split}\sum_{i=1}^{M}{\delta}_{i}&\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}=\sum_{i=2}^{M}\left(1-\frac{1}{2i-1}\right){\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-2}\\ &=\sum_{i=2}^{M}{\delta}_{i}\frac{\mu(\tilde{S}_{i-1})}{\sqrt{i-1}}-\sum_{i=2}^{M}\frac{\delta_{i}}{2i-1}\frac{\mu(\tilde{S}_{i-1})}{\sqrt{i-1}}\\ &\geq\sum_{i=2}^{M}\left({\delta}_{i}\sqrt{\pi}c_{i-1}\frac{\sqrt{\pi}}{2}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)-\sum_{i=2}^{M}\frac{1}{2i-1}\frac{\sqrt{i-1}+\frac{1}{2}}{\sqrt{i-1}}\\ &=\frac{\pi}{2}\sum_{i=1}^{M}\left({\delta}_{i}c_{i-1}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)-O(\ln{M}),\end{split}

which, together with (2), yields the desired lower bound. ∎

Lemma 19.

\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{C(i-1)}{2i-1}\leq\frac{C}{2}(M-t_{0}).

Proof.

By the fact that ${\delta}_{i}\in\{0,1\}$ we immediately get

\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{C(i-1)}{2i-1}\leq\frac{C}{2}\sum_{i=t_{0}+1}^{M}{\delta}_{i}\leq\frac{C}{2}(M-t_{0}).

∎

5. Edge density and volume results for $G_{n}^{h}$

In this section we use martingale techniques to prove theorems 8, 9, and 10 stated in Section 3, i.e., we derive concentration results for $\operatorname{vol}(S)$ , $e(S)$ , and $e(S,V\setminus S)$ for an arbitrary subset $S\subseteq V$ in $G_{n}^{h}$ . A series of results will lead us to Corollary 22 which implies, as a special case, Theorem 8. We start with analyzing the volumes.

Lemma 20.

\hat{Z}_{t}=c_{t}Z_{t}-\sum_{j=1}^{t}{\delta}_{j}c_{j-1}

(recall that ${\delta}_{j}$ and $c_{j}$ were introduced in Definition 7). Let $\mathscr{F}_{t}$ be a $\sigma$ -algebra associated with all the events that happened till time $t$ . Then $\hat{Z}_{1},\hat{Z}_{2},\ldots,\hat{Z}_{M}$ is a martingale with respect to the filtration $\mathscr{F}_{1}\subseteq\ldots\subseteq\mathscr{F}_{M}$ . Moreover, for $t\in[M-1]$

|\hat{Z}_{t+1}-\hat{Z}_{t}|\leq\frac{2}{\sqrt{\pi t}}\quad\quad\textnormal{and}\quad\quad\mathrm{Var}[(\hat{Z}_{t+1}-\hat{Z}_{t})|\mathscr{F}_{t}]\leq\frac{1}{4\pi(t+1)}.

Remark.

The first term of the martingale $\hat{Z}_{t}$ , i.e., $c_{t}Z_{t}=\left(\prod_{j=1}^{t}\frac{2j-1}{2j}\right)Z_{t}$ was inspired by the martingale constructed in Lemma 4 of [12] by Frieze, Prałat, Pérez-Giménez, and Reiniger.

Proof.

Let $t\in[M-1]$ . Recall that when mini-vertex $(t+1)$ arrives it may also connect to itself. Therefore, conditioned on $\mathscr{F}_{t}$ ,

(3)

Z_{t+1}=\begin{cases}Z_{t}+\delta_{t+1}+1&\quad\textnormal{with probability}\quad\frac{Z_{t}+\delta_{t+1}}{2t+1}\\ Z_{t}+\delta_{t+1}&\quad\textnormal{otherwise}.\end{cases}

Additionally, since $c_{t}=c_{t+1}\cdot\frac{2t+2}{2t+1}$ , we get

\begin{split}\mathbb{E}[\hat{Z}_{t+1}|\mathscr{F}_{t}]&=\mathbb{E}\bigg{[}c_{t+1}Z_{t+1}-\sum_{j=1}^{t+1}\delta_{j}c_{j-1}\bigg{|}\mathscr{F}_{t}\bigg{]}\\ &=c_{t+1}\left(Z_{t}+\delta_{t+1}+\frac{Z_{t}+\delta_{t+1}}{2t+1}\right)-\sum_{j=1}^{t+1}\delta_{j}c_{j-1}\\ &=c_{t+1}\cdot\frac{2t+2}{2t+1}(Z_{t}+\delta_{t+1})-\sum_{j=1}^{t+1}\delta_{j}c_{j-1}\\ &=c_{t}Z_{t}-\sum_{j=1}^{t}\delta_{j}c_{j-1}=\hat{Z}_{t},\end{split}

thus $\hat{Z}_{1},\ldots,\hat{Z}_{M}$ is a martingale with respect to the filtration $\mathscr{F}_{1}\subseteq\ldots\subseteq\mathscr{F}_{M}$ . Next, since $c_{t+1}=c_{t}\cdot\frac{2t+1}{2t+2}$

\begin{split}|\hat{Z}_{t+1}-\hat{Z}_{t}|&=|c_{t+1}Z_{t+1}-c_{t}Z_{t}-\delta_{t+1}c_{t}|=c_{t}\left|\frac{2t+1}{2t+2}Z_{t+1}-Z_{t}-\delta_{t+1}\right|\\ &=c_{t}\left|(Z_{t+1}-Z_{t})-\left(\frac{Z_{t+1}}{2t+2}+\delta_{t+1}\right)\right|.\end{split}

Note that $(Z_{t+1}-Z_{t})\in\{0,1,2\}$ . Moreover, the volume of $\tilde{S}_{t+1}$ in $T_{t+1}$ may be at most $2t+2$ thus $Z_{t+1}\leq 2t+2$ which implies also $\left(\frac{Z_{t+1}}{2t+2}+\delta_{t+1}\right)\in[0,2]$ . Now use Lemma 14 and the fact that $|a-b|\leq\max\{a,b\}$ for non-negative $a,b$ to get

|\hat{Z}_{t+1}-\hat{Z}_{t}|\leq 2c_{t}\leq\frac{2}{\sqrt{\pi t}}.

By the fact that $\hat{Z}_{1},\ldots,\hat{Z}_{M}$ is a martingale with respect to the filtration $\mathscr{F}_{1}\subseteq\ldots\subseteq\mathscr{F}_{M}$ , $c_{t}=\frac{2t+2}{2t+1}\cdot c_{t+1}$ and by (3) we also have

\begin{split}\mathrm{Var}[(\hat{Z}_{t+1}-\hat{Z}_{t})&|\mathscr{F}_{t}]=\mathbb{E}[(\hat{Z}_{t+1}-\hat{Z}_{t})^{2}|\mathscr{F}_{t}]\\ &=\mathbb{E}[(c_{t+1}Z_{t+1}-c_{t}Z_{t}-{\delta}_{t+1}c_{t})^{2}|\mathscr{F}_{t}]\\ &=c_{t+1}^{2}~{}\mathbb{E}\left[\left(Z_{t+1}-\frac{2t+2}{2t+1}(Z_{t}+{\delta}_{t+1})\right)^{2}\Bigg{|}\mathscr{F}_{t}\right]\\ &=c_{t+1}^{2}\left(\left(Z_{t}+\delta_{t+1}+1-\frac{2t+2}{2t+1}(Z_{t}+{\delta}_{t+1})\right)^{2}\frac{Z_{t}+\delta_{t+1}}{2t+1}\right.\\ &\quad\quad\quad+\left.\left(Z_{t}+\delta_{t+1}-\frac{2t+2}{2t+1}(Z_{t}+{\delta}_{t+1})\right)^{2}\left(1-\frac{Z_{t}+\delta_{t+1}}{2t+1}\right)\right)\\ &=c_{t+1}^{2}\left(\frac{Z_{t}+\delta_{t+1}}{2t+1}\left(1-\frac{Z_{t}+\delta_{t+1}}{2t+1}\right)\right)\leq\frac{c_{t+1}^{2}}{4}\leq\frac{1}{4\pi(t+1)},\end{split}

where the last inequalities follow from the fact that $\frac{Z_{t}+\delta_{t+1}}{2t+1}\in[0,1]$ and Lemma 14, respectively. ∎

Theorem 21.

Let $G_{n}^{h}$ be a preferential attachment graph and $T_{hn}=(\tilde{V},\tilde{E})$ its corresponding random tree. Fix $\tilde{S}\subseteq\tilde{V}$ and let $t\in[M]$ . Then for every $\varepsilon>0$ , for sufficiently large $t$ and for sufficiently large $n$ we get

\mathbb{P}\left[\left|\operatorname{vol}_{T_{t}}(\tilde{S}_{t})-2\sqrt{t}\mu(\tilde{S}_{t})\right|\geq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right]\leq 2\cdot 2^{-(1+{\varepsilon}/2)t/h},

where $g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2$ .
(Recall that $\mu(\tilde{S}_{t})$ was introduced in Definition 7).

Proof.

Throughout the proof we refer to the process of constructing the random tree $T_{hn}$ . For $t\in[M]$ let $\mathscr{F}_{t}$ be a $\sigma$ -algebra associated with all the events that happened till time $t$ . Fix $\varepsilon>0$ , set $t_{0}=\lfloor t/h\rfloor$ and for $j\in\{t_{0},t_{0}+1,\ldots,t\}$ consider

\hat{Z}_{j}=c_{j}Z_{j}-\sum_{i=1}^{j}{\delta}_{i}c_{i-1},

where $Z_{j}=\operatorname{vol}_{T_{j}}(\tilde{S}_{j})$ and $c_{i}$ ’s and ${\delta}_{i}$ ’s are as in Definition 7. By Lemma 20 we know that $\hat{Z}_{t_{0}},\ldots,\hat{Z}_{t}$ is a martingale with respect to the filtration $\mathscr{F}_{t_{0}}\subseteq\ldots\subseteq\mathscr{F}_{t}$ such that $|\hat{Z}_{j}-\hat{Z}_{j-1}|\leq\frac{2}{\sqrt{\pi(j-1)}}$ and $\mathrm{Var}[(\hat{Z}_{j}-\hat{Z}_{j-1})|\mathscr{F}_{j-1}]\leq\frac{1}{4\pi j}$ . Therefore

\max_{j\in\{t_{0}+1,\ldots,t\}}(\hat{Z}_{j}-\hat{Z}_{j-1})\leq\frac{2}{\sqrt{\pi(t_{0}-1)}}=\frac{2}{\sqrt{\pi(t/h)}}(1+O(1/t))

and, by Lemma 16,

\begin{split}\sum_{j=t_{0}+1}^{t}\mathrm{Var}[(\hat{Z}_{j}&-\hat{Z}_{j-1})|\mathscr{F}_{j-1}]\\ &\leq\sum_{j=t_{0}+1}^{t}\frac{1}{4\pi j}\leq\frac{1}{4\pi}\ln{(t/t_{0})}+\frac{1}{4\pi}\cdot\frac{1}{8\lfloor t/h\rfloor^{2}}=\frac{1}{4\pi}\ln{h}+O(1/t).\end{split}

Applying Freedman’s inequality (Lemma 12) to $\hat{Z}_{t_{0}},\ldots,\hat{Z}_{t}$ with $A=\frac{2}{\sqrt{\pi(t/h)}}(1+O(1/t))$ , $W=\frac{1}{4\pi}\ln{h}+O(1/t)$ and $\lambda=\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}$ , where $\bar{g}_{\mathscr{V}}(h)=g_{\mathscr{V}}(h)-2$ , we get

\begin{split}\mathbb{P}\left[\hat{Z}_{t}\vphantom{\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}}\right.&\left.\geq\hat{Z}_{t_{0}}+\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right]\\ &\leq\exp\left\{-\frac{\frac{(1+\varepsilon)^{2}\bar{g}_{\mathscr{V}}(h)^{2}}{\pi}\frac{t}{h}}{2\cdot\frac{1}{4\pi}\ln{h}+O(1/t)+\frac{2}{3}\cdot\frac{2}{\pi}(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)(1+O(1/t))}\right\}\\ &\leq\exp\left\{-\frac{(1+\varepsilon/2)^{2}\bar{g}_{\mathscr{V}}(h)^{2}}{\frac{1}{2}\ln{h}+\frac{4}{3}(1+\varepsilon/2)\bar{g}_{\mathscr{V}}(h)}\cdot\frac{t}{h}\right\},\end{split}

where the last inequality holds for sufficiently large $t$ and $n$ . One can verify that

\frac{(1+\varepsilon/2)^{2}\bar{g}_{\mathscr{V}}(h)^{2}}{\frac{1}{2}\ln{h}+\frac{4}{3}(1+\varepsilon/2)\bar{g}_{\mathscr{V}}(h)}\geq(1+{\varepsilon}/2)\ln{2}

thus for sufficiently large $t$ and $n$ we get

(4)

\mathbb{P}\left[\hat{Z}_{t}\geq\hat{Z}_{t_{0}}+\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right]\leq 2^{-(1+{\varepsilon}/2)t/h}.

Let us now analyze the complementary event $\left\{\hat{Z}_{t}\leq\hat{Z}_{t_{0}}+\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\}$ . It is equivalent to

\left\{c_{t}Z_{t}-\sum_{i=1}^{t}{\delta}_{i}c_{i-1}\leq c_{t_{0}}Z_{t_{0}}-\sum_{i=1}^{t_{0}}{\delta}_{i}c_{i-1}+\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\}

which is

\left\{Z_{t}-\frac{1}{c_{t}}\sum_{i=1}^{t}{\delta}_{i}c_{i-1}\leq\frac{c_{t_{0}}}{c_{t}}Z_{t_{0}}-\frac{1}{c_{t}}\sum_{i=1}^{t_{0}}{\delta}_{i}c_{i-1}+\frac{1}{c_{t}}\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\}.

By the definition of $\mu(\tilde{S}_{t})$ , Lemma 14 and Lemma 15 we have

(5)

\begin{split}\frac{1}{c_{t}}\sum_{i=1}^{t}{\delta}_{i}c_{i-1}&=\frac{2}{\sqrt{\pi}c_{t}}\mu(\tilde{S}_{t})\leq 2\sqrt{t}\mu(\tilde{S}_{t})e^{\frac{1}{8t}+\frac{1}{4\cdot 144t^{2}}}\\ &=2\sqrt{t}\mu(\tilde{S}_{t})(1+O(1/t))=2\sqrt{t}\mu(\tilde{S}_{t})+O(1).\end{split}

Note that $Z_{t_{0}}\leq 2t_{0}$ , therefore by Lemma 14

(6)

\frac{c_{t_{0}}}{c_{t}}Z_{t_{0}}\leq 2\sqrt{t}\sqrt{t_{0}}(1+O(1/t))=\frac{2t}{\sqrt{h}}+O(1),

and, again by Lemma 14,

(7)

\begin{split}\frac{1}{c_{t}}\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}&\leq(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}~{}(1+O(1/t))\\ &=(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}+O(1).\end{split}

Thus by (5), (6) and (7) we get that the event $\left\{\hat{Z}_{t}\leq\hat{Z}_{t_{0}}+\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\}$ implies

\left\{Z_{t}-2\sqrt{t}\mu(\tilde{S}_{t})\leq\frac{2t}{\sqrt{h}}+(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}+O(1)\right\}

which, by (4), for sufficiently large $t$ and $n$ , gives (recall that $\bar{g}_{\mathscr{V}}(h)+2=g_{\mathscr{V}}(h)$ )

(8)

\mathbb{P}\left[\operatorname{vol}_{T_{t}}(\tilde{S}_{t})-2\sqrt{t}\mu(\tilde{S}_{t})\geq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right]\leq 2^{-(1+{\varepsilon}/2)t/h}.

To get the opposite bound we repeat the reasoning for the martingale $-\hat{Z}_{t_{0}},\ldots,-\hat{Z}_{t}$ . We apply again Freedman’s inequality (Lemma 12) with $A=\frac{2}{\sqrt{\pi(t/h)}}(1+O(1/t))$ , $W=\frac{1}{4\pi}\ln{h}+O(1/t)$ and $\lambda=\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}$ , where (as before) $\bar{g}_{\mathscr{V}}(h)=g_{\mathscr{V}}(h)-2$ . For sufficiently large $t$ and $n$ we get

\begin{split}\mathbb{P}\left[-\hat{Z}_{t}\vphantom{\frac{{C}_{3\varepsilon}}{\sqrt{\pi}}}\right.&\left.\geq-\hat{Z}_{t_{0}}+\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right]\leq\exp\left\{-\frac{(1+\varepsilon/2)^{2}\bar{g}_{\mathscr{V}}(h)^{2}}{\frac{1}{2}\ln{h}+\frac{4}{3}(1+\varepsilon/2)\bar{g}_{\mathscr{V}}(h)}\cdot\frac{t}{h}\right\},\end{split}

which for sufficiently large $t$ and $n$ implies

(9)

\mathbb{P}\left[\hat{Z}_{t}\leq\hat{Z}_{t_{0}}-\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right]\leq 2^{-(1+{\varepsilon}/2)t/h}.

Now, the complementary event $\left\{\hat{Z}_{t}\geq\hat{Z}_{t_{0}}-\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\}$ is equivalent to

\left\{Z_{t}-\frac{1}{c_{t}}\sum_{i=1}^{t}{\delta}_{i}c_{i-1}\geq\frac{c_{t_{0}}}{c_{t}}Z_{t_{0}}-\frac{1}{c_{t}}\sum_{i=1}^{t_{0}}{\delta}_{i}c_{i-1}-\frac{1}{c_{t}}\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\}.

This time by the definition of $\mu(\tilde{S}_{t})$ and Lemma 14 we have

(10)

\frac{1}{c_{t}}\sum_{i=1}^{t}{\delta}_{i}c_{i-1}\geq 2\sqrt{t}\mu(\tilde{S}_{t}),

and by the definition of $\mu(\tilde{S}_{t})$ , Lemma 14 and Lemma 15

(11)

\frac{1}{c_{t}}\sum_{i=1}^{t_{0}}{\delta}_{i}c_{i-1}\leq 2\sqrt{t}\mu(\tilde{S}_{t_{0}})e^{\frac{1}{8t}+\frac{1}{4\cdot 144t^{2}}}\leq\frac{2t}{\sqrt{h}}+O(1).

Hence, by (10), (11) and (7), the event $\left\{\hat{Z}_{t}\geq\hat{Z}_{t_{0}}-\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\}$ implies

\left\{Z_{t}-2\sqrt{t}\mu(\tilde{S}_{t})\geq-\frac{2t}{\sqrt{h}}-(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}-O(1)\right\}.

Therefore by (9) for sufficiently large $t$ and $n$ we get

(12)

\mathbb{P}\left[\operatorname{vol}_{T_{t}}(\tilde{S}_{t})-2\sqrt{t}\mu(\tilde{S}_{t})\leq-(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right]\leq 2^{-(1+{\varepsilon}/2)t/h}.

Finally, by (8) and (12), for sufficiently large $t$ and $n$ we may write

\mathbb{P}\left[\left|\operatorname{vol}_{T_{t}}(\tilde{S}_{t})-2\sqrt{t}\mu(\tilde{S}_{t})\right|\geq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right]\leq 2\cdot 2^{-(1+{\varepsilon/2})t/h}.

∎

Corollary 22.

Let $G_{n}^{h}=(V,E)$ be a preferential attachment graph and $T_{hn}=(\tilde{V},\tilde{E})$ its corresponding random tree. Then for every $\varepsilon>0$ we have

\begin{split}\mathbb{P}\left[\vphantom{\frac{C}{\sqrt{h}}}\forall i\right.&\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\}\,\forall S\subseteq V\\ &\left.\left|\operatorname{vol}_{G_{i}^{h}}(S_{i})-2\sqrt{hi}\,\mu(\tilde{S}_{hi})\right|\leq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{hi}{\sqrt{h}}\right]=1-o(1),\end{split}

where $g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2$ .

Proof.

Fix $\varepsilon>0$ . Recall that $\operatorname{vol}_{G_{i}^{h}}(S_{i})=\operatorname{vol}_{T_{hi}}(\tilde{S}_{hi})$ . For $S\subseteq V$ and $i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\}$ define the event $\mathscr{E}_{S,i}$ as follows

\mathscr{E}_{S,i}=\Bigl{\{}\left|\operatorname{vol}_{T_{hi}}(\tilde{S}_{hi})-2\sqrt{hi}\,\mu(\tilde{S}_{hi})\right|\leq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{hi}{\sqrt{h}}\Bigr{\}}.

For $i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\}$ , by Theorem 21 and the union bound, for sufficiently large $n$ we have

\mathbb{P}\left[\exists S\subseteq V\,\,\mathscr{E}_{S,i}^{C}\right]\leq 2^{i}\cdot 2\cdot 2^{-(1+\varepsilon/2)i}=2\cdot 2^{-(\varepsilon/2)i}.

Indeed, note that $i$ iterates over the vertices of $G_{n}^{h}$ and at time $i$ there are $2^{i}$ possible configurations for $S_{i}$ , thus also for $\tilde{S}_{hi}$ . Next, again by the union bound, for sufficiently large $n$ we get

\begin{split}\mathbb{P}\left[\exists i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\}\,\,\exists S\subseteq V\,\,\mathscr{E}_{S,i}^{C}\right]&\leq\sum_{i=\lfloor\log_{2}{n}\rfloor}^{n}2\cdot 2^{-(\varepsilon/2)i}\\ &\leq\frac{2\cdot 2^{-\varepsilon\lfloor\log_{2}{n}\rfloor}}{1-2^{-\varepsilon}}\sim\frac{2}{(1-2^{-\varepsilon})n^{\varepsilon}},\end{split}

which implies

\mathbb{P}\left[\forall i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\}\,\,\forall S\subseteq V\,\,\mathscr{E}_{S,i}\right]=1-o(1).

∎

Note that considering only $i=n$ in Corollary 22 we get the statement of Theorem 8, which finishes its proof.

Now we will again use martingale inequalities which together with concentration results for volumes from Corollary 22 will lead to the proof of Theorem 9. Consider the process of constructing the random tree $T_{hn}=(\tilde{V},\tilde{E})$ . Let $\tilde{S}\subseteq\tilde{V}$ and $j\in\{1,\ldots,M\}$ . The result stated in Corollary 22 gives the concentration of the volumes of $\tilde{S}_{j}$ at time $j$ only for $j=hi$ , where $i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\}$ . In particular, it says nothing about the concentration of the volumes of the sets $\tilde{S}_{hi+k}$ , where $k\in\{1,2,\ldots h-1\}$ , at time $hi+k$ . Such intermediate concentrations will be needed to prove Theorem 9, i.e., to draw conclusion about the concentration of the number of edges within $\tilde{S}$ . We derive those intermediate concentrations in Lemma 23.

Lemma 23.

Let $G_{n}^{h}=(V,E)$ be a preferential attachment graph and $T_{hn}=(\tilde{V},\tilde{E})$ its corresponding random tree. Then for every $\varepsilon>0$ we have

\begin{split}\mathbb{P}\left[\vphantom{\frac{C}{\sqrt{h}}}\forall t\right.&\in\{\lfloor\log_{2}{n}\rfloor h,\ldots,M\}\,\forall S\subseteq V\\ &\left.\left|\operatorname{vol}_{T_{t}}(\tilde{S}_{t})-2\sqrt{t}\,\mu(\tilde{S}_{t})\right|\leq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right]=1-o(1),\end{split}

where $g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2$ .

Proof.

Fix $\varepsilon>0$ . Define the events $\mathscr{H}$ and $\mathscr{V}$ as follows

\begin{split}\mathscr{H}=\left\{\vphantom{\frac{C}{\sqrt{h}}}\forall t\right.&\in\{\lfloor\log_{2}{n}\rfloor h,\ldots,M\}\,\forall S\subseteq V\\ &\left.\left|\operatorname{vol}_{T_{t}}(\tilde{S}_{t})-2\sqrt{t}\,\mu(\tilde{S}_{t})\right|\leq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right\}\\ \mathscr{V}=\left\{\vphantom{\frac{C}{\sqrt{h}}}\right.\forall j&=ih\textnormal{ such that }i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\}\,\forall S\subseteq V\\ &\left.\left|\operatorname{vol}_{T_{j}}(\tilde{S}_{j})-2\sqrt{j}\,\mu(\tilde{S}_{j})\right|\leq(1+\varepsilon/2)g_{\mathscr{V}}(h)\frac{j}{\sqrt{h}}\right\}.\end{split}

By Corollary 22 we know that $\mathbb{P}[\mathscr{V}]=1-o(1)$ thus it is enough to show that the event $\mathscr{V}$ implies the event $\mathscr{H}$ .

Assume that $\mathscr{V}$ holds. For $t\in[M]$ and $S\subseteq V$ let $Z_{t}=\operatorname{vol}_{T_{t}}(\tilde{S}_{t})$ . Consider all $j=ih$ where $i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n-1\}$ and let $k\in\{1,2,\ldots h-1\}$ . By the fact that $\sqrt{1+k/j}=1+O(1/j)$ , ${\delta}_{\ell}\in\{0,1\}$ , Lemma 14 and Lemma 15 we get

\begin{split}2\sqrt{j+k}\,\mu(\tilde{S}_{j+k})&=2\sqrt{j}\sqrt{1+k/j}\left(\mu(\tilde{S}_{j})+\sum_{\ell=j+1}^{j+k}{{\delta}_{\ell}c_{\ell-1}}\right)\\ &\leq 2\sqrt{j}\left(1+O(1/j)\right)\left(\mu(\tilde{S}_{j})+\sum_{\ell=j+1}^{j+k}\frac{1}{\sqrt{\pi(\ell-1)}}\right)\\ &=(2\sqrt{j}+O(1/\sqrt{j}))\left(\mu(\tilde{S}_{j})+O(1/\sqrt{j})\right)\\ &=2\sqrt{j}\,\mu(\tilde{S}_{j})+O(1).\end{split}

Therefore, if $\mathscr{V}$ holds, then for all $j=ih$ where $i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n-1\}$ , for all $S\subseteq V$ , and for all $k\in\{1,2,\ldots,h-1\}$ , for sufficiently large $n$ on one hand,

\begin{split}Z_{j+k}&\geq Z_{j}\geq 2\sqrt{j}\,\mu(\tilde{S}_{j})-(1+\varepsilon/2)g_{\mathscr{V}}(h)\frac{j}{\sqrt{h}}\\ &\geq 2\sqrt{j+k}\,\mu(\tilde{S}_{j+k})-O(1)-(1+\varepsilon/2)g_{\mathscr{V}}(h)\frac{j+k}{\sqrt{h}}\\ &\geq 2\sqrt{j+k}\,\mu(\tilde{S}_{j+k})-(1+\varepsilon)g_{\mathscr{V}}(h)\frac{j+k}{\sqrt{h}}\end{split}

and on the other hand

\begin{split}Z_{j+k}&\leq Z_{j}+2k\leq 2\sqrt{j}\,\mu(\tilde{S}_{j})+(1+\varepsilon/2)g_{\mathscr{V}}(h)\frac{j}{\sqrt{h}}+2h\\ &\leq 2\sqrt{j+k}\,\mu(\tilde{S}_{j+k})+(1+\varepsilon)g_{\mathscr{V}}(h)\frac{j+k}{\sqrt{h}}.\end{split}

Thus for sufficiently large $n$ the event $\mathscr{V}$ implies the event $\mathscr{H}$ , therefore $\mathbb{P}[\mathscr{V}]=1-o(1)$ implies $\mathbb{P}[\mathscr{H}]=1-o(1)$ and the proof is finished. ∎

Now, we move on to the concentration of the number of edges within subsets of $V$ .

Lemma 24.

Let $G_{n}^{h}$ be a preferential attachment graph. Consider the process of constructing its corresponding random tree $T_{hn}=(\tilde{V},\tilde{E})$ . Fix $\tilde{S}\subseteq\tilde{V}$ and for $t\in[M]$ let $X_{t}=e(\tilde{S}_{t})$ and $\mathscr{F}_{t}$ be a $\sigma$ -algebra associated with all the events that happened till time $t$ . For $j\in\{2,\ldots,M\}$ set $D_{j}=\mathbb{E}[X_{j}-X_{j-1}|\mathscr{F}_{j-1}]$ and define

\hat{X}_{t}=X_{t}-\sum_{j=2}^{t}D_{j}.

Then $\hat{X}_{1},\hat{X}_{2},\ldots,\hat{X}_{M}$ is a martingale with respect to the filtration $\mathscr{F}_{1}\subseteq\ldots\subseteq\mathscr{F}_{M}$ . Moreover, for $t\in\{2,\ldots,M\}$

|\hat{X}_{t}-\hat{X}_{t-1}|\leq{\delta}_{t}.

Proof.

Let $t\in\{2,\ldots,M\}$ . Note that

\begin{split}\mathbb{E}[\hat{X}_{t}-\hat{X}_{t-1}|\mathscr{F}_{t-1}]&=\mathbb{E}[{X}_{t}-{X}_{t-1}-D_{t}|\mathscr{F}_{t-1}]\\ &=\mathbb{E}[{X}_{t}-{X}_{t-1}|\mathscr{F}_{t-1}]-\mathbb{E}[\mathbb{E}[X_{t}-X_{t-1}|\mathscr{F}_{t-1}]|\mathscr{F}_{t-1}]\\ &=\mathbb{E}[{X}_{t}-{X}_{t-1}|\mathscr{F}_{t-1}]-\mathbb{E}[{X}_{t}-{X}_{t-1}|\mathscr{F}_{t-1}]=0,\end{split}

thus $\hat{X}_{1},\ldots,\hat{X}_{M}$ is a martingale with respect to the filtration $\mathscr{F}_{1}\subseteq\ldots\subseteq\mathscr{F}_{M}$ .

Now, for $t\in[M]$ let $Z_{t}=\operatorname{vol}_{T_{t}}(\tilde{S}_{t})$ . Recall that when the mini-vertex $t$ arrives it may also connect to itself thus for $t\in\{2,\ldots,M\}$ we have

D_{t}=\mathbb{E}[X_{t}-X_{t-1}|\mathscr{F}_{t-1}]={\delta}_{t}\frac{Z_{t-1}+1}{2t-1}.

Therefore

\begin{split}|\hat{X}_{t}-\hat{X}_{t-1}|&=|X_{t}-X_{t-1}-D_{t}|=\left|X_{t}-X_{t-1}-{\delta}_{t}\frac{Z_{t-1}+1}{2t-1}\right|\\ &\leq\max\left\{X_{t}-X_{t-1},{\delta}_{t}\frac{Z_{t-1}+1}{2t-1}\right\}\leq{\delta}_{t},\end{split}

where we used the fact that $0\leq X_{t}-X_{t-1}\leq{\delta}_{t}$ , $Z_{t-1}\leq 2t-2$ thus $0\leq{\delta}_{t}\frac{Z_{t-1}}{2t-1}\leq{\delta}_{t}$ and $|a-b|\leq\max\{a,b\}$ for non-negative $a,b$ .

∎

Lemma 25.

Let $G_{n}^{h}$ be a preferential attachment graph and $T_{hn}=(\tilde{V},\tilde{E})$ its corresponding random tree. For $t\in[M]$ and $S\subseteq V$ let $Z_{t}=\operatorname{vol}_{T_{t}}(\tilde{S}_{t})$ . Then for $t_{0}\in[M]$ divisible by $h$ and for every $\varepsilon>0$ we have

\mathbb{P}\left[\forall S\subseteq V\,\left|e(S)-\left(e(S_{t_{0}/h})+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\right)\right|\leq{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right]=1-o(1),

where ${B}_{\varepsilon}=\sqrt{(1+\varepsilon)2\ln{2}}$ .

Proof.

Throughout the proof we again refer to the process of constructing the random tree $T_{hn}$ . For $t\in[M]$ let $\mathscr{F}_{t}$ be a $\sigma$ -algebra associated with all the events that happened till time $t$ . For $S\subseteq V$ let $X_{t}=e_{T_{t}}(\tilde{S}_{t})$ and $D_{t}=\mathbb{E}[X_{t}-X_{t-1}|\mathscr{F}_{t-1}]$ . Fix $\varepsilon>0$ and for $j\in\{t_{0},t_{0}+1,\ldots,M\}$ and $S\subseteq V$ consider

\hat{X}_{j}=X_{j}-\sum_{i=2}^{j}D_{i}.

By Lemma 24 we know that $\hat{X}_{t_{0}},\ldots,\hat{X}_{M}$ is a martingale with respect to the filtration $\mathscr{F}_{t_{0}}\subseteq\ldots\subseteq\mathscr{F}_{M}$ such that $|\hat{X}_{j}-\hat{X}_{j-1}|\leq{\delta}_{j}$ . Moreover

\sum_{j=t_{0}}^{M}{\delta}_{j}^{2}=\sum_{j=t_{0}}^{M}{\delta}_{j}\leq M.

Thus applying Azuma-Hoeffding inequality (Lemma 11) to $\hat{X}_{t_{0}},\ldots,\hat{X}_{M}$ with $b_{j}={\delta}_{j}$ and $x={B}_{\varepsilon}\frac{M}{\sqrt{h}}$ , where ${B}_{\varepsilon}=\sqrt{(1+\varepsilon)2\ln{2}}$ we get

(13)

\begin{split}\mathbb{P}\left[\hat{X}_{M}\geq\hat{X}_{t_{0}}+{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right]&\leq\exp\left\{-\frac{(1+\varepsilon)2\ln{2}\cdot M^{2}/h}{2M}\right\}=2^{-(1+\varepsilon)n}.\end{split}

Let us now analyze the event $\left\{\hat{X}_{M}\geq\hat{X}_{t_{0}}+{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right\}$ . It is equivalent to

\left\{X_{M}-\sum_{i=2}^{M}D_{i}\geq X_{t_{0}}-\sum_{i=2}^{t_{0}}D_{i}+{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right\}

which is

\left\{X_{M}\geq X_{t_{0}}+\sum_{i=t_{0}+1}^{M}D_{i}+{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right\}

and, by the definition of $D_{i}$ (check the proof of Lemma 24),

\left\{X_{M}\geq X_{t_{0}}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}+{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right\}.

Thus, by (13), we get

(14)

\mathbb{P}\left[X_{M}-\left(X_{t_{0}}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\right)\geq{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right]\leq 2^{-(1+\varepsilon)n}.

Acting analogously for the martingale $-\hat{X}_{t_{0}},\ldots,-\hat{X}_{M}$ we get

\mathbb{P}\left[\hat{X}_{M}\leq\hat{X}_{t_{0}}-{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right]\leq 2^{-(1+\varepsilon)n},

which implies

(15)

\mathbb{P}\left[X_{M}-\left(X_{t_{0}}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\right)\leq-{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right]\leq 2^{-(1+\varepsilon)n}.

By (14) and (15), for a fixed $S\subseteq V$ we get

(16)

\mathbb{P}\left[\left|X_{M}-\left(X_{t_{0}}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\right)\right|\geq{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right]\leq 2\cdot 2^{-(1+\varepsilon)n}.

Now, for $S\subseteq V$ , let the event $\mathscr{E}_{S}$ be defined as

\mathscr{E}_{S}=\left\{\left|e(S)-\left(e(S_{t_{0}/h})+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\right)\right|\leq{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right\}.

By (16) and the union bound we get (recall that $X_{M}=e(S)$ and $X_{t_{0}}=e(S_{t_{0}/h})$ )

\begin{split}\mathbb{P}\left[\forall S\subseteq V\,\mathscr{E}_{S}\right]&=1-\mathbb{P}\left[\exists S\subseteq V\,\mathscr{E}_{S}^{C}\right]\geq 1-\sum_{S\subseteq V}\mathbb{P}\left[\mathscr{E}_{S}^{C}\right]\\ &\geq 1-2^{n}\cdot 2\cdot 2^{-(1+\varepsilon)n}=1-2\cdot 2^{-\varepsilon n}=1-o(1).\end{split}

∎

We are ready to prove Theorem 9.

Proof of Theorem 9.

Throughout the proof we again refer to the process of constructing the random tree $T_{hn}$ . Fix $\varepsilon>0$ . For $t\in[M]$ and $S\subseteq V$ let $X_{t}=e_{T_{t}}(\tilde{S}_{t})$ and $Z_{t}=\operatorname{vol}_{T_{t}}(\tilde{S}_{t})$ . For $t_{0}=h\lfloor\log_{2}{n}\rfloor$ we define the events $\mathscr{H}$ , $\mathscr{E}$ and $\mathscr{V}$ as follows

\begin{split}\mathscr{H}=\left\{\vphantom{\frac{C}{\sqrt{h}}}\forall S\right.&\subseteq V\left.\left|e(S)-\mu(\tilde{S})^{2}\right|\leq(1+\varepsilon)g_{\mathscr{E}}(h)\frac{M}{\sqrt{h}}\right\},\\ \mathscr{E}=\left\{\vphantom{\frac{C}{\sqrt{h}}}\forall S\right.&\subseteq V\left.\left|e(S)-\left(e(S_{t_{0}/h})+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\right)\right|\leq{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right\},\\ \mathscr{V}=\left\{\vphantom{\frac{C}{\sqrt{h}}}\forall t\right.&\in\{t_{0},\ldots,M\}\,\forall S\subseteq V\left.|Z_{t}-2\sqrt{t}\,\mu(\tilde{S}_{t})|\leq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right\},\end{split}

where ${B}_{\varepsilon}=\sqrt{(1+\varepsilon)2\ln{2}}$ . By lemmas 23 and 25 we know that $\mathbb{P}[\mathscr{E}\cap\mathscr{V}]=1-o(1)$ . Thus it is enough to show that the event $\mathscr{E}\cap\mathscr{V}$ implies the event $\mathscr{H}$ .

Assume that $\mathscr{E}\cap\mathscr{V}$ holds. By lemmas 17, 18, and 19, for any $S\subseteq V$ , as $t_{0}=O(\ln M)$ , we can write

\begin{split}\sum_{i=t_{0}+1}^{M}&{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\\ \leq&\sum_{i=t_{0}+1}^{M}\frac{{\delta}_{i}}{2i-1}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{(1+\varepsilon)g_{\mathscr{V}}(h)\frac{i-1}{\sqrt{h}}}{2i-1}\\ \leq&~{}\frac{\pi}{2}\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}+\frac{\pi}{2}\sum_{i=1}^{M}\left({\delta}_{i}c_{i-1}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)\\ &\quad\quad\quad\quad+\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}+O(\ln{M})\\ =&\left(\frac{\sqrt{\pi}}{2}\right)^{2}\left(\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}+2\sum_{i=1}^{M}\left({\delta}_{i}c_{i-1}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)\right)+\frac{\pi}{4}\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}\\ &+\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}+O(\ln{M})\\ \leq&~{}\mu(\tilde{S}_{M})^{2}+\frac{\pi}{4}\sum_{i=1}^{M}\frac{1}{\pi(i-1)}+\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}+O(\ln{M})\\ =&~{}\mu(\tilde{S}_{M})^{2}+\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}+O(\ln{M}),\end{split}

where the last inequality follows from Lemma 14, the fact that ${\delta}_{i}\in\{0,1\}$ , and the fact that

\mu(\tilde{S}_{M})^{2}=\left(\frac{\sqrt{\pi}}{2}\sum_{i=1}^{M}{\delta}_{i}c_{i-1}\right)^{2}=\left(\frac{\sqrt{\pi}}{2}\right)^{2}\left(\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}+2\sum_{i=1}^{M}\left({\delta}_{i}c_{i-1}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)\right).

Analogously, again by lemmas 17, 18, and 19, for any $S\subseteq V$ we can get

\begin{split}\sum_{i=t_{0}+1}^{M}&{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\\ &\geq\sum_{i=t_{0}+1}^{M}\frac{{\delta}_{i}}{2i-1}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}-\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{(1+\varepsilon)g_{\mathscr{V}}(h)\frac{i-1}{\sqrt{h}}}{2i-1}\\ &\geq\mu(\tilde{S}_{M})^{2}-\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}-O(\ln{M}).\end{split}

Note that for any $S\subseteq V$ we have $e(S_{t_{0}/h})\leq t_{0}$ and recall that $t_{0}=h\lfloor\log_{2}{n}\rfloor$ . Thus for all $S\subseteq V$ , for sufficiently large $n$ we may write (recall that $\mathscr{E}\cap\mathscr{V}$ holds)

\begin{split}e(S)&\leq t_{0}+{B}_{\varepsilon}\frac{M}{\sqrt{h}}+\mu(\tilde{S})^{2}+\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}+O(\ln{M})\\ &\leq\mu(\tilde{S})^{2}+(1+\varepsilon)\sqrt{2\ln{2}}\frac{M}{\sqrt{h}}+\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M}{\sqrt{h}}\\ &=\mu(\tilde{S})^{2}+(1+\varepsilon)g_{\mathscr{E}}(h)\frac{M}{\sqrt{h}},\end{split}

where the term $O(\ln{M})$ vanishes after the second inequality as the factor $\sqrt{1+\varepsilon}$ from $B_{\varepsilon}$ is replaced by $(1+\varepsilon)$ . Analogously we get

\begin{split}e(S)&\geq t_{0}-{B}_{\varepsilon}\frac{M}{\sqrt{h}}+\mu(\tilde{S})^{2}-\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}-O(\ln{M})\\ &\geq\mu(\tilde{S})^{2}-(1+\varepsilon)g_{\mathscr{E}}(h)\frac{M}{\sqrt{h}}.\end{split}

This means that for sufficiently large $n$ the event $\mathscr{E}\cap\mathscr{V}$ implies the event $\mathscr{H}$ thus $\mathbb{P}[\mathscr{E}\cap\mathscr{V}]=1-o(1)$ implies $\mathbb{P}[\mathscr{H}]=1-o(1)$ and the proof is finished. ∎

Mimicking the reasoning from the proofs of Lemma 24, Lemma 25 and Theorem 9 one can prove Theorem 10. We leave it without the proof details as this would be very repetitive.

6. Modularity of $G_{n}^{h}$ vanishes with $h$

Recall that the main result of the paper (Theorem 5) states that the modularity of a preferential attachment graph $G_{n}^{h}$ is with high probability upper bounded by a function tending to $0$ with $h$ tending to infinity. The whole current section is devoted to its proof.

The first step in the proof of Theorem 5 follows from an interesting general result on modularity by Dinh and Thai.

Lemma 26 ([10], Lemma 1).

Let $G$ be a graph with at least one edge and let $k\in\mathbb{N}\setminus\{1\}$ . Then

\operatorname{mod}(G)\leq\frac{k}{k-1}\max_{\mathscr{A}:|\mathscr{A}|\leq k}\operatorname{mod}_{\mathscr{A}}(G).

In particular,

\operatorname{mod}(G)\leq 2\max_{\mathscr{A}:|\mathscr{A}|\leq 2}\operatorname{mod}_{\mathscr{A}}(G).

Corollary 27.

Let $G=(V,E)$ be a graph with at least one edge. Then

\operatorname{mod}(G)\leq\max\left\{0,\quad 4\cdot\max_{S\subseteq V}\left(\frac{e(S)}{e(G)}-\frac{\operatorname{vol}(S)^{2}}{\operatorname{vol}(G)^{2}}\right)\right\}.

Proof.

First, note that for $\mathscr{A}$ such that $|\mathscr{A}|=1$ we have $\operatorname{mod}_{\mathscr{A}}(G)=0$ . Next, consider $2$ -element partitions of $V$ . We have

\begin{split}\max_{\mathscr{A}:|\mathscr{A}|=2}\operatorname{mod}_{\mathscr{A}}(G)&=\max_{\mathscr{A}=\{S,V\setminus S\}}\left(\frac{e(S)}{e(G)}-\frac{\operatorname{vol}(S)^{2}}{\operatorname{vol}(G)^{2}}+\frac{e(V\setminus S)}{e(G)}-\frac{\operatorname{vol}(V\setminus S)^{2}}{\operatorname{vol}(G)^{2}}\right)\\ &\leq 2\cdot\max_{S\subseteq V}\left(\frac{e(S)}{e(G)}-\frac{\operatorname{vol}(S)^{2}}{\operatorname{vol}(G)^{2}}\right).\end{split}

The conclusion follows by Lemma 26. ∎

The above corollary frees us from considering all the partitions of $V$ when analyzing modularity of $G_{n}^{h}$ . We can simply concentrate on upper bounding the values of $\left(\frac{e(S)}{e(G_{n}^{h})}-\frac{\operatorname{vol}(S)^{2}}{\operatorname{vol}(G_{n}^{h})^{2}}\right)$ over all $S\subseteq V$ . To do so, we use the concentration results for $e(S)$ and $\operatorname{vol}(S)$ obtained in Section 5.

Proof of Theorem 5.

Fix $\varepsilon>0$ and let $g_{\mathscr{E}}(h)=\frac{g_{\mathscr{V}}(h)}{2}+\sqrt{2\ln{2}}$ . Let us define the events $\mathscr{H}$ , $\mathscr{E}$ and $\mathscr{V}$ as follows

\mathscr{H}=\left\{\operatorname{mod}(G_{n}^{h})\leq\frac{(1+\varepsilon)f(h)}{\sqrt{h}}\right\},

\mathscr{E}=\left\{\forall S\subseteq V\,e(S)\leq\mu(\tilde{S})^{2}+(1+\varepsilon)g_{\mathscr{E}}(h)\frac{M}{\sqrt{h}}\right\},

\mathscr{V}=\left\{\forall S\subseteq V\,\operatorname{vol}(S)\geq 2\sqrt{M}\mu(\tilde{S})-(1+\varepsilon)g_{\mathscr{V}}(h)\frac{M}{\sqrt{h}}\right\}.

By Theorem 8 and Theorem 9 we know that $\mathbb{P}[\mathscr{E}\cap\mathscr{V}]=1-o(1)$ . Thus it is enough to show that the event $\mathscr{E}\cap\mathscr{V}$ implies the event $\mathscr{H}$ .

Assume that $\mathscr{E}\cap\mathscr{V}$ holds. Recall that $e(G_{n}^{h})=M$ and $\operatorname{vol}(G_{n}^{h})=2M$ . For any $S\subseteq V$ we may write

\begin{split}\frac{e(S)}{e(G_{n}^{h})}&-\frac{\operatorname{vol}(S)^{2}}{\operatorname{vol}(G_{n}^{h})^{2}}=\frac{4Me(S)-\operatorname{vol}(S)^{2}}{4M^{2}}\\ &\leq\frac{1}{4M^{2}}\left(4M\mu(\tilde{S})^{2}+4(1+\varepsilon)g_{\mathscr{E}}(h)\frac{M^{2}}{\sqrt{h}}\right.\\ &\quad\quad\quad\quad\quad\quad\quad\quad\quad\left.-\left(2\sqrt{M}\mu(\tilde{S})-(1+\varepsilon)g_{\mathscr{V}}(h)\frac{M}{\sqrt{h}}\right)^{2}\right)\\ &=\frac{(1+\varepsilon)g_{\mathscr{E}}(h)}{\sqrt{h}}+\frac{\mu(\tilde{S})(1+\varepsilon)g_{\mathscr{V}}(h)}{\sqrt{Mh}}-\frac{(1+\varepsilon)^{2}g_{\mathscr{V}}(h)^{2}}{4h}\\ &\leq\frac{(1+\varepsilon)\left(g_{\mathscr{E}}(h)+g_{\mathscr{V}}(h)-g_{\mathscr{V}}(h)^{2}/(4\sqrt{h})\right)}{\sqrt{h}},\end{split}

where the last inequality follows from Lemma 15 and is valid for sufficiently large $n$ . By Corollary 27, for sufficiently large $n$ we obtain

\begin{split}\operatorname{mod}{(G_{n}^{h})}&\leq 4\cdot\max_{S\subseteq V}\left(\frac{e(S)}{e(G_{n}^{h})}-\frac{\operatorname{vol}(S)^{2}}{\operatorname{vol}(G_{n}^{h})^{2}}\right)\\ &\leq\frac{(1+\varepsilon)\left(4g_{\mathscr{E}}(h)+4g_{\mathscr{V}}(h)-g_{\mathscr{V}}(h)^{2}/\sqrt{h}\right)}{\sqrt{h}}=\frac{(1+\varepsilon)f(h)}{\sqrt{h}}.\end{split}

We got that for sufficiently large $n$ the event $\mathscr{E}\cap\mathscr{V}$ implies the event $\mathscr{H}$ thus $\mathbb{P}[\mathscr{E}\cap\mathscr{V}]=1-o(1)$ implies $\mathbb{P}[\mathscr{H}]=1-o(1)$ and the proof is finished. ∎

We finish by proving Corollary 6.

Proof of Corollary 6.

The corollary follows from Theorem 5 by considering $\tilde{f}(h)=6g_{\mathscr{V}}(h)+4\sqrt{2\ln{2}}$ instead of $f(h)$ in the upper bound (note that $3\sqrt{2\ln{2}}\leq 3.54$ , $(8/9)\ln{2}\leq 0.62$ and $4\ln{2}+12+4\sqrt{2\ln{2}}\leq 19.49$ ). ∎

7. Concluding remarks

We showed that modularity of a preferential attachment graph $G_{n}^{h}$ is with high probability upper bounded by a function of the order $\Theta(\sqrt{\ln{h}}/\sqrt{h})$ . This proves Conjecture 3 but means that Conjecture 4, saying that modularity of $G_{n}^{h}$ is with high probability of the order $\Theta(1/\sqrt{h})$ , still remains open. Note that the term $\Theta(1/\sqrt{h})$ can also be seen as $\Theta(1/\sqrt{\bar{d}_{h}})$ , where $\bar{d}_{h}$ states for the average vertex degree in $G_{n}^{h}$ . Such behavior of modularity has already been reported in other random graphs. For $G_{n,r}$ being a random $r$ -regular graph it is known that whp $\operatorname{mod}(G_{n,r})=\Theta(1/\sqrt{r})$ (see [25]). For binomial random graph $G(n,p)$ it is known that $\operatorname{mod}(G(n,p))=\Theta(1/\sqrt{np})$ when $np\geq 1$ and $p$ is bounded below $1$ (see [26]). These might be premises supporting Conjecture 4.

To the best of our knowledge, this paper gives the first concentration results for $\operatorname{vol}(S)$ , $e(S)$ and $e(S,V\setminus S)$ in $G_{n}^{h}$ , where $S$ can be an arbitrary subset of $V$ . The analogous results obtained so far, e.g. in [7], [12] or [29] and [30] always addressed “compact” subsets of vertices, i.e., sets of the form $\{i,i+1,i+2,\ldots,j\}$ . (In Lemma 4 of [12] the authors investigate the volume of any set $S\subseteq[t]$ of size $1\leq k\leq t$ at time $t$ but in fact in their proof the volume of $S$ is upper bounded by the volume of the $k$ “oldest” vertices in $[t]$ , i.e., the volume of the set $[k]$ .) In this paper more accurate analysis was possible thanks to introducing indicator functions $\delta_{i}^{\tilde{S}}$ in Definition 7. We believe that the proof methods leading to results obtained in Section 5 might serve in the future to get better than known today bounds for edge expansion or conductance of $G_{n}^{h}$ .

References

[1] K. Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, 19(3):357–367, 1967.
[2] J.P. Bagrow. Communities and bottlenecks: Trees and treelike networks have high modularity. Physical Review E, 85:066118, 2012.
[3] A.L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.
[4] P. Bennett and A. Dudek. A gentle introduction to the differential equation method and dynamic concentration. Discrete Mathematics, 345(12):113071, 2022.
[5] V.D. Blondel, J.L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008.
[6] B. Bollobás and O. Riordan. Handbook of Graphs and Networks: From the Genome to the Internet. Wiley-VCH, 2003. Pages 1–34.
[7] B. Bollobás and O. Riordan. The diameter of a scale-free random graph. Combinatorica, 24:5–34, 2004.
[8] B. Bollobás, O. Riordan, J. Spencer, and G. Tusnády. The degree sequence of a scale‐free random graph process. Random Structures & Algorithms, 18(3):279–290, April 2001.
[9] J. Chellig, N. Fountoulakis, and F. Skerman. The modularity of random graphs on the hyperbolic plane. Journal of Complex Networks, 10(1):cnab051, 2021.
[10] T.N. Dinh and M.T. Thai. Finding community structure with performance guarantees in scale-free networks. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, pages 888–891, 2011.
[11] D. Freedman. On tail probabilities for martingales. The Annals of Probability, 3(1):100–118, 1975.
[12] A. Frieze, X. Pérez-Giménez, P. Prałat, and B. Reiniger. Perfect matchings and hamiltonian cycles in the preferential attachment model. Random Structures & Algorithms, 54(2):258–288, 2019.
[13] G. Gilad and R. Sharan. From Leiden to Tel-Aviv University (TAU): exploring clustering solutions via a genetic algorithm. PNAS Nexus, 2(6):pgad180, 2023.
[14] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963.
[15] R. van der Hofstad. Random Graphs and Complex Networks. Vol.1. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2017.
[16] S. Janson, T. Łuczak, and A. Ruciński. Random Graphs. John Wiley & Sons, Inc., 2000.
[17] R. P. Boas Jr. and J. W. Wrench Jr. Partial sums of the harmonic series. The American Mathematical Monthly, 78(8):864–870, 1971.
[18] B. Kamiński, V. Poulin, P. Prałat, P. Szufel, and F. Théberge. Clustering via hypergraph modularity. Plos One, 14:e0224307, Feb 2019.
[19] B. Kamiński, P. Prałat, and F.Théberge. Mining Complex Networks. Chapman and Hall/CRC, 2021.
[20] M. Lasoń and M. Sulkowska. Modularity of minor-free graphs. Journal of Graph Theory, 102(4):728–736, 2023.
[21] L. Lichev and D. Mitsche. On the modularity of 3-regular random graphs and random graphs with given degree sequences. Random Structures & Algorithms, 61(4):754–802, 2022.
[22] Hosam M. Mahmoud, R. T. Smythe, and J. Szymański. On the structure of random plane-oriented recursive trees and their branches. Random Structures & Algorithms, 4(2):151–176, 1993.
[23] C. McDiarmid, K. Rybarczyk, F. Skerman, and M. Sulkowska. Expansion and modularity in preferential attachment graph, 2025. Preprint.
[24] C. McDiarmid and F. Skerman. Modularity in random regular graphs and lattices. Electronic Notes in Discrete Mathematics, 43:431–437, 2013.
[25] C. McDiarmid and F. Skerman. Modularity of regular and treelike graphs. Journal of Complex Networks, 6(4):596–619, 2018.
[26] C. McDiarmid and F. Skerman. Modularity of Erdős-Rényi random graphs. Random Structures & Algorithms, 57(1):211–243, 2020.
[27] M.E.J. Newman. Networks: An introduction. Oxford University Press, 2010.
[28] M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69(2):026113, 2004.
[29] L. Prokhorenkova, P. Prałat, and A.M. Raigorodskii. Modularity of complex networks models. In A. Bonato, F. Chung Graham, and P. Prałat, editors, Algorithms and Models for the Web Graph - 13th International Workshop, WAW 2016, Montreal, QC, Canada, December 14-15, 2016, Proceedings, volume 10088 of Lecture Notes in Computer Science, pages 115–126, 2016.
[30] L. Prokhorenkova, P. Prałat, and A.M. Raigorodskii. Modularity in several random graph models. Electronic Notes in Discrete Mathematics, 61:947–953, 2017.
[31] H. Robbins. A remark on Stirling’s formula. The American Mathematical Monthly, 62(1):26–29, 1955.
[32] J. Szymański. On a nonuniform random recursive tree. In A. Barlotti, M. Biliotti, A. Cossu, G. Korchmaros, and G. Tallini, editors, Annals of Discrete Mathematics (33), volume 144 of North-Holland Mathematics Studies, pages 297–306. North-Holland, 1987.
[33] V.A. Traag, L. Waltman, and N.J. van Eck. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports, 9(5233), 2019.
[34] L. Warnke. On the method of typical bounded differences. Combinatorics, Probability & Computing, 25:269–299, 2022.

Modularity of preferential attachment graphs

Abstract.

Key words and phrases:

1. Introduction

2. Model and main result

Definition 1 (Modularity, [28]).

Theorem 2 ([30], Theorem 4.2, Section 4.2).

Conjecture 3 ([29]).

Conjecture 4 ([29]).

Theorem 5.

Remark.

Corollary 6.

Remark.

Remark.

3. Volume and edge density

Definition 7 (Measure μ\mu).

Remark.

Theorem 8.

Theorem 9.

Theorem 10.

4. Auxiliary lemmas

Lemma 11 (Azuma-Hoeffding inequality, [1, 14]).

Lemma 12 (Freedman’s inequality, [11]).

Lemma 13 (Stirling’s approximation, [31]).

Lemma 14.

Proof.

Lemma 15.

Proof.

Lemma 16 (See [17]).

Lemma 17.

Proof.

Lemma 18.

Proof.

Lemma 19.

Proof.

5. Edge density and volume results for 𝑮𝒏𝒉G_{n}^{h}

Lemma 20.

Remark.

Proof.

Theorem 21.

Proof.

Corollary 22.

Proof.

Lemma 23.

Proof.

Lemma 24.

Proof.

Lemma 25.

Proof.

Proof of Theorem 9.

6. Modularity of 𝑮𝒏𝒉G_{n}^{h} vanishes with 𝒉h

Lemma 26 ([10], Lemma 1).

Corollary 27.

Proof.

Proof of Theorem 5.

Proof of Corollary 6.

7. Concluding remarks

References

Definition 7 (Measure $\mu$ ).

5. Edge density and volume results for $G_{n}^{h}$

6. Modularity of $G_{n}^{h}$ vanishes with $h$