This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Modularity of preferential attachment graphs

Katarzyna Rybarczyk*{}^{\textrm{*}}111*{}^{\textrm{*}}[email protected], Adam Mickiewicz University, Poznań, Poland  and  Małgorzata Sulkowska{}^{\textrm{$\star$}}222{}^{\textrm{$\star$}}[email protected], Wrocław University of Science and Technology, Department of Fundamentals of Computer Science, Poland
Abstract.

Modularity is a graph parameter measuring how clearly the set of graph vertices may be partitioned into subsets of high edge density. It indicates the presence of community structure in the graph. We study its value for a random preferential attachment model GnhG_{n}^{h} introduced by Barabási and Albert in 1999. A graph GnhG_{n}^{h} is created from some finite starting graph by adding new vertices one by one. A new vertex always connects to h1h\geq 1 already existing vertices and those are chosen with probability proportional to their current degrees. We prove that modularity of GnhG_{n}^{h} is with high probability upper bounded by a function tending to 0 with hh tending to infinity. This resolves the conjecture of Prokhorenkova, Prałat and Raigorodskii from 2016.

As a byproduct we obtain novel concentration results for the volume and the edge density parameters of subsets of GnhG_{n}^{h}.

Key words and phrases:
Modularity, preferential attachment model, expansion
This research was funded in whole or in part by National Science Centre, Poland, grant OPUS-25 no 2023/49/B/ST6/02517.

1. Introduction

Identifying communities in a complex network is a task of a great importance. It has a number of practical applications, varying from identifying the groups of common interests in social networks, through classifying fake news or spam, searching articles on related topic, identifying proteins with the same biological functions, optimizing large technological infrastructures, up to networks visualization [19, 27].

The notion of a graph featuring a community structure may be understood in a variety of ways. Therefore, some measures of the presence of community structure in the network had to be introduced. One of them is modularity proposed by Newman and Girvan in 2004 [27, 28]. The vertices of a graph with high modularity may be partitioned into subsets in which there are much more internal edges than we would expect by chance (see Definition 1). Nowadays, modularity is widely used not only as a quality function judging the performance of community detection algorithms [19], but also as a central ingredient of such algorithms, like in Louvain algorithm [5], Leiden algorithm [33] or Tel-Aviv algorithm [13].

Early theoretical results on modularity were given for trees [2] and regular graphs [24]. For a summary of results for various families of graphs check the appendix of [26] by McDiarmid and Skerman from 20202020. More recent discoveries include [9] by Chellig, Fountoulakis and Skerman (for random graphs on the hyperbolic plane), [20] by Lasoń and Sulkowska (for minor-free graphs) or [21] by Lichev and Mitsche (for 33-regular graphs and graphs with given degree sequence).

Despite being so widely used in practice, modularity still suffers from a narrow theoretical study in the families of random graphs devoted to modeling real-life networks. The first results for the basic and most studied random graph, binomial G(n,p)G(n,p), were given by McDiarmid and Skerman just in 2020 [26]. It is commonly known that G(n,p)G(n,p) does not reflect well most of real-world networks [19]. So-called preferential attachment models usually perform here much better. The first in this family were random recursive trees, see for example [22, 32]. However, the in-depth study of preferential attachment models was initiated in 1999 by the work of Barabási and Albert [3], who indicated the applications of such graphs in network modeling. The preferential attachment model was subsequently formally defined and analyzed by Bollobás, Riordan, Spencer, and Tusnády [8] and Bolloás and Riordan in [6, 7]. It relies on two mechanisms: growth (the graph is growing over time, gaining a new vertex and a bunch of h1h\geq 1 edges at each time step) and preferential attachment (arriving vertex is more likely to attach to other vertices with high degree rather than with low degree), for a precise definition check Section 2. Its degree distribution as well as diameter often fit in with the ones spotted in reality [15, 27]. Nevertheless, experimental study shows that its modularity, unlike in real networks, is low. Prokhorenkova, Prałat and Raigorodskii opened the preliminary study on modularity of a basic preferential attachment graph in [29, 30]. They obtained non-trivial upper and lower bounds, however, the gap to close remained big. They conjectured that modularity of such graph with high probability tends to 0 with hh (the number of edges added per step) tending to infinity (see Conjecture 3). In this paper we prove their conjecture, confirming the supposition that a basic preferential attachment model might have too small modularity to mirror well the behavior of real networks. As a byproduct we obtain new interesting concentration results for the volume and the edge density parameters of a given subset of vertices in such graph. These results are interesting in their own right and may have future applications to other problems involving the preferential attachment model.

In the following section we give the formal definition of the preferential attachment model and state the main result. Section 3 is devoted to presenting the results regarding the volume and the edge density parameters of subsets of vertices in GnhG_{n}^{h}. Section 4 is technical, it contains several facts and auxiliary lemmas used in the latter parts of the paper. In Section 5 we derive concentration results stated in Section 3. These results are used in Section 6 to prove the main theorem about vanishing modularity in a basic preferential attachment graph. Section 7 contains concluding remarks.

2. Model and main result

Let \mathbb{N} denote the set of natural numbers, ={1,2,3,}\mathbb{N}=\{1,2,3,\ldots\}. For nn\in\mathbb{N} let [n]={1,2,,n}[n]=\{1,2,\ldots,n\}. For the functions f(n)f(n) and g(n)g(n) we write f(n)g(n)f(n)\sim g(n) if limnf(n)/g(n)=1\lim_{n\rightarrow\infty}f(n)/g(n)=1. We say that an event \mathscr{E} occurs with high probability (whp) if the probability []\mathbb{P}[\mathscr{E}] depends on a certain number nn and tends to 11 as nn tends to infinity.

All the graphs considered in this paper are finite, undirected and loops and multiple edges are allowed. Thus a graph is a pair G=(V,E)G=(V,E), where VV is a finite set of vertices and EE is a finite multiset of elements from V(1)V(2)V^{(1)}\cup V^{(2)} with V(k)V^{(k)} being a set of all kk-element subsets of VV. Let e(G)=|E|e(G)=|E| and for S,UVS,U\subseteq V set eG(S)=|{eE(S(1)S(2))|e_{G}(S)=|\{e\in E\cap(S^{(1)}\cup S^{(2)})| and eG(S,U)=|{eE:eSeU}|e_{G}(S,U)=|\{e\in E:e\cap S\neq\emptyset\wedge e\cap U\neq\emptyset\}|. The degree of a vertex vVv\in V in GG, denoted by degG(v)\deg_{G}(v), is the number of edges to which vv belongs but loops are counted twice, i.e., degG(v)=2|{eE:veeV(1)}|+|{eE:veeV(2)}|\deg_{G}(v)=2|\{e\in E:v\in e\wedge e\in V^{(1)}\}|+|\{e\in E:v\in e\wedge e\in V^{(2)}\}|. We define the volume of SVS\subseteq V in GG by volG(S)=vSdegG(v)\operatorname{vol}_{G}(S)=\sum_{v\in S}\deg_{G}(v). By the volume of the graph, vol(G)\operatorname{vol}(G), we understand volG(V)\operatorname{vol}_{G}(V). Whenever the context is clear we write e(S)e(S) instead of eG(S)e_{G}(S), e(S,U)e(S,U) instead of eG(S,U)e_{G}(S,U), deg(v)\deg(v) instead of degG(v)\deg_{G}(v) and vol(S)\operatorname{vol}(S) instead of volG(S)\operatorname{vol}_{G}(S).

We focus on a particular random graph model, called here simply a preferential attachment graph (consult [3, 6, 15]). Given h,nh,n\in\mathbb{N} we construct a preferential attachment graph GnhG_{n}^{h} in two phases. In the first phase we sample a particular random tree ThnT_{hn}, its vertices are called mini-vertices. (We call ThnT_{hn} a tree, however loops, i.e. single-vertex edges, are allowed in ThnT_{hn}.) Next, the appropriate mini-vertices of ThnT_{hn} are grouped to form vertices of GnhG_{n}^{h}. Let us describe this procedure in details.
Phase 1. We start the whole process with T1T_{1} which is a graph consisting of a single mini-vertex 11 with a single loop (thus degree of vertex 11 is 22). For t1t\geq 1 graph Tt+1T_{t+1} is built upon TtT_{t} by adding a mini-vertex (t+1)(t+1) and joining it by an edge with a mini-vertex ii according to the following probability distribution:

(i=s)={degTt(s)2t+1for1st12t+1fors=t+1.\mathbb{P}(i=s)=\begin{cases}\frac{\deg_{T_{t}}(s)}{2t+1}&\textnormal{for}\quad 1\leq s\leq t\\ \frac{1}{2t+1}&\textnormal{for}\quad s=t+1.\end{cases}

Note that we allow a newly arrived vertex to connect to itself. We continue the process until we get the random tree ThnT_{hn}.
Phase 2. A random multigraph GnhG_{n}^{h} is obtained from ThnT_{hn} by merging each set of mini-vertices {h(i1)+1,h(i1)+2,,h(i1)+h}\{h(i-1)+1,h(i-1)+2,\ldots,h(i-1)+h\} to a single vertex ii for i{1,2,,n}i\in\{1,2,\ldots,n\}, keeping loops and multiple edges.

Note that if Gnh=(V,E)G_{n}^{h}=(V,E) then V=[n]V=[n], |V|=n|V|=n and |E|=hn|E|=hn. Since we will refer very often to the number of edges of GnhG_{n}^{h}, it will be also denoted by MM, i.e., M:=hnM:=hn. Given GnhG_{n}^{h}, by GthG_{t}^{h}, where t[n]t\in[n], we understand a subgraph of GnhG_{n}^{h} induced by the set of vertices [t][t].

Our main goal is to upper bound the graph parameter called modularity for GnhG_{n}^{h}. Its formal definition is given just below.

Definition 1 (Modularity, [28]).

Let GG be a graph with at least one edge. For a partition 𝒜\mathscr{A} of VV define a modularity score of GG as

mod𝒜(G)=S𝒜(e(S)e(G)(vol(S)vol(G))2).\operatorname{mod}_{\mathscr{A}}(G)=\sum_{S\in\mathscr{A}}\left(\frac{e(S)}{e(G)}-\left(\frac{\operatorname{vol}(S)}{\operatorname{vol}(G)}\right)^{2}\right).

Modularity of GG is given by

mod(G)=max𝒜mod𝒜(G),\operatorname{mod}(G)=\max_{\mathscr{A}}\operatorname{mod}_{\mathscr{A}}(G),

where maximum runs over all the partitions of the set VV.

Conventionally, a graph with no edges has modularity equal to 0. A single summand of the modularity score is the difference between the fraction of edges within SS and the expected fraction of edges within SS if we considered a certain random multigraph on VV with the expected degree sequence given by GG (see, e.g., [18]). It is easy to check that mod(G)[0,1)\operatorname{mod}(G)\in[0,1).

Non-trivial lower and upper bounds for modularity of GnhG_{n}^{h} obtained by Prokhorenkova, Prałat and Raigorodskii in [29, 30] are the following.

Theorem 2 ([30], Theorem 4.2, Section 4.2).

Let Gnh=(V,E)G_{n}^{h}=(V,E) be a preferential attachment graph. Then whp

mod(Gnh)=Ω(1/h)\operatorname{mod}(G_{n}^{h})=\Omega(1/\sqrt{h})

and whp

mod(Gnh)1min{δ(Gnh)/(2h),1/16},\operatorname{mod}(G_{n}^{h})\leq 1-\min\{\delta(G_{n}^{h})/(2h),1/16\},

where δ(Gnh)=minSV,1|S||V|/2e(S,VS)|S|\delta(G_{n}^{h})=\min\limits_{\begin{subarray}{c}S\subseteq V,1\leq|S|\leq|V|/2\end{subarray}}\frac{e(S,V\setminus S)}{|S|} is the edge expansion of GnhG_{n}^{h}.

Applying the results for edge expansion of GnhG_{n}^{h} by Mihail, Papadimitriou and Saberi from [MiPaSa06_journalv] to the upper bound one obtains that whp mod(Gnh)1O(1/h)\operatorname{mod}(G_{n}^{h})\leq 1-O(1/h). Indeed, the gap between the upper and the lower bound remained big. The authors stated two following conjectures suggesting that the upper bound could be improved.

Conjecture 3 ([29]).

Let GnhG_{n}^{h} be a preferential attachment graph. Then whp mod(Gnh)h0\operatorname{mod}(G_{n}^{h})\xrightarrow{h\rightarrow\infty}0.

Conjecture 4 ([29]).

Let GnhG_{n}^{h} be a preferential attachment graph. Then whp mod(Gnh)=Θ(1/h)\operatorname{mod}(G_{n}^{h})=\Theta(1/\sqrt{h}).

In this paper we present much better upper bound for modularity of GnhG_{n}^{h} than the one from Theorem 2 when hh is large, resolving, in the positive, Conjecture 3. Conjecture 4 still remains open. The main result of the paper may be presented as follows.

Theorem 5.

Let GnhG_{n}^{h} be a preferential attachment graph. Then for every ε>0\varepsilon>0 whp

mod(Gnh)(1+ε)f(h)h,\operatorname{mod}(G_{n}^{h})\leq\frac{(1+\varepsilon)f(h)}{\sqrt{h}},

where

f(h)=6g𝒱(h)+42ln2g𝒱(h)2/hf(h)=6g_{\mathscr{V}}(h)+4\sqrt{2\ln{2}}-g_{\mathscr{V}}(h)^{2}/\sqrt{h}

with

g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2.g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2.
Remark.

Note that f(h)32ln2lnhf(h)\sim 3\sqrt{2\ln{2}}\sqrt{\ln{h}} as hh\rightarrow\infty thus f(h)/h0f(h)/\sqrt{h}\rightarrow 0 as hh\rightarrow\infty. The value of f(h)/hf(h)/\sqrt{h} drops below 11 for h810h\geq 810.

Corollary 6.

Let GnhG_{n}^{h} be a preferential attachment graph. Then whp

mod(Gnh)3.54lnh+0.62+19.49h.\operatorname{mod}(G_{n}^{h})\leq\frac{3.54\sqrt{\ln{h}+0.62}+19.49}{\sqrt{h}}.
Remark.

The value 3.54lnh+0.62+19.49h\frac{3.54\sqrt{\ln{h}+0.62}+19.49}{\sqrt{h}} drops below 11 for h847h\geq 847.

Remark.

Some new results on the fact that mod(Gnh)\operatorname{mod}(G_{n}^{h}) is whp bounded below 11 also for small values of hh can be found in [23].

3. Volume and edge density

When talking about Gnh=(V,E)G_{n}^{h}=(V,E) we will very often refer to its corresponding random tree Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}). Recall that V=[n]V=[n], V~=[hn]\tilde{V}=[hn] and |E~|=|E|=hn=:M|\tilde{E}|=|E|=hn=:M. For SVS\subseteq V the corresponding set of its mini-vertices in V~\tilde{V} will be denoted by S~\tilde{S}, thus |S~|=h|S||\tilde{S}|=h|S|. For i[n]i\in[n] and SVS\subseteq V let Si=S[i]S_{i}=S\cap[i], in particular Sn=SS_{n}=S. Analogously, for i[M]i\in[M] and S~V~\tilde{S}\subseteq\tilde{V} set S~i=S~[i]\tilde{S}_{i}=\tilde{S}\cap[i], in particular S~M=S~\tilde{S}_{M}=\tilde{S}. Note that for SVS\subseteq V we have volGnh(S)=volThn(S~)\operatorname{vol}_{G_{n}^{h}}(S)=\operatorname{vol}_{T_{hn}}(\tilde{S}) and eGnh(S)=eThn(S~)e_{G_{n}^{h}}(S)=e_{T_{hn}}(\tilde{S}).

When working with modularity we need to have a control over eGnh(S)e_{G_{n}^{h}}(S) and volGnh(S)\operatorname{vol}_{G_{n}^{h}}(S), where SVS\subseteq V. Those values depend a lot on the arrival times of vertices from SS. To capture this phenomenon we define a special measure μ:2V~[0,)\mu:2^{\tilde{V}}\rightarrow[0,\infty), where 2V~2^{\tilde{V}} stands for the set of all subsets of V~\tilde{V}.

Definition 7 (Measure μ\mu).

Let Gnh=(V,E)G_{n}^{h}=(V,E) be a preferential attachment graph and Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}) its corresponding random tree. Let SVS\subseteq V thus S~V~\tilde{S}\subseteq\tilde{V} is the set of its corresponding mini-vertices. Associate S~\tilde{S} with the set of indicator functions

δiS~={1ifiS~0ifiS~,{\delta}_{i}^{\tilde{S}}=\begin{cases}1&\quad\textnormal{if}\quad i\in\tilde{S}\\ 0&\quad\textnormal{if}\quad i\notin\tilde{S},\end{cases}

where i[M]i\in[M] (whenever the context is clear we write δi{\delta}_{i} instead of δiS~{\delta}_{i}^{\tilde{S}}). Define a function μ:2V~[0,)\mu:2^{\tilde{V}}\rightarrow[0,\infty) as follows:

μ(S~)=π2j=1MδjS~cj1\mu(\tilde{S})=\frac{\sqrt{\pi}}{2}\cdot\sum_{j=1}^{M}{\delta}_{j}^{\tilde{S}}c_{j-1}

with cj=i=1j2i12ic_{j}=\prod_{i=1}^{j}\frac{2i-1}{2i} for j1j\geq 1 and c0=1c_{0}=1.

Remark.

Let Gnh=(V,E)G_{n}^{h}=(V,E) be a preferential attachment graph, SVS\subseteq V and t[M]t\in[M]. Note that

μ(S~t)=π2j=1tδjS~cj1.\mu(\tilde{S}_{t})=\frac{\sqrt{\pi}}{2}\cdot\sum_{j=1}^{t}{\delta}_{j}^{\tilde{S}}c_{j-1}.

We use the measure μ\mu to express the following novel concentration results for volGnh(S)\operatorname{vol}_{G_{n}^{h}}(S), eGnh(S)e_{G_{n}^{h}}(S) and eGnh(S,VS)e_{G_{n}^{h}}(S,V\setminus S), where SS is an arbitrary subset of VV.

Theorem 8.

Let Gnh=(V,E)G_{n}^{h}=(V,E) be a preferential attachment graph and Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}) its corresponding random tree. Then for every ε>0\varepsilon>0 whp

SV|vol(S)2Mμ(S~)|(1+ε)g𝒱(h)Mh,\forall S\subseteq V\,\,\left|\operatorname{vol}(S)-2\sqrt{M}\,\mu(\tilde{S})\right|\leq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{M}{\sqrt{h}},

where g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2.

Theorem 9.

Let GnhG_{n}^{h} be a preferential attachment graph and Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}) its corresponding random tree. Then for every ε>0\varepsilon>0 whp

SV|e(S)μ(S~)2|(1+ε)g(h)Mh,\forall S\subseteq V\,\,\left|e(S)-\mu(\tilde{S})^{2}\right|\leq(1+\varepsilon)g_{\mathscr{E}}(h)\frac{M}{\sqrt{h}},

where

g(h)=g𝒱(h)2+2ln2g_{\mathscr{E}}(h)=\frac{g_{\mathscr{V}}(h)}{2}+\sqrt{2\ln{2}}

with

g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2.g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2.
Theorem 10.

Let GnhG_{n}^{h} be a preferential attachment graph and Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}) its corresponding random tree. Then for every ε>0\varepsilon>0 whp

SV|e(S,VS)2μ(S~)(Mμ(S~))|(1+ε)(32g𝒱(h)+2ln2)Mh,\forall S\subseteq V\,\left|e(S,V\setminus S)-2\mu(\tilde{S})(\sqrt{M}-\mu(\tilde{S}))\right|\leq(1+\varepsilon)\left(\frac{3}{2}g_{\mathscr{V}}(h)+\sqrt{2\ln{2}}\right)\frac{M}{\sqrt{h}},

where

g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2.g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2.

To grasp the intuition hidden behind the above concentration results it is helpful to know that there is a relation between the structure of the graph ThnT_{hn} and the structure of a random graph G^\hat{G} on the vertex set [M][M] in which every edge {i,j}\{i,j\} (for i,j[M]i,j\in[M]) is present with probability 1/(2ij)1/(2\sqrt{ij}), independently of the other possible edges (consult Section 4 in [7] by Bollobás and Riordan).

Let S~[M]\tilde{S}\subseteq[M]. The number of inner edges of S~\tilde{S} in G^\hat{G}, eG^(S~)e_{\hat{G}}(\tilde{S}), satisfies

𝔼[eG^(S~)]=(iS~12i)2+O(lnM),\mathbb{E}[e_{\hat{G}}(\tilde{S})]=\left(\sum_{i\in\tilde{S}}\frac{1}{2\sqrt{i}}\right)^{2}+O(\ln{M}),

while the number of edges between S~\tilde{S} and [M]S~[M]\setminus\tilde{S}, eG^(S~,[M]S~)e_{\hat{G}}(\tilde{S},[M]\setminus\tilde{S}), fulfills

𝔼[eG^(S~,[M]S~)]=2(iS~12i)(i[M]S~12i).\mathbb{E}[e_{\hat{G}}(\tilde{S},[M]\setminus\tilde{S})]=2\left(\sum_{i\in\tilde{S}}\frac{1}{2\sqrt{i}}\right)\left(\sum_{i\in[M]\setminus\tilde{S}}\frac{1}{2\sqrt{i}}\right).

Since volG^(S~)=2eG^(S~)+eG^(S~,V~S~)\operatorname{vol}_{\hat{G}}(\tilde{S})=2e_{\hat{G}}(\tilde{S})+e_{\hat{G}}(\tilde{S},\tilde{V}\setminus{\tilde{S}}) one also gets

𝔼[volG^(S~)]=2(i[M]12i)(iS~12i)+O(lnM).\mathbb{E}[\operatorname{vol}_{\hat{G}}(\tilde{S})]=2\left(\sum_{i\in[M]}\frac{1}{2\sqrt{i}}\right)\left(\sum_{i\in\tilde{S}}\frac{1}{2\sqrt{i}}\right)+O(\ln{M}).

Therefore the measure μ\mu is constructed in such a way that μ(S~)\mu(\tilde{S}) mimics the behavior of iS~12i\sum_{i\in\tilde{S}}\frac{1}{2\sqrt{i}} in G^\hat{G}. We will see later (consult Lemma 14) that the value of πcj\sqrt{\pi}c_{j} is asymptotically close to 1/j1/\sqrt{j}. Thus in particular μ({i})12i\mu(\{i\})\sim\frac{1}{2\sqrt{i}} for ini\xrightarrow{n\rightarrow\infty}\infty and μ([M])M\mu([M])\sim\sqrt{M}. Therefore we may expect that in ThnT_{hn} we will get e(S~)μ(S~)2e(\tilde{S})\approx\mu(\tilde{S})^{2}, e(S~,VS~)2μ(S~)(Mμ(S~))e(\tilde{S},V\setminus\tilde{S})\approx 2\mu(\tilde{S})(\sqrt{M}-\mu(\tilde{S})) and vol(S~)2Mμ(S~)\operatorname{vol}(\tilde{S})\approx 2\sqrt{M}\mu(\tilde{S}).

4. Auxiliary lemmas

The current section gathers all technical lemmas needed in the latter parts of the paper.

The concentration results presented in Section 3 and proved in Section 5 are based on two variants of Azuma-Hoeffding martingale inequality. The first one is basic. We state it as it appears in [16] by Janson, Łuczak and Ruciński.

Lemma 11 (Azuma-Hoeffding inequality, [1, 14]).

If X0,X1,,XnX_{0},X_{1},\ldots,X_{n} is a martingale and there exist b1,,bnb_{1},\ldots,b_{n} such that |XjXj1|bj|X_{j}-X_{j-1}|\leq b_{j} for each j[n]j\in[n], then, for every x>0x>0,

[Xn𝔼[Xn]+x]exp{x22j=1nbj2}.\mathbb{P}[X_{n}\geq\mathbb{E}[X_{n}]+x]\leq\exp\left\{-\frac{x^{2}}{2\sum_{j=1}^{n}b_{j}^{2}}\right\}.

The second one is Freedman’s inequality. We state it below in the form very similar to the one presented in Lemma 22 of [34] by Warnke (one may consult also [4] by Bennett and Dudek).

Lemma 12 (Freedman’s inequality, [11]).

Let X0,X1,,XnX_{0},X_{1},\ldots,X_{n} be a martingale with respect to a filtration 01n\mathscr{F}_{0}\subseteq\mathscr{F}_{1}\subseteq\ldots\subseteq\mathscr{F}_{n}. Set Ak=maxi[k](XiXi1)A_{k}=\max_{i\in[k]}(X_{i}-X_{i-1}) and Wk=i=1kVar[XiXi1|i1]W_{k}=\sum_{i=1}^{k}\mathrm{Var}[X_{i}-X_{i-1}|\mathscr{F}_{i-1}]. Then for every λ>0\lambda>0 and W,A>0W,A>0 we have

[k[n]XkX0+λ,WkW,AkA]exp{λ22W+2Aλ/3}.\mathbb{P}[\exists k\in[n]\quad X_{k}\geq X_{0}+\lambda,W_{k}\leq W,A_{k}\leq A]\leq\exp\left\{-\frac{\lambda^{2}}{2W+2A\lambda/3}\right\}.

Next, we present bounds for the values of cjc_{j} and an upper bound for the function μ(S~)\mu(\tilde{S}), both introduced in Definition 7. These results will be referred to very often later on. They are derived with the use of Stirling’s approximation.

Lemma 13 (Stirling’s approximation, [31]).

Let nn\in\mathbb{N}. Then

2πn(ne)ne112n+1<n!<2πn(ne)ne112n.\sqrt{2\pi n}\left(\frac{n}{e}\right)^{n}e^{\frac{1}{12n+1}}<n!<\sqrt{2\pi n}\left(\frac{n}{e}\right)^{n}e^{\frac{1}{12n}}.
Lemma 14.

For j1j\geq 1 let cj=i=1j2i12ic_{j}=\prod_{i=1}^{j}\frac{2i-1}{2i}. Then

e18j14144j21πjcj1πj.e^{-\frac{1}{8j}-\frac{1}{4\cdot 144j^{2}}}\cdot\frac{1}{\sqrt{\pi j}}\leq c_{j}\leq\frac{1}{\sqrt{\pi j}.}
Proof.

By Stirling’s approximation (Lemma 13) we get

cj=i=1j2i12i=(2j)!22j(j!)21πje124j212j+11πj.c_{j}=\prod_{i=1}^{j}\frac{2i-1}{2i}=\frac{(2j)!}{2^{2j}(j!)^{2}}\leq\frac{1}{\sqrt{\pi j}}e^{\frac{1}{24j}-\frac{2}{12j+1}}\leq\frac{1}{\sqrt{\pi j}}.

Analogously, since 1122j+11122j1144(2j)2\frac{1}{12\cdot 2j+1}\geq\frac{1}{12\cdot 2j}-\frac{1}{144(2j)^{2}},

cj1πje124j14144j2212j=1πje18j14144j2.c_{j}\geq\frac{1}{\sqrt{\pi j}}e^{\frac{1}{24j}-\frac{1}{4\cdot 144j^{2}}-\frac{2}{12j}}=\frac{1}{\sqrt{\pi j}}e^{-\frac{1}{8j}-\frac{1}{4\cdot 144j^{2}}}.

Lemma 15.

Let GnhG_{n}^{h} be a preferential attachment graph and Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}) its corresponding random tree. Let also t[M]t\in[M] and S~V~\tilde{S}\subseteq\tilde{V}. Then

μ(S~t)t+12.\mu(\tilde{S}_{t})\leq\sqrt{t}+\frac{1}{2}.
Proof.

By Lemma 14 we get

μ(S~t)π2j=1tcj1π2+12j=2t1j1π2+12+1t12j𝑑jt+12.\begin{split}\mu(\tilde{S}_{t})&\leq\frac{\sqrt{\pi}}{2}\sum_{j=1}^{t}c_{j-1}\leq\frac{\sqrt{\pi}}{2}+\frac{1}{2}\sum_{j=2}^{t}\frac{1}{\sqrt{j-1}}\\ &\leq\frac{\sqrt{\pi}}{2}+\frac{1}{2}+\int_{1}^{t}\frac{1}{2\sqrt{j}}\,dj\leq\sqrt{t}+\frac{1}{2}.\end{split}

Lemmas 17, 18 and 19 are auxiliary calculations for expressions involving the volumes of the subsets of V~\tilde{V} (however the reader might not notice the connection with volumes at this point). They will be directly used in Section 5 in the proof of Theorem 9 stating the result on the concentration of eGnh(S)e_{G_{n}^{h}}(S) for SVS\subseteq V. Their proofs utilize the following well known approximation.

Lemma 16 (See [17]).

Let nn\in\mathbb{N} and Hn=k=1n1kH_{n}=\sum_{k=1}^{n}\frac{1}{k}. Then

Hn=lnn+γ+12nαn,H_{n}=\ln{n}+\gamma+\frac{1}{2n}-\alpha_{n},

where γ0.5772\gamma\approx 0.5772 is known as Euler-Mascheroni constant and 0αn1/(8n2)0\leq\alpha_{n}\leq 1/(8n^{2}).

Lemma 17.

Let GnhG_{n}^{h} be a preferential attachment graph and Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}) its corresponding random tree. Fix S~V~\tilde{S}\subseteq\tilde{V} and let t0[M]t_{0}\in[M] be such that t0=t0(n)nt_{0}=t_{0}(n)\xrightarrow{n\rightarrow\infty}\infty. Then

i=t0+1Mδi2i1=π2i=1M(δici1)2+O(lnM+lnt0).\sum_{i=t_{0}+1}^{M}\frac{{\delta}_{i}}{2i-1}=\frac{\pi}{2}\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}+O(\ln{M}+\ln{t_{0}}).
Proof.

First note that by Lemma 16

i=1t0δi2i1i=1t012i1=O(lnt0).\sum_{i=1}^{t_{0}}\frac{{\delta}_{i}}{2i-1}\leq\sum_{i=1}^{t_{0}}\frac{1}{2i-1}=O(\ln t_{0}).

Therefore

(1) i=t0+1Mδi2i1=i=1Mδi2i1+O(lnt0).\sum_{i=t_{0}+1}^{M}\frac{{\delta}_{i}}{2i-1}=\sum_{i=1}^{M}\frac{{\delta}_{i}}{2i-1}+O(\ln t_{0}).

For i{2,,M}i\in\{2,\ldots,M\}, by Lemma 14 we have

12i112(i1)π2ci12(1+O(1/i))\frac{1}{2i-1}\leq\frac{1}{2(i-1)}\leq\frac{\pi}{2}c_{i-1}^{2}(1+O(1/i))

and note that the constant hidden in O(1/i)O(1/i) is the same for all ii’s. Therefore, by the fact that δi{0,1}{\delta}_{i}\in\{0,1\} and ci1c_{i}\leq 1,

i=1Mδi2i1δ1+i=2Mπ2δici12+π2i=2MO(1/i)π2i=1M(δici1)2+O(lnM),\begin{split}\sum_{i=1}^{M}\frac{{\delta}_{i}}{2i-1}&\leq\delta_{1}+\sum_{i=2}^{M}\frac{\pi}{2}{\delta}_{i}c_{i-1}^{2}+\frac{\pi}{2}\sum_{i=2}^{M}O(1/i)\\ &\leq\frac{\pi}{2}\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}+O(\ln{M}),\end{split}

which, together with (1), yields the desired upper bound.

On the other hand, for i{2,,M}i\in\{2,\ldots,M\}, by Lemma 14 we have

12(i1)π2ci12\frac{1}{2(i-1)}\geq\frac{\pi}{2}c_{i-1}^{2}

thus

i=1Mδi2i1i=2M2i22i1δi2i2=i=2Mδi2(i1)i=2Mδi(2i1)(2i2)i=1Mπ2δici12O(lnM)=π2i=1M(δici1)2O(lnM),\begin{split}\sum_{i=1}^{M}\frac{{\delta}_{i}}{2i-1}&\geq\sum_{i=2}^{M}\frac{2i-2}{2i-1}\frac{{\delta}_{i}}{2i-2}=\sum_{i=2}^{M}\frac{{\delta}_{i}}{2(i-1)}-\sum_{i=2}^{M}\frac{{\delta}_{i}}{(2i-1)(2i-2)}\\ &\geq\sum_{i=1}^{M}\frac{\pi}{2}{\delta}_{i}c_{i-1}^{2}-O(\ln{M})=\frac{\pi}{2}\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}-O(\ln{M}),\end{split}

which, together with (1), yields the desired lower bound. ∎

Lemma 18.

Let GnhG_{n}^{h} be a preferential attachment graph and Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}) its corresponding random tree. Fix S~V~\tilde{S}\subseteq\tilde{V} and let t0[M]t_{0}\in[M] be such that t0=t0(n)nt_{0}=t_{0}(n)\xrightarrow{n\rightarrow\infty}\infty. Then

i=t0+1Mδi2i1μ(S~i1)2i1=π2i=1M(δici1j=1i1δjcj1)+O(lnM+t0).\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}=\frac{\pi}{2}\sum_{i=1}^{M}\left({\delta}_{i}c_{i-1}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)+O(\ln{M}+t_{0}).
Proof.

First note that by Lemma 15

i=1t0δi2i1μ(S~i1)2i1i=1t02i1(i1+12)2i12t0.\sum_{i=1}^{t_{0}}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}\leq\sum_{i=1}^{t_{0}}\frac{2\sqrt{i-1}(\sqrt{i-1}+\frac{1}{2})}{2i-1}\leq 2t_{0}.

Therefore

(2) i=t0+1Mδi2i1μ(S~i1)2i1=i=1Mδi2i1μ(S~i1)2i1+O(t0).\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}=\sum_{i=1}^{M}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}+O(t_{0}).

Moreover, for i{2,,M}i\in\{2,\ldots,M\}, by Lemma 14 we have

1i1πci1(1+O(1/i))andci11πi1\frac{1}{\sqrt{i-1}}\leq\sqrt{\pi}c_{i-1}(1+O(1/i))\quad\quad\textnormal{and}\quad\quad c_{i-1}\leq\frac{1}{\sqrt{\pi}\sqrt{i-1}}

and note that the constant hidden in O(1/i)O(1/i) is the same for all ii’s. Therefore, by the definition of μ(S~i)\mu(\tilde{S}_{i}), Lemma 15, and the fact that δi{0,1}{\delta}_{i}\in\{0,1\}

i=1Mδi2i1μ(S~i1)2i1=i=2Mδi2i1μ(S~i1)2i1i=2Mδiμ(S~i1)i1i=2Mδiπci1μ(S~i1)(1+O(1/i))i=2M(δiπci1π2j=1i1δjcj1)+i=2Mπi1+12πi1O(1/i)=π2i=1M(δici1j=1i1δjcj1)+O(lnM),\begin{split}\sum_{i=1}^{M}{\delta}_{i}&\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}=\sum_{i=2}^{M}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}\leq\sum_{i=2}^{M}{\delta}_{i}\frac{\mu(\tilde{S}_{i-1})}{\sqrt{i-1}}\\ &\leq\sum_{i=2}^{M}{\delta}_{i}\sqrt{\pi}c_{i-1}\mu(\tilde{S}_{i-1})(1+O(1/i))\\ &\leq\sum_{i=2}^{M}\left({\delta}_{i}\sqrt{\pi}c_{i-1}\frac{\sqrt{\pi}}{2}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)+\sum_{i=2}^{M}\sqrt{\pi}\frac{\sqrt{i-1}+\frac{1}{2}}{\sqrt{\pi}\sqrt{i-1}}O(1/i)\\ &=\frac{\pi}{2}\sum_{i=1}^{M}\left({\delta}_{i}c_{i-1}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)+O(\ln{M}),\end{split}

which, together with (2), yields the desired upper bound.

On the other hand, again by Lemma 14, the definition of μ(S~i)\mu(\tilde{S}_{i}), Lemma 15 and the fact that δi{0,1}{\delta}_{i}\in\{0,1\}

i=1Mδi2i1μ(S~i1)2i1=i=2M(112i1)δi2i1μ(S~i1)2i2=i=2Mδiμ(S~i1)i1i=2Mδi2i1μ(S~i1)i1i=2M(δiπci1π2j=1i1δjcj1)i=2M12i1i1+12i1=π2i=1M(δici1j=1i1δjcj1)O(lnM),\begin{split}\sum_{i=1}^{M}{\delta}_{i}&\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}=\sum_{i=2}^{M}\left(1-\frac{1}{2i-1}\right){\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-2}\\ &=\sum_{i=2}^{M}{\delta}_{i}\frac{\mu(\tilde{S}_{i-1})}{\sqrt{i-1}}-\sum_{i=2}^{M}\frac{\delta_{i}}{2i-1}\frac{\mu(\tilde{S}_{i-1})}{\sqrt{i-1}}\\ &\geq\sum_{i=2}^{M}\left({\delta}_{i}\sqrt{\pi}c_{i-1}\frac{\sqrt{\pi}}{2}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)-\sum_{i=2}^{M}\frac{1}{2i-1}\frac{\sqrt{i-1}+\frac{1}{2}}{\sqrt{i-1}}\\ &=\frac{\pi}{2}\sum_{i=1}^{M}\left({\delta}_{i}c_{i-1}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)-O(\ln{M}),\end{split}

which, together with (2), yields the desired lower bound. ∎

Lemma 19.

Let GnhG_{n}^{h} be a preferential attachment graph and Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}) its corresponding random tree. Fix S~V~\tilde{S}\subseteq\tilde{V} and let t0[M]t_{0}\in[M]. Then for any constant C>0C>0

i=t0+1MδiC(i1)2i1C2(Mt0).\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{C(i-1)}{2i-1}\leq\frac{C}{2}(M-t_{0}).
Proof.

By the fact that δi{0,1}{\delta}_{i}\in\{0,1\} we immediately get

i=t0+1MδiC(i1)2i1C2i=t0+1MδiC2(Mt0).\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{C(i-1)}{2i-1}\leq\frac{C}{2}\sum_{i=t_{0}+1}^{M}{\delta}_{i}\leq\frac{C}{2}(M-t_{0}).

5. Edge density and volume results for 𝑮𝒏𝒉G_{n}^{h}

In this section we use martingale techniques to prove theorems 8, 9, and 10 stated in Section 3, i.e., we derive concentration results for vol(S)\operatorname{vol}(S), e(S)e(S), and e(S,VS)e(S,V\setminus S) for an arbitrary subset SVS\subseteq V in GnhG_{n}^{h}. A series of results will lead us to Corollary 22 which implies, as a special case, Theorem 8. We start with analyzing the volumes.

Lemma 20.

Let GnhG_{n}^{h} be a preferential attachment graph. Consider the process of constructing its corresponding random tree Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}). Fix S~V~\tilde{S}\subseteq\tilde{V} and for t[M]t\in[M] let Zt=volTt(S~t)Z_{t}=\operatorname{vol}_{T_{t}}(\tilde{S}_{t}). Set

Z^t=ctZtj=1tδjcj1\hat{Z}_{t}=c_{t}Z_{t}-\sum_{j=1}^{t}{\delta}_{j}c_{j-1}

(recall that δj{\delta}_{j} and cjc_{j} were introduced in Definition 7). Let t\mathscr{F}_{t} be a σ\sigma-algebra associated with all the events that happened till time tt. Then Z^1,Z^2,,Z^M\hat{Z}_{1},\hat{Z}_{2},\ldots,\hat{Z}_{M} is a martingale with respect to the filtration 1M\mathscr{F}_{1}\subseteq\ldots\subseteq\mathscr{F}_{M}. Moreover, for t[M1]t\in[M-1]

|Z^t+1Z^t|2πtandVar[(Z^t+1Z^t)|t]14π(t+1).|\hat{Z}_{t+1}-\hat{Z}_{t}|\leq\frac{2}{\sqrt{\pi t}}\quad\quad\textnormal{and}\quad\quad\mathrm{Var}[(\hat{Z}_{t+1}-\hat{Z}_{t})|\mathscr{F}_{t}]\leq\frac{1}{4\pi(t+1)}.
Remark.

The first term of the martingale Z^t\hat{Z}_{t}, i.e., ctZt=(j=1t2j12j)Ztc_{t}Z_{t}=\left(\prod_{j=1}^{t}\frac{2j-1}{2j}\right)Z_{t} was inspired by the martingale constructed in Lemma 4 of [12] by Frieze, Prałat, Pérez-Giménez, and Reiniger.

Proof.

Let t[M1]t\in[M-1]. Recall that when mini-vertex (t+1)(t+1) arrives it may also connect to itself. Therefore, conditioned on t\mathscr{F}_{t},

(3) Zt+1={Zt+δt+1+1with probabilityZt+δt+12t+1Zt+δt+1otherwise.Z_{t+1}=\begin{cases}Z_{t}+\delta_{t+1}+1&\quad\textnormal{with probability}\quad\frac{Z_{t}+\delta_{t+1}}{2t+1}\\ Z_{t}+\delta_{t+1}&\quad\textnormal{otherwise}.\end{cases}

Additionally, since ct=ct+12t+22t+1c_{t}=c_{t+1}\cdot\frac{2t+2}{2t+1}, we get

𝔼[Z^t+1|t]=𝔼[ct+1Zt+1j=1t+1δjcj1|t]=ct+1(Zt+δt+1+Zt+δt+12t+1)j=1t+1δjcj1=ct+12t+22t+1(Zt+δt+1)j=1t+1δjcj1=ctZtj=1tδjcj1=Z^t,\begin{split}\mathbb{E}[\hat{Z}_{t+1}|\mathscr{F}_{t}]&=\mathbb{E}\bigg{[}c_{t+1}Z_{t+1}-\sum_{j=1}^{t+1}\delta_{j}c_{j-1}\bigg{|}\mathscr{F}_{t}\bigg{]}\\ &=c_{t+1}\left(Z_{t}+\delta_{t+1}+\frac{Z_{t}+\delta_{t+1}}{2t+1}\right)-\sum_{j=1}^{t+1}\delta_{j}c_{j-1}\\ &=c_{t+1}\cdot\frac{2t+2}{2t+1}(Z_{t}+\delta_{t+1})-\sum_{j=1}^{t+1}\delta_{j}c_{j-1}\\ &=c_{t}Z_{t}-\sum_{j=1}^{t}\delta_{j}c_{j-1}=\hat{Z}_{t},\end{split}

thus Z^1,,Z^M\hat{Z}_{1},\ldots,\hat{Z}_{M} is a martingale with respect to the filtration 1M\mathscr{F}_{1}\subseteq\ldots\subseteq\mathscr{F}_{M}. Next, since ct+1=ct2t+12t+2c_{t+1}=c_{t}\cdot\frac{2t+1}{2t+2}

|Z^t+1Z^t|=|ct+1Zt+1ctZtδt+1ct|=ct|2t+12t+2Zt+1Ztδt+1|=ct|(Zt+1Zt)(Zt+12t+2+δt+1)|.\begin{split}|\hat{Z}_{t+1}-\hat{Z}_{t}|&=|c_{t+1}Z_{t+1}-c_{t}Z_{t}-\delta_{t+1}c_{t}|=c_{t}\left|\frac{2t+1}{2t+2}Z_{t+1}-Z_{t}-\delta_{t+1}\right|\\ &=c_{t}\left|(Z_{t+1}-Z_{t})-\left(\frac{Z_{t+1}}{2t+2}+\delta_{t+1}\right)\right|.\end{split}

Note that (Zt+1Zt){0,1,2}(Z_{t+1}-Z_{t})\in\{0,1,2\}. Moreover, the volume of S~t+1\tilde{S}_{t+1} in Tt+1T_{t+1} may be at most 2t+22t+2 thus Zt+12t+2Z_{t+1}\leq 2t+2 which implies also (Zt+12t+2+δt+1)[0,2]\left(\frac{Z_{t+1}}{2t+2}+\delta_{t+1}\right)\in[0,2]. Now use Lemma 14 and the fact that |ab|max{a,b}|a-b|\leq\max\{a,b\} for non-negative a,ba,b to get

|Z^t+1Z^t|2ct2πt.|\hat{Z}_{t+1}-\hat{Z}_{t}|\leq 2c_{t}\leq\frac{2}{\sqrt{\pi t}}.

By the fact that Z^1,,Z^M\hat{Z}_{1},\ldots,\hat{Z}_{M} is a martingale with respect to the filtration 1M\mathscr{F}_{1}\subseteq\ldots\subseteq\mathscr{F}_{M}, ct=2t+22t+1ct+1c_{t}=\frac{2t+2}{2t+1}\cdot c_{t+1} and by (3) we also have

Var[(Z^t+1Z^t)|t]=𝔼[(Z^t+1Z^t)2|t]=𝔼[(ct+1Zt+1ctZtδt+1ct)2|t]=ct+12𝔼[(Zt+12t+22t+1(Zt+δt+1))2|t]=ct+12((Zt+δt+1+12t+22t+1(Zt+δt+1))2Zt+δt+12t+1+(Zt+δt+12t+22t+1(Zt+δt+1))2(1Zt+δt+12t+1))=ct+12(Zt+δt+12t+1(1Zt+δt+12t+1))ct+12414π(t+1),\begin{split}\mathrm{Var}[(\hat{Z}_{t+1}-\hat{Z}_{t})&|\mathscr{F}_{t}]=\mathbb{E}[(\hat{Z}_{t+1}-\hat{Z}_{t})^{2}|\mathscr{F}_{t}]\\ &=\mathbb{E}[(c_{t+1}Z_{t+1}-c_{t}Z_{t}-{\delta}_{t+1}c_{t})^{2}|\mathscr{F}_{t}]\\ &=c_{t+1}^{2}~{}\mathbb{E}\left[\left(Z_{t+1}-\frac{2t+2}{2t+1}(Z_{t}+{\delta}_{t+1})\right)^{2}\Bigg{|}\mathscr{F}_{t}\right]\\ &=c_{t+1}^{2}\left(\left(Z_{t}+\delta_{t+1}+1-\frac{2t+2}{2t+1}(Z_{t}+{\delta}_{t+1})\right)^{2}\frac{Z_{t}+\delta_{t+1}}{2t+1}\right.\\ &\quad\quad\quad+\left.\left(Z_{t}+\delta_{t+1}-\frac{2t+2}{2t+1}(Z_{t}+{\delta}_{t+1})\right)^{2}\left(1-\frac{Z_{t}+\delta_{t+1}}{2t+1}\right)\right)\\ &=c_{t+1}^{2}\left(\frac{Z_{t}+\delta_{t+1}}{2t+1}\left(1-\frac{Z_{t}+\delta_{t+1}}{2t+1}\right)\right)\leq\frac{c_{t+1}^{2}}{4}\leq\frac{1}{4\pi(t+1)},\end{split}

where the last inequalities follow from the fact that Zt+δt+12t+1[0,1]\frac{Z_{t}+\delta_{t+1}}{2t+1}\in[0,1] and Lemma 14, respectively. ∎

Theorem 21.

Let GnhG_{n}^{h} be a preferential attachment graph and Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}) its corresponding random tree. Fix S~V~\tilde{S}\subseteq\tilde{V} and let t[M]t\in[M]. Then for every ε>0\varepsilon>0, for sufficiently large tt and for sufficiently large nn we get

[|volTt(S~t)2tμ(S~t)|(1+ε)g𝒱(h)th]22(1+ε/2)t/h,\mathbb{P}\left[\left|\operatorname{vol}_{T_{t}}(\tilde{S}_{t})-2\sqrt{t}\mu(\tilde{S}_{t})\right|\geq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right]\leq 2\cdot 2^{-(1+{\varepsilon}/2)t/h},

where g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2.
(Recall that μ(S~t)\mu(\tilde{S}_{t}) was introduced in Definition 7).

Proof.

Throughout the proof we refer to the process of constructing the random tree ThnT_{hn}. For t[M]t\in[M] let t\mathscr{F}_{t} be a σ\sigma-algebra associated with all the events that happened till time tt. Fix ε>0\varepsilon>0, set t0=t/ht_{0}=\lfloor t/h\rfloor and for j{t0,t0+1,,t}j\in\{t_{0},t_{0}+1,\ldots,t\} consider

Z^j=cjZji=1jδici1,\hat{Z}_{j}=c_{j}Z_{j}-\sum_{i=1}^{j}{\delta}_{i}c_{i-1},

where Zj=volTj(S~j)Z_{j}=\operatorname{vol}_{T_{j}}(\tilde{S}_{j}) and cic_{i}’s and δi{\delta}_{i}’s are as in Definition 7. By Lemma 20 we know that Z^t0,,Z^t\hat{Z}_{t_{0}},\ldots,\hat{Z}_{t} is a martingale with respect to the filtration t0t\mathscr{F}_{t_{0}}\subseteq\ldots\subseteq\mathscr{F}_{t} such that |Z^jZ^j1|2π(j1)|\hat{Z}_{j}-\hat{Z}_{j-1}|\leq\frac{2}{\sqrt{\pi(j-1)}} and Var[(Z^jZ^j1)|j1]14πj\mathrm{Var}[(\hat{Z}_{j}-\hat{Z}_{j-1})|\mathscr{F}_{j-1}]\leq\frac{1}{4\pi j}. Therefore

maxj{t0+1,,t}(Z^jZ^j1)2π(t01)=2π(t/h)(1+O(1/t))\max_{j\in\{t_{0}+1,\ldots,t\}}(\hat{Z}_{j}-\hat{Z}_{j-1})\leq\frac{2}{\sqrt{\pi(t_{0}-1)}}=\frac{2}{\sqrt{\pi(t/h)}}(1+O(1/t))

and, by Lemma 16,

j=t0+1tVar[(Z^jZ^j1)|j1]j=t0+1t14πj14πln(t/t0)+14π18t/h2=14πlnh+O(1/t).\begin{split}\sum_{j=t_{0}+1}^{t}\mathrm{Var}[(\hat{Z}_{j}&-\hat{Z}_{j-1})|\mathscr{F}_{j-1}]\\ &\leq\sum_{j=t_{0}+1}^{t}\frac{1}{4\pi j}\leq\frac{1}{4\pi}\ln{(t/t_{0})}+\frac{1}{4\pi}\cdot\frac{1}{8\lfloor t/h\rfloor^{2}}=\frac{1}{4\pi}\ln{h}+O(1/t).\end{split}

Applying Freedman’s inequality (Lemma 12) to Z^t0,,Z^t\hat{Z}_{t_{0}},\ldots,\hat{Z}_{t} with A=2π(t/h)(1+O(1/t))A=\frac{2}{\sqrt{\pi(t/h)}}(1+O(1/t)), W=14πlnh+O(1/t)W=\frac{1}{4\pi}\ln{h}+O(1/t) and λ=(1+ε)g¯𝒱(h)πt/h\lambda=\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}, where g¯𝒱(h)=g𝒱(h)2\bar{g}_{\mathscr{V}}(h)=g_{\mathscr{V}}(h)-2, we get

[Z^tZ^t0+(1+ε)g¯𝒱(h)πt/h]exp{(1+ε)2g¯𝒱(h)2πth214πlnh+O(1/t)+232π(1+ε)g¯𝒱(h)(1+O(1/t))}exp{(1+ε/2)2g¯𝒱(h)212lnh+43(1+ε/2)g¯𝒱(h)th},\begin{split}\mathbb{P}\left[\hat{Z}_{t}\vphantom{\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}}\right.&\left.\geq\hat{Z}_{t_{0}}+\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right]\\ &\leq\exp\left\{-\frac{\frac{(1+\varepsilon)^{2}\bar{g}_{\mathscr{V}}(h)^{2}}{\pi}\frac{t}{h}}{2\cdot\frac{1}{4\pi}\ln{h}+O(1/t)+\frac{2}{3}\cdot\frac{2}{\pi}(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)(1+O(1/t))}\right\}\\ &\leq\exp\left\{-\frac{(1+\varepsilon/2)^{2}\bar{g}_{\mathscr{V}}(h)^{2}}{\frac{1}{2}\ln{h}+\frac{4}{3}(1+\varepsilon/2)\bar{g}_{\mathscr{V}}(h)}\cdot\frac{t}{h}\right\},\end{split}

where the last inequality holds for sufficiently large tt and nn. One can verify that

(1+ε/2)2g¯𝒱(h)212lnh+43(1+ε/2)g¯𝒱(h)(1+ε/2)ln2\frac{(1+\varepsilon/2)^{2}\bar{g}_{\mathscr{V}}(h)^{2}}{\frac{1}{2}\ln{h}+\frac{4}{3}(1+\varepsilon/2)\bar{g}_{\mathscr{V}}(h)}\geq(1+{\varepsilon}/2)\ln{2}

thus for sufficiently large tt and nn we get

(4) [Z^tZ^t0+(1+ε)g¯𝒱(h)πt/h]2(1+ε/2)t/h.\mathbb{P}\left[\hat{Z}_{t}\geq\hat{Z}_{t_{0}}+\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right]\leq 2^{-(1+{\varepsilon}/2)t/h}.

Let us now analyze the complementary event {Z^tZ^t0+(1+ε)g¯𝒱(h)πt/h}\left\{\hat{Z}_{t}\leq\hat{Z}_{t_{0}}+\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\}. It is equivalent to

{ctZti=1tδici1ct0Zt0i=1t0δici1+(1+ε)g¯𝒱(h)πt/h}\left\{c_{t}Z_{t}-\sum_{i=1}^{t}{\delta}_{i}c_{i-1}\leq c_{t_{0}}Z_{t_{0}}-\sum_{i=1}^{t_{0}}{\delta}_{i}c_{i-1}+\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\}

which is

{Zt1cti=1tδici1ct0ctZt01cti=1t0δici1+1ct(1+ε)g¯𝒱(h)πt/h}.\left\{Z_{t}-\frac{1}{c_{t}}\sum_{i=1}^{t}{\delta}_{i}c_{i-1}\leq\frac{c_{t_{0}}}{c_{t}}Z_{t_{0}}-\frac{1}{c_{t}}\sum_{i=1}^{t_{0}}{\delta}_{i}c_{i-1}+\frac{1}{c_{t}}\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\}.

By the definition of μ(S~t)\mu(\tilde{S}_{t}), Lemma 14 and Lemma 15 we have

(5) 1cti=1tδici1=2πctμ(S~t)2tμ(S~t)e18t+14144t2=2tμ(S~t)(1+O(1/t))=2tμ(S~t)+O(1).\begin{split}\frac{1}{c_{t}}\sum_{i=1}^{t}{\delta}_{i}c_{i-1}&=\frac{2}{\sqrt{\pi}c_{t}}\mu(\tilde{S}_{t})\leq 2\sqrt{t}\mu(\tilde{S}_{t})e^{\frac{1}{8t}+\frac{1}{4\cdot 144t^{2}}}\\ &=2\sqrt{t}\mu(\tilde{S}_{t})(1+O(1/t))=2\sqrt{t}\mu(\tilde{S}_{t})+O(1).\end{split}

Note that Zt02t0Z_{t_{0}}\leq 2t_{0}, therefore by Lemma 14

(6) ct0ctZt02tt0(1+O(1/t))=2th+O(1),\frac{c_{t_{0}}}{c_{t}}Z_{t_{0}}\leq 2\sqrt{t}\sqrt{t_{0}}(1+O(1/t))=\frac{2t}{\sqrt{h}}+O(1),

and, again by Lemma 14,

(7) 1ct(1+ε)g¯𝒱(h)πt/h(1+ε)g¯𝒱(h)th(1+O(1/t))=(1+ε)g¯𝒱(h)th+O(1).\begin{split}\frac{1}{c_{t}}\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}&\leq(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}~{}(1+O(1/t))\\ &=(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}+O(1).\end{split}

Thus by (5), (6) and (7) we get that the event {Z^tZ^t0+(1+ε)g¯𝒱(h)πt/h}\left\{\hat{Z}_{t}\leq\hat{Z}_{t_{0}}+\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\} implies

{Zt2tμ(S~t)2th+(1+ε)g¯𝒱(h)th+O(1)}\left\{Z_{t}-2\sqrt{t}\mu(\tilde{S}_{t})\leq\frac{2t}{\sqrt{h}}+(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}+O(1)\right\}

which, by (4), for sufficiently large tt and nn, gives (recall that g¯𝒱(h)+2=g𝒱(h)\bar{g}_{\mathscr{V}}(h)+2=g_{\mathscr{V}}(h))

(8) [volTt(S~t)2tμ(S~t)(1+ε)g𝒱(h)th]2(1+ε/2)t/h.\mathbb{P}\left[\operatorname{vol}_{T_{t}}(\tilde{S}_{t})-2\sqrt{t}\mu(\tilde{S}_{t})\geq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right]\leq 2^{-(1+{\varepsilon}/2)t/h}.

To get the opposite bound we repeat the reasoning for the martingale Z^t0,,Z^t-\hat{Z}_{t_{0}},\ldots,-\hat{Z}_{t}. We apply again Freedman’s inequality (Lemma 12) with A=2π(t/h)(1+O(1/t))A=\frac{2}{\sqrt{\pi(t/h)}}(1+O(1/t)), W=14πlnh+O(1/t)W=\frac{1}{4\pi}\ln{h}+O(1/t) and λ=(1+ε)g¯𝒱(h)πt/h\lambda=\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}, where (as before) g¯𝒱(h)=g𝒱(h)2\bar{g}_{\mathscr{V}}(h)=g_{\mathscr{V}}(h)-2. For sufficiently large tt and nn we get

[Z^tZ^t0+(1+ε)g¯𝒱(h)πt/h]exp{(1+ε/2)2g¯𝒱(h)212lnh+43(1+ε/2)g¯𝒱(h)th},\begin{split}\mathbb{P}\left[-\hat{Z}_{t}\vphantom{\frac{{C}_{3\varepsilon}}{\sqrt{\pi}}}\right.&\left.\geq-\hat{Z}_{t_{0}}+\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right]\leq\exp\left\{-\frac{(1+\varepsilon/2)^{2}\bar{g}_{\mathscr{V}}(h)^{2}}{\frac{1}{2}\ln{h}+\frac{4}{3}(1+\varepsilon/2)\bar{g}_{\mathscr{V}}(h)}\cdot\frac{t}{h}\right\},\end{split}

which for sufficiently large tt and nn implies

(9) [Z^tZ^t0(1+ε)g¯𝒱(h)πt/h]2(1+ε/2)t/h.\mathbb{P}\left[\hat{Z}_{t}\leq\hat{Z}_{t_{0}}-\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right]\leq 2^{-(1+{\varepsilon}/2)t/h}.

Now, the complementary event {Z^tZ^t0(1+ε)g¯𝒱(h)πt/h}\left\{\hat{Z}_{t}\geq\hat{Z}_{t_{0}}-\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\} is equivalent to

{Zt1cti=1tδici1ct0ctZt01cti=1t0δici11ct(1+ε)g¯𝒱(h)πt/h}.\left\{Z_{t}-\frac{1}{c_{t}}\sum_{i=1}^{t}{\delta}_{i}c_{i-1}\geq\frac{c_{t_{0}}}{c_{t}}Z_{t_{0}}-\frac{1}{c_{t}}\sum_{i=1}^{t_{0}}{\delta}_{i}c_{i-1}-\frac{1}{c_{t}}\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\}.

This time by the definition of μ(S~t)\mu(\tilde{S}_{t}) and Lemma 14 we have

(10) 1cti=1tδici12tμ(S~t),\frac{1}{c_{t}}\sum_{i=1}^{t}{\delta}_{i}c_{i-1}\geq 2\sqrt{t}\mu(\tilde{S}_{t}),

and by the definition of μ(S~t)\mu(\tilde{S}_{t}), Lemma 14 and Lemma 15

(11) 1cti=1t0δici12tμ(S~t0)e18t+14144t22th+O(1).\frac{1}{c_{t}}\sum_{i=1}^{t_{0}}{\delta}_{i}c_{i-1}\leq 2\sqrt{t}\mu(\tilde{S}_{t_{0}})e^{\frac{1}{8t}+\frac{1}{4\cdot 144t^{2}}}\leq\frac{2t}{\sqrt{h}}+O(1).

Hence, by (10), (11) and (7), the event {Z^tZ^t0(1+ε)g¯𝒱(h)πt/h}\left\{\hat{Z}_{t}\geq\hat{Z}_{t_{0}}-\frac{(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)}{\sqrt{\pi}}\sqrt{t/h}\right\} implies

{Zt2tμ(S~t)2th(1+ε)g¯𝒱(h)thO(1)}.\left\{Z_{t}-2\sqrt{t}\mu(\tilde{S}_{t})\geq-\frac{2t}{\sqrt{h}}-(1+\varepsilon)\bar{g}_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}-O(1)\right\}.

Therefore by (9) for sufficiently large tt and nn we get

(12) [volTt(S~t)2tμ(S~t)(1+ε)g𝒱(h)th]2(1+ε/2)t/h.\mathbb{P}\left[\operatorname{vol}_{T_{t}}(\tilde{S}_{t})-2\sqrt{t}\mu(\tilde{S}_{t})\leq-(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right]\leq 2^{-(1+{\varepsilon}/2)t/h}.

Finally, by (8) and (12), for sufficiently large tt and nn we may write

[|volTt(S~t)2tμ(S~t)|(1+ε)g𝒱(h)th]22(1+ε/2)t/h.\mathbb{P}\left[\left|\operatorname{vol}_{T_{t}}(\tilde{S}_{t})-2\sqrt{t}\mu(\tilde{S}_{t})\right|\geq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right]\leq 2\cdot 2^{-(1+{\varepsilon/2})t/h}.

Corollary 22.

Let Gnh=(V,E)G_{n}^{h}=(V,E) be a preferential attachment graph and Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}) its corresponding random tree. Then for every ε>0\varepsilon>0 we have

[i{log2n,,n}SV|volGih(Si)2hiμ(S~hi)|(1+ε)g𝒱(h)hih]=1o(1),\begin{split}\mathbb{P}\left[\vphantom{\frac{C}{\sqrt{h}}}\forall i\right.&\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\}\,\forall S\subseteq V\\ &\left.\left|\operatorname{vol}_{G_{i}^{h}}(S_{i})-2\sqrt{hi}\,\mu(\tilde{S}_{hi})\right|\leq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{hi}{\sqrt{h}}\right]=1-o(1),\end{split}

where g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2.

Proof.

Fix ε>0\varepsilon>0. Recall that volGih(Si)=volThi(S~hi)\operatorname{vol}_{G_{i}^{h}}(S_{i})=\operatorname{vol}_{T_{hi}}(\tilde{S}_{hi}). For SVS\subseteq V and i{log2n,,n}i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\} define the event S,i\mathscr{E}_{S,i} as follows

S,i={|volThi(S~hi)2hiμ(S~hi)|(1+ε)g𝒱(h)hih}.\mathscr{E}_{S,i}=\Bigl{\{}\left|\operatorname{vol}_{T_{hi}}(\tilde{S}_{hi})-2\sqrt{hi}\,\mu(\tilde{S}_{hi})\right|\leq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{hi}{\sqrt{h}}\Bigr{\}}.

For i{log2n,,n}i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\}, by Theorem 21 and the union bound, for sufficiently large nn we have

[SVS,iC]2i22(1+ε/2)i=22(ε/2)i.\mathbb{P}\left[\exists S\subseteq V\,\,\mathscr{E}_{S,i}^{C}\right]\leq 2^{i}\cdot 2\cdot 2^{-(1+\varepsilon/2)i}=2\cdot 2^{-(\varepsilon/2)i}.

Indeed, note that ii iterates over the vertices of GnhG_{n}^{h} and at time ii there are 2i2^{i} possible configurations for SiS_{i}, thus also for S~hi\tilde{S}_{hi}. Next, again by the union bound, for sufficiently large nn we get

[i{log2n,,n}SVS,iC]i=log2nn22(ε/2)i22εlog2n12ε2(12ε)nε,\begin{split}\mathbb{P}\left[\exists i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\}\,\,\exists S\subseteq V\,\,\mathscr{E}_{S,i}^{C}\right]&\leq\sum_{i=\lfloor\log_{2}{n}\rfloor}^{n}2\cdot 2^{-(\varepsilon/2)i}\\ &\leq\frac{2\cdot 2^{-\varepsilon\lfloor\log_{2}{n}\rfloor}}{1-2^{-\varepsilon}}\sim\frac{2}{(1-2^{-\varepsilon})n^{\varepsilon}},\end{split}

which implies

[i{log2n,,n}SVS,i]=1o(1).\mathbb{P}\left[\forall i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\}\,\,\forall S\subseteq V\,\,\mathscr{E}_{S,i}\right]=1-o(1).

Note that considering only i=ni=n in Corollary 22 we get the statement of Theorem 8, which finishes its proof.

Now we will again use martingale inequalities which together with concentration results for volumes from Corollary 22 will lead to the proof of Theorem 9. Consider the process of constructing the random tree Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}). Let S~V~\tilde{S}\subseteq\tilde{V} and j{1,,M}j\in\{1,\ldots,M\}. The result stated in Corollary 22 gives the concentration of the volumes of S~j\tilde{S}_{j} at time jj only for j=hij=hi, where i{log2n,,n}i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\}. In particular, it says nothing about the concentration of the volumes of the sets S~hi+k\tilde{S}_{hi+k}, where k{1,2,h1}k\in\{1,2,\ldots h-1\}, at time hi+khi+k. Such intermediate concentrations will be needed to prove Theorem 9, i.e., to draw conclusion about the concentration of the number of edges within S~\tilde{S}. We derive those intermediate concentrations in Lemma 23.

Lemma 23.

Let Gnh=(V,E)G_{n}^{h}=(V,E) be a preferential attachment graph and Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}) its corresponding random tree. Then for every ε>0\varepsilon>0 we have

[t{log2nh,,M}SV|volTt(S~t)2tμ(S~t)|(1+ε)g𝒱(h)th]=1o(1),\begin{split}\mathbb{P}\left[\vphantom{\frac{C}{\sqrt{h}}}\forall t\right.&\in\{\lfloor\log_{2}{n}\rfloor h,\ldots,M\}\,\forall S\subseteq V\\ &\left.\left|\operatorname{vol}_{T_{t}}(\tilde{S}_{t})-2\sqrt{t}\,\mu(\tilde{S}_{t})\right|\leq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right]=1-o(1),\end{split}

where g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2g_{\mathscr{V}}(h)=\frac{1}{6}\sqrt{2\ln{2}\,(9\ln{h}+8\ln{2})}+(2/3)\ln{2}+2.

Proof.

Fix ε>0\varepsilon>0. Define the events \mathscr{H} and 𝒱\mathscr{V} as follows

={t{log2nh,,M}SV|volTt(S~t)2tμ(S~t)|(1+ε)g𝒱(h)th}𝒱={j=ih such that i{log2n,,n}SV|volTj(S~j)2jμ(S~j)|(1+ε/2)g𝒱(h)jh}.\begin{split}\mathscr{H}=\left\{\vphantom{\frac{C}{\sqrt{h}}}\forall t\right.&\in\{\lfloor\log_{2}{n}\rfloor h,\ldots,M\}\,\forall S\subseteq V\\ &\left.\left|\operatorname{vol}_{T_{t}}(\tilde{S}_{t})-2\sqrt{t}\,\mu(\tilde{S}_{t})\right|\leq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right\}\\ \mathscr{V}=\left\{\vphantom{\frac{C}{\sqrt{h}}}\right.\forall j&=ih\textnormal{ such that }i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n\}\,\forall S\subseteq V\\ &\left.\left|\operatorname{vol}_{T_{j}}(\tilde{S}_{j})-2\sqrt{j}\,\mu(\tilde{S}_{j})\right|\leq(1+\varepsilon/2)g_{\mathscr{V}}(h)\frac{j}{\sqrt{h}}\right\}.\end{split}

By Corollary 22 we know that [𝒱]=1o(1)\mathbb{P}[\mathscr{V}]=1-o(1) thus it is enough to show that the event 𝒱\mathscr{V} implies the event \mathscr{H}.

Assume that 𝒱\mathscr{V} holds. For t[M]t\in[M] and SVS\subseteq V let Zt=volTt(S~t)Z_{t}=\operatorname{vol}_{T_{t}}(\tilde{S}_{t}). Consider all j=ihj=ih where i{log2n,,n1}i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n-1\} and let k{1,2,h1}k\in\{1,2,\ldots h-1\}. By the fact that 1+k/j=1+O(1/j)\sqrt{1+k/j}=1+O(1/j), δ{0,1}{\delta}_{\ell}\in\{0,1\}, Lemma 14 and Lemma 15 we get

2j+kμ(S~j+k)=2j1+k/j(μ(S~j)+=j+1j+kδc1)2j(1+O(1/j))(μ(S~j)+=j+1j+k1π(1))=(2j+O(1/j))(μ(S~j)+O(1/j))=2jμ(S~j)+O(1).\begin{split}2\sqrt{j+k}\,\mu(\tilde{S}_{j+k})&=2\sqrt{j}\sqrt{1+k/j}\left(\mu(\tilde{S}_{j})+\sum_{\ell=j+1}^{j+k}{{\delta}_{\ell}c_{\ell-1}}\right)\\ &\leq 2\sqrt{j}\left(1+O(1/j)\right)\left(\mu(\tilde{S}_{j})+\sum_{\ell=j+1}^{j+k}\frac{1}{\sqrt{\pi(\ell-1)}}\right)\\ &=(2\sqrt{j}+O(1/\sqrt{j}))\left(\mu(\tilde{S}_{j})+O(1/\sqrt{j})\right)\\ &=2\sqrt{j}\,\mu(\tilde{S}_{j})+O(1).\end{split}

Therefore, if 𝒱\mathscr{V} holds, then for all j=ihj=ih where i{log2n,,n1}i\in\{\lfloor\log_{2}{n}\rfloor,\ldots,n-1\}, for all SVS\subseteq V, and for all k{1,2,,h1}k\in\{1,2,\ldots,h-1\}, for sufficiently large nn on one hand,

Zj+kZj2jμ(S~j)(1+ε/2)g𝒱(h)jh2j+kμ(S~j+k)O(1)(1+ε/2)g𝒱(h)j+kh2j+kμ(S~j+k)(1+ε)g𝒱(h)j+kh\begin{split}Z_{j+k}&\geq Z_{j}\geq 2\sqrt{j}\,\mu(\tilde{S}_{j})-(1+\varepsilon/2)g_{\mathscr{V}}(h)\frac{j}{\sqrt{h}}\\ &\geq 2\sqrt{j+k}\,\mu(\tilde{S}_{j+k})-O(1)-(1+\varepsilon/2)g_{\mathscr{V}}(h)\frac{j+k}{\sqrt{h}}\\ &\geq 2\sqrt{j+k}\,\mu(\tilde{S}_{j+k})-(1+\varepsilon)g_{\mathscr{V}}(h)\frac{j+k}{\sqrt{h}}\end{split}

and on the other hand

Zj+kZj+2k2jμ(S~j)+(1+ε/2)g𝒱(h)jh+2h2j+kμ(S~j+k)+(1+ε)g𝒱(h)j+kh.\begin{split}Z_{j+k}&\leq Z_{j}+2k\leq 2\sqrt{j}\,\mu(\tilde{S}_{j})+(1+\varepsilon/2)g_{\mathscr{V}}(h)\frac{j}{\sqrt{h}}+2h\\ &\leq 2\sqrt{j+k}\,\mu(\tilde{S}_{j+k})+(1+\varepsilon)g_{\mathscr{V}}(h)\frac{j+k}{\sqrt{h}}.\end{split}

Thus for sufficiently large nn the event 𝒱\mathscr{V} implies the event \mathscr{H}, therefore [𝒱]=1o(1)\mathbb{P}[\mathscr{V}]=1-o(1) implies []=1o(1)\mathbb{P}[\mathscr{H}]=1-o(1) and the proof is finished. ∎

Now, we move on to the concentration of the number of edges within subsets of VV.

Lemma 24.

Let GnhG_{n}^{h} be a preferential attachment graph. Consider the process of constructing its corresponding random tree Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}). Fix S~V~\tilde{S}\subseteq\tilde{V} and for t[M]t\in[M] let Xt=e(S~t)X_{t}=e(\tilde{S}_{t}) and t\mathscr{F}_{t} be a σ\sigma-algebra associated with all the events that happened till time tt. For j{2,,M}j\in\{2,\ldots,M\} set Dj=𝔼[XjXj1|j1]D_{j}=\mathbb{E}[X_{j}-X_{j-1}|\mathscr{F}_{j-1}] and define

X^t=Xtj=2tDj.\hat{X}_{t}=X_{t}-\sum_{j=2}^{t}D_{j}.

Then X^1,X^2,,X^M\hat{X}_{1},\hat{X}_{2},\ldots,\hat{X}_{M} is a martingale with respect to the filtration 1M\mathscr{F}_{1}\subseteq\ldots\subseteq\mathscr{F}_{M}. Moreover, for t{2,,M}t\in\{2,\ldots,M\}

|X^tX^t1|δt.|\hat{X}_{t}-\hat{X}_{t-1}|\leq{\delta}_{t}.
Proof.

Let t{2,,M}t\in\{2,\ldots,M\}. Note that

𝔼[X^tX^t1|t1]=𝔼[XtXt1Dt|t1]=𝔼[XtXt1|t1]𝔼[𝔼[XtXt1|t1]|t1]=𝔼[XtXt1|t1]𝔼[XtXt1|t1]=0,\begin{split}\mathbb{E}[\hat{X}_{t}-\hat{X}_{t-1}|\mathscr{F}_{t-1}]&=\mathbb{E}[{X}_{t}-{X}_{t-1}-D_{t}|\mathscr{F}_{t-1}]\\ &=\mathbb{E}[{X}_{t}-{X}_{t-1}|\mathscr{F}_{t-1}]-\mathbb{E}[\mathbb{E}[X_{t}-X_{t-1}|\mathscr{F}_{t-1}]|\mathscr{F}_{t-1}]\\ &=\mathbb{E}[{X}_{t}-{X}_{t-1}|\mathscr{F}_{t-1}]-\mathbb{E}[{X}_{t}-{X}_{t-1}|\mathscr{F}_{t-1}]=0,\end{split}

thus X^1,,X^M\hat{X}_{1},\ldots,\hat{X}_{M} is a martingale with respect to the filtration 1M\mathscr{F}_{1}\subseteq\ldots\subseteq\mathscr{F}_{M}.

Now, for t[M]t\in[M] let Zt=volTt(S~t)Z_{t}=\operatorname{vol}_{T_{t}}(\tilde{S}_{t}). Recall that when the mini-vertex tt arrives it may also connect to itself thus for t{2,,M}t\in\{2,\ldots,M\} we have

Dt=𝔼[XtXt1|t1]=δtZt1+12t1.D_{t}=\mathbb{E}[X_{t}-X_{t-1}|\mathscr{F}_{t-1}]={\delta}_{t}\frac{Z_{t-1}+1}{2t-1}.

Therefore

|X^tX^t1|=|XtXt1Dt|=|XtXt1δtZt1+12t1|max{XtXt1,δtZt1+12t1}δt,\begin{split}|\hat{X}_{t}-\hat{X}_{t-1}|&=|X_{t}-X_{t-1}-D_{t}|=\left|X_{t}-X_{t-1}-{\delta}_{t}\frac{Z_{t-1}+1}{2t-1}\right|\\ &\leq\max\left\{X_{t}-X_{t-1},{\delta}_{t}\frac{Z_{t-1}+1}{2t-1}\right\}\leq{\delta}_{t},\end{split}

where we used the fact that 0XtXt1δt0\leq X_{t}-X_{t-1}\leq{\delta}_{t}, Zt12t2Z_{t-1}\leq 2t-2 thus 0δtZt12t1δt0\leq{\delta}_{t}\frac{Z_{t-1}}{2t-1}\leq{\delta}_{t} and |ab|max{a,b}|a-b|\leq\max\{a,b\} for non-negative a,ba,b.

Lemma 25.

Let GnhG_{n}^{h} be a preferential attachment graph and Thn=(V~,E~)T_{hn}=(\tilde{V},\tilde{E}) its corresponding random tree. For t[M]t\in[M] and SVS\subseteq V let Zt=volTt(S~t)Z_{t}=\operatorname{vol}_{T_{t}}(\tilde{S}_{t}). Then for t0[M]t_{0}\in[M] divisible by hh and for every ε>0\varepsilon>0 we have

[SV|e(S)(e(St0/h)+i=t0+1MδiZi1+12i1)|BεMh]=1o(1),\mathbb{P}\left[\forall S\subseteq V\,\left|e(S)-\left(e(S_{t_{0}/h})+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\right)\right|\leq{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right]=1-o(1),

where Bε=(1+ε)2ln2{B}_{\varepsilon}=\sqrt{(1+\varepsilon)2\ln{2}}.

Proof.

Throughout the proof we again refer to the process of constructing the random tree ThnT_{hn}. For t[M]t\in[M] let t\mathscr{F}_{t} be a σ\sigma-algebra associated with all the events that happened till time tt. For SVS\subseteq V let Xt=eTt(S~t)X_{t}=e_{T_{t}}(\tilde{S}_{t}) and Dt=𝔼[XtXt1|t1]D_{t}=\mathbb{E}[X_{t}-X_{t-1}|\mathscr{F}_{t-1}]. Fix ε>0\varepsilon>0 and for j{t0,t0+1,,M}j\in\{t_{0},t_{0}+1,\ldots,M\} and SVS\subseteq V consider

X^j=Xji=2jDi.\hat{X}_{j}=X_{j}-\sum_{i=2}^{j}D_{i}.

By Lemma 24 we know that X^t0,,X^M\hat{X}_{t_{0}},\ldots,\hat{X}_{M} is a martingale with respect to the filtration t0M\mathscr{F}_{t_{0}}\subseteq\ldots\subseteq\mathscr{F}_{M} such that |X^jX^j1|δj|\hat{X}_{j}-\hat{X}_{j-1}|\leq{\delta}_{j}. Moreover

j=t0Mδj2=j=t0MδjM.\sum_{j=t_{0}}^{M}{\delta}_{j}^{2}=\sum_{j=t_{0}}^{M}{\delta}_{j}\leq M.

Thus applying Azuma-Hoeffding inequality (Lemma 11) to X^t0,,X^M\hat{X}_{t_{0}},\ldots,\hat{X}_{M} with bj=δjb_{j}={\delta}_{j} and x=BεMhx={B}_{\varepsilon}\frac{M}{\sqrt{h}}, where Bε=(1+ε)2ln2{B}_{\varepsilon}=\sqrt{(1+\varepsilon)2\ln{2}} we get

(13) [X^MX^t0+BεMh]exp{(1+ε)2ln2M2/h2M}=2(1+ε)n.\begin{split}\mathbb{P}\left[\hat{X}_{M}\geq\hat{X}_{t_{0}}+{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right]&\leq\exp\left\{-\frac{(1+\varepsilon)2\ln{2}\cdot M^{2}/h}{2M}\right\}=2^{-(1+\varepsilon)n}.\end{split}

Let us now analyze the event {X^MX^t0+BεMh}\left\{\hat{X}_{M}\geq\hat{X}_{t_{0}}+{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right\}. It is equivalent to

{XMi=2MDiXt0i=2t0Di+BεMh}\left\{X_{M}-\sum_{i=2}^{M}D_{i}\geq X_{t_{0}}-\sum_{i=2}^{t_{0}}D_{i}+{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right\}

which is

{XMXt0+i=t0+1MDi+BεMh}\left\{X_{M}\geq X_{t_{0}}+\sum_{i=t_{0}+1}^{M}D_{i}+{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right\}

and, by the definition of DiD_{i} (check the proof of Lemma 24),

{XMXt0+i=t0+1MδiZi1+12i1+BεMh}.\left\{X_{M}\geq X_{t_{0}}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}+{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right\}.

Thus, by (13), we get

(14) [XM(Xt0+i=t0+1MδiZi1+12i1)BεMh]2(1+ε)n.\mathbb{P}\left[X_{M}-\left(X_{t_{0}}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\right)\geq{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right]\leq 2^{-(1+\varepsilon)n}.

Acting analogously for the martingale X^t0,,X^M-\hat{X}_{t_{0}},\ldots,-\hat{X}_{M} we get

[X^MX^t0BεMh]2(1+ε)n,\mathbb{P}\left[\hat{X}_{M}\leq\hat{X}_{t_{0}}-{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right]\leq 2^{-(1+\varepsilon)n},

which implies

(15) [XM(Xt0+i=t0+1MδiZi1+12i1)BεMh]2(1+ε)n.\mathbb{P}\left[X_{M}-\left(X_{t_{0}}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\right)\leq-{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right]\leq 2^{-(1+\varepsilon)n}.

By (14) and (15), for a fixed SVS\subseteq V we get

(16) [|XM(Xt0+i=t0+1MδiZi1+12i1)|BεMh]22(1+ε)n.\mathbb{P}\left[\left|X_{M}-\left(X_{t_{0}}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\right)\right|\geq{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right]\leq 2\cdot 2^{-(1+\varepsilon)n}.

Now, for SVS\subseteq V, let the event S\mathscr{E}_{S} be defined as

S={|e(S)(e(St0/h)+i=t0+1MδiZi1+12i1)|BεMh}.\mathscr{E}_{S}=\left\{\left|e(S)-\left(e(S_{t_{0}/h})+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\right)\right|\leq{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right\}.

By (16) and the union bound we get (recall that XM=e(S)X_{M}=e(S) and Xt0=e(St0/h)X_{t_{0}}=e(S_{t_{0}/h}))

[SVS]=1[SVSC]1SV[SC]12n22(1+ε)n=122εn=1o(1).\begin{split}\mathbb{P}\left[\forall S\subseteq V\,\mathscr{E}_{S}\right]&=1-\mathbb{P}\left[\exists S\subseteq V\,\mathscr{E}_{S}^{C}\right]\geq 1-\sum_{S\subseteq V}\mathbb{P}\left[\mathscr{E}_{S}^{C}\right]\\ &\geq 1-2^{n}\cdot 2\cdot 2^{-(1+\varepsilon)n}=1-2\cdot 2^{-\varepsilon n}=1-o(1).\end{split}

We are ready to prove Theorem 9.

Proof of Theorem 9.

Throughout the proof we again refer to the process of constructing the random tree ThnT_{hn}. Fix ε>0\varepsilon>0. For t[M]t\in[M] and SVS\subseteq V let Xt=eTt(S~t)X_{t}=e_{T_{t}}(\tilde{S}_{t}) and Zt=volTt(S~t)Z_{t}=\operatorname{vol}_{T_{t}}(\tilde{S}_{t}). For t0=hlog2nt_{0}=h\lfloor\log_{2}{n}\rfloor we define the events \mathscr{H}, \mathscr{E} and 𝒱\mathscr{V} as follows

={SV|e(S)μ(S~)2|(1+ε)g(h)Mh},={SV|e(S)(e(St0/h)+i=t0+1MδiZi1+12i1)|BεMh},𝒱={t{t0,,M}SV|Zt2tμ(S~t)|(1+ε)g𝒱(h)th},\begin{split}\mathscr{H}=\left\{\vphantom{\frac{C}{\sqrt{h}}}\forall S\right.&\subseteq V\left.\left|e(S)-\mu(\tilde{S})^{2}\right|\leq(1+\varepsilon)g_{\mathscr{E}}(h)\frac{M}{\sqrt{h}}\right\},\\ \mathscr{E}=\left\{\vphantom{\frac{C}{\sqrt{h}}}\forall S\right.&\subseteq V\left.\left|e(S)-\left(e(S_{t_{0}/h})+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\right)\right|\leq{B}_{\varepsilon}\frac{M}{\sqrt{h}}\right\},\\ \mathscr{V}=\left\{\vphantom{\frac{C}{\sqrt{h}}}\forall t\right.&\in\{t_{0},\ldots,M\}\,\forall S\subseteq V\left.|Z_{t}-2\sqrt{t}\,\mu(\tilde{S}_{t})|\leq(1+\varepsilon)g_{\mathscr{V}}(h)\frac{t}{\sqrt{h}}\right\},\end{split}

where Bε=(1+ε)2ln2{B}_{\varepsilon}=\sqrt{(1+\varepsilon)2\ln{2}}. By lemmas 23 and 25 we know that [𝒱]=1o(1)\mathbb{P}[\mathscr{E}\cap\mathscr{V}]=1-o(1). Thus it is enough to show that the event 𝒱\mathscr{E}\cap\mathscr{V} implies the event \mathscr{H}.

Assume that 𝒱\mathscr{E}\cap\mathscr{V} holds. By lemmas 17, 18, and 19, for any SVS\subseteq V, as t0=O(lnM)t_{0}=O(\ln M), we can write

i=t0+1MδiZi1+12i1i=t0+1Mδi2i1+i=t0+1Mδi2i1μ(S~i1)2i1+i=t0+1Mδi(1+ε)g𝒱(h)i1h2i1π2i=1M(δici1)2+π2i=1M(δici1j=1i1δjcj1)+(1+ε)g𝒱(h)2Mt0h+O(lnM)=(π2)2(i=1M(δici1)2+2i=1M(δici1j=1i1δjcj1))+π4i=1M(δici1)2+(1+ε)g𝒱(h)2Mt0h+O(lnM)μ(S~M)2+π4i=1M1π(i1)+(1+ε)g𝒱(h)2Mt0h+O(lnM)=μ(S~M)2+(1+ε)g𝒱(h)2Mt0h+O(lnM),\begin{split}\sum_{i=t_{0}+1}^{M}&{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\\ \leq&\sum_{i=t_{0}+1}^{M}\frac{{\delta}_{i}}{2i-1}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{(1+\varepsilon)g_{\mathscr{V}}(h)\frac{i-1}{\sqrt{h}}}{2i-1}\\ \leq&~{}\frac{\pi}{2}\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}+\frac{\pi}{2}\sum_{i=1}^{M}\left({\delta}_{i}c_{i-1}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)\\ &\quad\quad\quad\quad+\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}+O(\ln{M})\\ =&\left(\frac{\sqrt{\pi}}{2}\right)^{2}\left(\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}+2\sum_{i=1}^{M}\left({\delta}_{i}c_{i-1}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)\right)+\frac{\pi}{4}\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}\\ &+\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}+O(\ln{M})\\ \leq&~{}\mu(\tilde{S}_{M})^{2}+\frac{\pi}{4}\sum_{i=1}^{M}\frac{1}{\pi(i-1)}+\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}+O(\ln{M})\\ =&~{}\mu(\tilde{S}_{M})^{2}+\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}+O(\ln{M}),\end{split}

where the last inequality follows from Lemma 14, the fact that δi{0,1}{\delta}_{i}\in\{0,1\}, and the fact that

μ(S~M)2=(π2i=1Mδici1)2=(π2)2(i=1M(δici1)2+2i=1M(δici1j=1i1δjcj1)).\mu(\tilde{S}_{M})^{2}=\left(\frac{\sqrt{\pi}}{2}\sum_{i=1}^{M}{\delta}_{i}c_{i-1}\right)^{2}=\left(\frac{\sqrt{\pi}}{2}\right)^{2}\left(\sum_{i=1}^{M}({\delta}_{i}c_{i-1})^{2}+2\sum_{i=1}^{M}\left({\delta}_{i}c_{i-1}\sum_{j=1}^{i-1}{\delta}_{j}c_{j-1}\right)\right).

Analogously, again by lemmas 17, 18, and 19, for any SVS\subseteq V we can get

i=t0+1MδiZi1+12i1i=t0+1Mδi2i1+i=t0+1Mδi2i1μ(S~i1)2i1i=t0+1Mδi(1+ε)g𝒱(h)i1h2i1μ(S~M)2(1+ε)g𝒱(h)2Mt0hO(lnM).\begin{split}\sum_{i=t_{0}+1}^{M}&{\delta}_{i}\frac{Z_{i-1}+1}{2i-1}\\ &\geq\sum_{i=t_{0}+1}^{M}\frac{{\delta}_{i}}{2i-1}+\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{2\sqrt{i-1}\mu(\tilde{S}_{i-1})}{2i-1}-\sum_{i=t_{0}+1}^{M}{\delta}_{i}\frac{(1+\varepsilon)g_{\mathscr{V}}(h)\frac{i-1}{\sqrt{h}}}{2i-1}\\ &\geq\mu(\tilde{S}_{M})^{2}-\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}-O(\ln{M}).\end{split}

Note that for any SVS\subseteq V we have e(St0/h)t0e(S_{t_{0}/h})\leq t_{0} and recall that t0=hlog2nt_{0}=h\lfloor\log_{2}{n}\rfloor. Thus for all SVS\subseteq V, for sufficiently large nn we may write (recall that 𝒱\mathscr{E}\cap\mathscr{V} holds)

e(S)t0+BεMh+μ(S~)2+(1+ε)g𝒱(h)2Mt0h+O(lnM)μ(S~)2+(1+ε)2ln2Mh+(1+ε)g𝒱(h)2Mh=μ(S~)2+(1+ε)g(h)Mh,\begin{split}e(S)&\leq t_{0}+{B}_{\varepsilon}\frac{M}{\sqrt{h}}+\mu(\tilde{S})^{2}+\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}+O(\ln{M})\\ &\leq\mu(\tilde{S})^{2}+(1+\varepsilon)\sqrt{2\ln{2}}\frac{M}{\sqrt{h}}+\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M}{\sqrt{h}}\\ &=\mu(\tilde{S})^{2}+(1+\varepsilon)g_{\mathscr{E}}(h)\frac{M}{\sqrt{h}},\end{split}

where the term O(lnM)O(\ln{M}) vanishes after the second inequality as the factor 1+ε\sqrt{1+\varepsilon} from BεB_{\varepsilon} is replaced by (1+ε)(1+\varepsilon). Analogously we get

e(S)t0BεMh+μ(S~)2(1+ε)g𝒱(h)2Mt0hO(lnM)μ(S~)2(1+ε)g(h)Mh.\begin{split}e(S)&\geq t_{0}-{B}_{\varepsilon}\frac{M}{\sqrt{h}}+\mu(\tilde{S})^{2}-\frac{(1+\varepsilon)g_{\mathscr{V}}(h)}{2}\frac{M-t_{0}}{\sqrt{h}}-O(\ln{M})\\ &\geq\mu(\tilde{S})^{2}-(1+\varepsilon)g_{\mathscr{E}}(h)\frac{M}{\sqrt{h}}.\end{split}

This means that for sufficiently large nn the event 𝒱\mathscr{E}\cap\mathscr{V} implies the event \mathscr{H} thus [𝒱]=1o(1)\mathbb{P}[\mathscr{E}\cap\mathscr{V}]=1-o(1) implies []=1o(1)\mathbb{P}[\mathscr{H}]=1-o(1) and the proof is finished. ∎

Mimicking the reasoning from the proofs of Lemma 24, Lemma 25 and Theorem 9 one can prove Theorem 10. We leave it without the proof details as this would be very repetitive.

6. Modularity of 𝑮𝒏𝒉G_{n}^{h} vanishes with 𝒉h

Recall that the main result of the paper (Theorem 5) states that the modularity of a preferential attachment graph GnhG_{n}^{h} is with high probability upper bounded by a function tending to 0 with hh tending to infinity. The whole current section is devoted to its proof.

The first step in the proof of Theorem 5 follows from an interesting general result on modularity by Dinh and Thai.

Lemma 26 ([10], Lemma 1).

Let GG be a graph with at least one edge and let k{1}k\in\mathbb{N}\setminus\{1\}. Then

mod(G)kk1max𝒜:|𝒜|kmod𝒜(G).\operatorname{mod}(G)\leq\frac{k}{k-1}\max_{\mathscr{A}:|\mathscr{A}|\leq k}\operatorname{mod}_{\mathscr{A}}(G).

In particular,

mod(G)2max𝒜:|𝒜|2mod𝒜(G).\operatorname{mod}(G)\leq 2\max_{\mathscr{A}:|\mathscr{A}|\leq 2}\operatorname{mod}_{\mathscr{A}}(G).
Corollary 27.

Let G=(V,E)G=(V,E) be a graph with at least one edge. Then

mod(G)max{0,4maxSV(e(S)e(G)vol(S)2vol(G)2)}.\operatorname{mod}(G)\leq\max\left\{0,\quad 4\cdot\max_{S\subseteq V}\left(\frac{e(S)}{e(G)}-\frac{\operatorname{vol}(S)^{2}}{\operatorname{vol}(G)^{2}}\right)\right\}.
Proof.

First, note that for 𝒜\mathscr{A} such that |𝒜|=1|\mathscr{A}|=1 we have mod𝒜(G)=0\operatorname{mod}_{\mathscr{A}}(G)=0. Next, consider 22-element partitions of VV. We have

max𝒜:|𝒜|=2mod𝒜(G)=max𝒜={S,VS}(e(S)e(G)vol(S)2vol(G)2+e(VS)e(G)vol(VS)2vol(G)2)2maxSV(e(S)e(G)vol(S)2vol(G)2).\begin{split}\max_{\mathscr{A}:|\mathscr{A}|=2}\operatorname{mod}_{\mathscr{A}}(G)&=\max_{\mathscr{A}=\{S,V\setminus S\}}\left(\frac{e(S)}{e(G)}-\frac{\operatorname{vol}(S)^{2}}{\operatorname{vol}(G)^{2}}+\frac{e(V\setminus S)}{e(G)}-\frac{\operatorname{vol}(V\setminus S)^{2}}{\operatorname{vol}(G)^{2}}\right)\\ &\leq 2\cdot\max_{S\subseteq V}\left(\frac{e(S)}{e(G)}-\frac{\operatorname{vol}(S)^{2}}{\operatorname{vol}(G)^{2}}\right).\end{split}

The conclusion follows by Lemma 26. ∎

The above corollary frees us from considering all the partitions of VV when analyzing modularity of GnhG_{n}^{h}. We can simply concentrate on upper bounding the values of (e(S)e(Gnh)vol(S)2vol(Gnh)2)\left(\frac{e(S)}{e(G_{n}^{h})}-\frac{\operatorname{vol}(S)^{2}}{\operatorname{vol}(G_{n}^{h})^{2}}\right) over all SVS\subseteq V. To do so, we use the concentration results for e(S)e(S) and vol(S)\operatorname{vol}(S) obtained in Section 5.

Proof of Theorem 5.

Fix ε>0\varepsilon>0 and let g(h)=g𝒱(h)2+2ln2g_{\mathscr{E}}(h)=\frac{g_{\mathscr{V}}(h)}{2}+\sqrt{2\ln{2}}. Let us define the events \mathscr{H}, \mathscr{E} and 𝒱\mathscr{V} as follows

={mod(Gnh)(1+ε)f(h)h},\mathscr{H}=\left\{\operatorname{mod}(G_{n}^{h})\leq\frac{(1+\varepsilon)f(h)}{\sqrt{h}}\right\},
={SVe(S)μ(S~)2+(1+ε)g(h)Mh},\mathscr{E}=\left\{\forall S\subseteq V\,e(S)\leq\mu(\tilde{S})^{2}+(1+\varepsilon)g_{\mathscr{E}}(h)\frac{M}{\sqrt{h}}\right\},
𝒱={SVvol(S)2Mμ(S~)(1+ε)g𝒱(h)Mh}.\mathscr{V}=\left\{\forall S\subseteq V\,\operatorname{vol}(S)\geq 2\sqrt{M}\mu(\tilde{S})-(1+\varepsilon)g_{\mathscr{V}}(h)\frac{M}{\sqrt{h}}\right\}.

By Theorem 8 and Theorem 9 we know that [𝒱]=1o(1)\mathbb{P}[\mathscr{E}\cap\mathscr{V}]=1-o(1). Thus it is enough to show that the event 𝒱\mathscr{E}\cap\mathscr{V} implies the event \mathscr{H}.

Assume that 𝒱\mathscr{E}\cap\mathscr{V} holds. Recall that e(Gnh)=Me(G_{n}^{h})=M and vol(Gnh)=2M\operatorname{vol}(G_{n}^{h})=2M. For any SVS\subseteq V we may write

e(S)e(Gnh)vol(S)2vol(Gnh)2=4Me(S)vol(S)24M214M2(4Mμ(S~)2+4(1+ε)g(h)M2h(2Mμ(S~)(1+ε)g𝒱(h)Mh)2)=(1+ε)g(h)h+μ(S~)(1+ε)g𝒱(h)Mh(1+ε)2g𝒱(h)24h(1+ε)(g(h)+g𝒱(h)g𝒱(h)2/(4h))h,\begin{split}\frac{e(S)}{e(G_{n}^{h})}&-\frac{\operatorname{vol}(S)^{2}}{\operatorname{vol}(G_{n}^{h})^{2}}=\frac{4Me(S)-\operatorname{vol}(S)^{2}}{4M^{2}}\\ &\leq\frac{1}{4M^{2}}\left(4M\mu(\tilde{S})^{2}+4(1+\varepsilon)g_{\mathscr{E}}(h)\frac{M^{2}}{\sqrt{h}}\right.\\ &\quad\quad\quad\quad\quad\quad\quad\quad\quad\left.-\left(2\sqrt{M}\mu(\tilde{S})-(1+\varepsilon)g_{\mathscr{V}}(h)\frac{M}{\sqrt{h}}\right)^{2}\right)\\ &=\frac{(1+\varepsilon)g_{\mathscr{E}}(h)}{\sqrt{h}}+\frac{\mu(\tilde{S})(1+\varepsilon)g_{\mathscr{V}}(h)}{\sqrt{Mh}}-\frac{(1+\varepsilon)^{2}g_{\mathscr{V}}(h)^{2}}{4h}\\ &\leq\frac{(1+\varepsilon)\left(g_{\mathscr{E}}(h)+g_{\mathscr{V}}(h)-g_{\mathscr{V}}(h)^{2}/(4\sqrt{h})\right)}{\sqrt{h}},\end{split}

where the last inequality follows from Lemma 15 and is valid for sufficiently large nn. By Corollary 27, for sufficiently large nn we obtain

mod(Gnh)4maxSV(e(S)e(Gnh)vol(S)2vol(Gnh)2)(1+ε)(4g(h)+4g𝒱(h)g𝒱(h)2/h)h=(1+ε)f(h)h.\begin{split}\operatorname{mod}{(G_{n}^{h})}&\leq 4\cdot\max_{S\subseteq V}\left(\frac{e(S)}{e(G_{n}^{h})}-\frac{\operatorname{vol}(S)^{2}}{\operatorname{vol}(G_{n}^{h})^{2}}\right)\\ &\leq\frac{(1+\varepsilon)\left(4g_{\mathscr{E}}(h)+4g_{\mathscr{V}}(h)-g_{\mathscr{V}}(h)^{2}/\sqrt{h}\right)}{\sqrt{h}}=\frac{(1+\varepsilon)f(h)}{\sqrt{h}}.\end{split}

We got that for sufficiently large nn the event 𝒱\mathscr{E}\cap\mathscr{V} implies the event \mathscr{H} thus [𝒱]=1o(1)\mathbb{P}[\mathscr{E}\cap\mathscr{V}]=1-o(1) implies []=1o(1)\mathbb{P}[\mathscr{H}]=1-o(1) and the proof is finished. ∎

We finish by proving Corollary 6.

Proof of Corollary 6.

The corollary follows from Theorem 5 by considering f~(h)=6g𝒱(h)+42ln2\tilde{f}(h)=6g_{\mathscr{V}}(h)+4\sqrt{2\ln{2}} instead of f(h)f(h) in the upper bound (note that 32ln23.543\sqrt{2\ln{2}}\leq 3.54, (8/9)ln20.62(8/9)\ln{2}\leq 0.62 and 4ln2+12+42ln219.494\ln{2}+12+4\sqrt{2\ln{2}}\leq 19.49). ∎

7. Concluding remarks

We showed that modularity of a preferential attachment graph GnhG_{n}^{h} is with high probability upper bounded by a function of the order Θ(lnh/h)\Theta(\sqrt{\ln{h}}/\sqrt{h}). This proves Conjecture 3 but means that Conjecture 4, saying that modularity of GnhG_{n}^{h} is with high probability of the order Θ(1/h)\Theta(1/\sqrt{h}), still remains open. Note that the term Θ(1/h)\Theta(1/\sqrt{h}) can also be seen as Θ(1/d¯h)\Theta(1/\sqrt{\bar{d}_{h}}), where d¯h\bar{d}_{h} states for the average vertex degree in GnhG_{n}^{h}. Such behavior of modularity has already been reported in other random graphs. For Gn,rG_{n,r} being a random rr-regular graph it is known that whp mod(Gn,r)=Θ(1/r)\operatorname{mod}(G_{n,r})=\Theta(1/\sqrt{r}) (see [25]). For binomial random graph G(n,p)G(n,p) it is known that mod(G(n,p))=Θ(1/np)\operatorname{mod}(G(n,p))=\Theta(1/\sqrt{np}) when np1np\geq 1 and pp is bounded below 11 (see [26]). These might be premises supporting Conjecture 4.

To the best of our knowledge, this paper gives the first concentration results for vol(S)\operatorname{vol}(S), e(S)e(S) and e(S,VS)e(S,V\setminus S) in GnhG_{n}^{h}, where SS can be an arbitrary subset of VV. The analogous results obtained so far, e.g. in [7], [12] or [29] and [30] always addressed “compact” subsets of vertices, i.e., sets of the form {i,i+1,i+2,,j}\{i,i+1,i+2,\ldots,j\}. (In Lemma 4 of [12] the authors investigate the volume of any set S[t]S\subseteq[t] of size 1kt1\leq k\leq t at time tt but in fact in their proof the volume of SS is upper bounded by the volume of the kk “oldest” vertices in [t][t], i.e., the volume of the set [k][k].) In this paper more accurate analysis was possible thanks to introducing indicator functions δiS~\delta_{i}^{\tilde{S}} in Definition 7. We believe that the proof methods leading to results obtained in Section 5 might serve in the future to get better than known today bounds for edge expansion or conductance of GnhG_{n}^{h}.

References

  • [1] K. Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, 19(3):357–367, 1967.
  • [2] J.P. Bagrow. Communities and bottlenecks: Trees and treelike networks have high modularity. Physical Review E, 85:066118, 2012.
  • [3] A.L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.
  • [4] P. Bennett and A. Dudek. A gentle introduction to the differential equation method and dynamic concentration. Discrete Mathematics, 345(12):113071, 2022.
  • [5] V.D. Blondel, J.L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008.
  • [6] B. Bollobás and O. Riordan. Handbook of Graphs and Networks: From the Genome to the Internet. Wiley-VCH, 2003. Pages 1–34.
  • [7] B. Bollobás and O. Riordan. The diameter of a scale-free random graph. Combinatorica, 24:5–34, 2004.
  • [8] B. Bollobás, O. Riordan, J. Spencer, and G. Tusnády. The degree sequence of a scale‐free random graph process. Random Structures & Algorithms, 18(3):279–290, April 2001.
  • [9] J. Chellig, N. Fountoulakis, and F. Skerman. The modularity of random graphs on the hyperbolic plane. Journal of Complex Networks, 10(1):cnab051, 2021.
  • [10] T.N. Dinh and M.T. Thai. Finding community structure with performance guarantees in scale-free networks. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, pages 888–891, 2011.
  • [11] D. Freedman. On tail probabilities for martingales. The Annals of Probability, 3(1):100–118, 1975.
  • [12] A. Frieze, X. Pérez-Giménez, P. Prałat, and B. Reiniger. Perfect matchings and hamiltonian cycles in the preferential attachment model. Random Structures & Algorithms, 54(2):258–288, 2019.
  • [13] G. Gilad and R. Sharan. From Leiden to Tel-Aviv University (TAU): exploring clustering solutions via a genetic algorithm. PNAS Nexus, 2(6):pgad180, 2023.
  • [14] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963.
  • [15] R. van der Hofstad. Random Graphs and Complex Networks. Vol.1. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2017.
  • [16] S. Janson, T. Łuczak, and A. Ruciński. Random Graphs. John Wiley & Sons, Inc., 2000.
  • [17] R. P. Boas Jr. and J. W. Wrench Jr. Partial sums of the harmonic series. The American Mathematical Monthly, 78(8):864–870, 1971.
  • [18] B. Kamiński, V. Poulin, P. Prałat, P. Szufel, and F. Théberge. Clustering via hypergraph modularity. Plos One, 14:e0224307, Feb 2019.
  • [19] B. Kamiński, P. Prałat, and F.Théberge. Mining Complex Networks. Chapman and Hall/CRC, 2021.
  • [20] M. Lasoń and M. Sulkowska. Modularity of minor-free graphs. Journal of Graph Theory, 102(4):728–736, 2023.
  • [21] L. Lichev and D. Mitsche. On the modularity of 3-regular random graphs and random graphs with given degree sequences. Random Structures & Algorithms, 61(4):754–802, 2022.
  • [22] Hosam M. Mahmoud, R. T. Smythe, and J. Szymański. On the structure of random plane-oriented recursive trees and their branches. Random Structures & Algorithms, 4(2):151–176, 1993.
  • [23] C. McDiarmid, K. Rybarczyk, F. Skerman, and M. Sulkowska. Expansion and modularity in preferential attachment graph, 2025. Preprint.
  • [24] C. McDiarmid and F. Skerman. Modularity in random regular graphs and lattices. Electronic Notes in Discrete Mathematics, 43:431–437, 2013.
  • [25] C. McDiarmid and F. Skerman. Modularity of regular and treelike graphs. Journal of Complex Networks, 6(4):596–619, 2018.
  • [26] C. McDiarmid and F. Skerman. Modularity of Erdős-Rényi random graphs. Random Structures & Algorithms, 57(1):211–243, 2020.
  • [27] M.E.J. Newman. Networks: An introduction. Oxford University Press, 2010.
  • [28] M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69(2):026113, 2004.
  • [29] L. Prokhorenkova, P. Prałat, and A.M. Raigorodskii. Modularity of complex networks models. In A. Bonato, F. Chung Graham, and P. Prałat, editors, Algorithms and Models for the Web Graph - 13th International Workshop, WAW 2016, Montreal, QC, Canada, December 14-15, 2016, Proceedings, volume 10088 of Lecture Notes in Computer Science, pages 115–126, 2016.
  • [30] L. Prokhorenkova, P. Prałat, and A.M. Raigorodskii. Modularity in several random graph models. Electronic Notes in Discrete Mathematics, 61:947–953, 2017.
  • [31] H. Robbins. A remark on Stirling’s formula. The American Mathematical Monthly, 62(1):26–29, 1955.
  • [32] J. Szymański. On a nonuniform random recursive tree. In A. Barlotti, M. Biliotti, A. Cossu, G. Korchmaros, and G. Tallini, editors, Annals of Discrete Mathematics (33), volume 144 of North-Holland Mathematics Studies, pages 297–306. North-Holland, 1987.
  • [33] V.A. Traag, L. Waltman, and N.J. van Eck. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports, 9(5233), 2019.
  • [34] L. Warnke. On the method of typical bounded differences. Combinatorics, Probability & Computing, 25:269–299, 2022.