Modularity of preferential attachment graphs
Abstract.
Modularity is a graph parameter measuring how clearly the set of graph vertices may be partitioned into subsets of high edge density. It indicates the presence of community structure in the graph. We study its value for a random preferential attachment model introduced by Barabási and Albert in 1999. A graph is created from some finite starting graph by adding new vertices one by one. A new vertex always connects to already existing vertices and those are chosen with probability proportional to their current degrees. We prove that modularity of is with high probability upper bounded by a function tending to with tending to infinity. This resolves the conjecture of Prokhorenkova, Prałat and Raigorodskii from 2016.
As a byproduct we obtain novel concentration results for the volume and the edge density parameters of subsets of .
Key words and phrases:
Modularity, preferential attachment model, expansion1. Introduction
Identifying communities in a complex network is a task of a great importance. It has a number of practical applications, varying from identifying the groups of common interests in social networks, through classifying fake news or spam, searching articles on related topic, identifying proteins with the same biological functions, optimizing large technological infrastructures, up to networks visualization [19, 27].
The notion of a graph featuring a community structure may be understood in a variety of ways. Therefore, some measures of the presence of community structure in the network had to be introduced. One of them is modularity proposed by Newman and Girvan in 2004 [27, 28]. The vertices of a graph with high modularity may be partitioned into subsets in which there are much more internal edges than we would expect by chance (see Definition 1). Nowadays, modularity is widely used not only as a quality function judging the performance of community detection algorithms [19], but also as a central ingredient of such algorithms, like in Louvain algorithm [5], Leiden algorithm [33] or Tel-Aviv algorithm [13].
Early theoretical results on modularity were given for trees [2] and regular graphs [24]. For a summary of results for various families of graphs check the appendix of [26] by McDiarmid and Skerman from . More recent discoveries include [9] by Chellig, Fountoulakis and Skerman (for random graphs on the hyperbolic plane), [20] by Lasoń and Sulkowska (for minor-free graphs) or [21] by Lichev and Mitsche (for -regular graphs and graphs with given degree sequence).
Despite being so widely used in practice, modularity still suffers from a narrow theoretical study in the families of random graphs devoted to modeling real-life networks. The first results for the basic and most studied random graph, binomial , were given by McDiarmid and Skerman just in 2020 [26]. It is commonly known that does not reflect well most of real-world networks [19]. So-called preferential attachment models usually perform here much better. The first in this family were random recursive trees, see for example [22, 32]. However, the in-depth study of preferential attachment models was initiated in 1999 by the work of Barabási and Albert [3], who indicated the applications of such graphs in network modeling. The preferential attachment model was subsequently formally defined and analyzed by Bollobás, Riordan, Spencer, and Tusnády [8] and Bolloás and Riordan in [6, 7]. It relies on two mechanisms: growth (the graph is growing over time, gaining a new vertex and a bunch of edges at each time step) and preferential attachment (arriving vertex is more likely to attach to other vertices with high degree rather than with low degree), for a precise definition check Section 2. Its degree distribution as well as diameter often fit in with the ones spotted in reality [15, 27]. Nevertheless, experimental study shows that its modularity, unlike in real networks, is low. Prokhorenkova, Prałat and Raigorodskii opened the preliminary study on modularity of a basic preferential attachment graph in [29, 30]. They obtained non-trivial upper and lower bounds, however, the gap to close remained big. They conjectured that modularity of such graph with high probability tends to with (the number of edges added per step) tending to infinity (see Conjecture 3). In this paper we prove their conjecture, confirming the supposition that a basic preferential attachment model might have too small modularity to mirror well the behavior of real networks. As a byproduct we obtain new interesting concentration results for the volume and the edge density parameters of a given subset of vertices in such graph. These results are interesting in their own right and may have future applications to other problems involving the preferential attachment model.
In the following section we give the formal definition of the preferential attachment model and state the main result. Section 3 is devoted to presenting the results regarding the volume and the edge density parameters of subsets of vertices in . Section 4 is technical, it contains several facts and auxiliary lemmas used in the latter parts of the paper. In Section 5 we derive concentration results stated in Section 3. These results are used in Section 6 to prove the main theorem about vanishing modularity in a basic preferential attachment graph. Section 7 contains concluding remarks.
2. Model and main result
Let denote the set of natural numbers, . For let . For the functions and we write if . We say that an event occurs with high probability (whp) if the probability depends on a certain number and tends to as tends to infinity.
All the graphs considered in this paper are finite, undirected and loops and multiple edges are allowed. Thus a graph is a pair , where is a finite set of vertices and is a finite multiset of elements from with being a set of all -element subsets of . Let and for set and . The degree of a vertex in , denoted by , is the number of edges to which belongs but loops are counted twice, i.e., . We define the volume of in by . By the volume of the graph, , we understand . Whenever the context is clear we write instead of , instead of , instead of and instead of .
We focus on a particular random graph model, called here simply a preferential attachment graph (consult [3, 6, 15]). Given we construct a preferential attachment graph in two phases. In the first phase we sample a particular random tree , its vertices are called mini-vertices. (We call a tree, however loops, i.e. single-vertex edges, are allowed in .) Next, the appropriate mini-vertices of are grouped to form vertices of . Let us describe this procedure in details.
Phase 1. We start the whole process with which is a graph consisting of a single mini-vertex with a single loop (thus degree of vertex is ). For graph is built upon by adding a mini-vertex and joining it by an edge with a mini-vertex according to the following probability distribution:
Note that we allow a newly arrived vertex to connect to itself. We continue the process until we get the random tree .
Phase 2. A random multigraph is obtained from by merging each set of mini-vertices to a single vertex for , keeping loops and multiple edges.
Note that if then , and . Since we will refer very often to the number of edges of , it will be also denoted by , i.e., . Given , by , where , we understand a subgraph of induced by the set of vertices .
Our main goal is to upper bound the graph parameter called modularity for . Its formal definition is given just below.
Definition 1 (Modularity, [28]).
Let be a graph with at least one edge. For a partition of define a modularity score of as
Modularity of is given by
where maximum runs over all the partitions of the set .
Conventionally, a graph with no edges has modularity equal to . A single summand of the modularity score is the difference between the fraction of edges within and the expected fraction of edges within if we considered a certain random multigraph on with the expected degree sequence given by (see, e.g., [18]). It is easy to check that .
Non-trivial lower and upper bounds for modularity of obtained by Prokhorenkova, Prałat and Raigorodskii in [29, 30] are the following.
Theorem 2 ([30], Theorem 4.2, Section 4.2).
Let be a preferential attachment graph. Then whp
and whp
where is the edge expansion of .
Applying the results for edge expansion of by Mihail, Papadimitriou and Saberi from [MiPaSa06_journalv] to the upper bound one obtains that whp . Indeed, the gap between the upper and the lower bound remained big. The authors stated two following conjectures suggesting that the upper bound could be improved.
Conjecture 3 ([29]).
Let be a preferential attachment graph. Then whp .
Conjecture 4 ([29]).
Let be a preferential attachment graph. Then whp .
In this paper we present much better upper bound for modularity of than the one from Theorem 2 when is large, resolving, in the positive, Conjecture 3. Conjecture 4 still remains open. The main result of the paper may be presented as follows.
Theorem 5.
Let be a preferential attachment graph. Then for every whp
where
with
Remark.
Note that as thus as . The value of drops below for .
Corollary 6.
Let be a preferential attachment graph. Then whp
Remark.
The value drops below for .
Remark.
Some new results on the fact that is whp bounded below also for small values of can be found in [23].
3. Volume and edge density
When talking about we will very often refer to its corresponding random tree . Recall that , and . For the corresponding set of its mini-vertices in will be denoted by , thus . For and let , in particular . Analogously, for and set , in particular . Note that for we have and .
When working with modularity we need to have a control over and , where . Those values depend a lot on the arrival times of vertices from . To capture this phenomenon we define a special measure , where stands for the set of all subsets of .
Definition 7 (Measure ).
Let be a preferential attachment graph and its corresponding random tree. Let thus is the set of its corresponding mini-vertices. Associate with the set of indicator functions
where (whenever the context is clear we write instead of ). Define a function as follows:
with for and .
Remark.
Let be a preferential attachment graph, and . Note that
We use the measure to express the following novel concentration results for , and , where is an arbitrary subset of .
Theorem 8.
Let be a preferential attachment graph and its corresponding random tree. Then for every whp
where .
Theorem 9.
Let be a preferential attachment graph and its corresponding random tree. Then for every whp
where
with
Theorem 10.
Let be a preferential attachment graph and its corresponding random tree. Then for every whp
where
To grasp the intuition hidden behind the above concentration results it is helpful to know that there is a relation between the structure of the graph and the structure of a random graph on the vertex set in which every edge (for ) is present with probability , independently of the other possible edges (consult Section 4 in [7] by Bollobás and Riordan).
Let . The number of inner edges of in , , satisfies
while the number of edges between and , , fulfills
Since one also gets
Therefore the measure is constructed in such a way that mimics the behavior of in . We will see later (consult Lemma 14) that the value of is asymptotically close to . Thus in particular for and . Therefore we may expect that in we will get , and .
4. Auxiliary lemmas
The current section gathers all technical lemmas needed in the latter parts of the paper.
The concentration results presented in Section 3 and proved in Section 5 are based on two variants of Azuma-Hoeffding martingale inequality. The first one is basic. We state it as it appears in [16] by Janson, Łuczak and Ruciński.
Lemma 11 (Azuma-Hoeffding inequality, [1, 14]).
If is a martingale and there exist such that for each , then, for every ,
The second one is Freedman’s inequality. We state it below in the form very similar to the one presented in Lemma 22 of [34] by Warnke (one may consult also [4] by Bennett and Dudek).
Lemma 12 (Freedman’s inequality, [11]).
Let be a martingale with respect to a filtration . Set and . Then for every and we have
Next, we present bounds for the values of and an upper bound for the function , both introduced in Definition 7. These results will be referred to very often later on. They are derived with the use of Stirling’s approximation.
Lemma 13 (Stirling’s approximation, [31]).
Let . Then
Lemma 14.
For let . Then
Proof.
Lemma 15.
Let be a preferential attachment graph and its corresponding random tree. Let also and . Then
Proof.
Lemmas 17, 18 and 19 are auxiliary calculations for expressions involving the volumes of the subsets of (however the reader might not notice the connection with volumes at this point). They will be directly used in Section 5 in the proof of Theorem 9 stating the result on the concentration of for . Their proofs utilize the following well known approximation.
Lemma 16 (See [17]).
Let and . Then
where is known as Euler-Mascheroni constant and .
Lemma 17.
Let be a preferential attachment graph and its corresponding random tree. Fix and let be such that . Then
Proof.
Lemma 18.
Let be a preferential attachment graph and its corresponding random tree. Fix and let be such that . Then
Proof.
Lemma 19.
Let be a preferential attachment graph and its corresponding random tree. Fix and let . Then for any constant
Proof.
By the fact that we immediately get
∎
5. Edge density and volume results for
In this section we use martingale techniques to prove theorems 8, 9, and 10 stated in Section 3, i.e., we derive concentration results for , , and for an arbitrary subset in . A series of results will lead us to Corollary 22 which implies, as a special case, Theorem 8. We start with analyzing the volumes.
Lemma 20.
Let be a preferential attachment graph. Consider the process of constructing its corresponding random tree . Fix and for let . Set
(recall that and were introduced in Definition 7). Let be a -algebra associated with all the events that happened till time . Then is a martingale with respect to the filtration . Moreover, for
Remark.
The first term of the martingale , i.e., was inspired by the martingale constructed in Lemma 4 of [12] by Frieze, Prałat, Pérez-Giménez, and Reiniger.
Proof.
Let . Recall that when mini-vertex arrives it may also connect to itself. Therefore, conditioned on ,
(3) |
Additionally, since , we get
thus is a martingale with respect to the filtration . Next, since
Note that . Moreover, the volume of in may be at most thus which implies also . Now use Lemma 14 and the fact that for non-negative to get
By the fact that is a martingale with respect to the filtration , and by (3) we also have
where the last inequalities follow from the fact that and Lemma 14, respectively. ∎
Theorem 21.
Let be a preferential attachment graph and its corresponding random tree. Fix and let . Then for every , for sufficiently large and for sufficiently large we get
where .
(Recall that was introduced in Definition 7).
Proof.
Throughout the proof we refer to the process of constructing the random tree . For let be a -algebra associated with all the events that happened till time . Fix , set and for consider
where and ’s and ’s are as in Definition 7. By Lemma 20 we know that is a martingale with respect to the filtration such that and . Therefore
and, by Lemma 16,
Applying Freedman’s inequality (Lemma 12) to with , and , where , we get
where the last inequality holds for sufficiently large and . One can verify that
thus for sufficiently large and we get
(4) |
Let us now analyze the complementary event . It is equivalent to
which is
By the definition of , Lemma 14 and Lemma 15 we have
(5) |
Note that , therefore by Lemma 14
(6) |
and, again by Lemma 14,
(7) |
Thus by (5), (6) and (7) we get that the event implies
which, by (4), for sufficiently large and , gives (recall that )
(8) |
To get the opposite bound we repeat the reasoning for the martingale . We apply again Freedman’s inequality (Lemma 12) with , and , where (as before) . For sufficiently large and we get
which for sufficiently large and implies
(9) |
Now, the complementary event is equivalent to
This time by the definition of and Lemma 14 we have
(10) |
and by the definition of , Lemma 14 and Lemma 15
(11) |
Hence, by (10), (11) and (7), the event implies
Therefore by (9) for sufficiently large and we get
(12) |
Corollary 22.
Let be a preferential attachment graph and its corresponding random tree. Then for every we have
where .
Proof.
Fix . Recall that . For and define the event as follows
For , by Theorem 21 and the union bound, for sufficiently large we have
Indeed, note that iterates over the vertices of and at time there are possible configurations for , thus also for . Next, again by the union bound, for sufficiently large we get
which implies
∎
Note that considering only in Corollary 22 we get the statement of Theorem 8, which finishes its proof.
Now we will again use martingale inequalities which together with concentration results for volumes from Corollary 22 will lead to the proof of Theorem 9. Consider the process of constructing the random tree . Let and . The result stated in Corollary 22 gives the concentration of the volumes of at time only for , where . In particular, it says nothing about the concentration of the volumes of the sets , where , at time . Such intermediate concentrations will be needed to prove Theorem 9, i.e., to draw conclusion about the concentration of the number of edges within . We derive those intermediate concentrations in Lemma 23.
Lemma 23.
Let be a preferential attachment graph and its corresponding random tree. Then for every we have
where .
Proof.
Fix . Define the events and as follows
By Corollary 22 we know that thus it is enough to show that the event implies the event .
Assume that holds. For and let . Consider all where and let . By the fact that , , Lemma 14 and Lemma 15 we get
Therefore, if holds, then for all where , for all , and for all , for sufficiently large on one hand,
and on the other hand
Thus for sufficiently large the event implies the event , therefore implies and the proof is finished. ∎
Now, we move on to the concentration of the number of edges within subsets of .
Lemma 24.
Let be a preferential attachment graph. Consider the process of constructing its corresponding random tree . Fix and for let and be a -algebra associated with all the events that happened till time . For set and define
Then is a martingale with respect to the filtration . Moreover, for
Proof.
Let . Note that
thus is a martingale with respect to the filtration .
Now, for let . Recall that when the mini-vertex arrives it may also connect to itself thus for we have
Therefore
where we used the fact that , thus and for non-negative .
∎
Lemma 25.
Let be a preferential attachment graph and its corresponding random tree. For and let . Then for divisible by and for every we have
where .
Proof.
Throughout the proof we again refer to the process of constructing the random tree . For let be a -algebra associated with all the events that happened till time . For let and . Fix and for and consider
By Lemma 24 we know that is a martingale with respect to the filtration such that . Moreover
Thus applying Azuma-Hoeffding inequality (Lemma 11) to with and , where we get
(13) |
Let us now analyze the event . It is equivalent to
which is
and, by the definition of (check the proof of Lemma 24),
Thus, by (13), we get
(14) |
Acting analogously for the martingale we get
which implies
(15) |
By (14) and (15), for a fixed we get
(16) |
Now, for , let the event be defined as
By (16) and the union bound we get (recall that and )
∎
We are ready to prove Theorem 9.
Proof of Theorem 9.
Throughout the proof we again refer to the process of constructing the random tree . Fix . For and let and . For we define the events , and as follows
where . By lemmas 23 and 25 we know that . Thus it is enough to show that the event implies the event .
Assume that holds. By lemmas 17, 18, and 19, for any , as , we can write
where the last inequality follows from Lemma 14, the fact that , and the fact that
Analogously, again by lemmas 17, 18, and 19, for any we can get
Note that for any we have and recall that . Thus for all , for sufficiently large we may write (recall that holds)
where the term vanishes after the second inequality as the factor from is replaced by . Analogously we get
This means that for sufficiently large the event implies the event thus implies and the proof is finished. ∎
6. Modularity of vanishes with
Recall that the main result of the paper (Theorem 5) states that the modularity of a preferential attachment graph is with high probability upper bounded by a function tending to with tending to infinity. The whole current section is devoted to its proof.
The first step in the proof of Theorem 5 follows from an interesting general result on modularity by Dinh and Thai.
Lemma 26 ([10], Lemma 1).
Let be a graph with at least one edge and let . Then
In particular,
Corollary 27.
Let be a graph with at least one edge. Then
Proof.
First, note that for such that we have . Next, consider -element partitions of . We have
The conclusion follows by Lemma 26. ∎
The above corollary frees us from considering all the partitions of when analyzing modularity of . We can simply concentrate on upper bounding the values of over all . To do so, we use the concentration results for and obtained in Section 5.
Proof of Theorem 5.
We finish by proving Corollary 6.
7. Concluding remarks
We showed that modularity of a preferential attachment graph is with high probability upper bounded by a function of the order . This proves Conjecture 3 but means that Conjecture 4, saying that modularity of is with high probability of the order , still remains open. Note that the term can also be seen as , where states for the average vertex degree in . Such behavior of modularity has already been reported in other random graphs. For being a random -regular graph it is known that whp (see [25]). For binomial random graph it is known that when and is bounded below (see [26]). These might be premises supporting Conjecture 4.
To the best of our knowledge, this paper gives the first concentration results for , and in , where can be an arbitrary subset of . The analogous results obtained so far, e.g. in [7], [12] or [29] and [30] always addressed “compact” subsets of vertices, i.e., sets of the form . (In Lemma 4 of [12] the authors investigate the volume of any set of size at time but in fact in their proof the volume of is upper bounded by the volume of the “oldest” vertices in , i.e., the volume of the set .) In this paper more accurate analysis was possible thanks to introducing indicator functions in Definition 7. We believe that the proof methods leading to results obtained in Section 5 might serve in the future to get better than known today bounds for edge expansion or conductance of .
References
- [1] K. Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, 19(3):357–367, 1967.
- [2] J.P. Bagrow. Communities and bottlenecks: Trees and treelike networks have high modularity. Physical Review E, 85:066118, 2012.
- [3] A.L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.
- [4] P. Bennett and A. Dudek. A gentle introduction to the differential equation method and dynamic concentration. Discrete Mathematics, 345(12):113071, 2022.
- [5] V.D. Blondel, J.L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008.
- [6] B. Bollobás and O. Riordan. Handbook of Graphs and Networks: From the Genome to the Internet. Wiley-VCH, 2003. Pages 1–34.
- [7] B. Bollobás and O. Riordan. The diameter of a scale-free random graph. Combinatorica, 24:5–34, 2004.
- [8] B. Bollobás, O. Riordan, J. Spencer, and G. Tusnády. The degree sequence of a scale‐free random graph process. Random Structures & Algorithms, 18(3):279–290, April 2001.
- [9] J. Chellig, N. Fountoulakis, and F. Skerman. The modularity of random graphs on the hyperbolic plane. Journal of Complex Networks, 10(1):cnab051, 2021.
- [10] T.N. Dinh and M.T. Thai. Finding community structure with performance guarantees in scale-free networks. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, pages 888–891, 2011.
- [11] D. Freedman. On tail probabilities for martingales. The Annals of Probability, 3(1):100–118, 1975.
- [12] A. Frieze, X. Pérez-Giménez, P. Prałat, and B. Reiniger. Perfect matchings and hamiltonian cycles in the preferential attachment model. Random Structures & Algorithms, 54(2):258–288, 2019.
- [13] G. Gilad and R. Sharan. From Leiden to Tel-Aviv University (TAU): exploring clustering solutions via a genetic algorithm. PNAS Nexus, 2(6):pgad180, 2023.
- [14] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963.
- [15] R. van der Hofstad. Random Graphs and Complex Networks. Vol.1. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2017.
- [16] S. Janson, T. Łuczak, and A. Ruciński. Random Graphs. John Wiley & Sons, Inc., 2000.
- [17] R. P. Boas Jr. and J. W. Wrench Jr. Partial sums of the harmonic series. The American Mathematical Monthly, 78(8):864–870, 1971.
- [18] B. Kamiński, V. Poulin, P. Prałat, P. Szufel, and F. Théberge. Clustering via hypergraph modularity. Plos One, 14:e0224307, Feb 2019.
- [19] B. Kamiński, P. Prałat, and F.Théberge. Mining Complex Networks. Chapman and Hall/CRC, 2021.
- [20] M. Lasoń and M. Sulkowska. Modularity of minor-free graphs. Journal of Graph Theory, 102(4):728–736, 2023.
- [21] L. Lichev and D. Mitsche. On the modularity of 3-regular random graphs and random graphs with given degree sequences. Random Structures & Algorithms, 61(4):754–802, 2022.
- [22] Hosam M. Mahmoud, R. T. Smythe, and J. Szymański. On the structure of random plane-oriented recursive trees and their branches. Random Structures & Algorithms, 4(2):151–176, 1993.
- [23] C. McDiarmid, K. Rybarczyk, F. Skerman, and M. Sulkowska. Expansion and modularity in preferential attachment graph, 2025. Preprint.
- [24] C. McDiarmid and F. Skerman. Modularity in random regular graphs and lattices. Electronic Notes in Discrete Mathematics, 43:431–437, 2013.
- [25] C. McDiarmid and F. Skerman. Modularity of regular and treelike graphs. Journal of Complex Networks, 6(4):596–619, 2018.
- [26] C. McDiarmid and F. Skerman. Modularity of Erdős-Rényi random graphs. Random Structures & Algorithms, 57(1):211–243, 2020.
- [27] M.E.J. Newman. Networks: An introduction. Oxford University Press, 2010.
- [28] M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69(2):026113, 2004.
- [29] L. Prokhorenkova, P. Prałat, and A.M. Raigorodskii. Modularity of complex networks models. In A. Bonato, F. Chung Graham, and P. Prałat, editors, Algorithms and Models for the Web Graph - 13th International Workshop, WAW 2016, Montreal, QC, Canada, December 14-15, 2016, Proceedings, volume 10088 of Lecture Notes in Computer Science, pages 115–126, 2016.
- [30] L. Prokhorenkova, P. Prałat, and A.M. Raigorodskii. Modularity in several random graph models. Electronic Notes in Discrete Mathematics, 61:947–953, 2017.
- [31] H. Robbins. A remark on Stirling’s formula. The American Mathematical Monthly, 62(1):26–29, 1955.
- [32] J. Szymański. On a nonuniform random recursive tree. In A. Barlotti, M. Biliotti, A. Cossu, G. Korchmaros, and G. Tallini, editors, Annals of Discrete Mathematics (33), volume 144 of North-Holland Mathematics Studies, pages 297–306. North-Holland, 1987.
- [33] V.A. Traag, L. Waltman, and N.J. van Eck. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports, 9(5233), 2019.
- [34] L. Warnke. On the method of typical bounded differences. Combinatorics, Probability & Computing, 25:269–299, 2022.