Modularity and partially observed graphs.
Abstract
Suppose that there is an unknown underlying graph on a large vertex set, and we can test only a proportion of the possible edges to check whether they are present in . If has high modularity, is the observed graph likely to have high modularity? We see that this is indeed the case under a mild condition, in a natural model where we test edges at random. We find that with probability at least , as long as the expected number edges in is large enough. Similarly, with probability at least , under the stronger condition that the expected average degree in is large enough. Further, under this stronger condition, finding a good partition for helps us to find a good partition for .
We also consider the vertex sampling model for partially observing the underlying graph: we find that for dense underlying graphs we may estimate the modularity by sampling constantly many vertices and observing the corresponding induced subgraph, but this does not hold for underlying graphs with a subquadratic number of edges. Finally we deduce some related results, for example showing that under-sampling tends to lead to overestimation of modularity.
1 Introduction and main results
The modularity of a graph is a measure of the extent to which the graph breaks into separate communities. For a given graph , each partition of the vertices has a modularity score , with higher values indicating that the partition better captures community structure in . The modularity of the graph is defined to be the maximum over all vertex partitions of the modularity score, and satisfies . See Section 1.1 for definitions and some background.
Suppose that there is an unknown underlying graph on a large given vertex set, and we can test only a small proportion of the possible edges to check whether they are present in , or perhaps we can test many possible edges but there is a chance that we fail to spot an edge. We assume that there are no false positives, where there is no edge in but we think there is one, except briefly in the concluding remarks. We consider two questions. (a) If has high modularity, is the observed graph likely to have high modularity? In other words, if has low modularity can we assert with some confidence that has low modularity? (b) Conversely, if has low modularity, is likely to have low modularity? We find that, in a natural model where we test possible edges independently at random the answer to (a) is yes, under the condition that the expected number of edges found is large enough; and the answer to (b) is yes, under the stronger condition that the expected average degree in is large enough.
To investigate these questions we use the random graph models and described in subsections 1.2 and 1.3 below. First we recall the definition of modularity.
1.1 Modularity : definition and notation
Given a graph , modularity gives a score to each partition of the vertex set ; and the (maximum) modularity of is the maximum of these scores over all vertex partitions. Let be the indicator that is an edge. For a set of vertices, let denote the number of edges within , and let denote the sum of the degree over the vertices in .
Definition ([36], see also [35]).
Let be a graph with edges. For a vertex partition of , the modularity score of on is
The modularity of is , where the maximum is over all vertex partitions . (For an empty graph (i.e. with no edges, ): we set for each vertex partition , and .)
Directly from the definition we have for all graphs. Note isolated vertices are irrelevant - they are not counted in the formula for the modularity score. The second expression for expresses it as the difference of the edge contribution or coverage , and the degree tax .
1.2 Modularity of the random graph obtained by edge-sampling
Given a graph and , let be the random subgraph of on the same vertex set obtained by considering each edge of independently and keeping it in the graph with probability (and otherwise deleting it). Thus the binomial or Erdős-Rényi random graph may be written as , where is the -vertex complete graph. Let denote the number of edges in a graph : thus the expected number of edges in is . The first of our two theorems concerns when we want to have with probability near 1.
Theorem 1.1.
Given there exists such that the following holds. For each graph and probability such that , the random graph satisfies with probability .
The proof of Theorem 1.1 (in Section 4) will show that we may take . Following the proof, we give an example which shows that must be at least . See Figure 1.1 on page 1.1 for simulations illustrating Theorem 1.1 (and Theorem 1.2), and see Figure A.1 on page A.1 for simulations run on a larger underlying graph.
The second of our theorems concerns when we want also to have with probability near 1, and it shows that this will happen if the expected average degree is sufficiently large. We see also that, in this case, finding a good partition for helps us to find a good partition for . Observe that the expected average degree in is , where is the number of vertices in .
Theorem 1.2.
For each , there is a such that the following holds. Let the graph and probability satisfy . Then, with probability the following statements (a) and (b) hold:
-
(a)
the random graph satisfies ; and
-
(b)
given any partition of the vertex set, in a linear number of operations (seeing only ) the greedy amalgamating algorithm finds a partition with .
Part (b) says roughly that, given a good partition for , we can quickly construct a good partition for . Using also part (a), we may see that, if the partition for satisfies with probability at least , then the partition satisfies with probability at least . See Section 3 for the greedy amalgamating algorithm used to construct . The proof of Theorem 1.2 (in Section 5) will show that we may take . Following the proof, we give an example which shows that must be at least .
The assumption in Theorem 1.2 that the expected average degree is large is of course much stronger than the assumption in Theorem 1.1, but still may be much smaller than . If we go much further, and assume that at most an -proportion of edges are missed, that is , then deterministically we have by [32], see Section 6.1.
1.3 Modularity of the random graph obtained by limited search
Let be the graph on vertex set with edge set a uniformly random -edge subset of . The graph may also be sampled by a randomised procedure to find edges of , where we stop once we find ‘enough’ edges: we are given , there is an unknown graph on , and we query possible edges of , uniformly at random, until we find edges.
Corollary 1.3.
Given there exists such that the following holds. For all , the random graph satisfies with probability .
Corollary 1.4.
Given there exists such that the following holds. For all , the random graph satisfies the conclusions of Theorem 1.2 when is replaced by .
These corollaries follow easily from Theorems 1.1 and 1.2 respectively using Lemma 6.4, which shows that and (and similarly for ) are whp close in value for . The lemma will be proved in Section 6.2 (the lemma also shows that these quantities are close in expectation, which will be used in Section 9).
In the lower part of the figure we examine, for each random instance of , how well the modularity maximising partition of performs as a partition on the underlying graph . For each sampled graph we plot the score , where is the highest scoring partition on found in 200 runs of Louvain and Leiden and is the partition modified as in Theorem 1.2(b) (with in Lemma 3.1). See also Figure A.1 for simulations run on a larger underlying graph.
1.4 Estimating modularity by vertex sampling (‘parameter estimation’)
Another commonly considered model of partially observing an underlying graph is to sample a vertex subset of constant size and observe the corresponding induced subgraph . In Section 7 and Theorem 1.5 we consider this model. We find that for dense graphs , with probability at least we may estimate the modularity to within an error (that is ) by sampling a constant number of vertices ; but for graphs with a subquadratic number of edges this result does not hold, modularity is not estimable.
Theorem 1.5.
(a) For fixed with , modularity is estimable for graphs with density at least . (b) For any given function , modularity is not estimable for -vertex graphs with density at least .
1.5 Outline of the paper
The plan of the rest of the paper is as follows. In Section 2, we first show an application of our results to the stochastic block model in Section 2.1, then provide an overview of the further results of this paper in Section 2.2 and lastly give some background and the relation of this paper to previous results in Section 2.3.
Sections 3 to 5 give the proofs of the main results of the paper: Section 3 gives a crucial preliminary lemma for the proofs, the ‘fattening lemma’; and Theorem 1.1 and Theorem 1.2 are proven in Section 4 and Section 5 respectively. (Indeed we also prove versions of these results which take into account the number of parts in the partition.)
In Section 6 we give a ‘robustness’ lemma showing that and do not change much if we change a small proportion of edges; then we prove Lemma 6.4 on and , which completes the proof of Corollaries 1.3 and 1.4; and finally we give related results on concentration of modularity. We prove Theorem 1.5 on estimating modularity by vertex sampling in Section 7.
The later sections, Section 8 and 9 contain further related results. In Section 8 we see that under-sampling tends to lead to over-estimation of modularity (using Theorem 1.1). In Section 9 we show that Theorem 1.1 implies results on the expected modularity of random graphs and with constant average degree. In Section 10 we give results analogous to Theorems 1.1 and 1.2 for weighted networks and show an application of these results for the stochastic block model. Finally Section 11 contains a few concluding remarks and open questions.
In the appendix we give simulations run on a larger underlying graph in Section A.
2 Further results and background on existing results
2.1 Bootstrapping to sparser graphs and application to stochastic block model.
Modularity preserves signal a little below the connectivity threshold.
This property sets modularity apart from other measures of community in a network. If is min-cut, normalised min-cut or the spectral gap of the Laplacian of then if is disconnected. Thus, given a connected graph , for to approximate for these measures we must take large enough that is likely connected.
To see that our results can hold below the connectivity threshold, consider (so ) and with a large constant. Then with probability , see for example Theorem 1.3 of [32]. But whp is disconnected – indeed it will have a linear proportion of the vertices and edges inside the giant component, and a linear proportion outside. Furthermore we may take our underlying graph to be disconnected - for example if consists of two disjoint equal sized cliques then it has modularity , and this will again be approximated by for with some large constant .
Stochastic block model, a little below the connectivity threshold .
Let and . There are two versions of the balanced -community stochastic block model, where edges appears independently with probability within blocks and probability between blocks. In the first version, , the vertex set is partitioned deterministically into sets (blocks) of size or ; and in the second, , each vertex independently and uniformly picks a block to join (so the blocks have size about whp).
We first give an example where bounds at some (not too small) edge density are known from spectral results, and a version of Theorem 1.2 allows us to bootstrap these results to sparser models. After that we present Theorem 2.1 concerning the general -community model.
The bootstrapping example involves . Write for the maximum modularity value over all partitions into at most two parts, and note that is at least the modularity score of the planted bipartition. Thus by direct calculation - see for example similar calculations in [21, 32] - if then
(2.1) |
Using existing spectral results from [11, 27] we may deduce that the lower bound in (2.1) is tight for some values of . (We suppress the details here, see Remark 5.3.) Interestingly, one may then use Proposition 5.2 (a version of Theorem 1.2 for -partitions) to bootstrap these results to sparser graphs, where - see Remark 5.3. Note that this includes values of and for which the spectral results do not hold (since for part of the range is disconnected whp).
The tightness of (2.1) tells us that whp the planted partition has asymptotically maximal modularity value over all bipartitions. We now consider the modularity value for the -block model, and see that the planted partition is asymptotically optimal over all partitions.
Theorem 2.1.
Let be an integer, and let and satisfy and as . Then for or
and whp the planted partition has this modularity value.
We shall prove Theorem 2.1 in Section 10, using a deterministic lemma, Lemma 10.6, together with Theorem 10.5, which is a version of Theorem 1.2 with weighted underlying graph. We note that [2, 3] showed that whp the modularity optimal partition will agree with the planted partition except for vertices (for and for some fixed , ). Thus we could also have proven Theorem 2.1 for such via robustness results, see Lemma 6.1, and the likely modularity value of the planted partition.
The recent paper [21] gives explicit bounds on the modularity. We provide a simple stand-alone proof of Theorem 2.1, following [21] in using weighted graphs. Our result extends that in [21] as we reduce the upper bound there by a factor to match the lower bound, and we extend the range of from to . Theorem 2.1 did not appear in the earlier arXiv version [33] of our paper.
2.2 Further results in this paper
Robustness, concentration and closeness of modularity.
For either or modifying a single edge can change the value by at most . This was known for (see §5 of [32]). In Section 6.1 we prove the bound holds also for any given partition , and in Example 6.2 show examples such that the factor 2 is necessary in both cases. These robustness results give the concentration theorem below - see Section 6.
Theorem 2.2.
There is a constant such that for each graph , each partition and each the following holds with . For each
By Theorem 2.2 we have concentration for around as soon as is large. In more detail, if for a (large) constant . But, results on binomial random graphs (Theorem 1.1 (b) of [32]) show that, when is , we need large average expected degree (i.e. large) to avoid being much larger than whp. Thus we get concentration of as soon as is a large constant, however, this may not be concentration around until the average expected degree is a large constant.
Under-sampling and over-estimating modularity.
In Ecological networks each interaction observed reveals that an edge is present in the underlying network, and the effect of sampling effort can be modelled by taking the observed network after varying numbers of observations. It was noted in multiple papers on ecological networks that a lower sampling effort, under-sampling, can lead to overestimating the modularity of the underlying ecological network [45]. Our paper provides some theoretical explanations - see Section 8 for a statement of the result.
Expected modularity of the binomial random graph .
Given a constant , for each we let , where is the binomial (or Erdős-Rényi) random graph with edge-probability . It was conjectured in [32] that for each , tends to a limit as , and it was noted there that in this case the function would be continuous in . From Theorem 1.1 we may deduce that also would be non-increasing in - see Section 9.
Modularity and edge-sampling on weighted networks.
Network data which is of interest to cluster often has weights associated with each edge. Though we have stated the modularity score of a partition for binary edge weights it is simple to take the weight of edges inside each part (instead of the number of edges) and to take the degree of a vertex to be the sum of the weights of the edges incident to (instead of the number of edges). This weighted modularity is often used, and indeed the popular community detection algorithm Louvain can take weighted networks as input [4]. Our Theorems 1.1 and 1.2 have analogs for weighted networks - see Section 10.
2.3 Background on existing results, and our contribution
Modularity : use in community detection.
Modularity was introduced in [36], and gives a score to each vertex partition (i.e. commmunity division) and partitions with higher scores are considered to better capture the communities in the network. It is NP-hard to find a partition with the highest modularity score (i.e. such that ) [7] and community detection algorithms do not do this. However, it is fast to compute the modularity of a particular partition and hence it can be feasible to choose which modifications to make to a partition by picking the candidate partition with the highest modularity score. Louvain [4] and Leiden [44] are examples of this. The algorithms are fast and have had success in recovering ground truth communities on real world networks. However, there no theoretical guarantees for either that the partition found is near optimal, though recently [10] showed that a Louvain-like algorithm recovers the communities in the stochastic block model for a wide parameter range.
Modularity-based clustering algorithms are the most commonly used to detect communities on large network data [24, 23] - see [15] and [39] for surveys. The widespread use in applications makes modularity an important graph parameter to understand theoretically.
Informing applied network theory : privacy.
Sharing network data can lead to privacy concerns - one approach is to share a subsampled graph instead of [41]. A claimed advantage is that retains many properties of the underlying graph and that parameters of may be estimated knowing only and . Examples considered in [41] include vertex degrees and number of triangles.
Our contribution is to determine when the modularity of is well approximated by the modularity of . Furthermore, in part (b) of Theorem 1.2, we see that for large enough we can likely obtain a near optimal partition of the underlying graph whilst seeing only the shared graph . (Here we mean possible in an information theoretic sense - we do not consider the complexity of finding the partition.) Given the shared graph and a near optimal partition of the shared graph, we construct a partition which is likely to be near optimal for the underlying (non-shared) graph .
Robustness and percolation : existing results for random subgraphs of fixed graphs.
Given a graph with property we say that robustly has property if whp has property [43]. A seminal such result is in [22] which showed there is an absolute constant such that for and underlying graph with minimum degree then whp the observed graph is Hamiltonian. This was later strengthened to a hitting time result, i.e. taking edges of the underlying graph in a random order, in [19].
Another property that has been considered is expansion - a measure of the number of edges leaving each vertex set relative to the size of that set. For -regular with good expansion properties the expansion in the giant component of was studied in [37] and [14], see also [13]. Additionally non-planarity of for with a specified minimum degree was studied in [17] and for planar underlying graphs [26] determines the threshold for such that is 3-colourable for every planar .
Our contribution is to consider robustness of modularity : for with at least a large constant number of edges and large enough we get likely lower bounds on and for with average degree at least a large constant we also get likely upper bounds on .
Parameter estimation, vertex sampling and network measures : existing work.
There is a well developed field of parameter estimation which asks for which parameters is it possible to estimate the parameter of an underlying graph given access to an induced subgraph on a random vertex subset of constant size. For an excellent introduction, including the relation to the theory of graph limits see [6, 28]. There are related results in [5] : this paper analysed testability of graph parameters relating to minimising the number of edges between parts, some of which were normalised with respect to the size of the parts but in a different way to modularity, and found some of these parameters not to be testable. Our results complement these by showing that modularity is not testable in general but is testable for dense graphs.
Modularity : existing results for random graphs.
Recall that the modularity value of a graph is always in the interval with higher values taken to indicate a higher extent of community structure. The modularity of the binomial random graph exhibits three phases, see [32]: for sparse graphs (where ) modularity is near 1 whp, for dense graphs (where ) it is near whp, and inbetween (where for constants ) it is bounded away from and whp. Note that for , we have and [7], and hence this result on in the dense regime is recovered by our Theorem 1.2. It is also shown in [32] that, for whp .
Random regular graphs have received recent attention in [25]; and in particular it is shown that the random cubic graph whp has modularity in the interval , improving on earlier results. For large whp [31, 40] - note that this is the same order as for a binomial random graph with expected degree . The random hyperbolic graph whp has modularity asymptotically [8], and so does the preferential attachment tree, though if edges are added at each step as in the model then whp [40]. The modularity of the stochastic block model, and of a degree-corrected version, have been considered in [21, 2, 3, 46], see also Theorem 2.1.
3 The fattening lemma for vertex partitions
Given a graph and , we say that a vertex partition is -fat (or -good) if each part has volume at least . We shall describe a greedy algorithm which, given a graph , and a vertex partition , amalgamates some parts in to yield an -fat partition with modularity score at most a little less than that of . Note that .
Lemma 3.1 (The fattening lemma).
For each non-empty graph and each , there is an -fat partition of such that . Indeed, given any partition of , the greedy amalgamating algorithm uses a linear number of operations and constructs an -fat partition such that .
For comparison, note the neat result of Dinh and Thai [12] that, for each graph and positive integer , we have
(3.1) |
where is the maximum modularity score over partitions with at most parts. Observe that if is an -fat partition for and , then . However note neither approximation result implies the other. The constant 2 in Lemma 3.1 is best possible, as shown by the following example.
Example 3.2.
Fix . Let the odd integer be sufficiently large that
so there exists such that and . Let the graph consist of disjoint triangles. Thus has vertices and . Since the only -fat partition for is the trivial partition , with modularity score 0. Also (achieved with the connected components partition), see Proposition 1.3 of [31]. Thus
so in Lemma 3.1 we cannot replace 2 by .
To prove Lemma 3.1, it of course suffices to prove the second part. The greedy amalgamating algorithm to yield a good -fat partition is essentially a greedy algorithm for numbers, and we describe that first. The main step in the number greedy algorithm involves bipartitions, and the following well known problem. Before stating the problem, let us note some standard easy inequalities which are useful for the number (bi-) partitioning problem and for considering degree tax. Given with , we have
(3.2) |
The number (bi-) partitioning problem
Given a positive integer and a positive -vector with , let
Thus . Determining is the number partitioning problem, one of Garey and Johnson’s six basic NP-hard problems. It is well known that if each then . (To see this, observe that as we increase the last partial sum which is at most is at least . This result also follows immediately from (3.4) below.)
We might expect that is large when is small, that is when is large. We would like to find the largest constant such that always . We must have : for example if is odd and each , then . We shall show that is the right answer for , that is, we always have ; and further there is a simple greedy bi-partitioning algorithm which achieves this.
Assume that the elements are ordered so that . We have two bins, and we add the elements one by one to a smaller of the bins. In detail, the greedy bi-partitioning algorithm is as follows. Initially set . For , if then insert into , else insert into . At the end, let . The algorithm clearly uses at most comparisons and additions.
Lemma 3.3.
(3.3) |
Proof of Lemma 3.3.
When the algorithm has finished, let be the total of the values in the bin containing ; and let , the total in the other bin. We shall use induction on . The result is trivial if , since both sides of (3.3) are 0. Let and assume that the result holds for all inputs of length . We shall consider two cases.
- (a)
-
(b)
Suppose that , so . Let and , so . Let where (and note that ). By the induction hypothesis,
But . Hence
Hence, again (3.3) holds.
Now we have established the induction step, and thus proven the lemma.∎
Next we consider partitioning numbers into several parts. Let and let with each and . Let and let be a partition of . Let for each . The corresponding cost for and is
Observe that . Given , we say that the partition is -fat if for each . Observe that the trivial partition is always -fat.
The number greedy partitioning algorithm starts with the trivial partition. While there is a part such that, setting and we have , it picks the first such part and uses the greedy number bi-partitioning algorithm to split into two parts each with sum at least .
Lemma 3.4.
Let and let with each and ; and let . Then the number greedy partitioning algorithm finds an -fat partition of with , using operations.
Proof.
Let the final partition obtained by the greedy amalgamating algorithm be (for some ). Fix . Let . Let denote the cost of the trivial partition of with , so . By Lemma 3.3, . But since the algorithm stopped with , we have . Hence for each , and so
as required. ∎
Finally, let us deduce the fattening lemma, Lemma 3.1, from Lemma 3.4. The greedy amalgamating algorithm which we apply to the partition of is essentially the number greedy partitioning algorithm applied to the partition of into singletons , where the weight of is .
Proof of Lemma 3.1.
4 Modularity of with large expected number of edges
4.1 Proof of Theorem 1.1
We shall use a tail bound for random variables with a binomial or similar distribution. We use a variant of the Chernoff bounds, which follows for example from Theorems 2.1 and 2.8 of [18] by considering . In this and the next section we shall always have or : we will need general in Section 10.
Lemma 4.1.
Let , and let random variables be independent, with for each . Let and . Then
or equivalently
Proof of Theorem 1.1.
First note that we may assume that and . Let . By Lemma 3.1 there is an -fat partition for such that (thus by our choice of ). Observe that the number of parts in is at most , and thus by (3.2). To prove Theorem 1.1 we consider first the edge contribution and then the degree tax. Let . By Lemma 4.1 (with ), for
(4.1) |
For a graph on , let denote the number of ‘internal’ edges of within the parts of , so that . Then by Lemma 4.1 as above, for
But
so . Hence, for , with probability at least
(4.2) |
Also, by (4.1), for we have with probability at least . Hence by (4.2), for , with probability at least
(4.3) |
The degree tax is only a little more complicated. Let , so . By Lemma 4.1 with , for , with probability at least we have
But, since ,
and so, for , with probability at least we have
Recall that has parts and thus for , with probability at least
Also, by (4.1), for , with probability at least we have and so . Hence, for , with probability at least
(4.4) |
Now, for , with probability at least , both (4.3) and (4.4) hold. (With a little more care we could replace the factor here by but that would not make a significant difference.) Since , the failure probability here is at most . Choose sufficiently large that this probability is at most ; and note that we may take . In the next paragraph we shall choose , so (4.4) holds for our choice of and any .
Consider the ratios on the right sides of the inequalities (4.3) and (4.4). Choose sufficiently large that for
(4.5) |
and
(4.6) |
Note that the left side in (4.5) is increasing in , and the left side in (4.6) is decreasing in , so we just need the inequalities to hold for . Rearranging, we see the inequality (4.5) is equivalent to
and thus we must have . In fact we may take and this will ensure both (4.5) and (4.6) hold.
Finally, set , so ; and let us check that this value for works. With probability at least both (4.3) and (4.4) hold. Suppose that both (4.3) and (4.4) do hold: if also then , so (4.5) and (4.6) hold; and then
(4.7) |
Hence, if , then with probability at least , as required. This completes the proof of Theorem 1.1. ∎
In the above proof of Theorem 1.1 we took . If we used a more detailed and precise form of Lemma 4.1 - see for example Theorem 21.6 of [16] - we could improve several bounds in the proof but we would not improve the asymptotic estimate for . If we simply used Chebyshev’s inequality we find we can take .
The following example shows that, for the conclusion of Theorem 1.1 to hold, must be at least as .
Example 4.2.
Let be small. Let the integer be sufficiently large that the conclusion of Theorem 1.1 holds. Let and , and assume that . Let be the -edge graph consisting of a star with edges together with isolated edges. Then the connected components partition is the optimal partition for , since each component has modularity 0 [27]. Thus
Now let , and note that . The probability that contains none of the isolated edges is . But any star has modularity 0, so . Thus we must have , and so .
4.2 A -part analogue to Theorem 1.1
Recall that , the maximum value of over all vertex partitions with at most parts. We note here a variant of Theorem 1.1 where we replace each instance of by . (Observe that the inequality (3.1) of Dinh and Thai with large shows that Proposition 4.3 implies Theorem 1.1. A similar connection will hold for Proposition 5.2 and Theorem 1.2.)
Proposition 4.3.
Given there exists such that the following holds. For each graph and probability such that , and for each , the random graph satisfies with probability .
Proof.
The proof is almost immediate following that of Theorem 1.1. As in that proof, first note we may assume that and . Let be a partition with at most parts and such that . Let and then by Lemma 3.1 there is an -fat partition such that and such that is a refinement of . Note in particular this means that and so has at most parts.
5 Modularity of with large expected degree
In this section we first prove Theorem 1.2, and then we give a -part analogue Proposition 5.2 and use it to complete the discussion of the bootstrapping example from Section 2.1.
5.1 Proof of Theorem 1.2
The rough idea of the proof of Theorem 1.2 is that we can use the fattening lemma, Lemma 3.1, to bound the probability that a vertex partition behaves badly by the probability that a fat vertex partition behaves similarly badly, and we can use probabilistic methods to handle fat partitions. However, even after the streamlining provided by Lemma 3.1 the proof is quite intricate and we will need to define a large number of events, many of which are ‘bad events’ parameterised by deviation from the ideal case.
Proof of Theorem 1.2.
Let . We shall choose sufficiently large that certain inequalities below hold. It will suffice to choose for a sufficiently large constant . Let be a (fixed) -vertex graph and let ; and assume that .
We now define the ‘bad’ events , , and (there will be more later!). Let be the event . Let . Let be the event that there is a partition such that , where as in Lemma 3.1. Let be the set of partitions which are -fat for . Let be the event that for some with we have (that is, is not -fat for but is -fat for ). Observe that if some partition is -fat for , then holds. Let be the event that there is a partition such that .
We shall show that
(5.1) |
and
(5.2) |
Note that (5.1) yields . It will follow that the probability that (a) or (b) in the theorem fails is at most . By Theorem 1.1, if is sufficiently large (depending only on ) then . We shall show that, if we choose sufficiently large (depending only on ) then also
(5.3) |
and
(5.4) |
which will complete the proof of Theorem 1.2. Next come the details: the proofs of (5.1)-(5.4). We focus initially on part (a) and the containment (5.1).
We show next that the inequality (5.3) holds. Let us observe first that if for some (large) then ; so we are concerned only with large graphs.
Proof of (5.3).
Let have . Suppose that the edges within are and the edges with exactly one end in are . For let be 2 if is in and 0 otherwise; and for let be 1 if is in and 0 otherwise. Let , and let . Then and . Form by adding further independent random variables with so that . Now by Lemma 4.1 applied to (with )
Let be the (bad) event that for some with . Then, by a union bound, , which is at most if for a sufficiently large constant . By Lemma 4.1 (with ), (if is sufficiently large). Then we have
as required. ∎
We now start to prove that the inequality (5.4) holds. The proof proceeds by considering first degree tax in Claim (5.5), and then edge contribution in Claim (5.7).
Degree tax Let be the event that for some with we have . We claim that
(5.5) |
Proof of Claim (5.5).
Let be the event that . By Lemma 4.1 (with ), (if is sufficiently large). Suppose that holds: then for each partition
Thus
(5.6) |
Edge contribution Let be the event that for each partition we have . We claim that (if is sufficiently large)
(5.7) |
Proof of Claim (5.7).
Each partition has at most parts, so . Let be the event that, for some partition such that , we have . For each given such partition , is stochastically at most a binomial random variable with mean , so by Lemma 4.1 (with ) and a union bound we obtain , if for a sufficiently large constant .
Let be the event that, for some partition such that , we have . For each given such partition , is a binomial random variable with mean at least , so by Lemma 4.1 (with ) and a union bound we obtain , if for a sufficiently large constant .
Now consider any partition . If fails and then ; and so, if also fails, then . Thus if both and fail then holds, so
so the Claim (5.7) holds, as required. ∎
Proof of (5.4).
We have now completed the proofs of (5.1), (5.3) and (5.4); so it remains only to prove the containment (5.2).
Proof of (5.2).
Recall from Lemma 3.1 that , where . Hence, if we let be the event that there is a partition which is -fat for and satisfies then . Let be the event that there is a partition (that is, is -fat for ) which satisfies . Then , by the definition of the event . Also, by our choice of , if holds then there exists such that , i.e. holds; and thus . Summing up, we have ; so (5.2) holds, as required. ∎
In the proof of Theorem 1.2 we took . The following example shows that, for the conclusion (a) in Theorem 1.2 to hold, must be at least .
Example 5.1.
Recall from [32, Theorem 1.3] that there is a constant such that, for each whp . Let and let (so ). Let , let be (so ) and let . Then the expected average degree in is , but whp .
5.2 A -part analogue to Theorem 1.2
As in Section 4.2, recall that . We present here a variant of Theorem 1.2(a) where we replace each instance of by . In Section 2 it was mentioned in the discussion concerning the stochastic block model following (2.1) that the case of this variant will be used in Remark 5.3 to obtain the relevant upper bound.
Proposition 5.2.
For each , there is a such that the following holds. Let the graph and probability satisfy , and let . Then with probability the random graph satisfies .
Proof.
We may follow the proof of Theorem 1.2(a), with a few natural small changes which we detail below. Note that since we do not prove an analogue of Theorem 1.2(b), we need only prove analogues of (5.1), (5.3) and (5.4), i.e. not (5.2).
For partitions with at most parts we define analogues of the events and (the event is unchanged, and no analogue of is needed since we do not prove the analogue of (5.2)). Let be the event . By Theorem 1.2 (b) (with replaced by ) applied to an optimal partition with at most parts (and noting that has at most parts), we have .
As before let . Let be the set of partitions which are -fat for and have at most -parts; and let be the event that there is a partition such that . We must establish the analogues of the statements (5.1), (5.3) and (5.4).
Proof of analogue of (5.1).
Remark 5.3.
We may now complete the discussion of the bootstrapping example from Section 2.1. We give details for how to show that equality holds in (2.1) when . This will follow from spectral results on denser graphs, robustness of modularity and Proposition 5.2.
Recall that and where is the spectral gap of , see Lemma 6.1 of [32]. Now let satisfy the condition . Then a result from [11] says that, for and , we have whp; and thus
(5.9) |
We can use Proposition 5.2 to bootstrap (5.9) to sparser and (and with no condition corresponding to ). Let ; let (arbitrarily slowly) as , with ; and let and . Let us show that still (5.9) holds. Let and , for some constant such that , so whp by (5.9)
Also, whp . Let (so ). Then the sampled random graph satisfies , and whp . Hence by Proposition 5.2 we have ; whp and we have shown that (5.9) holds with the current choice of , as we aimed to do.
6 Robustness of modularity, and closeness and concentration for and
6.1 Robustness
We shall use deterministic robustness results relating to two graph operations, namely moving or deleting a set of edges, and concerning two graph parameters, namely the modularity score for a given partition , and the (maximum) modularity . We first briefly collect some already known results together with one new bound in Lemma 6.1 below. For edge deletion the bound on is new while the bound on follows from Lemma 5.1 in [32]; and for moving edges the bounds follow from Lemma 5.3 and its proof in the same paper. See also Example 6.2 below which gives examples showing the bounds in Lemma 6.1 are asymptotically tight.
Lemma 6.1.
Let be a graph and let be a partition of .
-
(a)
If is a non-empty proper subset of , and , then
-
(b)
If is a set of edges with , and , then
We give examples to show tightness of Lemma 6.1 before proceeding with the proof. See the concluding remarks for a related open problem.
Example 6.2.
The examples here show tightness of the lemma - it is not possible to replace the factors 2 and 1 above by smaller constants in any of the four cases.
For the examples let us first recall some simple properties of modularity. Firstly, the placement of any isolated vertices in a partition is irrelevant. Secondly, for any part in any optimal partition of a graph, the graph induced by the non-isolated vertices in is connected [7]. Thirdly, for any leaf vertex with pendant edge , the vertices and are in the same part in any optimal partition [7, 42].
-
(a)
Let be the graph consisting of a star on edges together with a disjoint edge . Take and thus is a star on edges together with two isolated vertices. We consider the bipartition which in places the vertices of the star in one part and the vertices of the disjoint edge in the other. Note that is an optimal partition of and of . Since is a star plus isolated vertices, , and we may calculate
Noting that , for this choice of and we have
and thus we may get arbitrarily close to the factor 2.
-
(b)
Let be the graph above except that we add an isolated vertex. Thus is the graph consisting of a star with central vertex on edges together with a disjoint edge and an isolated vertex . Let be obtained from by removing and adding the edge : thus is a star on edges together with two isolated vertices.
We consider the bipartition which places the vertices of the star and the isolated vertex together in one part and the disjoint edge in the other part. Then is an optimal partition of and of by the same argument as in part (a), and . Since is a star plus isolated vertices .
Noting that , for this choice of and we have
and thus we may get arbitrarily close to the factor 1.
To prove the inequality concerning in Lemma 6.1(a), we will use Lemma 6.3 bounding the edge contribution minus twice the degree tax.
Lemma 6.3.
Let be a graph on edges and let be a vertex partition. Then
Proof.
For each part , let us write and for the proportion of edges internal to part and proportion between part and the rest of the graph respectively. Note that . We also set for the proportion of edges within parts of the partition. Similarly let , so is the proportion of edges between parts, and note that for each . Also observe that .
Proof of Lemma 6.1(a), bound on .
We begin by following the proof of Lemma 5.1 in [32]. As there, let where is the set of edges in which lie within the parts of and . That is, is the proportion of the edges in which are in and lie within the parts of , and is the proportion which are in and lie between the parts.
Note that the statement we wish to prove follows from the following two inequalities.
(6.2) |
and
(6.3) |
The first of these, (6.2), is equation (5.2) in [32] and was proven there, so it remains only to prove (6.3). Directly from the definitions we may calculate the difference in edge contributions, (this is (5.4) in [32]),
(6.4) |
Note also the following simple bound on the possible increase in the degree tax. Since we have
and so
Thus by (6.4) we have (this is (5.6) in [32])
(6.5) |
Now we may apply Lemma 6.3 to infer that the RHS of (6.5) is at most which proves (6.3). Finally, by 6.2 and (6.3),
as required.∎
6.2 Closeness of and
Given a graph and , how close are and , where ? This question gives rise to Lemma 6.4, which we now state and prove. Note that Corollaries 1.3 and 1.4 then follow by Lemma 6.4 and Theorem 1.1 and 1.2 respectively.
Lemma 6.4.
Let be a graph, let , and let . Then for any partition of
(6.6) |
Also, we can couple and so that, for any partition of , if as then
(6.7) |
Proof.
Let , so with mean and variance . Couple and so that their edge sets are nested: to do this, we may list the edges of in a uniformly random order, let have the first edges and let have the first edges. By Lemma 6.1
(6.8) |
But
where we use (6.8) for the second inequality; and the first inequality in (6.6) follows. The second inequality in (6.6) may be proved in the just same way. Also, by Chebyshev’s inequality, for
So, setting , whp . Hence
and we have proved the first inequality in (6.7). The second inequality may be proved in just the same way. This completes the proof of Lemma 6.4. ∎
6.3 Concentration of and
We restate Theorem 2.2 from the introduction, adding also concentration results for the random edge model. Recall that, given a fixed underlying graph , for the random graph is obtained by uniformly sampling all -edge subsets of .
For the random graphs and , Theorem 7.1 of [32] showed that the modularity values are highly concentrated about their expected value. A very similar result is true, with similar proof, for the case of edge sampling. We use the results on to deduce the results on .
Theorem 6.5.
-
(a)
Given graph , a partition , and , for each
-
(b)
There is a constant such that for each graph , each partition and each the following holds with . For each
The results in Theorem 6.5 concerning and extend Theorem 7.1 of [32]. As well as the robustness results above, in the proof we use Lemma 4.1, and the following concentration result from [30] Theorem 7.4 (see also Example 7.3).
Lemma 6.6.
Let be a finite set, let be an integer such that , and consider the set of all -element subsets of . Suppose that the function satisfies whenever (i.e. the -element subsets and are minimally different). If the random variable is uniformly distributed over , then
Proof of Theorem 6.5 part (a).
Proof of Theorem 6.5 part (b).
Let and let . We will first show the more detailed statement that, for each , we have
(6.9) |
from which the result for in part (b) of the theorem follows. We show the proof in detail for concentration of then indicate the few differences to be made in showing the result for .
For any graph and any partition the modularity score lies in the interval , see [7]; and hence we may assume that . Define the event and let denote the complement of ,
(6.10) | |||
The proof proceeds by bounding separately the terms on the right in (6.10). Firstly, by using part (a) of the theorem and conditioning on where , we have a bound on the first term of (6.10)
The third term is also straight forward to bound. By Lemma 4.1 and since ,
It now remains to bound the second term of (6.10). Let be a random subgraph of obtained by keeping each edge with probability , independently of , and let . By coupling and by Lemma 6.1, for
For any and any random variable with finite mean and thus for ,
(6.11) | |||||
Note and so
since we assumed that and so . By Lemma 4.1
and thus we have a bound for the second term
which completes the proof of (6.9). It is now possible to choose small enough that the inequality holds for all and we may take as the coefficient of the exponential - the details of this calculation appear for example in the last part of the proof of Theorem 7.1 of [32]. This completes the proof of the first half of part (b).
To prove the corresponding result for the same technique may be used; in particular we bound the probability that is large by considering the sum of three terms as in (6.10). The first and second term can then be handled in the same way. For the third term we can get a slightly better bound by noticing that since for any graph we may assume instead of . ∎
7 Estimating modularity by sampling a fixed number of vertices: proof of Theorem 1.5
A graph parameter is estimable (or testable), see [6] and [28], if for every there is a positive integer such that if is a graph with at least vertices, then for a uniformly random k-subset of we have . We shall see here that the modularity value is estimable for dense graphs but not more generally. For convenience we restate Theorem 1.5 from the introduction.
Theorem (restatement of Theorem 1.5).
(a) For fixed with , modularity is estimable for graphs with density at least . (b) For any given function , modularity is not estimable for -vertex graphs with density at least .
We need some definitions. If is a graph and then is the number of ordered pairs such that there is an edge in between and . If and are graphs with the same vertex set , then the cut distance . Given a graph and , we let denote the -blow-up of , where each vertex of is replaced by independent copies of itself; and thus and .
Example 7.1.
. The unique form of an optimal partition of is into one path and one , with modularity . For we can balance the partition, into two copies of , with modularity score .
Proof of Theorem 1.5(a).
We shall use Theorem 15.1 of [28], which says that a graph parameter is estimable if and only if the following three conditions hold.
-
(i)
If and are graphs on the same vertex set and then .
-
(ii)
For every graph , has a limit as .
-
(iii)
if (where denotes the graph obtained from by adding a single isolated vertex).
Observe that always , so we need be concerned here only about conditions (i) and (ii). We shall show that condition (ii) concerning blow-ups always holds. After that, we shall show that condition (i) concerning cut distances holds, as long as the graphs are suitably dense. Finally we give examples for sparse graphs which show that in this case modularity is not estimable.
Blow-ups of a graph : condition (ii)
Recall that is the -blow-up of . Observe that always . For if is an optimal partition for , with parts, then the natural corresponding -part partition for has modularity score . Thus also for every .
Let be a (fixed) graph. We need to show that tends to a limit as . Let be where the sup is over all . We shall see that in fact
(7.1) |
If has no edges then for all ; so we may assume that . Let , let and let satisfy . Then
Hence by a robustness result, Lemma 5.1 of [32], (see also Lemma 6.1 in this paper) we have that ; and so
Let . Let be such that . Then, for all such that , letting be such that (so and thus ) we have
Now (7.1) follows, as required.
Cut distance and modularity : condition (i) for dense graphs
Fix with . A graph is called -dense if . Let . We want to show that there exists such that for all -dense graphs , with the same vertex set and such that we have . With foresight, we shall take .
By Lemma 1 in [12] there is a and a partition for such that . It suffices to show that
(7.2) |
(since then , and we may similarly deduce that ).
To prove (7.2) we first consider the edge contribution then the degree tax . Let .
Edge contribution . Note that , so ; and similarly , so . Thus
The second term here (minus the sum) is at least . Also, since for , the first term is at least . Hence
(7.3) |
Degree tax . Since we have
Thus, using the last inequality if ,
(7.4) |
Also for each . We claim that
(7.5) |
To show this, let . For
Thus
which completes the proof of (7.5). Using (7.5) and then (7.4), we find
since for . Thus
(7.6) |
Putting the results (7.3) on and (7.6) on together we have
by our choice of . Hence (7.2) holds, as required. This completes the proof of Theorem 1.5(a), for dense graphs. ∎ We now give a pair of constructions which will demonstrate that modularity is not estimable.
Example 7.2.
Let and let as , arbitrarily slowly, so that in particular . Then there are connected graphs on vertex set such that ; and as , and , and so . Thus the graphs are ‘nearly dense’ and are close in cut distance , but their modularity values are not close.
For we may let , and let be a collection of disjoint -cliques (together with at most -cliques) joined by edges to form a path. Then so for sufficiently large. Also it is easy to see that the partition of the vertex set into the cliques satisfies .
For we may consider a binomial random graph , which with probability at least is connected, has at least edges, and has modularity at most for a suitable by Theorem 1.1(c) of [32] (or by other results in that paper).
Proof of Theorem 1.5(b).
8 Under-sampling and overestimating modularity
When we sample few edges from a graph it seems that we tend to overestimate its modularity; that is, tends to be significantly larger than . For example, if is the complete graph and , then but in probability as , see Theorem 1.1 of [32]. Our Theorem 1.1 shows that when the expected number of edges observed is large, although we may overestimate modularity we are unlikely to underestimate it by much. In this section we use Theorem 1.1 to prove that when the sampling probability is bounded away from 0, increasing is unlikely to increase overestimation by much.
To state the result precisely we give one definition. For random variables and and , we say that -nearly (stochastically) dominates if
(8.1) |
Observe that if say and take values in and -nearly dominates then , since in this case
We now give the main result of this section.
Proposition 8.1.
Let and . Then there exists such that, for any graph with at least edges and any sampling probabilities with , it holds that -nearly dominates .
The case shows that as in Theorem 1.1, except that now we have the lower bound on the sampling probability (and depends on ).
Proof.
By Theorem 1.1 there exists such that for all graphs and all such that , we have
Let be the set of all graphs with . Then
(8.2) |
Let be such that
Fix a graph on vertex set with ; and note that by the above, for each
(8.3) |
We couple and in the natural way. Let , let and be independent. For each graph on vertex set , let be the graph on with edge set ; and observe that . We have
Hence for every
so -nearly dominates as required. ∎
9 Expected modularity when average degree is constant
The modularity of the Erdős-Rényi (or binomial) random graph is investigated in [32]. Given a constant we let for each . By Theorem 1.1 of that paper, for we have as . Let for each .
Conjecture 9.1 ([32]).
For each , tends to a limit as .
It was noted in that paper that if the conjecture holds then the function would be uniformly continuous for . From Theorem 1.1 (in the present paper) we shall deduce that also would be non-increasing in . We collect results on in the following proposition.
Proposition 9.2.
-
(i)
for , we have as ;
and if Conjecture 9.1 holds then
-
(ii)
for
-
(iii)
as
-
(iv)
is (uniformly) continuous for
-
(v)
is non-increasing for .
All but part of this result comes directly from [32]: part (as we already noted) and part are from Theorem 1.1; part is from Theorem 1.3 and part is from Lemma 7.4. Part will follow immediately from inequality (9.1) below, so to complete the proof of Proposition 9.2 it remains only to prove the following lemma.
Lemma 9.3.
Let . For each , there exists such that for all we have ; and thus
(9.1) |
Proof.
By Theorem 1.1 there exists such that for all graphs and all such that we have
and so
Let . Let be the set of all graphs with . Let be sufficiently large that for each we have . Let . Let and be independent. For each graph on , let be the graph on with edge set ; and observe that . Note also that satisfies . Given a graph , since and are independent,
Hence
since . Thus for each . It follows that
and since this holds for each , the inequality (9.1) follows. ∎
10 Modularity and edge-sampling on weighted networks
In network applications it can be useful to consider graphs in which the edges have weights. Following the notation of [9], let be a non-empty vertex set, and let satisfy for all vertices and . For simplicity, let us assume that for each . We call a weight function on . Let denote the maximum of all the values .
Define the (weighted) degree of a vertex by setting . Similarly, define the (weighted) volume of a vertex set by , and (corresponding to ) let .
Assume that is not identically zero, that is . For a given partition of , define the modularity score of on by
where
Define the modularity of by . As in the unweighted case, , and we may ignore vertices with degree 0. If is identically 0 we set and . If is -valued then and are the usual modularity score and modularity value respectively for the graph corresponding to . The values and are unchanged under re-scaling , so we may assume that for each ; and in this case we call a probability weight function.
Given a weight function on and we define a random weight function by considering each edge independently and keeping unchanged with probability , and otherwise setting it to 0. Also, given a probability weight function , let be the random (unweighted) graph obtained by considering each edge independently, and including edge with probability , and otherwise having no edge . For a special case of see Definition 3.1 in [21]. The model has also been considered in network science applications and is referred to as a probabilistic network (to distinguish it from a binary one - in which all edges have probability either 0 or 1 of appearing) [20, 38].
10.1 Weighted underlying to weighted observed graph
Given a probability weight function on and a probability , we defined the random weight function above. For such weight functions we have results very like Theorems 1.1 and 1.2. Note that the theorems below have conditions on the sum of weights being large - and recall that the modularity score of is unchanged under re-scaling - thus given a general weight function we may best apply the theorems to a re-scaling of by dividing through by (the maximum weight of an edge).
Theorem 10.1.
Given and , there exists such that the following holds. Let , and let the probability weight function on satisfy . Then, with probability , the random weight function satisfies .
Theorem 10.2.
Given , there exists such that the following holds. Let , and let the probability weight function on satisfy . Then, with probability ,
-
(a)
the random weight function satisfies ; and
-
(b)
given any partition of the vertex set, in a linear number of operations (seeing only ) the greedy amalgamating algorithm finds a partition with .
Observe that Theorem 1.1 is the special case of Theorem 10.1 when is -valued, and similarly Theorem 1.2 is a special case of Theorem 10.2. In order to prove Theorems 10.1 and 10.2, we may use almost the same proofs as before. We need a natural minor variant of Lemma 3.1. Given a weight function on , call a set of vertices -fat if , and call a partition of -fat if each part is -fat. The following lemma is very similar to Lemma 3.1 and may be proved in exactly the same way.
Lemma 10.3 (The fattening lemma for weighted graphs).
For each non-zero weight function on , and each , there is an -fat partition of such that . Indeed, given any partition of , using a linear number of operations, by amalgamating parts we can construct an -fat partition such that .
Proof of Theorem 10.1..
The proof is very similar to that of Theorem 1.1 so we indicate just a few key steps where there are differences.
As in that proof we may assume that and , and we set . By the weighted fattening lemma, Lemma 10.3, there exists an -fat partition (where ) such that . Let . Then corresponding to (4.1), for (and noting that for each edge )
Let denote the sum of ‘internal’ edge weights within the parts of . Then corresponding to (4.3), for , with probability at least
For the degree tax, corresponding to (4.4) we find the following. For , with probability at least
Thus for , with probability at least both the last two displayed inequalities hold. The failure probability is at most , and we may thus choose sufficiently large that the probability is at most ; and indeed we may take .
The rest of the proof follows the non-weighted case by making similar minor adaptations.∎
Proof of Theorem 10.2.
As in the last proof, we may follow the proof of the non-weighted version, Theorem 1.2, with the following adaptations. In place of the fattening lemma, use the weighted version, Lemma 10.3; replace instances of and by and respectively. We may still apply Lemma 4.1 with since all edges have weight at most 1. ∎
10.2 Weighted underlying to unweighted observed graph
We will see that the proofs in this subsection follow our proofs of Theorems 1.1 and 1.2 almost line by line, replacing all instances of with and all instances of with .
Theorem 10.4.
There exists such that the following holds. Let the probability weight function on satisfy . Then with probability at least the random graph satisfies .
Theorem 10.5.
Given , there exists such that the following holds. Let the probability weight function on satisfy . Then with probability at least ,
-
(a)
the random graph satisfies ; and
-
(b)
given any partition of the vertex set, in a linear number of operations (seeing only ) the greedy amalgamating algorithm finds a partition with .
Proof of Theorem 10.4.
The proof follows that of the non-weighted case, Theorem 1.1, line by line with the following adaptations. In place of the fattening lemma used on the underlying graph, use the weighted version, Lemma 10.3; and replace instances of and by and respectively. Note that Lemma 4.1 still applies with - more details are given in the proof of Theorem 10.5. ∎
Proof of Theorem 10.5.
As in the proof of Theorem 1.2, let and let for a sufficiently large constant . We again set . Let be a fixed probability weight function on , where ; and assume that .
Then the corresponding events and may all be defined as before, replacing instances of and with and respectively. Note that in the definition of , one constructs partition from seeing only the observed graph , and thus using the usual fattening lemma, Lemma 3.1 rather than the weighted one. Then the proof proceeds by noting is small due to Theorem 10.4 and proving the statements corresponding to (5.1) to (5.4).
Corresponding to the proof of (5.1) only the usual substitutions of and by and are required. For the observation after that a small change is needed. Notice that since we have that and by assumption. Thus as in the earlier proof.
Corresponding to the proof of (5.3) there are a few minor changes. Let have . Suppose the (unordered) pairs of vertices in with are labelled (some vertices may be repeated), and the pairs of vertices with exactly one endpoint in and positive edge weight are labelled . For let be 2 if edge is in and let be 0 otherwise; and for let if edge is in and let be 0 otherwise (so the random variable satisfies if and if , and is non-zero with probability .) Corresponding to the original proof, and , and the rest of the proof corresponding to (5.3) follows in the same manner – note that Lemma 4.1 (with ) still applies.
Corresponding to the proof of Claim (5.7), let be the event that, for some partition such that , we have . We note that is stochastically at most a sum of Bernoulli random variables, such that the mean (of the sum) is so we may apply Lemma 4.1 (with ). The rest of the proof corresponding to Claim (5.7) and indeed the entire proof continues with similar adaptations. ∎
10.3 Application : stochastic block model
In this subsection we show that Theorem 2.1 on the modularity value of the stochastic block model follows quickly from Theorem 10.5 (a weighted version of Theorem 1.2) and the deterministic result Lemma 10.6. For and let
Since rescaling a weight function does not change , for simplicity we set in the following lemma.
Lemma 10.6.
Let . For let be a partition of where . Let and , and let be the weight function on vertex set with if and are in the same block and with otherwise. Then for the planted partition , and .
Theorem 4.2 in the recent paper by Koshelev [21] concerns the same weighted graph as above and gives an upper bound on without the factor , i.e. . The proof in [21] proceeds by calculating the eigenvalues of the weighted adjacency matrix of . The proof below of Lemma 10.6 involves a simple weighted notion, , of ‘per-unit-modularity’ as used in [34, 27].
Proof.
It will be straightforward to show that the planted partition has modularity score as claimed: the main part of the proof is to show that is at most this value.
First we show that we may write the modularity of the weight function as a weighted sum of a function on vertex sets - see (10.1). The proof of the upper bound will proceed by bounding the maximum value of over vertex sets . By definition, for any partition of
where
(10.1) |
Let and let be an -fat partition of . By Lemma 10.3 it will suffice for us to show that ; and since is a weighted average of the values , it will suffice to show that for every with . Fix such a set . Note that , and so also .
Define such that , and note that with . Observe that the weighted complete graph on is approximately regular with weighted degree
for each vertex (where the error term is in ). Thus and . Also
since . Thus
Now, for any , given that
We now show . If some and the other are zero (in other words, if is ) then and . If then . Also, suppose that say where and . Then , and . Thus
(and the upper bound is achieved when is a block ). But, as we noted earlier, is a weighted average of the values for , and so
Hence
and we have the upper bound claimed.
Now take as the planted partition , and note that for each . Now since is a weighted average of the values we see that
and we are done. ∎
Proof of Theorem 2.1..
There is a version of the stochastic block model in which vertices are assigned to blocks independently and uniformly at random. Consider the variant of Theorem 2.1 using this version of the stochastic block model. Note that the part sizes will whp be . Thus by a coupling argument the edge set will differ by edges whp from the original model; and by the robustness result, Lemma 6.1, we find that whp the modularity values and are both as before.
11 Concluding remarks
Three phases of modularity
As mentioned in Section 2.3, the modularity of the binomial random graph exhibits three phases [32]. We restate the result as a sampling one. Recall that [7]. Let and consider . Then in the sparse case (where and ) we have near 1 whp, in the dense case (where ) is near whp, and in between (where for constants ) is bounded away from and whp. The question below asks if we may extend this three phase behaviour from complete graphs to a larger class of underlying graphs. Let us restrict our attention to graphs without isolated vertices.
Question 11.1.
For which classes of graphs do we have parts (i) and (ii) of the following three phase result (where we write for )?
-
(i)
If and then whp.
-
(ii)
There exist and such that if then whp.
-
(iii)
If then whp.
Firstly, observe that part (iii) is implied by Theorem 1.2: thus the open question is for which classes of graphs we have parts (i) and (ii). Secondly, we can only get three genuine phases if is unbounded. We have already noted that the three phase result holds for complete graphs, and by double exposure it must hold also for the random graph where as .
Part (i) is false for complete bipartite graphs for any fixed , since for any subgraph of (since has at most components, ignoring isolated vertices). Similarly, if and are fixed and has components then part (i) is false. Also, for (ii) to hold, we clearly must have bounded away from 1.
Random false positives
In our model, the observed graph has random false negatives, that is, there are are edges in which are non-edges in , but no false positives. Theorem 1.2 shows that the modularity value is robust to random false negatives: so long as the observed graph has high enough expected degree the modularity value of the observed graph is close to the modularity of the underlying graph. (For an underlying graph with edges we may have a chance of not seeing each edge.) However, perhaps random false positives may cause a large increase or decrease to the modularity value, possibly as much as adversarially added false positives.
Suppose that we are given all the edges in , but we may falsely think that some non-edges – the false positives – are also edges of . (We have not allowed false positives until now.) What can we say about modularity? Let us consider graphs with vertices and edges which we see fully and correctly; and suppose that there are false positives randomly chosen from the non-edges of the graph, so the observed graph has edges.
If is much smaller than then (unsurprisingly) we have no difficulties: by Lemma 6.1, if then so is a good estimate of . At the other extreme, if is much larger than then there is little point in using as an estimate for . For by Lemma 6.1, if and we denote by the graph formed just from the false positive edges, then ; so that is essentially determined by , whatever the value of .
The interest is thus in the balanced case, when for some constant . We will see in Example 11.2 that sometimes randomly adding false positives may increase the modularity nearly as much as adversarially choosing edges to add. Part (a) of the robustness lemma, Lemma 6.1, gives the following ‘adversarial’ bound. If the graph has edges, and is any graph obtained from by adding edges, then for any vertex partition ,
(11.1) |
Example 11.2.
Let and suppose that . Let consist of a star on together with isolated vertices; and note that and . Form by adding new edges uniformly at random. Then the difference in modularity values is approximately , matching the adversarial bound (11.1).
To see this we may check that whp consists of the star together with isolated edges (and isolated vertices - which we may ignore). But for such a graph , an optimal partition consists of one part and parts of size 2 corresponding to the isolated edges - since the parts in an optimal partition must induce connected subgraphs, and connected components which themselves have modularity zero must not be split in an optimal partition [27]. (Note that the modularity values of a star and of a single edge are zero [7]). Now, this partition captures all edges and so
Hence whp
which is close to the adversarial (worst case) bound in (11.1) when is small.
However, how much we may decrease the modularity by adding random false positives is open. Indeed, it is open how much we may decrease the modularity by adversarially adding edges: note that Example 6.2 which showed tightness of Lemma 6.1 only showed tightness for how much the modularity may increase by adding edges.
Question 11.3.
Let be a graph with edges, let be obtained from by randomly adding edges, and let be obtained from by adding a set of edges which maximises . What upper bounds do we have for or for ? Can we beat the upper bound significantly?
References
- [1] L. A. Adamic and N. Glance. The political blogosphere and the 2004 US election: divided they blog. In Proceedings of the 3rd international workshop on Link discovery, 2005.
- [2] P. J. Bickel and A. Chen. A nonparametric view of network models and newman–girvan and other modularities. Proceedings of the National Academy of Sciences, 106(50), 2009.
- [3] P. J. Bickel, A. Chen, Y. Zhao, E. Levina, and J. Zhu. Correction to the proof of consistency of community detection. The Annals of Statistics, 2015.
- [4] V. D. Blondel, J. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), 2008.
- [5] M. Bolla, T. Kói, and A. Krámli. Testability of minimum balanced multiway cut densities. Discrete Applied Mathematics, 160(7-8), 2012.
- [6] C. Borgs, J.T. Chayes, L. Lovász, V.T. Sós, and K. Vesztergombi. Convergent sequences of dense graphs I: Subgraph frequencies, metric properties and testing. Advances in Mathematics, 219(6), 2008. doi:10.1016/j.aim.2008.07.008.
- [7] U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski, and D. Wagner. On modularity clustering. Knowledge and Data Engineering, IEEE Transactions on, 20(2), 2008. doi:10.1109/TKDE.2007.190689.
- [8] J. Chellig, N. Fountoulakis, and F. Skerman. The modularity of random graphs on the hyperbolic plane. Journal of Complex Networks, 10(1), 2022.
- [9] F. Chung. Spectral graph theory, volume 92. American Mathematical Soc. Providence, RI, 1997. doi:10.1090/cbms/092.
- [10] V. Cohen-Addad, A. Kosowski, F. Mallmann-Trenn, and D. Saulpic. On the power of louvain in the stochastic block model. Advances in Neural Information Processing Systems, 33, 2020.
- [11] S. Deng, S. Ling, and T. Strohmer. Strong Consistency, Graph Laplacians, and the Stochastic Block Model. The Journal of Machine Learning Research, 22(1), 2021.
- [12] T. N. Dinh and M. T. Thai. Finding community structure with performance guarantees in scale-free networks. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on. IEEE, 2011.
- [13] S. Diskin, J. Erde, M. Kang, and M. Krivelevich. Isoperimetric inequalities and supercritical percolation on high-dimensional product graphs. arXiv preprint arXiv:2304.00016, 2023.
- [14] S. Diskin and M. Krivelevich. Expansion in supercritical random subgraphs of expanders and its consequences. arXiv preprint arXiv:2205.04852, 2022.
- [15] S. Fortunato. Community detection in graphs. Physics Reports, 486(3):75–174, 2010. doi:10.1016/j.physrep.2009.11.002.
- [16] A. Frieze and M. Karoński. Introduction to random graphs. Cambridge University Press, 2015. doi:10.1017/cbo9781316339831.
- [17] A. Frieze and M. Krivelevich. On the non-planarity of a random subgraph. Combinatorics, Probability and Computing, 22(5):722–732, 2013.
- [18] S. Janson, T. Łuczak, and A. Ruciński. Random Graphs, volume 45. John Wiley & Sons, 2011.
- [19] T. Johansson. On hamilton cycles in Erdős-Rényi subgraphs of large graphs. Random Structures & Algorithms, 57(1):132–149, 2020.
- [20] A. Kaveh, M. Magnani, and C. Rohner. Comparing node degrees in probabilistic networks. Journal of Complex Networks, 7(5), 2019.
- [21] M. Koshelev. Modularity in planted partition model. Computational Management Science, 20(1):34, 2023.
- [22] M. Krivelevich, C. Lee, and B. Sudakov. Robust hamiltonicity of dirac graphs. Transactions of the American Mathematical Society, 366(6):3095–3130, 2014.
- [23] R. Lambiotte and M. T. Schaub. Modularity and Dynamics on Complex Networks. Cambridge University Press, 2021.
- [24] A. Lancichinetti and S. Fortunato. Limits of modularity maximization in community detection. Physical Review E, 84(6), 2011. doi:10.1103/physreve.84.066122.
- [25] L. Lichev and D. Mitsche. On the modularity of 3-regular random graphs and random graphs with given degree sequences. Random Structures & Algorithms, 61(4), 2022.
- [26] C. Liu and F. Wei. Phase transition of degeneracy in minor-closed families. Advances in Applied Mathematics, 146, 2023.
- [27] B. Louf, C. McDiarmid, and F. Skerman. Modularity and graph expansion. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.ITCS.2024.78.
- [28] L. Lovász. Large networks and graph limits, volume 60. American Mathematical Soc., 2012.
- [29] D. Lusseau. The emergent properties of a dolphin social network. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270, 2003.
- [30] C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics, volume 141 of London Mathematical Society Lecture Note Series. Cambridge University Press, 1989.
- [31] C. McDiarmid and F. Skerman. Modularity of regular and treelike graphs. Journal of Complex Networks, 6(4), 2018. doi:10.1093/comnet/cnx046.
- [32] C. McDiarmid and F. Skerman. Modularity of Erdős-Rényi random graphs. Random Structures & Algorithms, 57(1), 2020. doi:10.1002/rsa.20910.
- [33] C. McDiarmid and F. Skerman. Modularity and edge sampling. arXiv preprint arXiv:2112.13190v1, 2021.
- [34] K. Meeks and F. Skerman. The parameterised complexity of computing the maximum modularity of a graph. Algorithmica, 82(8), 2020. doi:10.1007/s00453-019-00649-7.
- [35] M. E. J. Newman. Networks: An Introduction. Oxford University Press, 2010. doi:10.1093/acprof:oso/9780199206650.001.0001.
- [36] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69(2):026113, 2004. doi:10.1103/physreve.69.026113.
- [37] E. Ofek. On the expansion of the giant component in percolated (n, d, ) graphs. Combinatorics, Probability and Computing, 16(3):445–457, 2007.
- [38] T. Poisot, A. R. Cirtwill, K. Cazelles, D. Gravel, M.-J. Fortin, and D. B. Stouffer. The structure of probabilistic networks. Methods in Ecology and Evolution, 7(3), 2016.
- [39] M. A. Porter, J. Onnela, and Peter J. Mucha. Communities in networks. Notices of the AMS, 56(9):1082–1097, 2009.
- [40] L. O. Prokhorenkova, A. Raigorodskii, and P. Prałat. Modularity of complex networks models. Internet Mathematics, 2017.
- [41] D. Romanini, S. Lehmann, and M. Kivelä. Privacy and uniqueness of neighborhoods in social networks. Scientific reports, 11(1):1–15, 2021.
- [42] F. Skerman. Modularity of Networks. PhD thesis, University of Oxford, 2016.
- [43] B. Sudakov. Robustness of graph properties. Surveys in Combinatorics 2017, 440:372, 2017.
- [44] V. A. Traag, L. Waltman, and N. J. Van Eck. From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports, 9(1), 2019.
- [45] J. Vizentin-Bugoni, P. K. Maruyama, V. J. Debastiani, L. Duarte, B. Dalsgaard, and M. Sazima. Influences of sampling effort on detected patterns and structuring processes of a neotropical plant–hummingbird network. Journal of Animal Ecology, 85(1), 2016.
- [46] Y. Zhao, E. Levina, and J. Zhu. Consistency of community detection in networks under degree-corrected stochastic block models. The Annals of Statistics, 40(4):2266, 2012.