Modularity and partially observed graphs.

Colin McDiarmid Email: [email protected]. Department of Statistics, University of Oxford. Fiona Skerman Email: [email protected]. Partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) and the project AI4Research at Uppsala University. Part of this work was done while visiting the Simons Institute for the Theory of Computing, supported by a Simons-Berkeley Research Fellowship. Department of Mathematics, Uppsala University.

Abstract

Suppose that there is an unknown underlying graph $G$ on a large vertex set, and we can test only a proportion of the possible edges to check whether they are present in $G$ . If $G$ has high modularity, is the observed graph $G^{\prime}$ likely to have high modularity? We see that this is indeed the case under a mild condition, in a natural model where we test edges at random. We find that $q^{*}(G^{\prime})\geq q^{*}(G)-\varepsilon$ with probability at least $1-\varepsilon$ , as long as the expected number edges in $G^{\prime}$ is large enough. Similarly, $q^{*}(G^{\prime})\leq q^{*}(G)+\varepsilon$ with probability at least $1-\varepsilon$ , under the stronger condition that the expected average degree in $G^{\prime}$ is large enough. Further, under this stronger condition, finding a good partition for $G^{\prime}$ helps us to find a good partition for $G$ .

We also consider the vertex sampling model for partially observing the underlying graph: we find that for dense underlying graphs we may estimate the modularity by sampling constantly many vertices and observing the corresponding induced subgraph, but this does not hold for underlying graphs with a subquadratic number of edges. Finally we deduce some related results, for example showing that under-sampling tends to lead to overestimation of modularity.

1 Introduction and main results

The modularity of a graph is a measure of the extent to which the graph breaks into separate communities. For a given graph $G$ , each partition ${\mathcal{A}}$ of the vertices has a modularity score $q_{{\mathcal{A}}}(G)$ , with higher values indicating that the partition better captures community structure in $G$ . The modularity $q^{*}(G)$ of the graph $G$ is defined to be the maximum over all vertex partitions of the modularity score, and satisfies $0\leq q^{*}(G)<1$ . See Section 1.1 for definitions and some background.

Suppose that there is an unknown underlying graph $G$ on a large given vertex set, and we can test only a small proportion of the possible edges to check whether they are present in $G$ , or perhaps we can test many possible edges but there is a chance that we fail to spot an edge. We assume that there are no false positives, where there is no edge in $G$ but we think there is one, except briefly in the concluding remarks. We consider two questions. (a) If $G$ has high modularity, is the observed graph $G^{\prime}$ likely to have high modularity? In other words, if $G^{\prime}$ has low modularity can we assert with some confidence that $G$ has low modularity? (b) Conversely, if $G$ has low modularity, is $G^{\prime}$ likely to have low modularity? We find that, in a natural model where we test possible edges independently at random the answer to (a) is yes, under the condition that the expected number of edges found is large enough; and the answer to (b) is yes, under the stronger condition that the expected average degree in $G^{\prime}$ is large enough.

To investigate these questions we use the random graph models $G_{p}$ and $G_{m}$ described in subsections 1.2 and 1.3 below. First we recall the definition of modularity.

1.1 Modularity : definition and notation

Given a graph $G=(V,E)$ , modularity gives a score to each partition of the vertex set $V$ ; and the (maximum) modularity $q^{*}(G)$ of $G$ is the maximum of these scores over all vertex partitions. Let ${\mathbf{1}}_{uv\in E}$ be the indicator that $uv$ is an edge. For a set $A$ of vertices, let $e(A)$ denote the number of edges within $A$ , and let ${\rm vol}(A)$ denote the sum of the degree $d_{v}$ over the vertices $v$ in $A$ .

Definition ([36], see also [35]).

Let $G$ be a graph with $m\geq 1$ edges. For a vertex partition $\mathcal{A}$ of $G$ , the modularity score of ${\mathcal{A}}$ on $G$ is

\displaystyle q_{\mathcal{A}}(G)=\frac{1}{2m}\sum_{A\in\mathcal{A}}\sum_{u,v\in A}\left({\mathbf{1}}_{uv\in E}-\frac{d_{u}d_{v}}{2m}\right)=\frac{1}{m}\sum_{A\in\mathcal{A}}e(A)-\frac{1}{4m^{2}}\sum_{A\in\mathcal{A}}{\rm vol}(A)^{2}\,.

The modularity of $G$ is $q^{*}(G)=\max_{\mathcal{A}}q_{\mathcal{A}}(G)$ , where the maximum is over all vertex partitions ${\mathcal{A}}$ . (For an empty graph $G$ (i.e. with no edges, $m=0$ ): we set $q_{{\mathcal{A}}}(G)=0$ for each vertex partition ${\mathcal{A}}$ , and $q^{*}(G)=0$ .)

Directly from the definition we have $0\leq q^{*}(G)<1$ for all graphs. Note isolated vertices are irrelevant - they are not counted in the formula for the modularity score. The second expression for $q_{{\mathcal{A}}}(G)$ expresses it as the difference of the edge contribution or coverage $q^{E}_{\mathcal{A}}(G)=\tfrac{1}{m}\sum_{A}e(A)$ , and the degree tax $q^{D}_{\mathcal{A}}(G)=\tfrac{1}{4m^{2}}\sum_{A}{\rm vol}(A)^{2}$ .

1.2 Modularity of the random graph $G_{p}$ obtained by edge-sampling

Given a graph $G$ and $0<p\leq 1$ , let $G_{p}$ be the random subgraph of $G$ on the same vertex set obtained by considering each edge of $G$ independently and keeping it in the graph with probability $p$ (and otherwise deleting it). Thus the binomial or Erdős-Rényi random graph $G_{n,p}$ may be written as $(K_{n})_{p}$ , where $K_{n}$ is the $n$ -vertex complete graph. Let $e(H)$ denote the number of edges in a graph $H$ : thus the expected number of edges in $G_{p}$ is $e(G)p$ . The first of our two theorems concerns when we want to have $q^{*}(G_{p})>q^{*}(G)-\varepsilon$ with probability near 1.

Theorem 1.1.

Given $\varepsilon>0$ there exists $c=c(\varepsilon)$ such that the following holds. For each graph $G$ and probability $p$ such that $e(G)p\geq c$ , the random graph $G_{p}$ satisfies $q^{*}(G_{p})>q^{*}(G)-\varepsilon$ with probability $\geq 1-\varepsilon$ .

The proof of Theorem 1.1 (in Section 4) will show that we may take $c(\varepsilon)=\Theta(\varepsilon^{-3}\log\varepsilon^{-1})$ . Following the proof, we give an example which shows that $c(\varepsilon)$ must be at least $\Omega(\varepsilon^{-1}\log\varepsilon^{-1})$ . See Figure 1.1 on page 1.1 for simulations illustrating Theorem 1.1 (and Theorem 1.2), and see Figure A.1 on page A.1 for simulations run on a larger underlying graph.

The second of our theorems concerns when we want also to have $q^{*}(G_{p})<q^{*}(G)+\varepsilon$ with probability near 1, and it shows that this will happen if the expected average degree is sufficiently large. We see also that, in this case, finding a good partition ${\mathcal{A}}$ for $G_{p}$ helps us to find a good partition ${\mathcal{A}}^{\prime}$ for $G$ . Observe that the expected average degree in $G_{p}$ is $2e(G)p/v(G)$ , where $v(G)$ is the number of vertices in $G$ .

Theorem 1.2.

For each $\varepsilon>0$ , there is a $c=c(\varepsilon)$ such that the following holds. Let the graph $G$ and probability $p$ satisfy $e(G)p/v(G)\geq c$ . Then, with probability $\geq 1-\varepsilon$ the following statements (a) and (b) hold:

(a)

the random graph $G_{p}$ satisfies $|q^{*}(G_{p})-q^{*}(G)|<\varepsilon$ ; and
(b)

given any partition ${\mathcal{A}}$ of the vertex set, in a linear number of operations (seeing only $G_{p}$ ) the greedy amalgamating algorithm finds a partition ${\mathcal{A}}^{\prime}$ with $q_{{\mathcal{A}}^{\prime}}(G)\geq q_{{\mathcal{A}}}(G_{p})-\varepsilon$ .

Part (b) says roughly that, given a good partition ${\mathcal{A}}$ for $G_{p}$ , we can quickly construct a good partition ${\mathcal{A}}^{\prime}$ for $G$ . Using also part (a), we may see that, if the partition ${\mathcal{A}}$ for $G_{p}$ satisfies $q_{\mathcal{A}}(G_{p})\geq q^{*}(G_{p})-\varepsilon$ with probability at least $1-\varepsilon$ , then the partition ${\mathcal{A}}^{\prime}$ satisfies $q_{{\mathcal{A}}^{\prime}}(G)>q^{*}(G)-2\varepsilon$ with probability at least $1-2\varepsilon$ . See Section 3 for the greedy amalgamating algorithm used to construct ${\mathcal{A}}^{\prime}$ . The proof of Theorem 1.2 (in Section 5) will show that we may take $c(\varepsilon)=\Theta(\varepsilon^{-3}\log\varepsilon^{-1})$ . Following the proof, we give an example which shows that $c(\varepsilon)$ must be at least $\Omega(\varepsilon^{-2})$ .

The assumption in Theorem 1.2 that the expected average degree is large is of course much stronger than the assumption in Theorem 1.1, but still $e(G_{p})$ may be much smaller than $e(G)$ . If we go much further, and assume that at most an $\varepsilon$ -proportion of edges are missed, that is $|E\setminus E^{\prime}|\leq\varepsilon|E|$ , then deterministically we have $|q^{*}(G^{\prime})-q^{*}(G)|\leq 2\varepsilon$ by [32], see Section 6.1.

1.3 Modularity of the random graph $G_{m}$ obtained by limited search

Let $G_{m}$ be the graph on vertex set $[n]$ with edge set a uniformly random $m$ -edge subset of $E(G)$ . The graph $G_{m}$ may also be sampled by a randomised procedure to find edges of $G$ , where we stop once we find ‘enough’ edges: we are given $n$ , there is an unknown graph $G$ on $V=[n]$ , and we query possible edges of $G$ , uniformly at random, until we find $m$ edges.

Corollary 1.3.

Given $\varepsilon>0$ there exists $m_{0}=m_{0}(\varepsilon)$ such that the following holds. For all $m\geq m_{0}$ , the random graph $G_{m}$ satisfies $q^{*}(G_{m})>q^{*}(G)-\varepsilon$ with probability $>1-\varepsilon$ .

Corollary 1.4.

Given $\varepsilon>0$ there exists $c=c(\varepsilon)$ such that the following holds. For all $m\geq cn$ , the random graph $G_{m}$ satisfies the conclusions of Theorem 1.2 when $G_{p}$ is replaced by $G_{m}$ .

These corollaries follow easily from Theorems 1.1 and 1.2 respectively using Lemma 6.4, which shows that $q^{*}(G_{p})$ and $q^{*}(G_{m})$ (and similarly for $q_{\mathcal{A}}$ ) are whp close in value for $p=m/e(G)$ . The lemma will be proved in Section 6.2 (the lemma also shows that these quantities are close in expectation, which will be used in Section 9).

Refer to caption — Figure 1.1: Simulation results. The dolphin social network [29] with 62 vertices and 159 edges was taken to be the underlying graph $G$ . It is known that $q^{*}(G)=0.529$ to three decimal places [7]. In the upper part of the figure each red point corresponds to the estimated modularity $\tilde{q}(G_{p})$ of an instance of the sampled graph $G_{p}$ . For each edge probability $p=0.1,0.2,\ldots,0.9$ , the graph $G_{p}$ was sampled 50 times. For each sampled graph $G_{p}$ we took the maximum modularity score of the partitions output by 200 runs of both the Louvain [4] and Leiden [44] algorithms. The noise in the $x$ -axis is to allow one to see the points.
In the lower part of the figure we examine, for each random instance of $G_{p}$ , how well the modularity maximising partition of $G_{p}$ performs as a partition on the underlying graph $G$ . For each sampled graph $G_{p}$ we plot the score $q_{{{\mathcal{A}}}^{\prime}(G_{p})}(G)$ , where ${\mathcal{A}}(G_{p})$ is the highest scoring partition on $G_{p}$ found in 200 runs of Louvain and Leiden and ${\mathcal{A}}^{\prime}(G_{p})$ is the partition modified as in Theorem 1.2(b) (with $\eta=0.05$ in Lemma 3.1). See also Figure A.1 for simulations run on a larger underlying graph.

1.4 Estimating modularity by vertex sampling (‘parameter estimation’)

Another commonly considered model of partially observing an underlying graph $G$ is to sample a vertex subset $U$ of constant size $k$ and observe the corresponding induced subgraph $G[U]$ . In Section 7 and Theorem 1.5 we consider this model. We find that for dense graphs $G$ , with probability at least $1-\varepsilon$ we may estimate the modularity to within an $\varepsilon$ error (that is $|q^{*}(G)-q^{*}(G[U])|<\varepsilon$ ) by sampling a constant $k=k(\varepsilon)$ number of vertices ; but for graphs with a subquadratic number of edges this result does not hold, modularity is not estimable.

Theorem 1.5.

(a) For fixed $\rho$ with $0<\rho<1$ , modularity is estimable for graphs with density at least $\rho$ . (b) For any given function $\rho(n)=o(1)$ , modularity is not estimable for $n$ -vertex graphs with density at least $\rho(n)$ .

1.5 Outline of the paper

The plan of the rest of the paper is as follows. In Section 2, we first show an application of our results to the stochastic block model in Section 2.1, then provide an overview of the further results of this paper in Section 2.2 and lastly give some background and the relation of this paper to previous results in Section 2.3.

Sections 3 to 5 give the proofs of the main results of the paper: Section 3 gives a crucial preliminary lemma for the proofs, the ‘fattening lemma’; and Theorem 1.1 and Theorem 1.2 are proven in Section 4 and Section 5 respectively. (Indeed we also prove versions of these results which take into account the number of parts in the partition.)

In Section 6 we give a ‘robustness’ lemma showing that $q^{*}(G)$ and $q_{\mathcal{A}}(G)$ do not change much if we change a small proportion of edges; then we prove Lemma 6.4 on $G_{p}$ and $G_{m}$ , which completes the proof of Corollaries 1.3 and 1.4; and finally we give related results on concentration of modularity. We prove Theorem 1.5 on estimating modularity by vertex sampling in Section 7.

The later sections, Section 8 and 9 contain further related results. In Section 8 we see that under-sampling tends to lead to over-estimation of modularity (using Theorem 1.1). In Section 9 we show that Theorem 1.1 implies results on the expected modularity of random graphs $G_{p}$ and $G_{m}$ with constant average degree. In Section 10 we give results analogous to Theorems 1.1 and 1.2 for weighted networks and show an application of these results for the stochastic block model. Finally Section 11 contains a few concluding remarks and open questions.

In the appendix we give simulations run on a larger underlying graph in Section A.

2 Further results and background on existing results

2.1 Bootstrapping to sparser graphs and application to stochastic block model.

Modularity preserves signal a little below the connectivity threshold.

This property sets modularity apart from other measures of community in a network. If $s(H)$ is min-cut, normalised min-cut or the spectral gap of the Laplacian of $H$ then $s(H)=0$ if $H$ is disconnected. Thus, given a connected graph $G$ , for $s(G_{p})$ to approximate $s(G)$ for these measures $s$ we must take $p$ large enough that $G_{p}$ is likely connected.

To see that our results can hold below the connectivity threshold, consider $G=K_{n}$ (so $q^{*}(G)=0$ ) and $p=c/n$ with $c$ a large constant. Then $|q^{*}(G_{p})-q^{*}(G)|<\varepsilon$ with probability $1-\varepsilon$ , see for example Theorem 1.3 of [32]. But whp $G_{p}$ is disconnected – indeed it will have a linear proportion of the vertices and edges inside the giant component, and a linear proportion outside. Furthermore we may take our underlying graph to be disconnected - for example if $G$ consists of two disjoint equal sized cliques then it has modularity $1/2$ , and this will again be approximated by $q^{*}(G_{p})$ for $p=c/n$ with some large constant $c$ .

Stochastic block model, a little below the connectivity threshold $\Theta(n^{-1}\log n)$ .

Let $k\geq 2$ and $0\leq q\leq p\leq 1$ . There are two versions of the balanced $k$ -community stochastic block model, where edges appears independently with probability $p$ within blocks and probability $q$ between blocks. In the first version, $G_{n,k,p,q}$ , the vertex set $V=[n]$ is partitioned deterministically into $k$ sets (blocks) $V_{i}$ of size $\lfloor n/k\rfloor$ or $\lceil n/k\rceil$ ; and in the second, $G^{\prime}_{n,k,p,q}$ , each vertex independently and uniformly picks a block to join (so the blocks have size about $n/k$ whp).

We first give an example where bounds at some (not too small) edge density are known from spectral results, and a version of Theorem 1.2 allows us to bootstrap these results to sparser models. After that we present Theorem 2.1 concerning the general $k$ -community model.

The bootstrapping example involves $G_{n,2,p,q}$ . Write $q^{*}_{\leq 2}$ for the maximum modularity value over all partitions into at most two parts, and note that $q^{*}_{\leq 2}$ is at least the modularity score of the planted bipartition. Thus by direct calculation - see for example similar calculations in [21, 32] - if $n^{2}p\rightarrow\infty$ then

q^{*}_{\leq 2}(G_{n,2,p,q})\geq\frac{p-q}{2(p+q)}+o(1)\;\;\;\mbox{whp}.

(2.1)

Using existing spectral results from [11, 27] we may deduce that the lower bound in (2.1) is tight for some values of $p,q=\Theta(n^{-1}\log n)$ . (We suppress the details here, see Remark 5.3.) Interestingly, one may then use Proposition 5.2 (a version of Theorem 1.2 for $k$ -partitions) to bootstrap these results to sparser graphs, where $p,q=\omega(n^{-1})$ - see Remark 5.3. Note that this includes values of $p$ and $q$ for which the spectral results do not hold (since for part of the range $G_{n,2,p,q}$ is disconnected whp).

The tightness of (2.1) tells us that whp the planted partition has asymptotically maximal modularity value over all bipartitions. We now consider the modularity value for the $k$ -block model, and see that the planted partition is asymptotically optimal over all partitions.

Theorem 2.1.

Let $k\geq 2$ be an integer, and let $p=p(n)$ and $q=q(n)$ satisfy $0\leq q\leq p\leq 1$ and $np\to\infty$ as $n\to\infty$ . Then for $G=G_{n,k,p,q}$ or $G=G^{\prime}_{n,k,p,q}$

q^{*}(G)=\frac{(p-q)\,(1-1/k)}{p+(k-1)q}+o(1)\;\;\;\mbox{whp},

and whp the planted partition has this modularity value.

We shall prove Theorem 2.1 in Section 10, using a deterministic lemma, Lemma 10.6, together with Theorem 10.5, which is a version of Theorem 1.2 with weighted underlying graph. We note that [2, 3] showed that whp the modularity optimal partition will agree with the planted partition except for $o(n)$ vertices (for $p=\omega(1/n)$ and $q=\rho p$ for some fixed $\rho$ , $0<\rho<1$ ). Thus we could also have proven Theorem 2.1 for such $p,q$ via robustness results, see Lemma 6.1, and the likely modularity value of the planted partition.

The recent paper [21] gives explicit bounds on the modularity. We provide a simple stand-alone proof of Theorem 2.1, following [21] in using weighted graphs. Our result extends that in [21] as we reduce the upper bound there by a factor $1-1/k$ to match the lower bound, and we extend the range of $p$ from $\omega(n^{-1/2})$ to $\omega(n^{-1})$ . Theorem 2.1 did not appear in the earlier arXiv version [33] of our paper.

2.2 Further results in this paper

Robustness, concentration and closeness of modularity.

For either $q_{\mathcal{A}}(G)$ or $q^{*}(G)$ modifying a single edge can change the value by at most $2/e(G)$ . This was known for $q^{*}(G)$ (see §5 of [32]). In Section 6.1 we prove the $2/e(G)$ bound holds also for any given partition ${\mathcal{A}}$ , and in Example 6.2 show examples such that the factor 2 is necessary in both cases. These robustness results give the concentration theorem below - see Section 6.

Theorem 2.2.

There is a constant $\eta>0$ such that for each graph $G$ , each partition ${\mathcal{A}}={\mathcal{A}}(G)$ and each $0<p<1$ the following holds with $\mu=\mu(G,p)=e(G)p$ . For each $t\geq 0$

{\mathbb{P}}\Big{(}\big{|}\,q_{\mathcal{A}}(G_{p})-{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\,\big{|}\geq t\Big{)}<2\,e^{-\eta\mu t^{2}}\;\mbox{ and }\;{\mathbb{P}}\Big{(}\big{|}\,q^{*}(G_{p})-{\mathbb{E}}[q^{*}(G_{p})]\,\big{|}\geq t\Big{)}<2\,e^{-\eta\mu t^{2}}.

By Theorem 2.2 we have concentration for $q^{*}(G_{p})$ around ${\mathbb{E}}[q^{*}(G_{p})]$ as soon as $e(G)p$ is large. In more detail, ${\mathbb{P}}(|q^{*}(G_{p})-{\mathbb{E}}[q^{*}(G_{p})]|\geq\varepsilon)<\delta$ if $e(G)p>c\varepsilon^{-2}\log\delta^{-1}$ for a (large) constant $c$ . But, results on binomial random graphs (Theorem 1.1 (b) of [32]) show that, when $G$ is $K_{n}$ , we need large average expected degree (i.e. $e(G)p/v(G)$ large) to avoid $q^{*}(G_{p})$ being much larger than $q^{*}(G)$ whp. Thus we get concentration of $q^{*}(G_{p})$ as soon as $e(G)p$ is a large constant, however, this may not be concentration around $q^{*}(G)$ until the average expected degree $e(G)p/v(G)$ is a large constant.

Under-sampling and over-estimating modularity.

In Ecological networks each interaction observed reveals that an edge is present in the underlying network, and the effect of sampling effort can be modelled by taking the observed network after varying numbers of observations. It was noted in multiple papers on ecological networks that a lower sampling effort, under-sampling, can lead to overestimating the modularity of the underlying ecological network [45]. Our paper provides some theoretical explanations - see Section 8 for a statement of the result.

Expected modularity of the binomial random graph $G_{n,c/n}$ .

Given a constant $c>0$ , for each $n\geq c\,$ we let $\bar{q}(n,c)={\mathbb{E}}[q^{*}(G_{n,c/n})]$ , where $G_{n,c/n}$ is the binomial (or Erdős-Rényi) random graph with edge-probability $c/n$ . It was conjectured in [32] that for each $c>0$ , $\bar{q}(n,c)$ tends to a limit $\bar{q}(c)$ as $n\to\infty$ , and it was noted there that in this case the function $\bar{q}(c)$ would be continuous in $c$ . From Theorem 1.1 we may deduce that also $\bar{q}(c)$ would be non-increasing in $c$ - see Section 9.

Modularity and edge-sampling on weighted networks.

Network data which is of interest to cluster often has weights associated with each edge. Though we have stated the modularity score of a partition for binary edge weights it is simple to take the weight of edges inside each part (instead of the number of edges) and to take the degree of a vertex $v$ to be the sum of the weights of the edges incident to $v$ (instead of the number of edges). This weighted modularity is often used, and indeed the popular community detection algorithm Louvain can take weighted networks as input [4]. Our Theorems 1.1 and 1.2 have analogs for weighted networks - see Section 10.

2.3 Background on existing results, and our contribution

Modularity : use in community detection.

Modularity was introduced in [36], and gives a score to each vertex partition (i.e. commmunity division) and partitions with higher scores are considered to better capture the communities in the network. It is NP-hard to find a partition ${\mathcal{A}}$ with the highest modularity score (i.e. such that $q_{\mathcal{A}}(G)=q^{*}(G)$ ) [7] and community detection algorithms do not do this. However, it is fast to compute the modularity of a particular partition and hence it can be feasible to choose which modifications to make to a partition by picking the candidate partition with the highest modularity score. Louvain [4] and Leiden [44] are examples of this. The algorithms are fast and have had success in recovering ground truth communities on real world networks. However, there no theoretical guarantees for either that the partition found is near optimal, though recently [10] showed that a Louvain-like algorithm recovers the communities in the stochastic block model for a wide parameter range.

Modularity-based clustering algorithms are the most commonly used to detect communities on large network data [24, 23] - see [15] and [39] for surveys. The widespread use in applications makes modularity an important graph parameter to understand theoretically.

Informing applied network theory : privacy.

Sharing network data can lead to privacy concerns - one approach is to share a subsampled graph $G_{p}$ instead of $G$ [41]. A claimed advantage is that $G_{p}$ retains many properties of the underlying graph $G$ and that parameters of $G$ may be estimated knowing only $p$ and $G_{p}$ . Examples considered in [41] include vertex degrees and number of triangles.

Our contribution is to determine when the modularity of $G$ is well approximated by the modularity of $G_{p}$ . Furthermore, in part (b) of Theorem 1.2, we see that for $p$ large enough we can likely obtain a near optimal partition of the underlying graph $G$ whilst seeing only the shared graph $G_{p}$ . (Here we mean possible in an information theoretic sense - we do not consider the complexity of finding the partition.) Given the shared graph $G_{p}$ and a near optimal partition ${\mathcal{A}}$ of the shared graph, we construct a partition which is likely to be near optimal for the underlying (non-shared) graph $G$ .

Robustness and percolation : existing results for random subgraphs of fixed graphs.

Given a graph $G$ with property $\mathcal{P}$ we say that $G$ robustly has property $\mathcal{P}$ if whp $G_{p}$ has property $\mathcal{P}$ [43]. A seminal such result is in [22] which showed there is an absolute constant $c$ such that for $p\geq c(\log n)/n$ and underlying graph $G$ with minimum degree $n/2$ then whp the observed graph $G_{p}$ is Hamiltonian. This was later strengthened to a hitting time result, i.e. taking edges of the underlying graph in a random order, in [19].

Another property that has been considered is expansion - a measure of the number of edges leaving each vertex set relative to the size of that set. For $d$ -regular $G$ with good expansion properties the expansion in the giant component of $G_{p}$ was studied in [37] and [14], see also [13]. Additionally non-planarity of $G_{p}$ for $G$ with a specified minimum degree was studied in [17] and for planar underlying graphs [26] determines the threshold for $p$ such that $G_{p}$ is 3-colourable for every planar $G$ .

Our contribution is to consider robustness of modularity : for $G$ with at least a large constant number of edges and $p$ large enough we get likely lower bounds on $q^{*}(G_{p})$ and for $G$ with average degree at least a large constant we also get likely upper bounds on $q^{*}(G_{p})$ .

Parameter estimation, vertex sampling and network measures : existing work.

There is a well developed field of parameter estimation which asks for which parameters is it possible to estimate the parameter of an underlying graph given access to an induced subgraph on a random vertex subset of constant size. For an excellent introduction, including the relation to the theory of graph limits see [6, 28]. There are related results in [5] : this paper analysed testability of graph parameters relating to minimising the number of edges between parts, some of which were normalised with respect to the size of the parts but in a different way to modularity, and found some of these parameters not to be testable. Our results complement these by showing that modularity is not testable in general but is testable for dense graphs.

Modularity : existing results for random graphs.

Recall that the modularity value of a graph is always in the interval $[0,1)$ with higher values taken to indicate a higher extent of community structure. The modularity of the binomial random graph $G_{n,p}$ exhibits three phases, see [32]: for sparse graphs (where $np\leq 1+o(1)$ ) modularity is near 1 whp, for dense graphs (where $np\rightarrow\infty$ ) it is near $0$ whp, and inbetween (where $c_{1}\leq np\leq c_{2}$ for constants $1<c_{1}<c_{2}$ ) it is bounded away from $0$ and $1$ whp. Note that for $G=K_{n}$ , we have $np\sim\frac{1}{2}e(G)p/n$ and $q^{*}(K_{n})=0$ [7], and hence this result on $G_{n,p}$ in the dense regime is recovered by our Theorem 1.2. It is also shown in [32] that, for $1/n<p<0.99$ whp $q^{*}(G_{n,p})=\Theta(1/\sqrt{np})$ .

Random regular graphs have received recent attention in [25]; and in particular it is shown that the random cubic graph $G_{n,3}$ whp has modularity in the interval $[0.667,0.790]$ , improving on earlier results. For large $r$ whp $q^{*}(G_{n,r})=\Theta(1/\sqrt{r})$ [31, 40] - note that this is the same order as for a binomial random graph with expected degree $r$ . The random hyperbolic graph whp has modularity asymptotically $1$ [8], and so does the preferential attachment tree, though if $h\geq 2$ edges are added at each step as in the model $G_{n}^{h}$ then $\Omega(1/\sqrt{h})=q^{*}(G_{n}^{h})<0.94$ whp [40]. The modularity of the stochastic block model, and of a degree-corrected version, have been considered in [21, 2, 3, 46], see also Theorem 2.1.

3 The fattening lemma for vertex partitions

Given a graph $G$ and $0<\eta\leq 1$ , we say that a vertex partition ${\mathcal{A}}$ is $\eta$ -fat (or $\eta$ -good) if each part has volume at least $\eta\,{\rm vol}(G)$ . We shall describe a greedy algorithm which, given a graph $G$ , $0<\eta\leq 1$ and a vertex partition $\mathcal{B}$ , amalgamates some parts in $\mathcal{B}$ to yield an $\eta$ -fat partition ${\mathcal{A}}={\mathcal{A}}(G,\eta,\mathcal{B})$ with modularity score at most a little less than that of $\mathcal{B}$ . Note that $|{\mathcal{A}}|\leq|\mathcal{B}|$ .

Lemma 3.1 (The fattening lemma).

For each non-empty graph $G$ and each $0<\eta\leq 1$ , there is an $\eta$ -fat partition ${\mathcal{A}}$ of $V(G)$ such that $q_{{\mathcal{A}}}(G)>q^{*}(G)-2\eta$ . Indeed, given any partition $\mathcal{B}$ of $V(G)$ , the greedy amalgamating algorithm uses a linear number of operations and constructs an $\eta$ -fat partition ${\mathcal{A}}={\mathcal{A}}(G,\eta,\mathcal{B})$ such that $q_{{\mathcal{A}}}(G)>q_{\mathcal{B}}(G)-2\eta$ .

For comparison, note the neat result of Dinh and Thai [12] that, for each graph $G$ and positive integer $k$ , we have

q^{*}_{\leq k}(G)\geq q^{*}(G)\,(1-1/k),

(3.1)

where $q^{*}_{\leq k}(G)$ is the maximum modularity score over partitions with at most $k$ parts. Observe that if ${\mathcal{A}}$ is an $\eta$ -fat partition for $G$ and $k=\lfloor 1/\eta\rfloor$ , then $q_{{\mathcal{A}}}(G)\leq q_{\leq k}(G)$ . However note neither approximation result implies the other. The constant 2 in Lemma 3.1 is best possible, as shown by the following example.

Example 3.2.

Fix $0<b<2$ . Let the odd integer $k$ be sufficiently large that

1-\tfrac{1}{k}>b\,(\tfrac{1}{2}-\tfrac{1}{6k}),

so there exists $\eta$ such that $\eta>\tfrac{1}{2}-\tfrac{1}{6k}$ and $1-\tfrac{1}{k}>b\eta$ . Let the graph $G$ consist of $k$ disjoint triangles. Thus $G$ has $3k$ vertices and ${\rm vol}(G)=6k$ . Since $\eta>\tfrac{1}{2}-\tfrac{1}{6k}$ the only $\eta$ -fat partition for $G$ is the trivial partition $\mathcal{T}$ , with modularity score 0. Also $q^{*}(G)=1-\tfrac{1}{k}$ (achieved with the connected components partition), see Proposition 1.3 of [31]. Thus

q^{*}(G)-q_{\mathcal{T}}(G)=1-\tfrac{1}{k}>b\eta,

so in Lemma 3.1 we cannot replace 2 by $b$ .

To prove Lemma 3.1, it of course suffices to prove the second part. The greedy amalgamating algorithm to yield a good $\eta$ -fat partition is essentially a greedy algorithm for numbers, and we describe that first. The main step in the number greedy algorithm involves bipartitions, and the following well known problem. Before stating the problem, let us note some standard easy inequalities which are useful for the number (bi-) partitioning problem and for considering degree tax. Given $x_{1},\ldots,x_{k}\geq 0$ with $\sum_{i}x_{i}=s$ , we have

s\,\min_{i}x_{i}\leq s^{2}/k\leq\sum_{i}x_{i}^{2}\leq s\,\max_{i}x_{i}.

(3.2)

The number (bi-) partitioning problem

Given a positive integer $n$ and a positive $n$ -vector $\boldsymbol{x}=(x_{1},\ldots,x_{n})$ with $\sum_{i}x_{i}=1$ , let

\lambda(\boldsymbol{x})=\max_{A\subseteq[n]}\,\min\,\{\sum_{i\in A}x_{i},1-\sum_{i\in A}x_{i}\}.

Thus $0\leq\lambda(\boldsymbol{x})\leq\frac{1}{2}$ . Determining $\lambda(\boldsymbol{x})$ is the number partitioning problem, one of Garey and Johnson’s six basic NP-hard problems. It is well known that if each $x_{i}\leq\frac{1}{3}$ then $\lambda(\boldsymbol{x})\geq\frac{1}{3}$ . (To see this, observe that as we increase $j$ the last partial sum $x_{1}+x_{2}+\ldots+x_{j}$ which is at most $\frac{2}{3}$ is at least $\frac{1}{3}$ . This result also follows immediately from (3.4) below.)

We might expect that $\lambda(\boldsymbol{x})$ is large when $\sum_{i}x_{i}^{2}$ is small, that is when $1\!-\!\sum_{i}x_{i}^{2}$ is large. We would like to find the largest constant $\alpha$ such that always $\lambda(\boldsymbol{x})\geq\alpha(1\!-\!\sum_{i}x_{i}^{2})$ . We must have $\alpha\leq\frac{1}{2}$ : for example if $n>1$ is odd and each $x_{i}=1/n$ , then $\lambda(\boldsymbol{x})=(n-1)/2n=\frac{1}{2}(1-1/n)=\frac{1}{2}(1-\sum_{i}x_{i}^{2})$ . We shall show that $\frac{1}{2}$ is the right answer for $\alpha$ , that is, we always have $\lambda(\boldsymbol{x})\geq\frac{1}{2}(1-\sum_{i}x_{i}^{2})$ ; and further there is a simple greedy bi-partitioning algorithm which achieves this.

Assume that the elements are ordered so that $x_{1}\geq x_{2}\geq\cdots\geq x_{n}>0$ . We have two bins, and we add the elements one by one to a smaller of the bins. In detail, the greedy bi-partitioning algorithm is as follows. Initially set $A=B=\emptyset$ . For $j=1,\ldots,n$ , if $\sum_{i\in A}x_{i}\leq\sum_{i\in B}x_{i}$ then insert $j$ into $A$ , else insert $j$ into $B$ . At the end, let $\gamma(\boldsymbol{x})=\min\{\sum_{i\in A}x_{i},\sum_{i\in B}x_{i}\}$ . The algorithm clearly uses at most $n$ comparisons and $n$ additions.

Lemma 3.3.

\gamma(\boldsymbol{x})\geq\tfrac{1}{2}(1-\sum_{i}x_{i}^{2}).

(3.3)

Note by (3.2) we have $\sum_{i}x_{i}^{2}\leq x_{1}$ , so (3.3) gives

\gamma(\boldsymbol{x})\geq\tfrac{1}{2}(1-x_{1}).

(3.4)

Proof of Lemma 3.3.

When the algorithm has finished, let $T$ be the total of the values $x_{i}$ in the bin containing $x_{n}$ ; and let $\bar{T}=1-T$ , the total in the other bin. We shall use induction on $n$ . The result is trivial if $n=1$ , since both sides of (3.3) are 0. Let $n\geq 2$ and assume that the result holds for all inputs of length $n-1$ . We shall consider two cases.

(a)

Suppose that $T\geq\frac{1}{2}$ , so $T\geq\bar{T}$ . Then $T-\bar{T}\leq x_{n}$ , so $\gamma(\boldsymbol{x})=\bar{T}\geq\frac{1}{2}(1-x_{n})$ . But $\sum_{i}x_{i}^{2}\geq x_{n}$ by (3.2), so (3.3) holds (without using the induction hypothesis).

(b)

Suppose that $T<\frac{1}{2}$ , so $\gamma(\boldsymbol{x})=T<\bar{T}$ . Let $x=x_{n}$ and $s=\sum_{i=1}^{n}x_{i}^{2}$ , so $0<x,s<1$ . Let $\boldsymbol{y}=(y_{1},\ldots,y_{n-1})$ where $y_{i}=x_{i}/(1-x)$ (and note that $\sum_{i}y_{i}=1$ ). By the induction hypothesis,

\gamma(\boldsymbol{y})\geq\tfrac{1}{2}(1-\sum_{i}y_{i}^{2})=\tfrac{1}{2}(1-\tfrac{s-x^{2}}{(1-x)^{2}}).

But $\gamma(\boldsymbol{x})=x+(1-x)\gamma(\boldsymbol{y})$ . Hence

	$\displaystyle\gamma(\boldsymbol{x})$	$\displaystyle\geq$	$\displaystyle x+\tfrac{1}{2}(1-x)(1-\tfrac{s-x^{2}}{(1-x)^{2}})\;\;=\;\;\tfrac{1}{2}(1+x-\tfrac{s-x^{2}}{1-x})$
		$\displaystyle=$	$\displaystyle\tfrac{1}{2}\tfrac{1-x^{2}-s+x^{2}}{1-x}\;\;=\;\;\tfrac{1}{2}\tfrac{1-s}{1-x}\;\;>\;\;\tfrac{1}{2}(1-s).$

Hence, again (3.3) holds.

Now we have established the induction step, and thus proven the lemma.∎

Next we consider partitioning numbers into several parts. Let $n\geq 1$ and let $\boldsymbol{x}=(x_{1},\ldots,x_{n})$ with each $x_{i}>0$ and $\sum_{i}x_{i}=1$ . Let $k\geq 1$ and let ${\mathcal{A}}=(A_{1},\ldots,A_{k})$ be a partition of $[n]$ . Let $S_{j}=\sum_{i\in A_{j}}x_{i}$ for each $j=1,\ldots,k$ . The corresponding cost for ${\mathcal{A}}$ and $\boldsymbol{x}$ is

c({\mathcal{A}},\boldsymbol{x})=\sum_{j\in[k]}{S_{j}}^{2}-\sum_{i\in[n]}x_{i}^{2}.

Observe that $0\leq c({\mathcal{A}},\boldsymbol{x})<1$ . Given $0<\eta\leq 1$ , we say that the partition ${\mathcal{A}}$ is $(\boldsymbol{x},\eta)$ -fat if $S_{j}\geq\eta$ for each $j$ . Observe that the trivial partition is always $(\boldsymbol{x},\eta)$ -fat.

The number greedy partitioning algorithm starts with the trivial partition. While there is a part $A$ such that, setting $S=\sum_{i\in A}x_{i}$ and $\boldsymbol{y}=(x_{i}/S:i\in A)$ we have $\gamma(\boldsymbol{y})\geq\eta/S$ , it picks the first such part $A$ and uses the greedy number bi-partitioning algorithm to split $A$ into two parts each with sum at least $\eta$ .

Lemma 3.4.

Let $n\geq 1$ and let $\boldsymbol{x}=(x_{1},\ldots,x_{n})$ with each $x_{i}>0$ and $\sum_{i}x_{i}=1$ ; and let $0<\eta\leq 1$ . Then the number greedy partitioning algorithm finds an $(\boldsymbol{x},\eta)$ -fat partition ${\mathcal{A}}$ of $[n]$ with $c({\mathcal{A}},\boldsymbol{x})<2\eta$ , using $O(n)$ operations.

Proof.

Let the final partition obtained by the greedy amalgamating algorithm be ${\mathcal{A}}=(A_{1},\ldots,A_{k})$ (for some $k\geq 1$ ). Fix $j\in[k]$ . Let $\boldsymbol{x}^{(j)}=(x_{i}/S_{j}:i\in A_{j})$ . Let $c_{j}$ denote the cost of the trivial partition of $A_{j}$ with $\boldsymbol{x}^{(j)}$ , so $c_{j}=1-\sum_{i\in A_{j}}(x_{i}/S_{j})^{2}\geq 0$ . By Lemma 3.3, $\gamma(\boldsymbol{x}^{(j)})\geq\tfrac{1}{2}(1-\sum_{i\in A_{j}}(x_{i}/S_{j})^{2})=\tfrac{1}{2}c_{j}$ . But since the algorithm stopped with ${\mathcal{A}}$ , we have $\gamma(\boldsymbol{x}^{(j)})<\eta/S_{j}$ . Hence $c_{j}<2\eta/S_{j}$ for each $j\in[k]$ , and so

	$\displaystyle c({\mathcal{A}},\boldsymbol{x})$	$\displaystyle=$	$\displaystyle\sum_{j\in[k]}({S_{j}}^{2}-\!\sum_{i\in A_{j}}x_{i}^{2})\;=\;\sum_{j\in[k]}{S_{j}}^{2}(1-\!\sum_{i\in A_{j}}(x_{i}/S_{j})^{2})$
		$\displaystyle=$	$\displaystyle\sum_{j\in[k]}{S_{j}}^{2}c_{j}\;<\sum_{j\in[k]}{S_{j}}^{2}(2\eta/S_{j})\;=\;2\eta,$

as required. ∎

Finally, let us deduce the fattening lemma, Lemma 3.1, from Lemma 3.4. The greedy amalgamating algorithm which we apply to the partition $\mathcal{B}=(B_{1},\ldots,B_{n})$ of $V(G)$ is essentially the number greedy partitioning algorithm applied to the partition of $[n]$ into singletons $\{i\}$ , where the weight of $i$ is ${\rm vol}(B_{i})/{\rm vol}(G)$ .

Proof of Lemma 3.1.

Let $\mathcal{B}=(B_{1},\ldots,B_{n})$ be a partition of $V(G)$ ; and let $\boldsymbol{x}=(x_{1},\ldots,x_{n})$ where $x_{i}={\rm vol}(B_{i})/{\rm vol}(G)$ . By Lemma 3.4 the number greedy partitioning algorithm finds an $(\boldsymbol{x},\eta)$ -fat partition ${\mathcal{A}}=(A_{1},\ldots,A_{k})$ of $[n]$ with $c({\mathcal{A}},\boldsymbol{x})<2\eta$ . Now ${\mathcal{A}}$ yields the partition $\tilde{{\mathcal{A}}}=(\tilde{A}_{1},\ldots,\tilde{A}_{k})$ of $V(G)$ , where $\tilde{A}_{j}=\cup_{i\in A_{j}}B_{i}$ ; and ${\rm vol}(\tilde{A}_{j})=(\sum_{i\in A_{j}}x_{i})\,{\rm vol}(G)$ , so $\tilde{{\mathcal{A}}}$ is $\eta$ -fat for $G$ . Also $q_{\tilde{{\mathcal{A}}}}^{E}(G)\geq q_{\mathcal{B}}^{E}(G)$ , and $q_{\tilde{{\mathcal{A}}}}^{D}(G)-q_{\mathcal{B}}^{D}(G)=c({\mathcal{A}},\boldsymbol{x})$ . Thus

	$\displaystyle q_{\tilde{{\mathcal{A}}}}(G)$	$\displaystyle\geq$	$\displaystyle q_{\mathcal{B}}^{E}(G)-q_{\tilde{{\mathcal{A}}}}^{D}(G)\;=\;q_{\mathcal{B}}^{E}(G)-(c({\mathcal{A}},\boldsymbol{x})+q_{\mathcal{B}}^{D}(G))$
		$\displaystyle=$	$\displaystyle q_{\mathcal{B}}(G)-c({\mathcal{A}},\boldsymbol{x})\;>\;q_{\mathcal{B}}(G)-2\eta.$

This completes the proof of Lemma 3.1. ∎

4 Modularity of $G_{p}$ with large expected number of edges

4.1 Proof of Theorem 1.1

We shall use a tail bound for random variables with a binomial or similar distribution. We use a variant of the Chernoff bounds, which follows for example from Theorems 2.1 and 2.8 of [18] by considering $S/b$ . In this and the next section we shall always have $b=1$ or $b=2\,$ : we will need general $b$ in Section 10.

Lemma 4.1.

Let $b>0$ , and let random variables $X_{1},\ldots,X_{k}$ be independent, with $0\leq X_{j}\leq b$ for each $j$ . Let $S=\sum_{j=1}^{k}X_{j}$ and $\mu={\mathbb{E}}(S)$ . Then

{\mathbb{P}}(|S-\mu|\geq\varepsilon\mu)\leq 2e^{-\tfrac{1}{3}\varepsilon^{2}\mu/b}\;\;\;\mbox{ for }0<\varepsilon\leq 1;

or equivalently

{\mathbb{P}}(|S-\mu|\geq x\sqrt{\mu})\leq 2e^{-\frac{1}{3}x^{2}/b}\;\;\;\mbox{ for }0<x\leq\sqrt{\mu}\,.

Proof of Theorem 1.1.

First note that we may assume that $0<\varepsilon<1$ and $q^{*}(G)\geq\varepsilon$ . Let $\eta=\varepsilon/4$ . By Lemma 3.1 there is an $\eta$ -fat partition ${\mathcal{A}}=(A_{1},\ldots,A_{k})$ for $G$ such that $q_{{\mathcal{A}}}(G)\geq q^{*}(G)-2\eta$ (thus $q_{{\mathcal{A}}}(G)\geq 2\eta$ by our choice of $\eta=\varepsilon/4$ ). Observe that the number $k$ of parts in ${\mathcal{A}}$ is at most $\eta^{-1}$ , and thus $q_{{\mathcal{A}}}^{D}(G)\geq\eta$ by (3.2). To prove Theorem 1.1 we consider first the edge contribution and then the degree tax. Let $t=(e(G)p)^{1/2}$ . By Lemma 4.1 (with $b=1$ ), for $0\leq x\leq t$

{\mathbb{P}}(|e(G_{p})-e(G)p|\geq e(G)p\cdot x/t)\leq 2e^{-x^{2}/3}\,.

(4.1)

For a graph $H$ on $V(G)$ , let $e_{{\mathcal{A}}}^{\rm int}(H)$ denote the number of ‘internal’ edges of $H$ within the parts of ${\mathcal{A}}$ , so that $q_{{\mathcal{A}}}^{E}(H)=e_{{\mathcal{A}}}^{\rm int}(H)/e(H)$ . Then by Lemma 4.1 as above, for $0<x\leq(e_{{\mathcal{A}}}^{\rm int}(G)p)^{1/2}$

{\mathbb{P}}(e_{{\mathcal{A}}}^{\rm int}(G_{p})\leq e_{{\mathcal{A}}}^{\rm int}(G)p\,(1-x\,({e_{{\mathcal{A}}}^{\rm int}(G)p})^{-1/2})\leq 2e^{-x^{2}/3}\,.

But

q_{{\mathcal{A}}}^{E}(G)>q^{*}(G)-2\eta+q_{{\mathcal{A}}}^{D}(G)\geq q^{*}(G)-\eta\geq 3\eta,

so $e_{{\mathcal{A}}}^{\rm int}(G)\geq 3\eta\,e(G)$ . Hence, for $0<x\leq(3\eta)^{1/2}t$ , with probability at least $1-2e^{-x^{2}/3}$

e_{{\mathcal{A}}}^{\rm int}(G_{p})\geq e_{{\mathcal{A}}}^{\rm int}(G)\,p\,(1-x(3\eta\,e(G)p)^{-1/2})=e_{{\mathcal{A}}}^{\rm int}(G)\,p\,(1-(3\eta)^{-1/2}x/t).

(4.2)

Also, by (4.1), for $0<x\leq t$ we have $e(G_{p})\leq e(G)p\,(1+x/t)$ with probability at least $1-2e^{-x^{2}/3}$ . Hence by (4.2), for $0<x\leq(3\eta)^{1/2}t$ , with probability at least $1-4e^{-x^{2}/3}$

q_{{\mathcal{A}}}^{E}(G_{p})=\frac{e_{{\mathcal{A}}}^{\rm int}(G_{p})}{e(G_{p})}\geq q_{{\mathcal{A}}}^{E}(G)\frac{1-(3\eta)^{-1/2}x/t}{1+x/t}.

(4.3)

The degree tax is only a little more complicated. Let $X_{i}={\rm vol}_{G_{p}}(A_{i})$ , so ${\mathbb{E}}[X_{i}]={\rm vol}_{G}(A_{i})p$ . By Lemma 4.1 with $b=2$ , for $0<x\leq({\rm vol}_{G}(A_{i})p)^{1/2}$ , with probability at least $1-2e^{-x^{2}/6}$ we have

X_{i}\leq{\rm vol}_{G}(A_{i})p\,\big{(}1+x\,\big{(}{{\rm vol}_{G}(A_{i})p}\big{)}^{-1/2}\big{)}\,.

But, since ${\rm vol}_{G}(A_{i})\geq\eta\,{\rm vol}(G)$ ,

\big{(}{\rm vol}_{G}(A_{i})p\big{)}^{-1/2}\leq\big{(}\eta\,{\rm vol}(G)p\big{)}^{-1/2}=(2\eta)^{-1/2}/t;

and so, for $0<x\leq(2\eta)^{1/2}t$ , with probability at least $1-2e^{-x^{2}/6}$ we have

X_{i}\leq{\rm vol}_{G}(A_{i})p\,\big{(}1+(2\eta)^{-1/2}x/t\big{)}\,.

Recall that ${\mathcal{A}}$ has $k$ parts and thus for $0<x\leq(2\eta)^{1/2}t$ , with probability at least $1-2ke^{-x^{2}/6}$

\sum_{i}X_{i}^{2}\leq\sum_{i}{\rm vol}_{G}(A_{i})^{2}p^{2}\,\big{(}1+(2\eta)^{-1/2}x/t\big{)}^{2}.

Also, by (4.1), for $0<x\leq t$ , with probability at least $1-2e^{-x^{2}/3}$ we have $e(G_{p})\geq e(G)p(1-x/t)$ and so ${\rm vol}(G_{p})^{2}\geq{\rm vol}(G)^{2}p^{2}(1-x/t)^{2}$ . Hence, for $0<x\leq(2\eta)^{1/2}t$ , with probability at least $1-2(k+1)e^{-x^{2}/6}$

q_{{\mathcal{A}}}^{D}(G_{p})\leq q_{{\mathcal{A}}}^{D}(G)\left(\frac{1+(2\eta)^{-1/2}x/t}{1-x/t}\right)^{2}.

(4.4)

Now, for $0<x\leq(2\eta)^{1/2}t$ , with probability at least $1-2(k+3)e^{-x^{2}/6}$ , both (4.3) and (4.4) hold. (With a little more care we could replace the factor $(k+3)$ here by $(k+2)$ but that would not make a significant difference.) Since $k\leq\eta^{-1}=4/\varepsilon$ , the failure probability here is at most $2(4/\varepsilon+3)e^{-x^{2}/6}$ . Choose $x=x(\varepsilon)$ sufficiently large that this probability is at most $\varepsilon$ ; and note that we may take $x=\Theta(\sqrt{\log\varepsilon^{-1}})$ . In the next paragraph we shall choose $t_{0}\geq(2\eta)^{-1/2}x$ , so (4.4) holds for our choice of $x$ and any $t\geq t_{0}$ .

Consider the ratios on the right sides of the inequalities (4.3) and (4.4). Choose $t_{0}$ sufficiently large that for $t\geq t_{0}$

\frac{1-(3\eta)^{-1/2}x/t}{1+x/t}\geq 1-\eta

(4.5)

and

\left(\frac{1+(2\eta)^{-1/2}x/t}{1-x/t}\right)^{2}\leq 1+\eta.

(4.6)

Note that the left side in (4.5) is increasing in $t$ , and the left side in (4.6) is decreasing in $t$ , so we just need the inequalities to hold for $t_{0}$ . Rearranging, we see the inequality (4.5) is equivalent to

t\geq x(3^{-1/2}\eta^{-3/2}+\eta^{-1}-1).

and thus we must have $t_{0}\geq 3^{-1/2}\eta^{-3/2}x$ . In fact we may take $t_{0}=\Theta(\eta^{-3/2}x)=\Theta(\varepsilon^{-3/2}\sqrt{\log\varepsilon^{-1}})$ and this will ensure both (4.5) and (4.6) hold.

Finally, set $c=t_{0}^{2}$ , so $c=\Theta(\varepsilon^{-3}\log\varepsilon^{-1})$ ; and let us check that this value for $c$ works. With probability at least $1-\varepsilon$ both (4.3) and (4.4) hold. Suppose that both (4.3) and (4.4) do hold: if also $e(G)p\geq c$ then $t\geq t_{0}$ , so (4.5) and (4.6) hold; and then

q_{{\mathcal{A}}}(G_{p})\geq q_{{\mathcal{A}}}^{E}(G)(1-\eta)-q_{{\mathcal{A}}}^{D}(G)(1+\eta)>q_{{\mathcal{A}}}(G)-2\eta\geq q^{*}(G)-\varepsilon.

(4.7)

Hence, if $e(G)p\geq c$ , then $q_{{\mathcal{A}}}(G_{p})>q^{*}(G)-\varepsilon$ with probability at least $1-\varepsilon$ , as required. This completes the proof of Theorem 1.1. ∎

In the above proof of Theorem 1.1 we took $c=\Theta(\varepsilon^{-3}\log\varepsilon^{-1})$ . If we used a more detailed and precise form of Lemma 4.1 - see for example Theorem 21.6 of [16] - we could improve several bounds in the proof but we would not improve the asymptotic estimate for $c$ . If we simply used Chebyshev’s inequality we find we can take $c=\Theta(\varepsilon^{-5})$ .

The following example shows that, for the conclusion of Theorem 1.1 to hold, $c(\varepsilon)$ must be at least $\Omega(\varepsilon^{-1}\log\varepsilon^{-1})$ as $\varepsilon\to 0$ .

Example 4.2.

Let $\varepsilon>0$ be small. Let the integer $c=c(\varepsilon)$ be sufficiently large that the conclusion of Theorem 1.1 holds. Let $m=2c$ and $k=\lceil\varepsilon m\rceil$ , and assume that $k+1<m$ . Let $G$ be the $m$ -edge graph consisting of a star with $m-k$ edges together with $k$ isolated edges. Then the connected components partition is the optimal partition for $G$ , since each component has modularity 0 [27]. Thus

q^{*}(G)=1-\tfrac{(m-k)^{2}+k}{m^{2}}=\tfrac{2k}{m}-\tfrac{k^{2}+k}{m^{2}}>\tfrac{k}{m}\geq\varepsilon.

Now let $p=\frac{1}{2}$ , and note that $e(G)p=mp\geq c$ . The probability that $G_{p}$ contains none of the $k$ isolated edges is $(1-p)^{k}>2^{-\varepsilon m-1}$ . But any star has modularity 0, so ${\mathbb{P}}(q^{*}(G_{p})=0)>2^{-\varepsilon m-1}$ . Thus we must have $2^{-\varepsilon m}<2\varepsilon$ , and so $c>\frac{1}{2}\varepsilon^{-1}\log_{2}(2\varepsilon)^{-1}$ .

4.2 A $k$ -part analogue to Theorem 1.1

Recall that $q^{*}_{\leq k}(G)=\max_{|{\mathcal{A}}|\leq k}q_{\mathcal{A}}(G)$ , the maximum value of $q_{\mathcal{A}}(G)$ over all vertex partitions ${\mathcal{A}}$ with at most $k$ parts. We note here a variant of Theorem 1.1 where we replace each instance of $q^{*}$ by $q^{*}_{\leq k}$ . (Observe that the inequality (3.1) of Dinh and Thai with large $k$ shows that Proposition 4.3 implies Theorem 1.1. A similar connection will hold for Proposition 5.2 and Theorem 1.2.)

Proposition 4.3.

Given $\varepsilon>0$ there exists $c=c(\varepsilon)$ such that the following holds. For each graph $G$ and probability $p$ such that $e(G)p\geq c$ , and for each $k\geq 2$ , the random graph $G_{p}$ satisfies $q^{*}_{\leq k}(G_{p})>q^{*}_{\leq k}(G)-\varepsilon$ with probability $\geq 1-\varepsilon$ .

Proof.

The proof is almost immediate following that of Theorem 1.1. As in that proof, first note we may assume that $0<\varepsilon<1$ and $q^{*}_{\leq k}(G)\geq\varepsilon$ . Let ${\mathcal{A}}$ be a partition with at most $k$ parts and such that $q_{{\mathcal{A}}}(G)=q^{*}_{\leq k}(G)$ . Let $\eta=\varepsilon/4$ and then by Lemma 3.1 there is an $\eta$ -fat partition ${\mathcal{A}}^{\prime}$ such that $q_{{\mathcal{A}}^{\prime}}(G)\geq q_{{\mathcal{A}}}(G)-2\eta$ and such that ${\mathcal{A}}$ is a refinement of ${\mathcal{A}}^{\prime}$ . Note in particular this means that $|{\mathcal{A}}^{\prime}|\leq|{\mathcal{A}}|\leq k$ and so ${\mathcal{A}}^{\prime}$ has at most $k$ parts.

Then we may proceed exactly as in the proof of Theorem 1.1 and the analogue of line (4.7) would say that then with probability at least $1-\varepsilon$ we have

q_{{\mathcal{A}}^{\prime}}(G_{p})>q_{\mathcal{A}}(G)-\varepsilon.

But now, since $q_{\mathcal{A}}(G)=q^{*}_{\leq k}(G)$ and since $|{\mathcal{A}}^{\prime}|\leq k$ this implies that with probability at least $1-\varepsilon$ we have $q^{*}_{\leq k}(G_{p})\geq q_{{\mathcal{A}}^{\prime}}(G_{p})>q^{*}_{\leq k}(G)-\varepsilon$ as required. ∎

5 Modularity of $G_{p}$ with large expected degree

In this section we first prove Theorem 1.2, and then we give a $k$ -part analogue Proposition 5.2 and use it to complete the discussion of the bootstrapping example from Section 2.1.

5.1 Proof of Theorem 1.2

The rough idea of the proof of Theorem 1.2 is that we can use the fattening lemma, Lemma 3.1, to bound the probability that a vertex partition behaves badly by the probability that a fat vertex partition behaves similarly badly, and we can use probabilistic methods to handle fat partitions. However, even after the streamlining provided by Lemma 3.1 the proof is quite intricate and we will need to define a large number of events, many of which are ‘bad events’ parameterised by deviation from the ideal case.

Proof of Theorem 1.2.

Let $\varepsilon>0$ . We shall choose $c=c(\varepsilon)$ sufficiently large that certain inequalities below hold. It will suffice to choose $c>K\varepsilon^{-3}\log\varepsilon^{-1}$ for a sufficiently large constant $K$ . Let $G=(V,E)$ be a (fixed) $n$ -vertex graph and let $0<p<1$ ; and assume that $e(G)p\geq cn$ .

We now define the ‘bad’ events $\mathcal{B}_{0}$ , $\mathcal{B}_{1}$ , $\mathcal{B}_{2}$ and ${\mathcal{E}}_{0}$ (there will be more later!). Let $\mathcal{B}_{0}$ be the event $\{q^{*}(G_{p})<q^{*}(G)-\varepsilon\}$ . Let $\eta=\tfrac{\varepsilon}{9}$ . Let ${\mathcal{E}}_{0}$ be the event that there is a partition ${\mathcal{A}}_{0}$ such that $q_{{\mathcal{A}}^{\prime}_{0}}(G)<q_{{\mathcal{A}}_{0}}(G_{p})-\varepsilon$ , where ${\mathcal{A}}^{\prime}_{0}={\mathcal{A}}(G_{p},\eta,{\mathcal{A}}_{0})$ as in Lemma 3.1. Let $\mathcal{Q}$ be the set of partitions which are $\frac{\eta}{2}$ -fat for $G$ . Let $\mathcal{B}_{1}$ be the event that for some $A\subseteq V$ with ${\rm vol}_{G}(A)<\frac{\eta}{2}{\rm vol}(G)$ we have ${\rm vol}_{G_{p}}(A)\geq\eta{\rm vol}(G_{p})$ (that is, $A$ is not $\eta/2$ -fat for $G$ but is $\eta$ -fat for $G_{p}$ ). Observe that if some partition ${\mathcal{A}}\not\in\mathcal{Q}$ is $\eta$ -fat for $G_{p}$ , then $\mathcal{B}_{1}$ holds. Let $\mathcal{B}_{2}$ be the event that there is a partition ${\mathcal{A}}\in\mathcal{Q}$ such that $q_{{\mathcal{A}}}(G_{p})>q_{{\mathcal{A}}}(G)+\tfrac{\varepsilon}{2}$ .

We shall show that

\{q^{*}(G_{p})>q^{*}(G)+\varepsilon\}\subseteq\mathcal{B}_{1}\cup\mathcal{B}_{2}

(5.1)

and

{\mathcal{E}}_{0}\subseteq\mathcal{B}_{1}\cup\mathcal{B}_{2}.

(5.2)

Note that (5.1) yields $\{|q^{*}(G_{p})-q^{*}(G)|>\varepsilon\}\subseteq\mathcal{B}_{0}\cup\mathcal{B}_{1}\cup\mathcal{B}_{2}$ . It will follow that the probability that (a) or (b) in the theorem fails is at most ${\mathbb{P}}(\mathcal{B}_{0})+{\mathbb{P}}(\mathcal{B}_{1})+{\mathbb{P}}(\mathcal{B}_{2})$ . By Theorem 1.1, if $cn$ is sufficiently large (depending only on $\varepsilon$ ) then ${\mathbb{P}}(\mathcal{B}_{0})<\varepsilon/3$ . We shall show that, if we choose $c$ sufficiently large (depending only on $\varepsilon$ ) then also

{\mathbb{P}}(\mathcal{B}_{1})<\varepsilon/3

(5.3)

and

{\mathbb{P}}(\mathcal{B}_{2})<\varepsilon/3

(5.4)

which will complete the proof of Theorem 1.2. Next come the details: the proofs of (5.1)-(5.4). We focus initially on part (a) and the containment (5.1).

Proof of (5.1).

By Lemma 3.1 (and since $2\eta\leq\varepsilon/2$ ),

$\displaystyle\{q^{}(G_{p})>q^{}(G)+\varepsilon\}$	$\displaystyle\subseteq$	$\displaystyle\{\exists\,{\mathcal{A}}\>\eta\!-\!\mbox{fat for}\,G_{p}:q_{{\mathcal{A}}}(G_{p})>q^{*}(G)+\tfrac{\varepsilon}{2}\}$
	$\displaystyle\subseteq$	$\displaystyle\{\exists\,{\mathcal{A}}\>\eta\!-\!\mbox{fat for}\,G_{p}:q_{{\mathcal{A}}}(G_{p})>q_{{\mathcal{A}}}(G)+\tfrac{\varepsilon}{2}\}$
	$\displaystyle\subseteq$	$\displaystyle\{\exists\,{\mathcal{A}}\not\in\mathcal{Q}:{\mathcal{A}}\mbox{ is }\eta\!-\!\mbox{fat for}\,G_{p}\}\cup\mathcal{B}_{2}$
	$\displaystyle\subseteq$	$\displaystyle\mathcal{B}_{1}\cup\mathcal{B}_{2}.$

Thus the containment (5.1) holds. ∎

We show next that the inequality (5.3) holds. Let us observe first that if $e(G)p/v(G)\geq c$ for some (large) $c$ then $v(G)\geq 2c\,$ ; so we are concerned only with large graphs.

Proof of (5.3).

Let $A\subseteq V$ have ${\rm vol}_{G}(A)<\frac{\eta}{2}{\rm vol}(G)$ . Suppose that the edges within $A$ are $e_{1},\ldots,e_{j}$ and the edges with exactly one end in $A$ are $e_{j+1},\ldots,e_{k}$ . For $i=1,\ldots,j$ let $X_{i}$ be 2 if $e_{i}$ is in $G_{p}$ and 0 otherwise; and for $i=j+1,\ldots,k$ let $X_{i}$ be 1 if $e_{i}$ is in $G_{p}$ and 0 otherwise. Let $S=\sum_{i=1}^{k}X_{i}$ , and let $\mu={\mathbb{E}}(S)$ . Then ${\rm vol}_{G_{p}}(A)=S$ and $\mu={\rm vol}_{G}(A)p<\frac{\eta}{2}{\rm vol}(G)p$ . Form $S^{\prime}$ by adding further independent random variables $X_{i}$ with $0\leq X_{i}\leq 1$ so that $\mu^{\prime}:={\mathbb{E}}[S^{\prime}]=\frac{\eta}{2}{\rm vol}(G)p$ . Now by Lemma 4.1 applied to $S^{\prime}$ (with $b=2$ )

$\displaystyle{\mathbb{P}}({\rm vol}_{G_{p}}(A)\geq\tfrac{2}{3}\eta\,{\rm vol}(G)p)$	$\displaystyle=$	$\displaystyle{\mathbb{P}}(S\geq\tfrac{2}{3}\eta\,{\rm vol}(G)p)$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}(S^{\prime}\geq\tfrac{2}{3}\eta\,{\rm vol}(G)p)\;=\;{\mathbb{P}}(S^{\prime}\geq\tfrac{4}{3}\mu^{\prime})$
	$\displaystyle\leq$	$\displaystyle 2e^{-\frac{1}{54}\mu^{\prime}}\;=\;2e^{-\frac{\eta}{108}{\rm vol}(G)p}\;\leq\;2e^{-\frac{\eta}{54cn}}.$

Let $\mathcal{B}_{3}$ be the (bad) event that ${\rm vol}_{G_{p}}(A)\geq\tfrac{2}{3}\eta\,{\rm vol}(G)p\,$ for some $A\subseteq V$ with ${\rm vol}_{G}(A)<\frac{1}{2}\eta{\rm vol}(G)$ . Then, by a union bound, ${\mathbb{P}}(\mathcal{B}_{3})\leq 2^{n}\cdot 2e^{-\frac{\eta}{108}cn}$ , which is at most $\tfrac{\varepsilon}{6}$ if $c\geq K/\varepsilon$ for a sufficiently large constant $K$ . By Lemma 4.1 (with $b=1$ ), ${\mathbb{P}}({\rm vol}(G_{p})<\tfrac{2}{3}{\rm vol}(G)p)\leq\tfrac{\varepsilon}{6}$ (if $cn$ is sufficiently large). Then we have

{\mathbb{P}}(\mathcal{B}_{1})\leq{\mathbb{P}}(\mathcal{B}_{3})+{\mathbb{P}}({\rm vol}(G_{p})<\tfrac{2}{3}{\rm vol}(G)p)\leq\tfrac{\varepsilon}{3},

as required. ∎

We now start to prove that the inequality (5.4) holds. The proof proceeds by considering first degree tax in Claim (5.5), and then edge contribution in Claim (5.7).

Degree tax Let $\mathcal{B}_{4}$ be the event that for some $A\subseteq V$ with ${\rm vol}_{G}(A)\geq\frac{\eta}{2}{\rm vol}(G)$ we have ${\rm vol}_{G_{p}}(A)<(1-\tfrac{\varepsilon}{9}){\rm vol}_{G}(A)p$ . We claim that

{\mathbb{P}}(\mathcal{B}_{4})\leq\tfrac{\varepsilon}{12}.

(5.5)

Proof of Claim (5.5).

Let $A\subseteq V$ have ${\rm vol}_{G}(A)\geq\eta\,{\rm vol}(G)$ . Then ${\rm vol}_{G_{p}}(A)$ has mean ${\rm vol}_{G}(A)p\geq\tfrac{\eta}{2}{\rm vol}(G)p\geq\tfrac{\eta}{2}cn$ . Hence, by Lemma 4.1 (with $b=2$ ) and a union bound, we obtain (5.5) if $c\geq K/\varepsilon$ for a sufficiently large constant $K$ . ∎

Let ${\mathcal{E}}_{1}$ be the event that ${\rm vol}(G_{p})\leq\big{(}(1\!-\!\tfrac{\varepsilon}{9})/(1\!-\!\tfrac{\varepsilon}{8})\big{)}\,{\rm vol}(G)p$ . By Lemma 4.1 (with $b=1$ ), ${\mathbb{P}}(\overline{{\mathcal{E}}}_{1})<\tfrac{\varepsilon}{12}$ (if $cn$ is sufficiently large). Suppose that ${\mathcal{E}}_{1}\land\overline{\mathcal{B}_{4}}$ holds: then for each partition ${\mathcal{A}}=(A_{1},\ldots,A_{k})\in\mathcal{Q}$

	$\displaystyle q^{D}_{{\mathcal{A}}}(G_{p})$	$\displaystyle\geq$	$\displaystyle\frac{\sum_{i}\big{(}(1-\tfrac{\varepsilon}{9}){\rm vol}_{G}(A_{i})p\big{)}^{2}}{(((1-\tfrac{\varepsilon}{9})/(1-\tfrac{\varepsilon}{8})){\rm vol}(G)p)^{2}}$
		$\displaystyle=$	$\displaystyle\big{(}1-\tfrac{\varepsilon}{8}\big{)}^{2}q^{D}_{{\mathcal{A}}}(G)\;>\;(1-\tfrac{\varepsilon}{4})q^{D}_{{\mathcal{A}}}(G)\;\geq\;q^{D}_{{\mathcal{A}}}(G)-\tfrac{\varepsilon}{4}.$

Thus

{\mathbb{P}}(\exists\,{\mathcal{A}}\in\mathcal{Q}:q^{D}_{{\mathcal{A}}}(G_{p})\leq q^{D}_{{\mathcal{A}}}(G)-\tfrac{\varepsilon}{4})\leq{\mathbb{P}}(\overline{{\mathcal{E}}_{1}})+{\mathbb{P}}(\mathcal{B}_{4})\leq\tfrac{\varepsilon}{6}.

(5.6)

Edge contribution Let ${\mathcal{E}}_{2}$ be the event that for each partition ${\mathcal{A}}\in\mathcal{Q}$ we have $e_{{\mathcal{A}}}^{\rm int}(G_{p})\leq\max\{(1+\eta)e_{{\mathcal{A}}}^{\rm int}(G)p,\,\eta\,e(G)p\}$ . We claim that (if $c$ is sufficiently large)

{\mathbb{P}}(\overline{{\mathcal{E}}}_{2})\leq\tfrac{\varepsilon}{12}.

(5.7)

Proof of Claim (5.7).

Each partition ${\mathcal{A}}\in\mathcal{Q}$ has at most $\frac{2}{\eta}$ parts, so $|\mathcal{Q}|\leq(1+\frac{2}{\eta})^{n}$ . Let $\mathcal{B}_{5}$ be the event that, for some partition ${\mathcal{A}}\in\mathcal{Q}$ such that $e_{{\mathcal{A}}}^{\rm int}(G)<\tfrac{\eta}{2}e(G)$ , we have $e_{{\mathcal{A}}}^{\rm int}(G_{p})\geq\eta e(G)p$ . For each given such partition ${\mathcal{A}}$ , $e_{{\mathcal{A}}}^{\rm int}(G_{p})$ is stochastically at most a binomial random variable with mean $\tfrac{\eta}{2}e(G)p$ , so by Lemma 4.1 (with $b=1$ ) and a union bound we obtain ${\mathbb{P}}(\mathcal{B}_{5})\leq\tfrac{\varepsilon}{24}$ , if $c>K\varepsilon^{-1}\log\varepsilon^{-1}$ for a sufficiently large constant $K$ .

Let $\mathcal{B}_{6}$ be the event that, for some partition ${\mathcal{A}}\in\mathcal{Q}$ such that $e_{{\mathcal{A}}}^{\rm int}(G)\geq\tfrac{\eta}{2}e(G)$ , we have $e_{{\mathcal{A}}}^{\rm int}(G_{p})>(1+\eta)e_{{\mathcal{A}}}^{\rm int}(G)p$ . For each given such partition ${\mathcal{A}}$ , $e_{{\mathcal{A}}}^{\rm int}(G_{p})$ is a binomial random variable with mean at least $\tfrac{\eta}{2}e(G)p\geq\tfrac{\eta}{2}cn$ , so by Lemma 4.1 (with $b=1$ ) and a union bound we obtain ${\mathbb{P}}(\mathcal{B}_{6})\leq\tfrac{\varepsilon}{24}$ , if $c>K\varepsilon^{-3}\log\varepsilon^{-1}$ for a sufficiently large constant $K$ .

Now consider any partition ${\mathcal{A}}\in\mathcal{Q}$ . If $\mathcal{B}_{5}$ fails and $e_{{\mathcal{A}}}^{\rm int}(G_{p})\geq\eta e(G)p$ then $e_{{\mathcal{A}}}^{\rm int}(G)\geq\tfrac{\eta}{2}e(G)$ ; and so, if also $\mathcal{B}_{6}$ fails, then $e_{{\mathcal{A}}}^{\rm int}(G_{p})\leq(1+\eta)e_{{\mathcal{A}}}^{\rm int}(G)p$ . Thus if both $\mathcal{B}_{5}$ and $\mathcal{B}_{6}$ fail then ${\mathcal{E}}_{2}$ holds, so

{\mathbb{P}}(\overline{{\mathcal{E}}}_{2})\leq{\mathbb{P}}(\mathcal{B}_{5})+{\mathbb{P}}(\mathcal{B}_{6})\leq\tfrac{\varepsilon}{12},

so the Claim (5.7) holds, as required. ∎

Having proved the claims (5.5) and (5.7), we may now complete the proof of (5.4).

Proof of (5.4).

Let ${\mathcal{E}}_{3}$ be the event that $e(G_{p})\geq(1+\eta)^{-1}e(G)p$ . By Lemma 4.1 (with $b=1$ ), ${\mathbb{P}}(\overline{{\mathcal{E}}}_{3})\leq\tfrac{\varepsilon}{12}$ if $c>K\varepsilon^{-2}\log\varepsilon^{-1}$ for a sufficiently large constant $K$ . If ${\mathcal{E}}_{2}$ and ${\mathcal{E}}_{3}$ hold, then for each partition ${\mathcal{A}}\in\mathcal{Q}$ we have

$\displaystyle q^{E}_{{\mathcal{A}}}(G_{p})$	$\displaystyle\leq$	$\displaystyle\frac{\max\{(1+\eta)e_{{\mathcal{A}}}^{\rm int}(G)p,\,\eta\,e(G)p\}}{(1+\eta)^{-1}e(G)p}$
	$\displaystyle=$	$\displaystyle\max\{(1+\eta)^{2}q^{E}_{{\mathcal{A}}}(G),\,\eta+\eta^{2}\}$
	$\displaystyle<$	$\displaystyle q^{E}_{{\mathcal{A}}}(G)+2\eta+\eta^{2}\;\;<\;\;q^{E}_{{\mathcal{A}}}(G)+\tfrac{\varepsilon}{4}.$

Hence, using also (5.7), if $c>K\varepsilon^{-3}\log\varepsilon^{-1}$ for a sufficiently large constant $K$ ,

{\mathbb{P}}(\exists{\mathcal{A}}\in\mathcal{Q}:q^{E}_{{\mathcal{A}}}(G_{p})\geq q^{E}_{{\mathcal{A}}}(G)+\tfrac{\varepsilon}{4})\leq{\mathbb{P}}(\overline{{\mathcal{E}}}_{2})+{\mathbb{P}}(\overline{{\mathcal{E}}}_{3})\leq\tfrac{\varepsilon}{6}.

(5.8)

But now, by (5.6) and (5.8),

	$\displaystyle{\mathbb{P}}(\mathcal{B}_{2})$	$\displaystyle=$	$\displaystyle{\mathbb{P}}(\exists{\mathcal{A}}\!\in\!\mathcal{Q}:q_{{\mathcal{A}}}(G_{p})>q_{{\mathcal{A}}}(G)\!+\!\tfrac{\varepsilon}{2})$
		$\displaystyle\leq$	$\displaystyle{\mathbb{P}}(\exists{\mathcal{A}}\!\in\!\mathcal{Q}:q^{D}_{{\mathcal{A}}}(G_{p})\!<\!q^{D}_{{\mathcal{A}}}(G)\!-\!\tfrac{\varepsilon}{4})+{\mathbb{P}}(\exists{\mathcal{A}}\!\in\!\mathcal{Q}:q^{E}_{{\mathcal{A}}}(G_{p})\!\geq\!q^{E}_{{\mathcal{A}}}(G)\!+\!\tfrac{\varepsilon}{4})\leq\tfrac{\varepsilon}{3}.$

This completes the proof of (5.4). ∎

We have now completed the proofs of (5.1), (5.3) and (5.4); so it remains only to prove the containment (5.2).

Proof of (5.2).

Recall from Lemma 3.1 that $q_{{\mathcal{A}}^{\prime}_{0}}(G_{p})\geq q_{{\mathcal{A}}_{0}}(G_{p})-2\eta$ , where ${\mathcal{A}}^{\prime}_{0}={\mathcal{A}}(G_{p},\eta,{\mathcal{A}}_{0})$ . Hence, if we let ${\mathcal{E}}_{4}$ be the event that there is a partition ${\mathcal{A}}_{1}$ which is $\eta$ -fat for $G_{p}$ and satisfies $q_{{\mathcal{A}}_{1}}(G)<q_{{\mathcal{A}}_{1}}(G_{p})-\varepsilon+2\eta$ then ${\mathcal{E}}_{0}\subseteq{\mathcal{E}}_{4}$ . Let ${\mathcal{E}}_{5}$ be the event that there is a partition ${\mathcal{A}}_{2}\in Q$ (that is, ${\mathcal{A}}_{2}$ is $\frac{\eta}{2}$ -fat for $G$ ) which satisfies $q_{{\mathcal{A}}_{2}}(G)<q_{{\mathcal{A}}_{2}}(G_{p})-\varepsilon+2\eta$ . Then ${\mathcal{E}}_{4}\subseteq\mathcal{B}_{1}\cup{\mathcal{E}}_{5}$ , by the definition of the event $\mathcal{B}_{1}$ . Also, by our choice of $\eta$ , if ${\mathcal{E}}_{5}$ holds then there exists ${\mathcal{A}}\in Q$ such that $q_{{\mathcal{A}}}(G)<q_{{\mathcal{A}}}(G_{p})-\frac{\varepsilon}{2}$ , i.e. $\mathcal{B}_{2}$ holds; and thus ${\mathcal{E}}_{5}\subseteq\mathcal{B}_{2}$ . Summing up, we have ${\mathcal{E}}_{0}\subseteq{\mathcal{E}}_{4}\subseteq\mathcal{B}_{1}\cup{{\mathcal{E}}_{5}}\subseteq\mathcal{B}_{1}\cup\mathcal{B}_{2}\,$ ; so (5.2) holds, as required. ∎

We have now completed the proofs of (5.1)-(5.4), and thus completed the proof of Theorem 1.2.∎

In the proof of Theorem 1.2 we took $c(\varepsilon)=\Theta(\varepsilon^{-3}\log\varepsilon^{-1})$ . The following example shows that, for the conclusion (a) in Theorem 1.2 to hold, $c(\varepsilon)$ must be at least $\Omega(\varepsilon^{-2})$ .

Example 5.1.

Recall from [32, Theorem 1.3] that there is a constant $a>0$ such that, for each $c\geq 1$ whp $q^{*}(G_{n,c/n})>a/\sqrt{c}$ . Let $\varepsilon>0$ and let $c=c(\varepsilon)=a^{2}/\varepsilon^{2}$ (so $\varepsilon=a/\sqrt{c}$ ). Let $n\geq c+1$ , let $G$ be $K_{n}$ (so $q^{*}(G)=0$ ) and let $p=(c+1)/n$ . Then the expected average degree in $G_{p}$ is $(n-1)p\geq c$ , but whp $q^{*}(G_{p})>q^{*}(G)+\varepsilon$ .

5.2 A $k$ -part analogue to Theorem 1.2

As in Section 4.2, recall that $q^{*}_{\leq k}(G)=\max_{|{\mathcal{A}}|\leq k}q_{\mathcal{A}}(G)$ . We present here a variant of Theorem 1.2(a) where we replace each instance of $q^{*}$ by $q^{*}_{\leq k}$ . In Section 2 it was mentioned in the discussion concerning the stochastic block model following (2.1) that the case $k=2$ of this variant will be used in Remark 5.3 to obtain the relevant upper bound.

Proposition 5.2.

For each $\varepsilon>0$ , there is a $c=c(\varepsilon)$ such that the following holds. Let the graph $G$ and probability $p$ satisfy $e(G)p/v(G)\geq c$ , and let $k\geq 2$ . Then with probability $\geq 1-\varepsilon$ the random graph $G_{p}$ satisfies $|q^{*}_{\leq k}(G_{p})-q^{*}_{\leq k}(G)|<\varepsilon$ .

Proof.

We may follow the proof of Theorem 1.2(a), with a few natural small changes which we detail below. Note that since we do not prove an analogue of Theorem 1.2(b), we need only prove analogues of (5.1), (5.3) and (5.4), i.e. not (5.2).

For partitions with at most $k$ parts we define analogues of the events $\mathcal{B}_{0}$ and $\mathcal{B}_{2}$ (the event $\mathcal{B}_{1}$ is unchanged, and no analogue of ${\mathcal{E}}_{0}$ is needed since we do not prove the analogue of (5.2)). Let $\mathcal{B}_{0}^{(k)}$ be the event $\{q^{*}_{\leq k}(G_{p})<q^{*}_{\leq k}(G)-\varepsilon\}$ . By Theorem 1.2 (b) (with $\varepsilon$ replaced by $\varepsilon/3$ ) applied to an optimal partition ${\mathcal{A}}$ with at most $k$ parts (and noting that ${\mathcal{A}}^{\prime}$ has at most $k$ parts), we have ${\mathbb{P}}(\mathcal{B}_{0}^{(k)})\leq\varepsilon/3$ .

As before let $\eta=\tfrac{\varepsilon}{9}$ . Let $\mathcal{Q}^{(k)}$ be the set of partitions which are $\frac{\eta}{2}$ -fat for $G$ and have at most $k$ -parts; and let $\mathcal{B}_{2}^{(k)}$ be the event that there is a partition ${\mathcal{A}}\in\mathcal{Q}^{(k)}$ such that $q_{{\mathcal{A}}}(G_{p})>q_{{\mathcal{A}}}(G)+\tfrac{\varepsilon}{2}$ . We must establish the analogues of the statements (5.1), (5.3) and (5.4).

Proof of analogue of (5.1).

We want to show that $\{q^{*}_{\leq k}(G_{p})>q^{*}_{\leq k}(G)+\varepsilon\}\subseteq\mathcal{B}_{1}\cup\mathcal{B}_{2}^{(k)}$ . As before by Lemma 3.1 (and since $2\eta\leq\varepsilon/2$ ),

$\displaystyle\{q^{}_{\leq k}(G_{p})>q^{}_{\leq k}(G)+\varepsilon\}$	$\displaystyle\subseteq$	$\displaystyle\{\exists\,{\mathcal{A}}:\>\|{\mathcal{A}}\|\leq k,\>{\mathcal{A}}\mbox{ is }\eta\!-\!\mbox{fat for}\,G_{p},\>q_{{\mathcal{A}}}(G_{p})>q^{*}_{\leq k}(G)+\tfrac{\varepsilon}{2}\}$
	$\displaystyle\subseteq$	$\displaystyle\{\exists\,{\mathcal{A}}:\>\|{\mathcal{A}}\|\leq k,\>{\mathcal{A}}\mbox{ is }\eta\!-\!\mbox{fat for}\,G_{p},\>q_{{\mathcal{A}}}(G_{p})>q_{{\mathcal{A}}}(G)+\tfrac{\varepsilon}{2}\}$
	$\displaystyle\subseteq$	$\displaystyle\{\exists\,{\mathcal{A}}\not\in\mathcal{Q}^{(k)}:\>\|{\mathcal{A}}\|\leq k,\>{\mathcal{A}}\mbox{ is }\eta\!-\!\mbox{fat for}\,G_{p}\}\cup\mathcal{B}_{2}^{(k)}$
	$\displaystyle\subseteq$	$\displaystyle\mathcal{B}_{1}\cup\mathcal{B}_{2}^{(k)},$

as required. ∎

Since the event $\mathcal{B}_{1}$ is unchanged, the statement of (5.3) is unchanged and thus ${\mathbb{P}}(\mathcal{B}_{1})<\varepsilon/3$ . The analogue of (5.4) is to prove that ${\mathbb{P}}(\mathcal{B}_{2}^{(k)})<\varepsilon/3$ . But $\mathcal{B}_{2}^{(k)}\subseteq\mathcal{B}_{2}$ and thus ${\mathbb{P}}(\mathcal{B}_{2}^{(k)})\leq{\mathbb{P}}(\mathcal{B}_{2})<\varepsilon/3$ by (the original) (5.4). We have now completed the proof of Proposition 5.2. ∎

Remark 5.3.

We may now complete the discussion of the bootstrapping example from Section 2.1. We give details for how to show that equality holds in (2.1) when $p,q=\omega(n^{-1})$ . This will follow from spectral results on denser graphs, robustness of modularity and Proposition 5.2.

Recall that $q^{*}(G)\leq\bar{\lambda}$ and $q^{*}_{\leq 2}(G)\leq\bar{\lambda}/2$ where $\bar{\lambda}$ is the spectral gap of $G$ , see Lemma 6.1 of [32]. Now let $a>b>0$ satisfy the condition $\sqrt{a}-\sqrt{b}>\sqrt{2}$ . Then a result from [11] says that, for $p=n^{-1}a\log n$ and $q=n^{-1}b\log n$ , we have $\bar{\lambda}=(a-b)/(a+b)+o(1)$ whp; and thus

q^{*}_{\leq 2}(G_{n,2,p,q})=\frac{p-q}{2(p+q)}+o(1)\;\;\;\mbox{whp}.

(5.9)

We can use Proposition 5.2 to bootstrap (5.9) to sparser $p$ and $q$ (and with no condition corresponding to $\sqrt{a}-\sqrt{b}>\sqrt{2}$ ). Let $\alpha>\beta>0$ ; let $\omega=\omega(n)\to\infty$ (arbitrarily slowly) as $n\to\infty$ , with $\omega(n)\leq\log n$ ; and let $p=\alpha\,\omega/n$ and $q=\beta\,\omega/n$ . Let us show that still (5.9) holds. Let $p^{+}=c\alpha(\log n)/n$ and $q^{+}=c\beta(\log n)/n$ , for some constant $c\geq 1$ such that $\sqrt{c\alpha}-\sqrt{c\beta}>\sqrt{2}$ , so whp by (5.9)

q^{*}_{\leq 2}(G_{n,2,p^{+},q^{+}})=\frac{p^{+}-q^{+}}{2(p^{+}+q^{+})}+o(1)=\frac{p-q}{2(p+q)}+o(1).

Also, whp $e(G_{n,2,p^{+},q^{+}})=\frac{1}{4}(\alpha+\beta)cn\log n+o(n^{2/3})$ . Let $x=\omega/(c\log n)$ (so $0\leq x\leq 1$ ). Then the sampled random graph $(G_{n,2,p^{+},q^{+}})_{x}$ satisfies $(G_{n,2,p^{+},q^{+}})_{x}=_{s}G_{n,p,q}$ , and whp $e((G_{n,2,p^{+},q^{+}})_{x})/n=\frac{1}{4}(\alpha+\beta)\omega+o(1)\to\infty$ . Hence by Proposition 5.2 we have $q^{*}_{\leq 2}(G_{n,2,p,q})=q^{*}_{\leq 2}(G_{n,2,p^{+},q^{+}})+o(1)$ ; whp and we have shown that (5.9) holds with the current choice of $p,q$ , as we aimed to do.

6 Robustness of modularity, and closeness and concentration for $q^{}(G_{p})$ and $q^{}(G_{m})$

6.1 Robustness

We shall use deterministic robustness results relating to two graph operations, namely moving or deleting a set of edges, and concerning two graph parameters, namely the modularity score $q_{\mathcal{A}}(G)$ for a given partition ${\mathcal{A}}$ , and the (maximum) modularity $q^{*}(G)$ . We first briefly collect some already known results together with one new bound in Lemma 6.1 below. For edge deletion the bound on $q_{\mathcal{A}}$ is new while the bound on $q^{*}$ follows from Lemma 5.1 in [32]; and for moving edges the bounds follow from Lemma 5.3 and its proof in the same paper. See also Example 6.2 below which gives examples showing the bounds in Lemma 6.1 are asymptotically tight.

Lemma 6.1.

Let $H=(V,E)$ be a graph and let ${\mathcal{A}}$ be a partition of $V$ .

(a)

If $E_{0}$ is a non-empty proper subset of $E$ , and $H^{\prime}=(V,E\setminus E_{0})$ , then

$|q_{\mathcal{A}}(H)-q_{\mathcal{A}}(H^{\prime})|\,,\;|q^{*}(H)-q^{*}(H^{\prime})|<\frac{2\,|E_{0}|}{|E|}.$
(b)

If $\tilde{E}\neq E$ is a set of edges with $|\tilde{E}|=|E|$ , and $\tilde{H}=(V,\tilde{E})$ , then

$|q_{\mathcal{A}}(H)-q_{\mathcal{A}}(\tilde{H})|\,,\;|q^{*}(H)-q^{*}(\tilde{H})|<\frac{|E\!\bigtriangleup\!\tilde{E}|}{|E|}.$

We give examples to show tightness of Lemma 6.1 before proceeding with the proof. See the concluding remarks for a related open problem.

Example 6.2.

The examples here show tightness of the lemma - it is not possible to replace the factors 2 and 1 above by smaller constants in any of the four cases.

For the examples let us first recall some simple properties of modularity. Firstly, the placement of any isolated vertices in a partition is irrelevant. Secondly, for any part $A$ in any optimal partition ${\mathcal{A}}$ of a graph, the graph induced by the non-isolated vertices in $A$ is connected [7]. Thirdly, for any leaf vertex $u$ with pendant edge $uv$ , the vertices $u$ and $v$ are in the same part in any optimal partition [7, 42].

(a)

Let $H_{1}$ be the graph consisting of a star on $m$ edges together with a disjoint edge $e$ . Take $E_{0}=\{e\}$ and thus $H_{1}^{\prime}$ is a star on $m$ edges together with two isolated vertices. We consider the bipartition ${\mathcal{A}}$ which in $H_{1}$ places the vertices of the star in one part and the vertices of the disjoint edge in the other. Note that ${\mathcal{A}}$ is an optimal partition of $H_{1}$ and of $H_{1}^{\prime}$ . Since $H_{1}^{\prime}$ is a star plus isolated vertices, $q^{*}(H_{1}^{\prime})=q_{\mathcal{A}}(H_{1}^{\prime})=0$ , and we may calculate

q_{\mathcal{A}}(H_{1})=1-\frac{(2m)^{2}+2^{2}}{4(m+1)^{2}}=\frac{2m}{(m+1)^{2}}.

Noting that $\{e\}/|E|=1/(m+1)$ , for this choice of $H_{1}$ and $H_{1}^{\prime}$ we have

|q^{*}(H_{1})-q^{*}(H_{1}^{\prime})|=|q_{\mathcal{A}}(H_{1})-q_{\mathcal{A}}(H_{1}^{\prime})|=\frac{2m}{(m+1)^{2}}=\frac{2m}{m+1}\frac{|E_{0}|}{|E|}

and thus we may get arbitrarily close to the factor 2.

(b)

Let $H_{2}$ be the graph $H_{1}$ above except that we add an isolated vertex. Thus $H_{2}$ is the graph consisting of a star with central vertex $u$ on $m$ edges together with a disjoint edge $e$ and an isolated vertex $v$ . Let $\tilde{H}_{2}$ be obtained from $H_{2}$ by removing $e$ and adding the edge $e^{\prime}=uv$ : thus $\tilde{H}_{2}$ is a star on $m+1$ edges together with two isolated vertices.

We consider the bipartition ${\mathcal{A}}$ which places the vertices of the star and the isolated vertex together in one part and the disjoint edge in the other part. Then ${\mathcal{A}}$ is an optimal partition of $H_{2}$ and of $\tilde{H}_{2}$ by the same argument as in part (a), and $q^{*}(H_{2})=q^{*}(H_{1})=2m/(m+1)^{2}$ . Since $\tilde{H}_{2}$ is a star plus isolated vertices $q^{*}(\tilde{H}_{2})=q_{\mathcal{A}}(\tilde{H}_{2})=0$ .

Noting that $|E\bigtriangleup\tilde{E}|/|E|=2/(m+1)$ , for this choice of $H_{2}$ and $\tilde{H}_{2}$ we have

|q^{*}(H_{2})-q^{*}(\tilde{H}_{2})|=|q_{\mathcal{A}}(H_{2})-q_{\mathcal{A}}(\tilde{H}_{2})|=\frac{2m}{(m+1)^{2}}=\frac{m}{m+1}\frac{|E\bigtriangleup\tilde{E}|}{|E|}

and thus we may get arbitrarily close to the factor 1.

To prove the inequality concerning $q_{\mathcal{A}}$ in Lemma 6.1(a), we will use Lemma 6.3 bounding the edge contribution minus twice the degree tax.

Lemma 6.3.

Let $H$ be a graph on $m\geq 1$ edges and let ${\mathcal{A}}$ be a vertex partition. Then

q^{E}_{\mathcal{A}}(H)-2q^{D}_{\mathcal{A}}(H)\geq-1\,.

Proof.

For each part $A_{i}\in{\mathcal{A}}$ , let us write $a_{i}=\frac{1}{m}e(A_{i})$ and $b_{i}=\frac{1}{m}e(A_{i},\bar{A_{i}})$ for the proportion of edges internal to part $A_{i}$ and proportion between part $A_{i}$ and the rest of the graph respectively. Note that $\sum_{i}(a_{i}+\frac{1}{2}b_{i})=1$ . We also set $a=\sum_{i}a_{i}=q^{E}_{\mathcal{A}}(H)$ for the proportion of edges within parts of the partition. Similarly let $b=\frac{1}{2}\sum_{i}b_{i}$ , so $b$ is the proportion of edges between parts, and note that $b_{i}\leq b$ for each $i$ . Also observe that $a+b=1$ .

The degree tax can be written

q^{D}_{\mathcal{A}}(H)=\sum_{i}(a_{i}+b_{i}/2)^{2}=\sum_{i}a_{i}^{2}+\sum_{i}a_{i}b_{i}+\tfrac{1}{4}\sum_{i}b_{i}^{2}.

(6.1)

We now upper bound each term of the RHS of (6.1) in turn. First note that $\sum_{i}a_{i}=a$ implies $\sum_{i}a_{i}^{2}\leq a^{2}$ . Next, since each $b_{i}\leq b$ and $\sum_{i}a_{i}=a$ we have $\sum_{i}a_{i}b_{i}\leq ab$ . Similarly, each $b_{i}\leq b$ and $\sum_{i}b_{i}=2b$ implies that $\sum_{i}b_{i}^{2}\leq b\,(2b)=2b^{2}.$ Thus by these bounds and (6.1)

q^{D}_{\mathcal{A}}(H)\leq a^{2}+ab+\tfrac{1}{2}b^{2}=\tfrac{1}{2}+\tfrac{1}{2}a^{2}

since $a+b=1$ . Recalling that $q^{E}_{\mathcal{A}}(H)=a$ , we get

q^{E}_{\mathcal{A}}(H)-2q^{D}_{\mathcal{A}}(H)\geq a-1-a^{2}=-1+a(1-a)\geq-1

where the last inequality follows since $0\leq a\leq 1$ . ∎

Proof of Lemma 6.1(a), bound on $q_{\mathcal{A}}$ .

We begin by following the proof of Lemma 5.1 in [32]. As there, let $\alpha=\alpha({\mathcal{A}})=|E_{1}|/|E|$ where $E_{1}$ is the set of edges in $E_{0}$ which lie within the parts of ${\mathcal{A}}$ and $\beta=\beta({\mathcal{A}})=|E_{0}\backslash E_{1}|/|E|$ . That is, $\alpha$ is the proportion of the edges in $E$ which are in $E_{0}$ and lie within the parts of ${\mathcal{A}}$ , and $\beta$ is the proportion which are in $E_{0}$ and lie between the parts.

Note that the statement we wish to prove follows from the following two inequalities.

q_{\mathcal{A}}(H^{\prime})-q_{\mathcal{A}}(H)<2\alpha+2\beta

(6.2)

and

q_{\mathcal{A}}(H)-q_{\mathcal{A}}(H^{\prime})<2\alpha+\beta\,.

(6.3)

The first of these, (6.2), is equation (5.2) in [32] and was proven there, so it remains only to prove (6.3). Directly from the definitions we may calculate the difference in edge contributions, (this is (5.4) in [32]),

q^{E}_{\mathcal{A}}(H)-q^{E}_{\mathcal{A}}(H^{\prime})=\alpha-(\alpha+\beta)\,q^{E}_{\mathcal{A}}(H^{\prime})\,.

(6.4)

Note also the following simple bound on the possible increase in the degree tax. Since $|E\backslash E_{0}|=(1-\alpha-\beta)|E|$ we have

q_{\mathcal{A}}^{D}(H)>\frac{1}{4|E|^{2}}\sum_{A\in{\mathcal{A}}}{\rm vol}_{H^{\prime}}(A)^{2}=(1-\alpha-\beta)^{2}q^{D}_{\mathcal{A}}(H^{\prime})

and so

q_{\mathcal{A}}^{D}(H^{\prime})-q_{\mathcal{A}}^{D}(H)<2(\alpha+\beta)q_{\mathcal{A}}^{D}(H^{\prime})\,.

Thus by (6.4) we have (this is (5.6) in [32])

q_{\mathcal{A}}(H)-q_{\mathcal{A}}(H^{\prime})<\alpha\!-\!(\alpha+\beta)q^{E}_{\mathcal{A}}(H^{\prime})+2(\alpha+\beta)q_{\mathcal{A}}^{D}(H^{\prime})=\alpha-(\alpha+\beta)(q^{E}_{\mathcal{A}}(H^{\prime})-2q^{D}_{\mathcal{A}}(H^{\prime})).

(6.5)

Now we may apply Lemma 6.3 to infer that the RHS of (6.5) is at most $2\alpha+\beta$ which proves (6.3). Finally, by 6.2 and (6.3),

|q_{\mathcal{A}}(H)-q_{\mathcal{A}}(H^{\prime})|<2(\alpha+\beta)=\frac{2\,|E_{0}|}{|E|},

as required.∎

6.2 Closeness of $q^{}(G_{p})$ and $q^{}(G_{m})$

Given a graph $G$ and $1\leq m\leq e(G)$ , how close are $q^{*}(G_{m})$ and $q^{*}(G_{p})$ , where $p=m/e(G)$ ? This question gives rise to Lemma 6.4, which we now state and prove. Note that Corollaries 1.3 and 1.4 then follow by Lemma 6.4 and Theorem 1.1 and 1.2 respectively.

Lemma 6.4.

Let $G$ be a graph, let $1\leq m\leq e(G)$ , and let $p=m/e(G)$ . Then for any partition ${\mathcal{A}}$ of $V(G)$

|{\mathbb{E}}[q^{*}(G_{p})]-{\mathbb{E}}[q^{*}(G_{m})]|\,,\;|{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]-{\mathbb{E}}[q_{\mathcal{A}}(G_{m})]|\leq 2\sqrt{\tfrac{1-p}{m}}\,.

(6.6)

Also, we can couple $G_{p}$ and $G_{m}$ so that, for any partition ${\mathcal{A}}$ of $V(G)$ , if $\omega(m)\to\infty$ as $m\to\infty$ then

|q^{*}(G_{p})-q^{*}(G_{m})|\;,\;|q_{\mathcal{A}}(G_{p})-q_{\mathcal{A}}(G_{m})|\;\leq\;\omega(m)\sqrt{\tfrac{1-p}{m}}\;\;\mbox{ whp}\,.

(6.7)

Proof.

Let $X=e(G_{p})$ , so $X\sim{\rm Bin}(e(G),p)$ with mean $m$ and variance ${\rm var}(X)=m(1-p)$ . Couple $G_{p}$ and $G_{m}$ so that their edge sets are nested: to do this, we may list the edges of $G$ in a uniformly random order, let $G_{p}$ have the first $X$ edges and let $G_{m}$ have the first $m$ edges. By Lemma 6.1

|q^{*}(G_{p})-q^{*}(G_{m})|\leq\tfrac{2\,|X-m|}{m}\,.

(6.8)

But

	$\displaystyle\left({\mathbb{E}}[q^{}(G_{p})]-{\mathbb{E}}[q^{}(G_{m})]\right)^{2}$	$\displaystyle\leq$	$\displaystyle{\mathbb{E}}[(q^{}(G_{p})-q^{}(G_{m}))^{2}]$
		$\displaystyle\leq$	$\displaystyle\tfrac{4}{m^{2}}{\rm var}(X)\;=\;\tfrac{4(1-p)}{m}\,,$

where we use (6.8) for the second inequality; and the first inequality in (6.6) follows. The second inequality in (6.6) may be proved in the just same way. Also, by Chebyshev’s inequality, for $t>0$

{\mathbb{P}}(|X-m|\geq t)\leq\tfrac{{\rm var}(X)}{t^{2}}=\tfrac{m(1-p)}{t^{2}}.

So, setting $t=\frac{1}{2}\omega\sqrt{m(1-p)}$ , whp $|X-m|\leq t$ . Hence

|q^{*}(G_{p})-q^{*}(G_{m})|\leq\tfrac{2t}{m}=\omega\,\sqrt{\tfrac{1-p}{m}}\;\;\mbox{ whp}\,;

and we have proved the first inequality in (6.7). The second inequality may be proved in just the same way. This completes the proof of Lemma 6.4. ∎

6.3 Concentration of $q^{}(G_{p})$ and $q^{}(G_{m})$

We restate Theorem 2.2 from the introduction, adding also concentration results for the random edge model. Recall that, given a fixed underlying graph $G$ , for $0<m\leq e(G)$ the random graph $G_{m}$ is obtained by uniformly sampling all $m$ -edge subsets of $G$ .

For the random graphs $G_{n,m}$ and $G_{n,p}$ , Theorem 7.1 of [32] showed that the modularity values are highly concentrated about their expected value. A very similar result is true, with similar proof, for the case of edge sampling. We use the results on $G_{m}$ to deduce the results on $G_{p}$ .

Theorem 6.5.

(a)

Given graph $G$ , a partition ${\mathcal{A}}$ , and $0\leq m\leq e(G)$ , for each $t>0$

{\mathbb{P}}\Big{(}\big{|}\,q_{\mathcal{A}}(G_{m})-{\mathbb{E}}[q_{\mathcal{A}}(G_{m})]\,\big{|}\geq t\Big{)}<2\,e^{-t^{2}m/2}\;\mbox{ and }\;{\mathbb{P}}\Big{(}\big{|}q^{*}(G_{m})-{\mathbb{E}}[q^{*}(G_{m})]\big{|}\geq t\Big{)}<2e^{-t^{2}m/2}.

(b)

There is a constant $\eta>0$ such that for each graph $G$ , each partition ${\mathcal{A}}={\mathcal{A}}(G)$ and each $0<p<1$ the following holds with $\mu=\mu(G,p)=e(G)p$ . For each $t\geq 0$

{\mathbb{P}}\Big{(}\big{|}\,q_{\mathcal{A}}(G_{p})-{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\,\big{|}\geq t\Big{)}<2\,e^{-\eta\mu t^{2}}\;\mbox{ and }\;{\mathbb{P}}\Big{(}\big{|}\,q^{*}(G_{p})-{\mathbb{E}}[q^{*}(G_{p})]\,\big{|}\geq t\Big{)}<2\,e^{-\eta\mu t^{2}}.

The results in Theorem 6.5 concerning $q^{*}(G_{m})$ and $q^{*}(G_{p})$ extend Theorem 7.1 of [32]. As well as the robustness results above, in the proof we use Lemma 4.1, and the following concentration result from [30] Theorem 7.4 (see also Example 7.3).

Lemma 6.6.

Let $A$ be a finite set, let $a$ be an integer such that $0\leq a\leq|A|$ , and consider the set $\binom{A}{a}$ of all $a$ -element subsets of $A$ . Suppose that the function $f:\binom{A}{a}\rightarrow{\mathbb{R}}$ satisfies $|f(S)-f(T)|\leq c$ whenever $|S\triangle T|=2$ (i.e. the $a$ -element subsets $S$ and $T$ are minimally different). If the random variable $X$ is uniformly distributed over $\binom{A}{a}$ , then

{\mathbb{P}}\left(\bigl{|}f(X)-{\mathbb{E}}[f(X)]\bigr{|}\geq t\right)\leq 2e^{-2t^{2}/ac^{2}}.

Proof of Theorem 6.5 part (a).

By Lemma 6.1(b), if $E(H)$ and $E(\tilde{H})$ are both of size $m$ and are minimally different then $|q_{\mathcal{A}}(H)-q_{\mathcal{A}}(\tilde{H})|$ , $|q^{*}(H)-q^{*}(\tilde{H})|<2/m$ . Hence, Lemma 6.6 with $A=E(G)$ , $a=m$ and taking $c=2/m$ for $q_{\mathcal{A}}$ and for $q^{*}$ immediately yields part (a) of Theorem 6.5. ∎

Proof of Theorem 6.5 part (b).

Let $M=e(G_{p})$ and let $\mu=\mu(n,p)={\mathbb{E}}[M]=e(G)p$ . We will first show the more detailed statement that, for each $t\geq 42/\sqrt{\mu}$ , we have

{\mathbb{P}}\big{(}\big{|}q_{\mathcal{A}}(G_{p})-{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\big{|}\geq t\big{)}\leq 6e^{-t^{2}\mu/150},

(6.9)

from which the result for $q_{\mathcal{A}}$ in part (b) of the theorem follows. We show the proof in detail for concentration of $q_{\mathcal{A}}$ then indicate the few differences to be made in showing the result for $q^{*}$ .

For any graph and any partition the modularity score lies in the interval $[-1/2,1)$ , see [7]; and hence we may assume that $0\leq t<3/2$ . Define the event $\mathcal{E}=\{M>2\mu/3\}$ and let ${\mathcal{E}}^{c}$ denote the complement of ${\mathcal{E}}$ ,

	$\displaystyle\!\!\!\!\!\!\!\!{\mathbb{P}}\big{(}\big{\|}q_{\mathcal{A}}(G_{p})\!-\!{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\big{\|}\geq t\big{)}$		(6.10)
	$\displaystyle\leq{\mathbb{P}}\big{(}\big{(}\big{\|}q_{\mathcal{A}}(G_{p})\!-\!{\mathbb{E}}[q_{\mathcal{A}}(G_{p})\|M]\big{\|}\geq\tfrac{t}{2}\big{)}\land{\mathcal{E}}\big{)}+{\mathbb{P}}\big{(}\big{\|}{\mathbb{E}}[q_{\mathcal{A}}(G_{p})\|M]\!-\!{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\big{\|}\geq\tfrac{t}{2}\big{)}\land{\mathcal{E}}\big{)}+{\mathbb{P}}({\mathcal{E}}^{c}).$

The proof proceeds by bounding separately the terms on the right in (6.10). Firstly, by using part (a) of the theorem and conditioning on $M=m$ where $m>2\mu/3$ , we have a bound on the first term of (6.10)

\displaystyle{\mathbb{P}}\Big{(}\big{(}\,\big{|}\,q_{\mathcal{A}}(G_{p})-{\mathbb{E}}[q^{*}(G_{p})|M]\,\big{|}\geq\tfrac{t}{2}\big{)}\land{\mathcal{E}}\Big{)}

\displaystyle\leq

\displaystyle 2\exp(-\tfrac{1}{2}(\tfrac{t}{2})^{2}(\tfrac{2\mu}{3}))=2\exp(-t^{2}\mu/12).

The third term is also straight forward to bound. By Lemma 4.1 and since $t<3/2$ ,

{\mathbb{P}}({\mathcal{E}}^{c})\leq 2\exp(-\tfrac{1}{3}(\tfrac{1}{3})^{2}\mu)=2\exp(-\mu/27)<2\exp(-(4/243)t^{2}\mu).

It now remains to bound the second term of (6.10). Let $G^{\prime}_{p}$ be a random subgraph of $G$ obtained by keeping each edge with probability $p$ , independently of $G_{p}$ , and let $M^{\prime}=e(G_{p}^{\prime})$ . By coupling and by Lemma 6.1, for $0<m\leq e(G)$

\Big{|}{\mathbb{E}}\big{[}q_{\mathcal{A}}(G_{p})|M=m]-{\mathbb{E}}[q_{\mathcal{A}}(G^{\prime}_{p})|M^{\prime}=m^{\prime}]\Big{|}\leq\frac{2|m-m^{\prime}|}{m}.

For any $x$ and any random variable $Y$ with finite mean $\big{|}x-{\mathbb{E}}[Y]\big{|}\leq{\mathbb{E}}_{Y}[|x-Y|]$ and thus for $0<m\leq e(G)$ ,

$\displaystyle\big{\|}{\mathbb{E}}[q_{\mathcal{A}}(G_{p})\|M=m]-{\mathbb{E}}(q_{\mathcal{A}}(G^{\prime}_{p}))\big{\|}$	$\displaystyle\leq$	$\displaystyle{\mathbb{E}}_{M^{\prime}}\big{[}\big{\|}{\mathbb{E}}[q_{\mathcal{A}}(G_{p})\|M=m]-{\mathbb{E}}[q_{\mathcal{A}}(G_{p}^{\prime})\|M^{\prime}]\big{\|}\big{]}$	(6.11)
	$\displaystyle\leq$	$\displaystyle(2/m)\,{\mathbb{E}}_{M^{\prime}}[\|m-M^{\prime}\|]$
	$\displaystyle\leq$	$\displaystyle(2/m)\,\big{(}\|m-\mu\|+{\mathbb{E}}[\|M^{\prime}-\mu\|]\big{)}.$

Note ${\mathbb{E}}[|M^{\prime}-\mu|]\leq\sqrt{{\mathbb{E}}[(M^{\prime}-\mu)^{2}]}\leq\sqrt{\mu}$ and so

$\displaystyle{\mathbb{P}}\Big{(}\big{(}\big{\|}{\mathbb{E}}[q_{\mathcal{A}}(G_{p})\|M]-{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\big{\|}\geq\tfrac{t}{2}\big{)}\land{\mathcal{E}}\Big{)}$	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}\Big{(}\big{(}\|M-\mu\|+\sqrt{\mu}\geq\tfrac{tM}{4}\big{)}\land{\mathcal{E}}\Big{)}$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}\Big{(}\|M-\mu\|\geq\tfrac{t\mu}{6}-\sqrt{\mu}\Big{)}$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}\Big{(}\|M-\mu\|\geq\tfrac{t\mu}{7}\Big{)}$

since we assumed that $t\geq 42\mu^{-1/2}$ and so $t\mu/6-\sqrt{\mu}\geq t\mu/7$ . By Lemma 4.1

{\mathbb{P}}\Big{(}|M-\mu|\geq t\mu/7\Big{)}\leq 2\exp\Big{(}-\tfrac{1}{3}(t/7)^{2}\mu\Big{)}=2\exp\big{(}-t^{2}\mu/147\big{)},

and thus we have a bound for the second term

{\mathbb{P}}\Big{(}\big{(}\big{|}{\mathbb{E}}[q_{\mathcal{A}}(G_{p})|M]-{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\big{|}\geq\tfrac{t}{2}\big{)}\land{\mathcal{E}}\Big{)}\leq 2\exp\big{(}-t^{2}\mu/147\big{)},

which completes the proof of (6.9). It is now possible to choose $\eta$ small enough that the inequality holds for all $t\geq 0$ and we may take $2$ as the coefficient of the exponential - the details of this calculation appear for example in the last part of the proof of Theorem 7.1 of [32]. This completes the proof of the first half of part (b).

To prove the corresponding result for $q^{*}$ the same technique may be used; in particular we bound the probability that $|q^{*}(G_{p})-{\mathbb{E}}[q^{*}(G_{p})]|$ is large by considering the sum of three terms as in (6.10). The first and second term can then be handled in the same way. For the third term we can get a slightly better bound by noticing that since $q^{*}(H)\in[0,1)$ for any graph $H$ we may assume $t<1$ instead of $t<3/2$ . ∎

7 Estimating modularity by sampling a fixed number of vertices: proof of Theorem 1.5

A graph parameter $f$ is estimable (or testable), see [6] and [28], if for every $\varepsilon>0$ there is a positive integer $k=k(\varepsilon)$ such that if $G$ is a graph with at least $k$ vertices, then for $X$ a uniformly random k-subset of $V(G)$ we have ${\mathbb{P}}(|f(G)-f(G[X])|>\varepsilon)<\varepsilon$ . We shall see here that the modularity value $q^{*}(G)$ is estimable for dense graphs but not more generally. For convenience we restate Theorem 1.5 from the introduction.

Theorem (restatement of Theorem 1.5).

We need some definitions. If $G$ is a graph and $S,T\subseteq V(G)$ then $e_{G}(S,T)$ is the number of ordered pairs $(s,t)\in S\times T$ such that there is an edge in $G$ between $s$ and $t$ . If $G$ and $G^{\prime}$ are graphs with the same vertex set $V$ , then the cut distance $d_{\square}(G,G^{\prime}):=|V|^{-2}\max_{S,T}|e_{G}(S,T)-e_{G^{\prime}}(S,T)|$ . Given a graph $G$ and $b\in{\mathbb{N}}$ , we let $G(b)$ denote the $b$ -blow-up of $G$ , where each vertex of $G$ is replaced by $b$ independent copies of itself; and thus $v(G(b))=b\,v(G)$ and $e(G(b))=b^{2}\,e(G)$ .

Example 7.1.

$q^{*}(C_{5}(2))>q^{*}(C_{5})$ . The unique form of an optimal partition of $C_{5}$ is into one path $P_{2}$ and one $P_{3}$ , with modularity $\frac{3}{5}-\frac{4^{2}+6^{2}}{10^{2}}=\frac{3}{5}-\frac{52}{100}=\frac{2}{25}$ . For $C_{5}(2)$ we can balance the partition, into two copies of $K_{2,3}$ , with modularity score $\frac{12}{20}-\frac{1}{2}=\frac{1}{10}>\frac{2}{25}$ .

Proof of Theorem 1.5(a).

We shall use Theorem 15.1 of [28], which says that a graph parameter $f(G)$ is estimable if and only if the following three conditions hold.

(i)

If $G_{n}$ and $G^{\prime}_{n}$ are graphs on the same vertex set and $d_{\square}(G_{n},G^{\prime}_{n})\!\rightarrow\!0$ then $f(G_{n})\!-\!f(G^{\prime}_{n})\!\rightarrow\!~{}0$ .
(ii)

For every graph $G$ , $f(G(b))$ has a limit as $b\rightarrow\infty$ .
(iii)

$f(G\!\cup\!K_{1})-f(G)\rightarrow 0$ if $v(G)\rightarrow\infty$ (where $G\!\cup\!K_{1}$ denotes the graph obtained from $G$ by adding a single isolated vertex).

Observe that always $q^{*}(G\!\cup\!K_{1})=q^{*}(G)$ , so we need be concerned here only about conditions (i) and (ii). We shall show that condition (ii) concerning blow-ups always holds. After that, we shall show that condition (i) concerning cut distances holds, as long as the graphs are suitably dense. Finally we give examples for sparse graphs which show that in this case modularity is not estimable.

Blow-ups of a graph : condition (ii)

Recall that $G(b)$ is the $b$ -blow-up of $G$ . Observe that always $q^{*}(G(b))\geq q^{*}(G)$ . For if ${\mathcal{A}}$ is an optimal partition for $G$ , with $k$ parts, then the natural corresponding $k$ -part partition for $G(b)$ has modularity score $q^{*}(G)$ . Thus also $q^{*}(G(jb))\geq q^{*}(G(b))$ for every $j\in{\mathbb{N}}$ .

Let $G$ be a (fixed) graph. We need to show that $q^{*}(G(b))$ tends to a limit as $b\to\infty$ . Let $q^{**}(G)$ be $\sup_{b}q^{*}(G(b))$ where the sup is over all $b\in{\mathbb{N}}$ . We shall see that in fact

q^{*}(G(b))\to q^{**}(G)\;\;\;\mbox{ as }\;b\to\infty.

(7.1)

If $G$ has no edges then $q^{*}(G(b))=0$ for all $b\in{\mathbb{N}}$ ; so we may assume that $e(G)\geq 1$ . Let $a\in{\mathbb{N}}$ , let $j\in{\mathbb{N}}$ and let $b\in{\mathbb{N}}$ satisfy $(j-1)a\leq b<ja$ . Then

\frac{e(G(ja))-e(G(b))}{e(G(ja))}=\frac{j^{2}a^{2}-b^{2}}{j^{2}a^{2}}\leq\frac{j^{2}-(j-1)^{2}}{j^{2}}<\frac{2}{j}.

Hence by a robustness result, Lemma 5.1 of [32], (see also Lemma 6.1 in this paper) we have that $|q^{*}(G(b))-q^{*}(G(ja))|<\tfrac{4}{j}$ ; and so

q^{*}(G(b))>q^{*}(G(ja))-\tfrac{4}{j}.

Let $\varepsilon>0$ . Let $a\in{\mathbb{N}}$ be such that $q^{*}(G(a))\geq q^{**}(G)-\varepsilon/2$ . Then, for all $b\in{\mathbb{N}}$ such that $b\geq 8a/\varepsilon$ , letting $j\in{\mathbb{N}}$ be such that $(j-1)a\leq b<ja$ (so $j>8/\varepsilon$ and thus $4/j<\varepsilon/2$ ) we have

q^{*}(G(b))\geq q^{*}(G(ja))-\varepsilon/2\geq q^{*}(G(a))-\varepsilon/2\geq q^{**}(G)-\varepsilon\,.

Now (7.1) follows, as required.

Cut distance and modularity : condition (i) for dense graphs

Fix $\rho$ with $0<\rho<1$ . A graph $G$ is called $\rho$ -dense if $e(G)\geq\rho\,v(G)^{2}/2$ . Let $0<\varepsilon<1$ . We want to show that there exists $\delta>0$ such that for all $\rho$ -dense graphs $G$ , $G^{\prime}$ with the same vertex set and such that $d_{\square}(G,G^{\prime})\leq\delta$ we have $|q^{*}(G)-q^{*}(G^{\prime})|\leq\varepsilon$ . With foresight, we shall take $\delta=\tfrac{\rho\,\varepsilon}{16\,+\,4/\varepsilon}$ .

By Lemma 1 in [12] there is a $k\leq k_{0}=\lceil 2/\varepsilon\rceil$ and a partition ${\mathcal{A}}=(A_{1},\ldots,A_{k})$ for $G$ such that $q_{\mathcal{A}}(G)\geq q^{*}(G)-\varepsilon/2$ . It suffices to show that

q_{\mathcal{A}}(G^{\prime})\geq q_{\mathcal{A}}(G)-\varepsilon/2

(7.2)

(since then $q^{*}(G^{\prime})\geq q_{\mathcal{A}}(G^{\prime})\geq q^{*}(G)-\varepsilon$ , and we may similarly deduce that $q^{*}(G)\geq q^{*}(G^{\prime})-\varepsilon$ ). To prove (7.2) we first consider the edge contribution $q^{E}_{\mathcal{A}}$ then the degree tax $q^{D}_{\mathcal{A}}$ . Let $n=v(G)=v(G^{\prime})$ .

Edge contribution $q^{E}_{\mathcal{A}}$ . Note that $2|e(G)-e(G^{\prime})|\leq d_{\square}(G,G^{\prime})\,n^{2}\leq\delta n^{2}$ , so $e(G^{\prime})\leq e(G)+\frac{1}{2}\delta n^{2}$ ; and similarly $|{\rm int}_{G}(A_{i})-{\rm int}_{G^{\prime}}(A_{i})|\leq\frac{1}{2}\delta n^{2}$ , so ${\rm int}_{G}(A_{i})\geq{\rm int}_{G^{\prime}}(A_{i})-\frac{1}{2}\delta n^{2}$ . Thus

$\displaystyle q^{E}_{\mathcal{A}}(G^{\prime})-q^{E}_{\mathcal{A}}(G)$	$\displaystyle=$	$\displaystyle\sum_{i=1}^{k}\big{(}\frac{{\rm int}_{G^{\prime}}(A_{i})}{e(G^{\prime})}-\frac{{\rm int}_{G}(A_{i})}{e(G)}\big{)}$
	$\displaystyle\geq$	$\displaystyle\sum_{i=1}^{k}\big{(}\frac{{\rm int}_{G}(A_{i})-\tfrac{1}{2}\delta n^{2}}{e(G)+\tfrac{1}{2}\delta n^{2}}-\frac{{\rm int}_{G}(A_{i})}{e(G)}\big{)}$
	$\displaystyle=$	$\displaystyle q^{E}_{\mathcal{A}}(G)\,\left(\big{(}1+\frac{\delta n^{2}}{2e(G)}\big{)}^{-1}-1\right)-\sum_{i=1}^{k}\frac{\delta n^{2}}{2e(G)+\delta n^{2}}.$

The second term here (minus the sum) is at least $-k\,\tfrac{\delta n^{2}}{2e(G)}\geq-k_{0}\,\delta/\rho$ . Also, since $(1+x)^{-1}\geq 1-x$ for $x\geq 0$ , the first term is at least $-q^{E}_{\mathcal{A}}(G)\,\tfrac{\delta n^{2}}{2e(G)}\geq-\delta/\rho$ . Hence

q^{E}_{\mathcal{A}}(G^{\prime})-q^{E}_{\mathcal{A}}(G)\geq-(k_{0}+1)\,\delta/\rho\geq-(2/\varepsilon+\!2)\,\delta/\rho.

(7.3)

Degree tax $q^{D}_{\mathcal{A}}$ . Since $|{\rm vol}(G)-{\rm vol}(G^{\prime})|\leq\delta n^{2}$ we have

|{\rm vol}(G)^{2}-{\rm vol}(G^{\prime})^{2}|=({\rm vol}(G)+{\rm vol}(G^{\prime}))\,|{\rm vol}(G)-{\rm vol}(G^{\prime})|\leq({\rm vol}(G)+{\rm vol}(G^{\prime}))\,\delta n^{2}.

Thus, using the last inequality if ${\rm vol}(G^{\prime})\leq{\rm vol}(G)$ ,

{\rm vol}(G^{\prime})^{2}\geq{\rm vol}(G)^{2}-2\delta n^{2}{\rm vol}(G)={\rm vol}(G)^{2}\,(1-2\delta n^{2}/{\rm vol}(G))\geq{\rm vol}(G)^{2}\,(1-2\delta/\rho).

(7.4)

Also $|{\rm vol}_{G}(A_{i})-{\rm vol}_{G^{\prime}}(A_{i})|\leq\delta n^{2}$ for each $i$ . We claim that

\sum_{i=1}^{k}{\rm vol}_{G^{\prime}}(A_{i})^{2}-\sum_{i=1}^{k}{\rm vol}_{G}(A_{i})^{2}\leq 2\delta n^{2}\,{\rm vol}(G^{\prime}).

(7.5)

To show this, let $I=\{i\in[k]:{\rm vol}_{G^{\prime}}(A_{i})\geq{\rm vol}_{G}(A_{i})\}$ . For $i\in I$

{\rm vol}_{G^{\prime}}(A_{i})^{2}-{\rm vol}_{G}(A_{i})^{2}=\big{(}{\rm vol}_{G^{\prime}}(A_{i})+{\rm vol}_{G}(A_{i})\big{)}\big{(}{\rm vol}_{G^{\prime}}(A_{i})-{\rm vol}_{G}(A_{i})\big{)}\leq 2\,{\rm vol}_{G^{\prime}}(A_{i})\cdot\delta n^{2}.

Thus

$\displaystyle\sum_{i=1}^{k}{\rm vol}_{G^{\prime}}(A_{i})^{2}-\sum_{i=1}^{k}{\rm vol}_{G}(A_{i})^{2}$	$\displaystyle\leq$	$\displaystyle\sum_{i\in I}\big{(}{\rm vol}_{G^{\prime}}(A_{i})^{2}-{\rm vol}_{G}(A_{i})^{2}\big{)}$
	$\displaystyle\leq$	$\displaystyle 2\delta n^{2}\,\sum_{i\in I}{\rm vol}_{G^{\prime}}(A_{i})$
	$\displaystyle\leq$	$\displaystyle 2\delta n^{2}\,{\rm vol}(G^{\prime}),$

which completes the proof of (7.5). Using (7.5) and then (7.4), we find

$\displaystyle q^{D}_{\mathcal{A}}(G^{\prime})$	$\displaystyle=$	$\displaystyle({\rm vol}(G^{\prime}))^{-2}\sum_{i=1}^{k}{\rm vol}_{G^{\prime}}(A_{i})^{2}$
	$\displaystyle\leq$	$\displaystyle({\rm vol}(G^{\prime}))^{-2}\big{(}\big{(}\sum_{i=1}^{k}{\rm vol}_{G}(A_{i})^{2}\big{)}+2\delta n^{2}{\rm vol}(G^{\prime})\big{)}$
	$\displaystyle\leq$	$\displaystyle(1-2\delta/\rho)^{-1}\,q^{D}_{\mathcal{A}}(G)+2\delta n^{2}/{\rm vol}(G^{\prime})$
	$\displaystyle\leq$	$\displaystyle q^{D}_{\mathcal{A}}(G)+4\delta/\rho+2\delta/\rho$

since $(1-x)^{-1}\leq 1+2x$ for $0\leq x\leq\frac{1}{2}$ . Thus

q^{D}_{\mathcal{A}}(G^{\prime})\leq q^{D}_{\mathcal{A}}(G)+6\,\delta/\rho.

(7.6)

Putting the results (7.3) on $q^{E}_{\mathcal{A}}$ and (7.6) on $q^{D}_{\mathcal{A}}$ together we have

q_{\mathcal{A}}(G^{\prime})\geq q_{\mathcal{A}}(G)-(2/\varepsilon+\!8)\,\delta/\rho\geq q_{\mathcal{A}}(G)-\varepsilon/2

by our choice of $\delta$ . Hence (7.2) holds, as required. This completes the proof of Theorem 1.5(a), for dense graphs. ∎ We now give a pair of constructions which will demonstrate that modularity is not estimable.

Example 7.2.

Let $0\leq\rho(n)<1$ and let $\rho(n)\to 0$ as $n\to\infty$ , arbitrarily slowly, so that in particular $\rho(n)\,n/\log n\to\infty$ . Then there are connected graphs $G_{n},G^{\prime}_{n}$ on vertex set $[n]$ such that $e(G_{n}),e(G^{\prime}_{n})\geq\rho(n)\,n^{2}/2$ ; and as $n\to\infty$ , $q^{*}(G_{n})\to 1$ and $q^{*}(G^{\prime}_{n})\to 0$ , and $e(G_{n}),e(G^{\prime}_{n})=o(n^{2})$ so $d_{\square}(G_{n},G^{\prime}_{n})=o(1)$ . Thus the graphs $G_{n},G^{\prime}_{n}$ are ‘nearly dense’ and are close in cut distance $d_{\square}$ , but their modularity values are not close.

For $G_{n}$ we may let $k=k(n)\sim 2\rho(n)n$ , and let $G_{n}$ be a collection of disjoint $k$ -cliques (together with at most $k-1$ $(k+1)$ -cliques) joined by edges to form a path. Then $e(G_{n})\sim(n/k)\binom{k}{2}\sim nk/2\sim\rho(n)n^{2}$ so $e(G_{n})\geq\rho(n)n^{2}/2$ for $n$ sufficiently large. Also it is easy to see that the partition ${\mathcal{A}}$ of the vertex set into the cliques satisfies $q_{\mathcal{A}}(G_{n})\sim 1$ .

For $G^{\prime}_{n}$ we may consider a binomial random graph $G_{n,\rho}$ , which with probability at least $\frac{1}{3}+o(1)$ is connected, has at least $\rho(n)n^{2}/2$ edges, and has modularity at most $\varepsilon(n)$ for a suitable $\varepsilon(n)=o(1)$ by Theorem 1.1(c) of [32] (or by other results in that paper).

Proof of Theorem 1.5(b).

Observe that if $G$ and $G^{\prime}$ are two $n$ -vertex graphs with at most $\varepsilon n^{2}/2$ edges then

d_{\square}(G,G^{\prime})\leq\max\{2e(G),2e(G^{\prime})\}/n^{2}\leq\varepsilon.

Also recall condition (i) for estimability in the proof of Theorem 1.5(a). We may now see that Theorem 1.5(b) (for graphs that do not have at least constant positive density) follows directly from Example 7.2 above. ∎

8 Under-sampling and overestimating modularity

When we sample few edges from a graph $H$ it seems that we tend to overestimate its modularity; that is, $q^{*}(H_{p})$ tends to be significantly larger than $q^{*}(H)$ . For example, if $H$ is the complete graph $K_{n}$ and $p=1/n$ , then $q^{*}(H)=0$ but $q^{*}(H_{p})\to 1$ in probability as $n\to\infty$ , see Theorem 1.1 of [32]. Our Theorem 1.1 shows that when the expected number $e(H)p$ of edges observed is large, although we may overestimate modularity we are unlikely to underestimate it by much. In this section we use Theorem 1.1 to prove that when the sampling probability $p$ is bounded away from 0, increasing $p$ is unlikely to increase overestimation by much.

To state the result precisely we give one definition. For random variables $X$ and $Y$ and $\varepsilon>0$ , we say that $X$ $\varepsilon$ -nearly (stochastically) dominates $Y$ if

{\mathbb{P}}(X\geq t)\geq{\mathbb{P}}(Y\geq t+\varepsilon)-\varepsilon\;\;\;\mbox{ for each }t.

(8.1)

Observe that if say $X$ and $Y$ take values in $[0,1]$ and $X$ $\varepsilon$ -nearly dominates $Y$ then ${\mathbb{E}}[X]>{\mathbb{E}}[Y]-2\varepsilon$ , since in this case

	$\displaystyle{\mathbb{E}}[X]$	$\displaystyle\geq$	$\displaystyle\int_{0}^{1-\varepsilon}{\mathbb{P}}(X\geq t)\,dt\;\;\geq\int_{0}^{1-\varepsilon}({\mathbb{P}}(Y\geq t+\varepsilon)-\varepsilon)\,dt$
		$\displaystyle=$	$\displaystyle\int_{\varepsilon}^{1}{\mathbb{P}}(Y\geq u)\,du-\varepsilon+\varepsilon^{2}\;\;\geq{\mathbb{E}}[Y]-2\varepsilon+\varepsilon^{2}\,.$

We now give the main result of this section.

Proposition 8.1.

Let $0<p_{0}<1$ and $\varepsilon>0$ . Then there exists $c$ such that, for any graph $H$ with at least $c$ edges and any sampling probabilities $p_{1},p_{2}$ with $p_{0}\leq p_{1}<p_{2}\leq 1$ , it holds that $\,q^{*}(H_{p_{1}})$ $\varepsilon$ -nearly dominates $q^{*}(H_{p_{2}})$ .

The case $p_{2}=1$ shows that ${\mathbb{P}}(q^{*}(H_{p_{1}})\geq q^{*}(H)-\varepsilon)\geq 1-\varepsilon$ as in Theorem 1.1, except that now we have the lower bound $p_{0}>0$ on the sampling probability $p$ (and $c$ depends on $p_{0}$ ).

Proof.

By Theorem 1.1 there exists $c_{0}$ such that for all graphs $J$ and all $0<p\leq 1$ such that $e(J)p\geq c_{0}$ , we have

{\mathbb{P}}\big{(}q^{*}(J_{p})>q^{*}(J)-\varepsilon/2\big{)}>1-\varepsilon/2.

Let $A$ be the set of all graphs $J$ with $e(J)p_{0}\geq c_{0}$ . Then

{\mathbb{P}}\big{(}q^{*}(J_{p})>q^{*}(J)-\varepsilon/2\big{)}>1-\varepsilon/2\;\;\;\;\mbox{ for each }J\in A\;\mbox{ and }\;p_{0}\leq p\leq 1.

(8.2)

Let $c\in{\mathbb{N}}$ be such that

{\mathbb{P}}({\rm Bin}(c,p_{0})\geq c_{0}/p_{0})\geq 1-\varepsilon/2.

Fix a graph $H$ on vertex set $V$ with $e(H)\geq c$ ; and note that by the above, for each $p_{0}\leq p\leq 1$

{\mathbb{P}}(H_{p}\in A)\geq{\mathbb{P}}(H_{p_{0}}\in A)>1-\varepsilon/2.

(8.3)

We couple $H_{p_{1}}$ and $H_{p_{2}}$ in the natural way. Let $p=p_{1}/p_{2}$ , let $Z\sim G_{n,p}$ and $H_{p_{2}}$ be independent. For each graph $J$ on vertex set $V$ , let $J_{Z}$ be the graph on $V$ with edge set $E(J)\cap E(Z)$ ; and observe that $(H_{p_{2}})_{Z}\sim H_{p_{1}}$ . We have

			$\displaystyle{\mathbb{P}}(q^{}((H_{p_{2}})_{Z})>q^{}(H_{p_{2}})-\varepsilon)$
		$\displaystyle\!\geq\!$	$\displaystyle\sum_{K\in A}{\mathbb{P}}\big{(}(H_{p_{2}}=K)\land(q^{}(K_{Z})>q^{}(K)-\varepsilon)\big{)}$
		$\displaystyle\!=\!$	$\displaystyle\sum_{K\in A}{\mathbb{P}}(H_{p_{2}}=K)\;{\mathbb{P}}(q^{}(K_{p})>q^{}(K)-\varepsilon)\;\;\;\mbox{ since $Z$ and $H_{p_{2}}$ are independent, and }K_{Z}\sim K_{p}$
		$\displaystyle\!\geq\!$	$\displaystyle(1-\varepsilon/2)\sum_{K\in A}{\mathbb{P}}(H_{p_{2}}=K)\;\;\;\mbox{ by~{}\eqref{eqn.A}}$
		$\displaystyle\!\geq\!$	$\displaystyle(1-\varepsilon/2)^{2}>1-\varepsilon\;\;\;\mbox{ by~{}(\ref{eqn.inAnew})}.$

Hence for every $t$

$\displaystyle{\mathbb{P}}(q^{*}(H_{p_{1}})\geq t)$	$\displaystyle\geq$	$\displaystyle{\mathbb{P}}\big{(}(q^{}((H_{p_{2}})_{Z})>q^{}(H_{p_{2}})-\varepsilon)\land(q^{*}(H_{p_{2}})\geq t+\varepsilon)\big{)}$
	$\displaystyle\geq$	$\displaystyle{\mathbb{P}}(q^{}(H_{p_{2}})\geq t+\varepsilon)-{\mathbb{P}}(q^{}((H_{p_{2}})_{Z})\leq q^{*}(H_{p_{2}})-\varepsilon)$
	$\displaystyle>$	$\displaystyle{\mathbb{P}}(q^{*}(H_{p_{2}})\geq t+\varepsilon)-\varepsilon,$

so $q^{*}(H_{p_{1}})$ $\varepsilon$ -nearly dominates $q^{*}(H_{p_{2}})$ as required. ∎

9 Expected modularity when average degree is constant

The modularity of the Erdős-Rényi (or binomial) random graph $G_{n,p}$ is investigated in [32]. Given a constant $c>0$ we let $\bar{q}(n,c)={\mathbb{E}}[q^{*}(G_{n,c/n})]$ for each $n\geq c$ . By Theorem 1.1 of that paper, for $0<c\leq 1$ we have $\bar{q}(n,c)\to 1$ as $n\to\infty$ . Let $\bar{q}(c)=1$ for each $c\in(0,1]$ .

Conjecture 9.1 ([32]).

For each $c>1$ , $\bar{q}(n,c)$ tends to a limit $\bar{q}(c)$ as $n\to\infty$ .

It was noted in that paper that if the conjecture holds then the function $\bar{q}(c)$ would be uniformly continuous for $c\in(0,\infty)$ . From Theorem 1.1 (in the present paper) we shall deduce that also $\bar{q}(c)$ would be non-increasing in $c$ . We collect results on $\bar{q}(c)$ in the following proposition.

Proposition 9.2.

(i)

for $0<c\leq 1$ , we have $\bar{q}(n,c)\to\bar{q}(c)=1$ as $n\to\infty$ ;

and if Conjecture 9.1 holds then

(ii)

$0<\bar{q}(c)<1$ for $c>1$
(iii)

$\bar{q}(c)=\Theta(c^{-\frac{1}{2}})$ as $c\to\infty$
(iv)

$\bar{q}(c)$ is (uniformly) continuous for $c\in(0,\infty)$
(v)

$\bar{q}(c)$ is non-increasing for $c\in(0,\infty)$ .

All but part $(v)$ of this result comes directly from [32]: part $(i)$ (as we already noted) and part $(ii)$ are from Theorem 1.1; part $(iii)$ is from Theorem 1.3 and part $(iv)$ is from Lemma 7.4. Part $(v)$ will follow immediately from inequality (9.1) below, so to complete the proof of Proposition 9.2 it remains only to prove the following lemma.

Lemma 9.3.

Let $0<c<c^{\prime}$ . For each $\varepsilon>0$ , there exists $n_{0}$ such that for all $n\geq n_{0}$ we have $\bar{q}(n,c^{\prime})-\bar{q}(n,c)<\varepsilon$ ; and thus

\limsup_{n\to\infty}\;(\bar{q}(n,c^{\prime})-\bar{q}(n,c))\leq 0.

(9.1)

Proof.

By Theorem 1.1 there exists $c_{0}$ such that for all graphs $H$ and all $0<p\leq 1$ such that $e(H)p\geq c_{0}$ we have

{\mathbb{P}}(q^{*}(H_{p})>q^{*}(H)-\varepsilon/4)>1-\varepsilon/4\,,

and so

{\mathbb{E}}[q^{*}(H_{p})]\geq(1-\varepsilon/4)(q^{*}(H)-\varepsilon/4)>q^{*}(H)-\varepsilon/2.

Let $p=c/c^{\prime}$ . Let $A$ be the set of all graphs $H$ with $e(H)p\geq c_{0}$ . Let $n_{0}\geq c^{\prime}$ be sufficiently large that for each $n\geq n_{0}$ we have ${\mathbb{P}}(G_{n,c^{\prime}/n}\in A)>1-\varepsilon/2$ . Let $n\geq n_{0}$ . Let $X\sim G_{n,c^{\prime}/n}$ and $Z\sim G_{n,p}$ be independent. For each graph $H$ on $[n]$ , let $H_{Z}$ be the graph on $[n]$ with edge set $E(H)\cap E(Z)$ ; and observe that $H_{Z}\sim H_{p}$ . Note also that $Y=X_{Z}$ satisfies $Y\sim G_{n,c/n}$ . Given a graph $H\in A$ , since $H_{Z}$ and ${\mathbf{1}}_{X=H}$ are independent,

	$\displaystyle{\mathbb{E}}[q^{*}(Y){\mathbf{1}}_{X=H}]$	$\displaystyle=$	$\displaystyle{\mathbb{E}}[q^{}(H_{Z}){\mathbf{1}}_{X=H}]\;=\;{\mathbb{E}}[q^{}(H_{p})]\,{\mathbb{P}}(X=H)$
		$\displaystyle\geq$	$\displaystyle(q^{*}(H)-\varepsilon/2)\,{\mathbb{P}}(X=H).$

Hence

$\displaystyle\bar{q}(n,c)$	$\displaystyle=$	$\displaystyle{\mathbb{E}}[q^{}(Y)]\;\;\geq\;\;\sum_{H\in A}{\mathbb{E}}[q^{}(Y){\mathbf{1}}_{X=H}]$
	$\displaystyle\geq$	$\displaystyle\sum_{H\in A}(q^{*}(H)-\varepsilon/2)\,{\mathbb{P}}(X=H)$
	$\displaystyle>$	$\displaystyle\sum_{H\in A}q^{*}(H)\,{\mathbb{P}}(X=H)-\varepsilon/2$
	$\displaystyle>$	$\displaystyle{\mathbb{E}}[q^{*}(X)]-\varepsilon\;\;=\;\;\bar{q}(n,c^{\prime})-\varepsilon$

since ${\mathbb{P}}(X\not\in A)<\varepsilon/2$ . Thus $\bar{q}(n,c^{\prime})-\bar{q}(n,c)<\varepsilon$ for each $n\geq n_{0}$ . It follows that

\limsup_{n\to\infty}\;(\bar{q}(n,c^{\prime})-\bar{q}(n,c))\leq\varepsilon\,;

and since this holds for each $\varepsilon>0$ , the inequality (9.1) follows. ∎

10 Modularity and edge-sampling on weighted networks

In network applications it can be useful to consider graphs in which the edges have weights. Following the notation of [9], let $V$ be a non-empty vertex set, and let $w:V\times V\to{\mathbb{R}}$ satisfy $w(u,v)=w(v,u)\geq 0$ for all vertices $u$ and $v$ . For simplicity, let us assume that $w(v,v)=0$ for each $v$ . We call $w$ a weight function on $V^{2}$ . Let $\max(w)$ denote the maximum of all the values $w(u,v)$ .

Define the (weighted) degree of a vertex $u$ by setting $\deg_{w}(u)=\sum_{v}w(u,v)$ . Similarly, define the (weighted) volume of a vertex set $X$ by ${\rm vol}_{w}(X)=\sum_{u\in X}\deg_{w}(u)$ , and (corresponding to $e(X)$ ) let $e_{w}(X)=\tfrac{1}{2}\sum_{u,v\in X}w(u,v)$ .

Assume that $w$ is not identically zero, that is ${\rm vol}_{w}(V)=2e_{w}(V)>0$ . For a given partition ${\mathcal{A}}$ of $V$ , define the modularity score of ${\mathcal{A}}$ on $w$ by

	$\displaystyle q_{\mathcal{A}}(w)$	$\displaystyle=$	$\displaystyle\frac{1}{{\rm vol}_{w}(V)}\sum_{A\in{\mathcal{A}}}\sum_{u,v\in A}\left(w(u,v)-\frac{\deg_{w}(u)\deg_{w}(v)}{{\rm vol}_{w}(V)}\right)$
		$\displaystyle=$	$\displaystyle q_{{\mathcal{A}}}^{E}(w)-q_{{\mathcal{A}}}^{D}(w)$

where

q_{{\mathcal{A}}}^{E}(w)=\frac{1}{{\rm vol}_{w}(V)}\sum_{A\in{\mathcal{A}}}2e_{w}(A)\;\;\mbox{ and }\;\;q_{{\mathcal{A}}}^{D}(w)=\frac{1}{{\rm vol}_{w}(V)^{2}}\sum_{A\in{\mathcal{A}}}{\rm vol}_{w}(A)^{2}.

Define the modularity of $w$ by $q^{*}(w)=\max_{{\mathcal{A}}}q_{{\mathcal{A}}}(w)$ . As in the unweighted case, $0\leq q^{*}(w)<1$ , and we may ignore vertices with degree 0. If $w$ is identically 0 we set $q_{{\mathcal{A}}}(w)=0$ and $q^{*}(w)=0$ . If $w$ is $\{0,1\}$ -valued then $q_{{\mathcal{A}}}(w)$ and $q^{*}(w)$ are the usual modularity score and modularity value respectively for the graph corresponding to $w$ . The values $q_{{\mathcal{A}}}(w)$ and $q^{*}(w)$ are unchanged under re-scaling $w$ , so we may assume that $0\leq w(u,v)\leq 1$ for each $u,v\in V$ ; and in this case we call $w$ a probability weight function.

Given a weight function $w$ on $V^{2}$ and $0<p\leq 1$ we define a random weight function $w_{p}$ by considering each edge $uv$ independently and keeping $w(u,v)$ unchanged with probability $p$ , and otherwise setting it to 0. Also, given a probability weight function $w$ , let $G_{w}$ be the random (unweighted) graph obtained by considering each edge $uv$ independently, and including edge $uv$ with probability $w(u,v)$ , and otherwise having no edge $uv$ . For a special case of $G_{w}$ see Definition 3.1 in [21]. The model has also been considered in network science applications and $G_{w}$ is referred to as a probabilistic network (to distinguish it from a binary one - in which all edges have probability either 0 or 1 of appearing) [20, 38].

10.1 Weighted underlying to weighted observed graph

Given a probability weight function $w$ on $V^{2}$ and a probability $p$ , we defined the random weight function $w_{p}$ above. For such weight functions we have results very like Theorems 1.1 and 1.2. Note that the theorems below have conditions on the sum of weights $e_{w}(V)$ being large - and recall that the modularity score of $w$ is unchanged under re-scaling - thus given a general weight function $w$ we may best apply the theorems to a re-scaling $\tilde{w}$ of $w$ by dividing through by $\max(w)$ (the maximum weight of an edge).

Theorem 10.1.

Given $b>0$ and $\varepsilon>0$ , there exists $c=c(\varepsilon)$ such that the following holds. Let $0<p\leq 1$ , and let the probability weight function $w$ on $V^{2}$ satisfy $e_{w}(V)\,p\geq c$ . Then, with probability $\geq 1-\varepsilon$ , the random weight function $w_{p}$ satisfies $q^{*}(w_{p})\geq q^{*}(w)-\varepsilon$ .

Theorem 10.2.

Given $\varepsilon>0$ , there exists $c=c(\varepsilon)$ such that the following holds. Let $0<p\leq 1$ , and let the probability weight function $w$ on $V^{2}$ satisfy $e_{w}(V)\,p\geq c\,|V|$ . Then, with probability $\geq 1-\varepsilon$ ,

(a)

the random weight function $w_{p}$ satisfies $|q^{*}(w_{p})-q^{*}(w)|<\varepsilon$ ; and
(b)

given any partition ${\mathcal{A}}$ of the vertex set, in a linear number of operations (seeing only $G_{p}$ ) the greedy amalgamating algorithm finds a partition ${\mathcal{A}}^{\prime}$ with $q_{{\mathcal{A}}^{\prime}}(w)\geq q_{{\mathcal{A}}}(w_{p})\!-\!\varepsilon$ .

Observe that Theorem 1.1 is the special case of Theorem 10.1 when $w$ is $\{0,1\}$ -valued, and similarly Theorem 1.2 is a special case of Theorem 10.2. In order to prove Theorems 10.1 and 10.2, we may use almost the same proofs as before. We need a natural minor variant of Lemma 3.1. Given a weight function $w$ on $V^{2}$ , call a set $U$ of vertices $\eta$ -fat if ${\rm vol}_{w}(U)\geq\eta\,{\rm vol}_{w}(V)$ , and call a partition ${\mathcal{A}}$ of $V$ $\eta$ -fat if each part is $\eta$ -fat. The following lemma is very similar to Lemma 3.1 and may be proved in exactly the same way.

Lemma 10.3 (The fattening lemma for weighted graphs).

For each non-zero weight function $w$ on $V^{2}\!$ , and each $0<\eta\leq 1$ , there is an $\eta$ -fat partition ${\mathcal{A}}$ of $V$ such that $q_{{\mathcal{A}}}(w)>q^{*}(w)-2\eta$ . Indeed, given any partition ${\mathcal{A}}_{0}$ of $V$ , using a linear number of operations, by amalgamating parts we can construct an $\eta$ -fat partition ${\mathcal{A}}$ such that $q_{{\mathcal{A}}}(w)>q_{{\mathcal{A}}_{0}}(w)-2\eta$ .

Proof of Theorem 10.1..

The proof is very similar to that of Theorem 1.1 so we indicate just a few key steps where there are differences.

As in that proof we may assume that $0<\varepsilon<1$ and $q^{*}(w)\geq\varepsilon$ , and we set $\eta=\varepsilon/4$ . By the weighted fattening lemma, Lemma 10.3, there exists an $\eta$ -fat partition ${\mathcal{A}}=\{A_{1},\ldots,A_{k}\}$ (where $k\leq 1/\eta$ ) such that $q_{\mathcal{A}}(w)\geq q^{*}(w)-2\eta\geq 2\eta$ . Let $t=(e_{w}(V)p)^{1/2}$ . Then corresponding to (4.1), for $0\leq x\leq t$ (and noting that $w(u,v)\leq 1$ for each edge $uv$ )

{\mathbb{P}}\big{(}\;|e_{w_{p}}(V)-e_{w}(V)p|\geq e_{w}(V)p\cdot x/t\;)\leq 2e^{-x^{2}/3}.

Let $e^{\rm int}_{\mathcal{A}}(w)$ denote the sum of ‘internal’ edge weights within the parts of ${\mathcal{A}}$ . Then corresponding to (4.3), for $0<x\leq(3\eta)^{1/2}t$ , with probability at least $1-4e^{-x^{2}/3}$

q^{E}_{\mathcal{A}}(w_{p})=\frac{e^{\rm int}_{\mathcal{A}}(w_{p})}{e_{w_{p}}(V)}\geq q^{E}_{\mathcal{A}}(w)\frac{1-(3\eta)^{-1/2}x/t}{1+x/t}.

For the degree tax, corresponding to (4.4) we find the following. For $0<x\leq(2\eta)^{1/2}t$ , with probability at least $1-2(k+1)e^{-x^{2}/(6b)}$

q^{D}_{\mathcal{A}}(w_{p})\leq q^{D}_{\mathcal{A}}(w)\left(\frac{1+(2\eta)^{-1/2}x/t}{1-x/t}\right)^{2}.

Thus for $0<x\leq(2\eta)^{1/2}t$ , with probability at least $1-2(k+3)e^{-x^{2}/6}$ both the last two displayed inequalities hold. The failure probability is at most $2(4/\varepsilon+3)e^{-x^{2}/6}$ , and we may thus choose $x=x(\varepsilon)$ sufficiently large that the probability is at most $\varepsilon$ ; and indeed we may take $x=\Theta((\log\varepsilon^{-1})^{1/2})$ .

The rest of the proof follows the non-weighted case by making similar minor adaptations.∎

Proof of Theorem 10.2.

As in the last proof, we may follow the proof of the non-weighted version, Theorem 1.2, with the following adaptations. In place of the fattening lemma, use the weighted version, Lemma 10.3; replace instances of $G$ and $G_{p}$ by $w$ and $w_{p}$ respectively. We may still apply Lemma 4.1 with $0\leq X_{j}\leq 2$ since all edges have weight at most 1. ∎

10.2 Weighted underlying to unweighted observed graph

We will see that the proofs in this subsection follow our proofs of Theorems 1.1 and 1.2 almost line by line, replacing all instances of $G$ with $w$ and all instances of $G_{p}$ with $G_{w}$ .

Theorem 10.4.

There exists $c=c(\varepsilon)$ such that the following holds. Let the probability weight function $w$ on $V^{2}$ satisfy $e_{w}(V)\geq c$ . Then with probability at least $1-\varepsilon$ the random graph $G_{w}$ satisfies $q^{*}(G_{w})>q^{*}(w)-\varepsilon$ .

Theorem 10.5.

Given $\varepsilon>0$ , there exists $c=c(\varepsilon)$ such that the following holds. Let the probability weight function $w$ on $V^{2}$ satisfy $e_{w}(V)\geq c\,|V|$ . Then with probability at least $1-\varepsilon$ ,

(a)

the random graph $G_{w}$ satisfies $|q^{*}(G_{w})-q^{*}(w)|<\varepsilon$ ; and
(b)

given any partition ${\mathcal{A}}$ of the vertex set, in a linear number of operations (seeing only $G_{p}$ ) the greedy amalgamating algorithm finds a partition ${\mathcal{A}}^{\prime}$ with $q_{{\mathcal{A}}^{\prime}}(w)\geq q_{{\mathcal{A}}}(G_{w})\!-\!\varepsilon$ .

Proof of Theorem 10.4.

The proof follows that of the non-weighted case, Theorem 1.1, line by line with the following adaptations. In place of the fattening lemma used on the underlying graph, use the weighted version, Lemma 10.3; and replace instances of $G$ and $G_{p}$ by $w$ and $G_{w}$ respectively. Note that Lemma 4.1 still applies with $0\leq X_{i}\leq 2$ - more details are given in the proof of Theorem 10.5. ∎

Proof of Theorem 10.5.

As in the proof of Theorem 1.2, let $\varepsilon>0$ and let $c>K\varepsilon^{-3}\log\varepsilon^{-1}$ for a sufficiently large constant $K$ . We again set $\eta=\varepsilon/9$ . Let $w$ be a fixed probability weight function on $V^{2}$ , where $|V|=n$ ; and assume that $e_{w}(V)p\geq cn$ .

Then the corresponding events $\mathcal{B}_{0},\mathcal{B}_{1},\mathcal{B}_{2}$ and ${\mathcal{E}}_{0}$ may all be defined as before, replacing instances of $G$ and $G_{p}$ with $w$ and $G_{w}$ respectively. Note that in the definition of ${\mathcal{E}}_{0}$ , one constructs partition ${\mathcal{A}}_{0}^{\prime}$ from ${\mathcal{A}}_{0}$ seeing only the observed graph $G_{w}$ , and thus using the usual fattening lemma, Lemma 3.1 rather than the weighted one. Then the proof proceeds by noting ${\mathbb{P}}(\mathcal{B}_{0})$ is small due to Theorem 10.4 and proving the statements corresponding to (5.1) to (5.4).

Corresponding to the proof of (5.1) only the usual substitutions of $G$ and $G_{p}$ by $w$ and $G_{w}$ are required. For the observation after that a small change is needed. Notice that since $\max(w)\leq 1$ we have that $v(G)^{2}/2\geq e_{w}(V)$ and $e_{w}(V)\geq cv(G)$ by assumption. Thus $v(G)\geq 2c$ as in the earlier proof.

Corresponding to the proof of (5.3) there are a few minor changes. Let $A\subseteq V$ have ${\rm vol}_{w}(A)<\frac{\eta}{2}{\rm vol}_{w}(V)$ . Suppose the (unordered) pairs $u,v$ of vertices in $A$ with $w(u,v)>0$ are labelled $u_{1}v_{1},\ldots,u_{j}v_{j}$ (some vertices may be repeated), and the pairs of vertices with exactly one endpoint in $A$ and positive edge weight are labelled $u_{j+1}v_{j+1},\ldots,u_{k}v_{k}$ . For $i=1,\ldots,j$ let $X_{i}$ be 2 if edge $u_{i}v_{i}$ is in $G_{w}$ and let $X_{i}$ be 0 otherwise; and for $i=j+1,\ldots,k$ let $X_{i}=1$ if edge $u_{i}v_{i}$ is in $G_{w}$ and let $X_{i}$ be 0 otherwise (so the random variable $X_{i}$ satisfies $0\leq X_{i}\leq 2$ if $i\leq j$ and $0\leq X_{i}\leq 1$ if $i>j$ , and is non-zero with probability $w(u_{i},v_{i})$ .) Corresponding to the original proof, ${\rm vol}_{G_{w}}(A)=\sum_{i}X_{i}$ and ${\rm vol}_{w}(A)={\mathbb{E}}[\sum_{i}X_{i}]$ , and the rest of the proof corresponding to (5.3) follows in the same manner – note that Lemma 4.1 (with $b=2$ ) still applies.

Corresponding to the proof of Claim (5.7), let $\mathcal{B}_{5}$ be the event that, for some partition ${\mathcal{A}}\in\mathcal{Q}$ such that $e_{{\mathcal{A}}}^{\rm int}(w)<\tfrac{\eta}{2}e_{w}(V)$ , we have $e_{{\mathcal{A}}}^{\rm int}(G_{w})\geq\eta e_{w}(V)$ . We note that $e_{{\mathcal{A}}}^{\rm int}(G_{w})$ is stochastically at most a sum of Bernoulli random variables, such that the mean (of the sum) is $\eta e_{w}(V)/2$ so we may apply Lemma 4.1 (with $b=1$ ). The rest of the proof corresponding to Claim (5.7) and indeed the entire proof continues with similar adaptations. ∎

10.3 Application : stochastic block model

In this subsection we show that Theorem 2.1 on the modularity value $q^{*}(G_{n,k,p,q})$ of the stochastic block model follows quickly from Theorem 10.5 (a weighted version of Theorem 1.2) and the deterministic result Lemma 10.6. For $k\geq 2$ and $0\leq q\leq p$ let

q(k,p,q)=\frac{(p-q)\,(1-1/k)}{p+(k-1)q}.

Since rescaling a weight function $w$ does not change $q^{*}(w)$ , for simplicity we set $\alpha=1$ in the following lemma.

Lemma 10.6.

Let $k\in{\mathbb{N}},k\geq 2$ . For $n\in{\mathbb{N}}$ let $V_{1}\cup\cdots\cup V_{k}$ be a partition of $V=[n]$ where $\lfloor n/k\rfloor\leq|V_{i}|\leq\lceil n/k\rceil$ . Let $\alpha=1$ and $0\leq\beta=\beta(n)\leq\alpha$ , and let $w=w(n,k,\alpha,\beta)$ be the weight function on vertex set $V$ with $w_{uv}=\alpha$ if $u$ and $v$ are in the same block $V_{i}$ and with $w_{uv}=\beta$ otherwise. Then $q_{{\mathcal{A}}_{0}}(w)=q(k,\alpha,\beta)+o(1)$ for the planted partition ${\mathcal{A}}_{0}$ , and $q^{*}(w)=q(k,\alpha,\beta)+o(1)$ .

Theorem 4.2 in the recent paper by Koshelev [21] concerns the same weighted graph as above and gives an upper bound on $q^{*}(w)$ without the factor $(1-1/k)$ , i.e. $q^{*}(w)\leq(\alpha-\beta)/(\alpha+(k-1)\beta)+o(1)$ . The proof in [21] proceeds by calculating the eigenvalues of the weighted adjacency matrix of $w$ . The proof below of Lemma 10.6 involves a simple weighted notion, $f_{w}(A)$ , of ‘per-unit-modularity’ as used in [34, 27].

Proof.

It will be straightforward to show that the planted partition ${\mathcal{A}}_{0}$ has modularity score as claimed: the main part of the proof is to show that $q^{*}(w)$ is at most this value.

First we show that we may write the modularity of the weight function $w$ as a weighted sum of a function $f_{w}(A)$ on vertex sets $A$ - see (10.1). The proof of the upper bound will proceed by bounding the maximum value of $f_{w}(A)$ over vertex sets $A$ . By definition, for any partition ${\mathcal{A}}$ of $V$

\displaystyle q_{\mathcal{A}}(w)

\displaystyle=

\displaystyle\frac{1}{{\rm vol}_{w}(V)}\sum_{A\in{\mathcal{A}}}\left(2e_{w}(A)-\frac{{\rm vol}_{w}(A)^{2}}{{\rm vol}_{w}(V)}\right)=\sum_{A\in{\mathcal{A}}}\frac{{\rm vol}_{w}(A)}{{\rm vol}_{w}(V)}f_{w}(A)

where

f_{w}(A)=\frac{2e_{w}(A)}{{\rm vol}_{w}(A)}-\frac{{\rm vol}_{w}(A)}{{\rm vol}_{w}(V)}.

(10.1)

Let $\eta>0$ and let ${\mathcal{A}}$ be an $\eta$ -fat partition of $V$ . By Lemma 10.3 it will suffice for us to show that $q_{\mathcal{A}}(w)\leq q(k,\alpha,\beta)+o(1)$ ; and since $q_{\mathcal{A}}(w)$ is a weighted average of the values $f_{w}(A)$ , it will suffice to show that $f_{w}(A)\leq q(k,\alpha,\beta)+o(1)$ for every $A\subseteq V$ with ${\rm vol}_{w}(A)\geq\eta\,{\rm vol}_{w}(V)$ . Fix such a set $A$ . Note that ${\rm vol}_{w}(V)=\Theta(n^{2})$ , and so also ${\rm vol}_{w}(A)=\Theta(n^{2})$ .

Define $\delta_{i}$ such that $|A\cap V_{i}|=\delta_{i}|V_{i}|$ , and note that $0\leq\delta_{i}\leq 1$ with $\sum_{i}\delta_{i}>0$ . Observe that the weighted complete graph on $[n]$ is approximately regular with weighted degree

\deg_{w}(u)=(\alpha+(k-1)\beta)\,n/k+O(1)

for each vertex $u$ (where the $O(1)$ error term is in $(-2\alpha+\beta,\beta)\subseteq(-2,1)$ ). Thus ${\rm vol}_{w}(V)=(\alpha+(k-1)\beta)n^{2}/k+O(n)$ and ${\rm vol}_{w}(A)=(\alpha+(k-1)\beta)(\sum_{i}\delta_{i})n^{2}/k^{2}+O(n)$ . Also

	$\displaystyle 2e_{w}(A)$	$\displaystyle=$	$\displaystyle\alpha\frac{n^{2}}{k^{2}}\big{(}\sum_{i}\delta_{i}^{2}\big{)}+\beta\frac{n^{2}}{k^{2}}\big{(}\sum_{i\neq j}\delta_{i}\delta_{j}\big{)}+O(n)$
		$\displaystyle=$	$\displaystyle\frac{n^{2}}{k^{2}}\left((\alpha-\beta)(\sum_{i}\delta_{i}^{2})+\beta(\sum_{i}\delta_{i})^{2}\right)+O(n)$

since $\sum_{i\neq j}\delta_{i}\delta_{j}=(\sum_{i}\delta_{i})^{2}-\sum_{i}\delta_{i}^{2}$ . Thus

$\displaystyle f_{w}(A)$	$\displaystyle=$	$\displaystyle\frac{(\alpha-\beta)(\sum_{i}\delta_{i}^{2})+\beta(\sum_{i}\delta_{i})^{2}}{(\alpha+(k-1)\beta)(\sum_{i}\delta_{i})}-\frac{(\alpha+(k-1)\beta)(\sum_{i}\delta_{i})}{k(\alpha+(k-1)\beta)}+O\Big{(}1/n\Big{)}$
	$\displaystyle=$	$\displaystyle\frac{(\alpha-\beta)(\sum_{i}\delta_{i}^{2})+(\beta-\frac{1}{k}(\alpha+(k-1)\beta)(\sum_{i}\delta_{i})^{2}}{(\alpha+(k-1)\beta)(\sum_{i}\delta_{i})}+O\Big{(}1/n\Big{)}$
	$\displaystyle=$	$\displaystyle\frac{(\alpha-\beta)((\sum_{i}\delta_{i}^{2})-\frac{1}{k}(\sum_{i}\delta_{i})^{2})}{(\alpha+(k-1)\beta)(\sum_{i}\delta_{i})}+O\Big{(}1/n\Big{)}.$

Now, for any $s>0$ , given that $\sum_{i}\delta_{i}=s$

f_{w}(A)=\frac{(\alpha-\beta)\,((\sum_{i}\delta_{i}^{2})/s-s/k)}{\alpha+(k-1)\beta}+O\Big{(}1/n\Big{)}.

We now show $(\sum_{i}\delta_{i}^{2})/s-s/k\leq 1-1/k$ . If some $\delta_{i}=1$ and the other $\delta_{j}$ are zero (in other words, if $A$ is $V_{i}$ ) then $s=1$ and $(\sum_{i}\delta_{i}^{2})/s-s/k=1-1/k$ . If $s\leq 1$ then $(\sum_{i}\delta_{i}^{2})/s-s/k\leq s-s/k\leq 1-1/k$ . Also, suppose that $s>1$ say $s=a+x$ where $a\in{\mathbb{N}}$ and $0\leq x<1$ . Then $\sum_{i}\delta_{i}^{2}\leq a+x^{2}$ , and $(\sum_{i}\delta_{i}^{2})/s-s/k\leq\frac{a+x^{2}}{a+x}-\frac{s}{k}<1-1/k$ . Thus

f_{w}(A)\leq q(k,\alpha,\beta)+O\Big{(}1/n\Big{)}

(and the upper bound is achieved when $A$ is a block $V_{i}$ ). But, as we noted earlier, $q_{\mathcal{A}}(w)$ is a weighted average of the values $f_{w}(A)$ for $A\in{\mathcal{A}}$ , and so

q_{\mathcal{A}}(w)\leq q(k,\alpha,\beta)+O\Big{(}1/n\Big{)}.

Hence

q^{*}(w)\leq q(k,\alpha,\beta)+o(1),

and we have the upper bound claimed.

Now take ${\mathcal{A}}$ as the planted partition $\{V_{1},\ldots,V_{k}\}$ , and note that $f_{w}(V_{i})=q(k,\alpha,\beta)+O(1/n)$ for each $i$ . Now since $q_{\mathcal{A}}(w)$ is a weighted average of the values $f_{w}(V_{i})$ we see that

q_{\mathcal{A}}(w)=q(k,\alpha,\beta)+O(1/n),

and we are done. ∎

Proof of Theorem 2.1..

Let $\alpha=1$ and $\beta=\rho$ (so $0\leq\beta\leq\alpha$ ). Let $\varepsilon>0$ . Let $w=w(n,k,\alpha,\beta)$ as in Lemma 10.6, and let $\hat{w}=p\,w$ . Then $G_{n,k,p,q}$ is $G_{\hat{w}}$ , and whp $|q^{*}(G_{\hat{w}})-q^{*}(\hat{w})|\leq\varepsilon$ by Theorem 10.5 since with $V=[n]$ we have $e_{\hat{w}}(V)/n\geq\frac{np}{2k}+O(1)\to\infty$ as $n\to\infty$ . But $q^{*}(\hat{w})=q^{*}(w)$ and $q-{\mathcal{A}}(\hat{w})=q_{\mathcal{A}}(w)$ for the planted partition ${\mathcal{A}}$ , so by Lemma 10.6

q_{\mathcal{A}}(G_{n,k,p,q})=q(k,\alpha,\beta)+o(1)\;\;\;\mbox{whp}

and

q^{*}(G_{n,k,p,q})=q(k,\alpha,\beta)+o(1)\;\;\;\mbox{whp},

and we are done. ∎

There is a version of the stochastic block model in which vertices are assigned to blocks independently and uniformly at random. Consider the variant of Theorem 2.1 using this version of the stochastic block model. Note that the part sizes will whp be $n/k\pm n^{1/2}\log n$ . Thus by a coupling argument the edge set will differ by $o(m)$ edges whp from the original model; and by the robustness result, Lemma 6.1, we find that whp the modularity values $q_{\mathcal{A}}$ and $q^{*}$ are both $q(k,\alpha,\beta)+o(1)$ as before.

11 Concluding remarks

Three phases of modularity
As mentioned in Section 2.3, the modularity of the binomial random graph $G_{n,p}$ exhibits three phases [32]. We restate the result as a sampling one. Recall that $q^{*}(K_{n})=0$ [7]. Let $G=K_{n}$ and consider $n\to\infty$ . Then in the sparse case (where $e(G)p\rightarrow\infty$ and $e(G)p/n\leq\frac{1}{2}+o(1)$ ) we have $q^{*}(G_{p})$ near 1 whp, in the dense case (where $e(G)p/n\rightarrow\infty$ ) $q^{*}(G_{p})$ is near $q^{*}(G)=0$ whp, and in between (where $c_{1}\leq e(G)p/n\leq c_{2}$ for constants $\frac{1}{2}<c_{1}<c_{2}$ ) $q^{*}(G_{p})$ is bounded away from $q^{*}(G)=0$ and $1$ whp. The question below asks if we may extend this three phase behaviour from complete graphs to a larger class of underlying graphs. Let us restrict our attention to graphs without isolated vertices.

Question 11.1.

For which classes $\mathcal{H}$ of graphs $H$ do we have parts (i) and (ii) of the following three phase result (where we write $n$ for $v(H)$ )?

(i)

If $e(H)p\rightarrow\infty$ and $e(H)p/n\leq\frac{1}{2}$ then $q^{*}(H_{p})=1+o(1)$ whp.
(ii)

There exist $\varepsilon>0$ and $0<c_{1}<c_{2}$ such that if $c_{1}\leq e(H)p/n\leq c_{2}$ then $q^{*}(H)+\varepsilon<q^{*}(H_{p})<1-\varepsilon$ whp.
(iii)

If $e(H)p/n\rightarrow\infty$ then $q^{*}(H_{p})=q^{*}(H)+o(1)$ whp.

Firstly, observe that part (iii) is implied by Theorem 1.2: thus the open question is for which classes of graphs we have parts (i) and (ii). Secondly, we can only get three genuine phases if $e(H)/n$ is unbounded. We have already noted that the three phase result holds for complete graphs, and by double exposure it must hold also for the random graph $G_{n,p}$ where $np\to\infty$ as $n\to\infty$ .

Part (i) is false for complete bipartite graphs $K_{k,n}$ for any fixed $k$ , since $q^{*}(G)\leq 1-1/k$ for any subgraph $G$ of $K_{k,n}$ (since $G$ has at most $k$ components, ignoring isolated vertices). Similarly, if $t$ and $k_{1},\ldots,k_{t}$ are fixed and $H$ has components $K_{k_{1},n_{1}},\ldots,K_{k_{t},n_{t}}$ then part (i) is false. Also, for (ii) to hold, we clearly must have $q^{*}(H)$ bounded away from 1.

Random false positives
In our model, the observed graph $G_{p}$ has random false negatives, that is, there are are edges in $G$ which are non-edges in $G_{p}$ , but no false positives. Theorem 1.2 shows that the modularity value is robust to random false negatives: so long as the observed graph has high enough expected degree the modularity value of the observed graph is close to the modularity of the underlying graph. (For an underlying graph with $\Theta(n^{2})$ edges we may have a $1-\Theta(1/n)$ chance of not seeing each edge.) However, perhaps random false positives may cause a large increase or decrease to the modularity value, possibly as much as adversarially added false positives.

Suppose that we are given all the edges in $G$ , but we may falsely think that some non-edges – the false positives – are also edges of $G$ . (We have not allowed false positives until now.) What can we say about modularity? Let us consider graphs $G_{n}$ with $n$ vertices and $m=m(n)$ edges which we see fully and correctly; and suppose that there are $\tilde{m}=\tilde{m}(n)$ false positives randomly chosen from the non-edges of the graph, so the observed graph $G^{\prime}_{n}$ has $m+\tilde{m}$ edges.

If $\tilde{m}$ is much smaller than $m$ then (unsurprisingly) we have no difficulties: by Lemma 6.1, if $\tilde{m}\leq(\varepsilon/2)m$ then $|q^{*}(G_{n})-q^{*}(G^{\prime}_{n})|\leq\varepsilon$ so $q^{*}(G^{\prime}_{n})$ is a good estimate of $q^{*}(G_{n})$ . At the other extreme, if $\tilde{m}$ is much larger than $m$ then there is little point in using $q^{*}(G^{\prime}_{n})$ as an estimate for $q^{*}(G_{n})$ . For by Lemma 6.1, if $\tilde{m}\geq(2/\varepsilon)m$ and we denote by $\tilde{G}_{n}$ the graph formed just from the false positive edges, then $|q^{*}(G^{\prime}_{n})-q^{*}(\tilde{G}_{n})|\leq\varepsilon$ ; so that $q^{*}(G^{\prime}_{n})$ is essentially determined by $q^{*}(\tilde{G}_{n})$ , whatever the value of $q^{*}(G_{n})$ .

The interest is thus in the balanced case, when $\tilde{m}\sim\delta m$ for some constant $\delta$ . We will see in Example 11.2 that sometimes randomly adding false positives may increase the modularity nearly as much as adversarially choosing edges to add. Part (a) of the robustness lemma, Lemma 6.1, gives the following ‘adversarial’ bound. If the graph $G$ has $m$ edges, and $G^{\prime}$ is any graph obtained from $G$ by adding $\delta m$ edges, then for any vertex partition ${\mathcal{A}}$ ,

|q^{*}(G)-q^{*}(G^{\prime})|,\,|q_{\mathcal{A}}(G)-q_{\mathcal{A}}(G^{\prime})|\;\leq\frac{2\delta}{1+\delta}.

(11.1)

Example 11.2.

Let $k=k(n)\in{\mathbb{N}}$ and suppose that $2\leq k\ll n^{1/2}$ . Let $G_{n}$ consist of a star on $[k]$ together with $n-k$ isolated vertices; and note that $m=e(G_{n})=k-1$ and $q^{*}(G_{n})=0$ . Form $G^{\prime}_{n}$ by adding $\delta m$ new edges uniformly at random. Then the difference in modularity values is approximately $2\delta/(1+\delta)$ , matching the adversarial bound (11.1).

To see this we may check that whp $G^{\prime}_{n}$ consists of the star together with $\delta m$ isolated edges (and $n-k-2\delta m$ isolated vertices - which we may ignore). But for such a graph $H$ , an optimal partition ${\mathcal{A}}$ consists of one part $[k]$ and $\delta m$ parts of size 2 corresponding to the isolated edges - since the parts in an optimal partition must induce connected subgraphs, and connected components which themselves have modularity zero must not be split in an optimal partition [27]. (Note that the modularity values of a star and of a single edge are zero [7]). Now, this partition captures all edges and so

q_{{\mathcal{A}}}(H)=1-\frac{(2m)^{2}+\delta m\cdot 2^{2}}{(2m(1+\delta))^{2}}=\frac{2\delta+\delta^{2}}{(1+\delta)^{2}}-\frac{\delta}{(1+\delta)^{2}m}=\frac{2\delta}{1+\delta}-\frac{\delta^{2}}{(1+\delta)^{2}}+O(\frac{1}{m}).

Hence whp

q^{*}(G^{\prime}_{n})-q^{*}(G_{n})=q^{*}(G^{\prime}_{n})=\frac{2\delta}{1+\delta}-\frac{\delta^{2}}{(1+\delta)^{2}}+O(\frac{1}{m})\,,

which is close to the adversarial (worst case) bound in (11.1) when $\delta$ is small.

However, how much we may decrease the modularity by adding random false positives is open. Indeed, it is open how much we may decrease the modularity by adversarially adding edges: note that Example 6.2 which showed tightness of Lemma 6.1 only showed tightness for how much the modularity may increase by adding edges.

Question 11.3.

Let $G$ be a graph with $m$ edges, let $G^{\prime}$ be obtained from $G$ by randomly adding $\delta m$ edges, and let $G^{\prime\prime}$ be obtained from $G$ by adding a set of $\delta m$ edges which maximises $q^{*}(G)-q^{*}(G^{\prime\prime})$ . What upper bounds do we have for $q^{*}(G)-q^{*}(G^{\prime})$ or for $q^{*}(G)-q^{*}(G^{\prime\prime})$ ? Can we beat the upper bound $2\delta/(1+\delta)$ significantly?

References

[1] L. A. Adamic and N. Glance. The political blogosphere and the 2004 US election: divided they blog. In Proceedings of the 3rd international workshop on Link discovery, 2005.
[2] P. J. Bickel and A. Chen. A nonparametric view of network models and newman–girvan and other modularities. Proceedings of the National Academy of Sciences, 106(50), 2009.
[3] P. J. Bickel, A. Chen, Y. Zhao, E. Levina, and J. Zhu. Correction to the proof of consistency of community detection. The Annals of Statistics, 2015.
[4] V. D. Blondel, J. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), 2008.
[5] M. Bolla, T. Kói, and A. Krámli. Testability of minimum balanced multiway cut densities. Discrete Applied Mathematics, 160(7-8), 2012.
[6] C. Borgs, J.T. Chayes, L. Lovász, V.T. Sós, and K. Vesztergombi. Convergent sequences of dense graphs I: Subgraph frequencies, metric properties and testing. Advances in Mathematics, 219(6), 2008. doi:10.1016/j.aim.2008.07.008.
[7] U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski, and D. Wagner. On modularity clustering. Knowledge and Data Engineering, IEEE Transactions on, 20(2), 2008. doi:10.1109/TKDE.2007.190689.
[8] J. Chellig, N. Fountoulakis, and F. Skerman. The modularity of random graphs on the hyperbolic plane. Journal of Complex Networks, 10(1), 2022.
[9] F. Chung. Spectral graph theory, volume 92. American Mathematical Soc. Providence, RI, 1997. doi:10.1090/cbms/092.
[10] V. Cohen-Addad, A. Kosowski, F. Mallmann-Trenn, and D. Saulpic. On the power of louvain in the stochastic block model. Advances in Neural Information Processing Systems, 33, 2020.
[11] S. Deng, S. Ling, and T. Strohmer. Strong Consistency, Graph Laplacians, and the Stochastic Block Model. The Journal of Machine Learning Research, 22(1), 2021.
[12] T. N. Dinh and M. T. Thai. Finding community structure with performance guarantees in scale-free networks. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on. IEEE, 2011.
[13] S. Diskin, J. Erde, M. Kang, and M. Krivelevich. Isoperimetric inequalities and supercritical percolation on high-dimensional product graphs. arXiv preprint arXiv:2304.00016, 2023.
[14] S. Diskin and M. Krivelevich. Expansion in supercritical random subgraphs of expanders and its consequences. arXiv preprint arXiv:2205.04852, 2022.
[15] S. Fortunato. Community detection in graphs. Physics Reports, 486(3):75–174, 2010. doi:10.1016/j.physrep.2009.11.002.
[16] A. Frieze and M. Karoński. Introduction to random graphs. Cambridge University Press, 2015. doi:10.1017/cbo9781316339831.
[17] A. Frieze and M. Krivelevich. On the non-planarity of a random subgraph. Combinatorics, Probability and Computing, 22(5):722–732, 2013.
[18] S. Janson, T. Łuczak, and A. Ruciński. Random Graphs, volume 45. John Wiley & Sons, 2011.
[19] T. Johansson. On hamilton cycles in Erdős-Rényi subgraphs of large graphs. Random Structures & Algorithms, 57(1):132–149, 2020.
[20] A. Kaveh, M. Magnani, and C. Rohner. Comparing node degrees in probabilistic networks. Journal of Complex Networks, 7(5), 2019.
[21] M. Koshelev. Modularity in planted partition model. Computational Management Science, 20(1):34, 2023.
[22] M. Krivelevich, C. Lee, and B. Sudakov. Robust hamiltonicity of dirac graphs. Transactions of the American Mathematical Society, 366(6):3095–3130, 2014.
[23] R. Lambiotte and M. T. Schaub. Modularity and Dynamics on Complex Networks. Cambridge University Press, 2021.
[24] A. Lancichinetti and S. Fortunato. Limits of modularity maximization in community detection. Physical Review E, 84(6), 2011. doi:10.1103/physreve.84.066122.
[25] L. Lichev and D. Mitsche. On the modularity of 3-regular random graphs and random graphs with given degree sequences. Random Structures & Algorithms, 61(4), 2022.
[26] C. Liu and F. Wei. Phase transition of degeneracy in minor-closed families. Advances in Applied Mathematics, 146, 2023.
[27] B. Louf, C. McDiarmid, and F. Skerman. Modularity and graph expansion. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.ITCS.2024.78.
[28] L. Lovász. Large networks and graph limits, volume 60. American Mathematical Soc., 2012.
[29] D. Lusseau. The emergent properties of a dolphin social network. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270, 2003.
[30] C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics, volume 141 of London Mathematical Society Lecture Note Series. Cambridge University Press, 1989.
[31] C. McDiarmid and F. Skerman. Modularity of regular and treelike graphs. Journal of Complex Networks, 6(4), 2018. doi:10.1093/comnet/cnx046.
[32] C. McDiarmid and F. Skerman. Modularity of Erdős-Rényi random graphs. Random Structures & Algorithms, 57(1), 2020. doi:10.1002/rsa.20910.
[33] C. McDiarmid and F. Skerman. Modularity and edge sampling. arXiv preprint arXiv:2112.13190v1, 2021.
[34] K. Meeks and F. Skerman. The parameterised complexity of computing the maximum modularity of a graph. Algorithmica, 82(8), 2020. doi:10.1007/s00453-019-00649-7.
[35] M. E. J. Newman. Networks: An Introduction. Oxford University Press, 2010. doi:10.1093/acprof:oso/9780199206650.001.0001.
[36] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69(2):026113, 2004. doi:10.1103/physreve.69.026113.
[37] E. Ofek. On the expansion of the giant component in percolated (n, d, $\lambda$ ) graphs. Combinatorics, Probability and Computing, 16(3):445–457, 2007.
[38] T. Poisot, A. R. Cirtwill, K. Cazelles, D. Gravel, M.-J. Fortin, and D. B. Stouffer. The structure of probabilistic networks. Methods in Ecology and Evolution, 7(3), 2016.
[39] M. A. Porter, J. Onnela, and Peter J. Mucha. Communities in networks. Notices of the AMS, 56(9):1082–1097, 2009.
[40] L. O. Prokhorenkova, A. Raigorodskii, and P. Prałat. Modularity of complex networks models. Internet Mathematics, 2017.
[41] D. Romanini, S. Lehmann, and M. Kivelä. Privacy and uniqueness of neighborhoods in social networks. Scientific reports, 11(1):1–15, 2021.
[42] F. Skerman. Modularity of Networks. PhD thesis, University of Oxford, 2016.
[43] B. Sudakov. Robustness of graph properties. Surveys in Combinatorics 2017, 440:372, 2017.
[44] V. A. Traag, L. Waltman, and N. J. Van Eck. From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports, 9(1), 2019.
[45] J. Vizentin-Bugoni, P. K. Maruyama, V. J. Debastiani, L. Duarte, B. Dalsgaard, and M. Sazima. Influences of sampling effort on detected patterns and structuring processes of a neotropical plant–hummingbird network. Journal of Animal Ecology, 85(1), 2016.
[46] Y. Zhao, E. Levina, and J. Zhu. Consistency of community detection in networks under degree-corrected stochastic block models. The Annals of Statistics, 40(4):2266, 2012.

$\displaystyle{\mathbb{P}}\Big{(}\big{(}\big{\|}{\mathbb{E}}[q_{\mathcal{A}}(G_{p})\|M]-{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\big{\|}\geq\tfrac{t}{2}\big{)}\land{\mathcal{E}}\Big{)}$	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}\Big{(}\big{(}\|M-\mu\|+\sqrt{\mu}\geq\tfrac{tM}{4}\big{)}\land{\mathcal{E}}\Big{)}$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}\Big{(}\|M-\mu\|\geq\tfrac{t\mu}{6}-\sqrt{\mu}\Big{)}$
	$\displaystyle\leq$	$\displaystyle{\mathbb{P}}\Big{(}\|M-\mu\|\geq\tfrac{t\mu}{7}\Big{)}$

Modularity and partially observed graphs.

Abstract

1 Introduction and main results

1.1 Modularity : definition and notation

Definition ([36], see also [35]).

1.2 Modularity of the random graph GpG_{p} obtained by edge-sampling

Theorem 1.1.

Theorem 1.2.

1.3 Modularity of the random graph GmG_{m} obtained by limited search

Corollary 1.3.

Corollary 1.4.

1.4 Estimating modularity by vertex sampling (‘parameter estimation’)

Theorem 1.5.

1.5 Outline of the paper

2 Further results and background on existing results

2.1 Bootstrapping to sparser graphs and application to stochastic block model.

Modularity preserves signal a little below the connectivity threshold.

Stochastic block model, a little below the connectivity threshold Θ​(n−1​log⁡n)\Theta(n^{-1}\log n).

Theorem 2.1.

2.2 Further results in this paper

Robustness, concentration and closeness of modularity.

Theorem 2.2.

Under-sampling and over-estimating modularity.

Expected modularity of the binomial random graph Gn,c/nG_{n,c/n}.

Modularity and edge-sampling on weighted networks.

2.3 Background on existing results, and our contribution

Modularity : use in community detection.

Informing applied network theory : privacy.

Robustness and percolation : existing results for random subgraphs of fixed graphs.

Parameter estimation, vertex sampling and network measures : existing work.

Modularity : existing results for random graphs.

3 The fattening lemma for vertex partitions

Lemma 3.1 (The fattening lemma).

Example 3.2.

The number (bi-) partitioning problem

Lemma 3.3.

Proof of Lemma 3.3.

Lemma 3.4.

Proof.

Proof of Lemma 3.1.

4 Modularity of GpG_{p} with large expected number of edges

4.1 Proof of Theorem 1.1

Lemma 4.1.

Proof of Theorem 1.1.

Example 4.2.

4.2 A kk-part analogue to Theorem 1.1

Proposition 4.3.

Proof.

5 Modularity of GpG_{p} with large expected degree

5.1 Proof of Theorem 1.2

Proof of Theorem 1.2.

Proof of (5.1).

Proof of (5.3).

Proof of Claim (5.5).

Proof of Claim (5.7).

Proof of (5.4).

Proof of (5.2).

Example 5.1.

5.2 A kk-part analogue to Theorem 1.2

Proposition 5.2.

Proof.

Proof of analogue of (5.1).

Remark 5.3.

6 Robustness of modularity, and closeness and concentration for q∗​(Gp)q^{*}(G_{p}) and q∗​(Gm)q^{*}(G_{m})

6.1 Robustness

Lemma 6.1.

Example 6.2.

Lemma 6.3.

Proof.

Proof of Lemma 6.1(a), bound on q𝒜q_{\mathcal{A}}.

6.2 Closeness of q∗​(Gp)q^{*}(G_{p}) and q∗​(Gm)q^{*}(G_{m})

Lemma 6.4.

Proof.

6.3 Concentration of q∗​(Gp)q^{*}(G_{p}) and q∗​(Gm)q^{*}(G_{m})

Theorem 6.5.

Lemma 6.6.

Proof of Theorem 6.5 part (a).

Proof of Theorem 6.5 part (b).

7 Estimating modularity by sampling a fixed number of vertices: proof of Theorem 1.5

Theorem (restatement of Theorem 1.5).

1.2 Modularity of the random graph $G_{p}$ obtained by edge-sampling

1.3 Modularity of the random graph $G_{m}$ obtained by limited search

Stochastic block model, a little below the connectivity threshold $\Theta(n^{-1}\log n)$ .

Expected modularity of the binomial random graph $G_{n,c/n}$ .

4 Modularity of $G_{p}$ with large expected number of edges

4.2 A $k$ -part analogue to Theorem 1.1

5 Modularity of $G_{p}$ with large expected degree

5.2 A $k$ -part analogue to Theorem 1.2

6 Robustness of modularity, and closeness and concentration for $q^{}(G_{p})$ and $q^{}(G_{m})$

Proof of Lemma 6.1(a), bound on $q_{\mathcal{A}}$ .

6.2 Closeness of $q^{}(G_{p})$ and $q^{}(G_{m})$

6.3 Concentration of $q^{}(G_{p})$ and $q^{}(G_{m})$