This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Modularity and partially observed graphs.

Colin McDiarmid Email: [email protected]. Department of Statistics, University of Oxford. Fiona Skerman Email: [email protected]. Partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) and the project AI4Research at Uppsala University. Part of this work was done while visiting the Simons Institute for the Theory of Computing, supported by a Simons-Berkeley Research Fellowship. Department of Mathematics, Uppsala University.
Abstract

Suppose that there is an unknown underlying graph GG on a large vertex set, and we can test only a proportion of the possible edges to check whether they are present in GG. If GG has high modularity, is the observed graph GG^{\prime} likely to have high modularity? We see that this is indeed the case under a mild condition, in a natural model where we test edges at random. We find that q(G)q(G)εq^{*}(G^{\prime})\geq q^{*}(G)-\varepsilon with probability at least 1ε1-\varepsilon, as long as the expected number edges in GG^{\prime} is large enough. Similarly, q(G)q(G)+εq^{*}(G^{\prime})\leq q^{*}(G)+\varepsilon with probability at least 1ε1-\varepsilon, under the stronger condition that the expected average degree in GG^{\prime} is large enough. Further, under this stronger condition, finding a good partition for GG^{\prime} helps us to find a good partition for GG.

We also consider the vertex sampling model for partially observing the underlying graph: we find that for dense underlying graphs we may estimate the modularity by sampling constantly many vertices and observing the corresponding induced subgraph, but this does not hold for underlying graphs with a subquadratic number of edges. Finally we deduce some related results, for example showing that under-sampling tends to lead to overestimation of modularity.

1 Introduction and main results

The modularity of a graph is a measure of the extent to which the graph breaks into separate communities. For a given graph GG, each partition 𝒜{\mathcal{A}} of the vertices has a modularity score q𝒜(G)q_{{\mathcal{A}}}(G), with higher values indicating that the partition better captures community structure in GG. The modularity q(G)q^{*}(G) of the graph GG is defined to be the maximum over all vertex partitions of the modularity score, and satisfies 0q(G)<10\leq q^{*}(G)<1. See Section 1.1 for definitions and some background.

Suppose that there is an unknown underlying graph GG on a large given vertex set, and we can test only a small proportion of the possible edges to check whether they are present in GG, or perhaps we can test many possible edges but there is a chance that we fail to spot an edge. We assume that there are no false positives, where there is no edge in GG but we think there is one, except briefly in the concluding remarks. We consider two questions. (a) If GG has high modularity, is the observed graph GG^{\prime} likely to have high modularity? In other words, if GG^{\prime} has low modularity can we assert with some confidence that GG has low modularity? (b) Conversely, if GG has low modularity, is GG^{\prime} likely to have low modularity? We find that, in a natural model where we test possible edges independently at random the answer to (a) is yes, under the condition that the expected number of edges found is large enough; and the answer to (b) is yes, under the stronger condition that the expected average degree in GG^{\prime} is large enough.

To investigate these questions we use the random graph models GpG_{p} and GmG_{m} described in subsections 1.2 and 1.3 below. First we recall the definition of modularity.


1.1 Modularity : definition and notation

Given a graph G=(V,E)G=(V,E), modularity gives a score to each partition of the vertex set VV; and the (maximum) modularity q(G)q^{*}(G) of GG is the maximum of these scores over all vertex partitions. Let 𝟏uvE{\mathbf{1}}_{uv\in E} be the indicator that uvuv is an edge. For a set AA of vertices, let e(A)e(A) denote the number of edges within AA, and let vol(A){\rm vol}(A) denote the sum of the degree dvd_{v} over the vertices vv in AA.

Definition ([36], see also [35]).

Let GG be a graph with m1m\geq 1 edges. For a vertex partition 𝒜\mathcal{A} of GG, the modularity score of 𝒜{\mathcal{A}} on GG is

q𝒜(G)=12mA𝒜u,vA(𝟏uvEdudv2m)=1mA𝒜e(A)14m2A𝒜vol(A)2.\displaystyle q_{\mathcal{A}}(G)=\frac{1}{2m}\sum_{A\in\mathcal{A}}\sum_{u,v\in A}\left({\mathbf{1}}_{uv\in E}-\frac{d_{u}d_{v}}{2m}\right)=\frac{1}{m}\sum_{A\in\mathcal{A}}e(A)-\frac{1}{4m^{2}}\sum_{A\in\mathcal{A}}{\rm vol}(A)^{2}\,.

The modularity of GG is q(G)=max𝒜q𝒜(G)q^{*}(G)=\max_{\mathcal{A}}q_{\mathcal{A}}(G), where the maximum is over all vertex partitions 𝒜{\mathcal{A}}. (For an empty graph GG (i.e. with no edges, m=0m=0): we set q𝒜(G)=0q_{{\mathcal{A}}}(G)=0 for each vertex partition 𝒜{\mathcal{A}}, and q(G)=0q^{*}(G)=0.)

Directly from the definition we have 0q(G)<10\leq q^{*}(G)<1 for all graphs. Note isolated vertices are irrelevant - they are not counted in the formula for the modularity score. The second expression for q𝒜(G)q_{{\mathcal{A}}}(G) expresses it as the difference of the edge contribution or coverage q𝒜E(G)=1mAe(A)q^{E}_{\mathcal{A}}(G)=\tfrac{1}{m}\sum_{A}e(A), and the degree tax q𝒜D(G)=14m2Avol(A)2q^{D}_{\mathcal{A}}(G)=\tfrac{1}{4m^{2}}\sum_{A}{\rm vol}(A)^{2}.


1.2 Modularity of the random graph GpG_{p} obtained by edge-sampling

Given a graph GG and 0<p10<p\leq 1, let GpG_{p} be the random subgraph of GG on the same vertex set obtained by considering each edge of GG independently and keeping it in the graph with probability pp (and otherwise deleting it). Thus the binomial or Erdős-Rényi random graph Gn,pG_{n,p} may be written as (Kn)p(K_{n})_{p}, where KnK_{n} is the nn-vertex complete graph. Let e(H)e(H) denote the number of edges in a graph HH: thus the expected number of edges in GpG_{p} is e(G)pe(G)p. The first of our two theorems concerns when we want to have q(Gp)>q(G)εq^{*}(G_{p})>q^{*}(G)-\varepsilon with probability near 1.

Theorem 1.1.

Given ε>0\varepsilon>0 there exists c=c(ε)c=c(\varepsilon) such that the following holds. For each graph GG and probability pp such that e(G)pce(G)p\geq c, the random graph GpG_{p} satisfies q(Gp)>q(G)εq^{*}(G_{p})>q^{*}(G)-\varepsilon with probability 1ε\geq 1-\varepsilon.

The proof of Theorem 1.1 (in Section 4) will show that we may take c(ε)=Θ(ε3logε1)c(\varepsilon)=\Theta(\varepsilon^{-3}\log\varepsilon^{-1}). Following the proof, we give an example which shows that c(ε)c(\varepsilon) must be at least Ω(ε1logε1)\Omega(\varepsilon^{-1}\log\varepsilon^{-1}). See Figure 1.1 on page 1.1 for simulations illustrating Theorem 1.1 (and Theorem 1.2), and see Figure A.1 on page A.1 for simulations run on a larger underlying graph.

The second of our theorems concerns when we want also to have q(Gp)<q(G)+εq^{*}(G_{p})<q^{*}(G)+\varepsilon with probability near 1, and it shows that this will happen if the expected average degree is sufficiently large. We see also that, in this case, finding a good partition 𝒜{\mathcal{A}} for GpG_{p} helps us to find a good partition 𝒜{\mathcal{A}}^{\prime} for GG. Observe that the expected average degree in GpG_{p} is 2e(G)p/v(G)2e(G)p/v(G), where v(G)v(G) is the number of vertices in GG.

Theorem 1.2.

For each ε>0\varepsilon>0, there is a c=c(ε)c=c(\varepsilon) such that the following holds. Let the graph GG and probability pp satisfy e(G)p/v(G)ce(G)p/v(G)\geq c. Then, with probability 1ε\geq 1-\varepsilon the following statements (a) and (b) hold:

  • (a)

    the random graph GpG_{p} satisfies |q(Gp)q(G)|<ε|q^{*}(G_{p})-q^{*}(G)|<\varepsilon; and

  • (b)

    given any partition 𝒜{\mathcal{A}} of the vertex set, in a linear number of operations (seeing only GpG_{p}) the greedy amalgamating algorithm finds a partition 𝒜{\mathcal{A}}^{\prime} with q𝒜(G)q𝒜(Gp)εq_{{\mathcal{A}}^{\prime}}(G)\geq q_{{\mathcal{A}}}(G_{p})-\varepsilon.

Part (b) says roughly that, given a good partition 𝒜{\mathcal{A}} for GpG_{p}, we can quickly construct a good partition 𝒜{\mathcal{A}}^{\prime} for GG. Using also part (a), we may see that, if the partition 𝒜{\mathcal{A}} for GpG_{p} satisfies q𝒜(Gp)q(Gp)εq_{\mathcal{A}}(G_{p})\geq q^{*}(G_{p})-\varepsilon with probability at least 1ε1-\varepsilon, then the partition 𝒜{\mathcal{A}}^{\prime} satisfies q𝒜(G)>q(G)2εq_{{\mathcal{A}}^{\prime}}(G)>q^{*}(G)-2\varepsilon with probability at least 12ε1-2\varepsilon. See Section 3 for the greedy amalgamating algorithm used to construct 𝒜{\mathcal{A}}^{\prime}. The proof of Theorem 1.2 (in Section 5) will show that we may take c(ε)=Θ(ε3logε1)c(\varepsilon)=\Theta(\varepsilon^{-3}\log\varepsilon^{-1}). Following the proof, we give an example which shows that c(ε)c(\varepsilon) must be at least Ω(ε2)\Omega(\varepsilon^{-2}).

The assumption in Theorem 1.2 that the expected average degree is large is of course much stronger than the assumption in Theorem 1.1, but still e(Gp)e(G_{p}) may be much smaller than e(G)e(G). If we go much further, and assume that at most an ε\varepsilon-proportion of edges are missed, that is |EE|ε|E||E\setminus E^{\prime}|\leq\varepsilon|E|, then deterministically we have |q(G)q(G)|2ε|q^{*}(G^{\prime})-q^{*}(G)|\leq 2\varepsilon by [32], see Section 6.1.


1.3 Modularity of the random graph GmG_{m} obtained by limited search

Let GmG_{m} be the graph on vertex set [n][n] with edge set a uniformly random mm-edge subset of E(G)E(G). The graph GmG_{m} may also be sampled by a randomised procedure to find edges of GG, where we stop once we find ‘enough’ edges: we are given nn, there is an unknown graph GG on V=[n]V=[n], and we query possible edges of GG, uniformly at random, until we find mm edges.

Corollary 1.3.

Given ε>0\varepsilon>0 there exists m0=m0(ε)m_{0}=m_{0}(\varepsilon) such that the following holds. For all mm0m\geq m_{0}, the random graph GmG_{m} satisfies q(Gm)>q(G)εq^{*}(G_{m})>q^{*}(G)-\varepsilon with probability >1ε>1-\varepsilon.

Corollary 1.4.

Given ε>0\varepsilon>0 there exists c=c(ε)c=c(\varepsilon) such that the following holds. For all mcnm\geq cn, the random graph GmG_{m} satisfies the conclusions of Theorem 1.2 when GpG_{p} is replaced by GmG_{m}.

These corollaries follow easily from Theorems 1.1 and 1.2 respectively using Lemma 6.4, which shows that q(Gp)q^{*}(G_{p}) and q(Gm)q^{*}(G_{m}) (and similarly for q𝒜q_{\mathcal{A}}) are whp close in value for p=m/e(G)p=m/e(G). The lemma will be proved in Section 6.2 (the lemma also shows that these quantities are close in expectation, which will be used in Section 9).

Refer to caption

q~(Gp)\tilde{q}(G_{p})

Estimated modularity of sampled graphpp
Refer to caption

q𝒜(Gp)(G)q_{{\mathcal{A}}^{\prime}(G_{p})}(G)

Modularity score of underlying graph usingpartition estimated from sampled graphpp
Figure 1.1: Simulation results. The dolphin social network [29] with 62 vertices and 159 edges was taken to be the underlying graph GG. It is known that q(G)=0.529q^{*}(G)=0.529 to three decimal places [7]. In the upper part of the figure each red point corresponds to the estimated modularity q~(Gp)\tilde{q}(G_{p}) of an instance of the sampled graph GpG_{p}. For each edge probability p=0.1,0.2,,0.9p=0.1,0.2,\ldots,0.9, the graph GpG_{p} was sampled 50 times. For each sampled graph GpG_{p} we took the maximum modularity score of the partitions output by 200 runs of both the Louvain [4] and Leiden [44] algorithms. The noise in the xx-axis is to allow one to see the points.
In the lower part of the figure we examine, for each random instance of GpG_{p}, how well the modularity maximising partition of GpG_{p} performs as a partition on the underlying graph GG. For each sampled graph GpG_{p} we plot the score q𝒜(Gp)(G)q_{{{\mathcal{A}}}^{\prime}(G_{p})}(G), where 𝒜(Gp){\mathcal{A}}(G_{p}) is the highest scoring partition on GpG_{p} found in 200 runs of Louvain and Leiden and 𝒜(Gp){\mathcal{A}}^{\prime}(G_{p}) is the partition modified as in Theorem 1.2(b) (with η=0.05\eta=0.05 in Lemma 3.1). See also Figure A.1 for simulations run on a larger underlying graph.

1.4 Estimating modularity by vertex sampling (‘parameter estimation’)

Another commonly considered model of partially observing an underlying graph GG is to sample a vertex subset UU of constant size kk and observe the corresponding induced subgraph G[U]G[U]. In Section 7 and Theorem 1.5 we consider this model. We find that for dense graphs GG, with probability at least 1ε1-\varepsilon we may estimate the modularity to within an ε\varepsilon error (that is |q(G)q(G[U])|<ε|q^{*}(G)-q^{*}(G[U])|<\varepsilon) by sampling a constant k=k(ε)k=k(\varepsilon) number of vertices ; but for graphs with a subquadratic number of edges this result does not hold, modularity is not estimable.

Theorem 1.5.

(a) For fixed ρ\rho with 0<ρ<10<\rho<1, modularity is estimable for graphs with density at least ρ\rho. (b) For any given function ρ(n)=o(1)\rho(n)=o(1), modularity is not estimable for nn-vertex graphs with density at least ρ(n)\rho(n).


1.5 Outline of the paper

The plan of the rest of the paper is as follows. In Section 2, we first show an application of our results to the stochastic block model in Section 2.1, then provide an overview of the further results of this paper in Section 2.2 and lastly give some background and the relation of this paper to previous results in Section 2.3.

Sections 3 to 5 give the proofs of the main results of the paper: Section 3 gives a crucial preliminary lemma for the proofs, the ‘fattening lemma’; and Theorem 1.1 and Theorem 1.2 are proven in Section 4 and Section 5 respectively. (Indeed we also prove versions of these results which take into account the number of parts in the partition.)

In Section 6 we give a ‘robustness’ lemma showing that q(G)q^{*}(G) and q𝒜(G)q_{\mathcal{A}}(G) do not change much if we change a small proportion of edges; then we prove Lemma 6.4 on GpG_{p} and GmG_{m}, which completes the proof of Corollaries 1.3 and 1.4; and finally we give related results on concentration of modularity. We prove Theorem 1.5 on estimating modularity by vertex sampling in Section 7.

The later sections, Section 8 and 9 contain further related results. In Section 8 we see that under-sampling tends to lead to over-estimation of modularity (using Theorem 1.1). In Section 9 we show that Theorem 1.1 implies results on the expected modularity of random graphs GpG_{p} and GmG_{m} with constant average degree. In Section 10 we give results analogous to Theorems 1.1 and 1.2 for weighted networks and show an application of these results for the stochastic block model. Finally Section 11 contains a few concluding remarks and open questions.

In the appendix we give simulations run on a larger underlying graph in Section A.


2 Further results and background on existing results


2.1 Bootstrapping to sparser graphs and application to stochastic block model.

Modularity preserves signal a little below the connectivity threshold.

This property sets modularity apart from other measures of community in a network. If s(H)s(H) is min-cut, normalised min-cut or the spectral gap of the Laplacian of HH then s(H)=0s(H)=0 if HH is disconnected. Thus, given a connected graph GG, for s(Gp)s(G_{p}) to approximate s(G)s(G) for these measures ss we must take pp large enough that GpG_{p} is likely connected.

To see that our results can hold below the connectivity threshold, consider G=KnG=K_{n} (so q(G)=0q^{*}(G)=0) and p=c/np=c/n with cc a large constant. Then |q(Gp)q(G)|<ε|q^{*}(G_{p})-q^{*}(G)|<\varepsilon with probability 1ε1-\varepsilon, see for example Theorem 1.3 of [32]. But whp GpG_{p} is disconnected – indeed it will have a linear proportion of the vertices and edges inside the giant component, and a linear proportion outside. Furthermore we may take our underlying graph to be disconnected - for example if GG consists of two disjoint equal sized cliques then it has modularity 1/21/2, and this will again be approximated by q(Gp)q^{*}(G_{p}) for p=c/np=c/n with some large constant cc.


Stochastic block model, a little below the connectivity threshold Θ(n1logn)\Theta(n^{-1}\log n).

Let k2k\geq 2 and 0qp10\leq q\leq p\leq 1. There are two versions of the balanced kk-community stochastic block model, where edges appears independently with probability pp within blocks and probability qq between blocks. In the first version, Gn,k,p,qG_{n,k,p,q}, the vertex set V=[n]V=[n] is partitioned deterministically into kk sets (blocks) ViV_{i} of size n/k\lfloor n/k\rfloor or n/k\lceil n/k\rceil; and in the second, Gn,k,p,qG^{\prime}_{n,k,p,q}, each vertex independently and uniformly picks a block to join (so the blocks have size about n/kn/k whp).

We first give an example where bounds at some (not too small) edge density are known from spectral results, and a version of Theorem 1.2 allows us to bootstrap these results to sparser models. After that we present Theorem 2.1 concerning the general kk-community model.

The bootstrapping example involves Gn,2,p,qG_{n,2,p,q}. Write q2q^{*}_{\leq 2} for the maximum modularity value over all partitions into at most two parts, and note that q2q^{*}_{\leq 2} is at least the modularity score of the planted bipartition. Thus by direct calculation - see for example similar calculations in [21, 32] - if n2pn^{2}p\rightarrow\infty then

q2(Gn,2,p,q)pq2(p+q)+o(1)whp.q^{*}_{\leq 2}(G_{n,2,p,q})\geq\frac{p-q}{2(p+q)}+o(1)\;\;\;\mbox{whp}. (2.1)

Using existing spectral results from [11, 27] we may deduce that the lower bound in (2.1) is tight for some values of p,q=Θ(n1logn)p,q=\Theta(n^{-1}\log n). (We suppress the details here, see Remark 5.3.) Interestingly, one may then use Proposition 5.2 (a version of Theorem 1.2 for kk-partitions) to bootstrap these results to sparser graphs, where p,q=ω(n1)p,q=\omega(n^{-1}) - see Remark 5.3. Note that this includes values of pp and qq for which the spectral results do not hold (since for part of the range Gn,2,p,qG_{n,2,p,q} is disconnected whp).

The tightness of (2.1) tells us that whp the planted partition has asymptotically maximal modularity value over all bipartitions. We now consider the modularity value for the kk-block model, and see that the planted partition is asymptotically optimal over all partitions.

Theorem 2.1.

Let k2k\geq 2 be an integer, and let p=p(n)p=p(n) and q=q(n)q=q(n) satisfy 0qp10\leq q\leq p\leq 1 and npnp\to\infty as nn\to\infty. Then for G=Gn,k,p,qG=G_{n,k,p,q} or G=Gn,k,p,qG=G^{\prime}_{n,k,p,q}

q(G)=(pq)(11/k)p+(k1)q+o(1)whp,q^{*}(G)=\frac{(p-q)\,(1-1/k)}{p+(k-1)q}+o(1)\;\;\;\mbox{whp},

and whp the planted partition has this modularity value.

We shall prove Theorem 2.1 in Section 10, using a deterministic lemma, Lemma 10.6, together with Theorem 10.5, which is a version of Theorem 1.2 with weighted underlying graph. We note that [2, 3] showed that whp the modularity optimal partition will agree with the planted partition except for o(n)o(n) vertices (for p=ω(1/n)p=\omega(1/n) and q=ρpq=\rho p for some fixed ρ\rho, 0<ρ<10<\rho<1). Thus we could also have proven Theorem 2.1 for such p,qp,q via robustness results, see Lemma 6.1, and the likely modularity value of the planted partition.

The recent paper [21] gives explicit bounds on the modularity. We provide a simple stand-alone proof of Theorem 2.1, following [21] in using weighted graphs. Our result extends that in [21] as we reduce the upper bound there by a factor 11/k1-1/k to match the lower bound, and we extend the range of pp from ω(n1/2)\omega(n^{-1/2}) to ω(n1)\omega(n^{-1}). Theorem 2.1 did not appear in the earlier arXiv version [33] of our paper.


2.2 Further results in this paper


Robustness, concentration and closeness of modularity.

For either q𝒜(G)q_{\mathcal{A}}(G) or q(G)q^{*}(G) modifying a single edge can change the value by at most 2/e(G)2/e(G). This was known for q(G)q^{*}(G) (see §5 of [32]). In Section 6.1 we prove the 2/e(G)2/e(G) bound holds also for any given partition 𝒜{\mathcal{A}}, and in Example 6.2 show examples such that the factor 2 is necessary in both cases. These robustness results give the concentration theorem below - see Section 6.

Theorem 2.2.

There is a constant η>0\eta>0 such that for each graph GG, each partition 𝒜=𝒜(G){\mathcal{A}}={\mathcal{A}}(G) and each 0<p<10<p<1 the following holds with μ=μ(G,p)=e(G)p\mu=\mu(G,p)=e(G)p. For each t0t\geq 0

(|q𝒜(Gp)𝔼[q𝒜(Gp)]|t)<2eημt2 and (|q(Gp)𝔼[q(Gp)]|t)<2eημt2.{\mathbb{P}}\Big{(}\big{|}\,q_{\mathcal{A}}(G_{p})-{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\,\big{|}\geq t\Big{)}<2\,e^{-\eta\mu t^{2}}\;\mbox{ and }\;{\mathbb{P}}\Big{(}\big{|}\,q^{*}(G_{p})-{\mathbb{E}}[q^{*}(G_{p})]\,\big{|}\geq t\Big{)}<2\,e^{-\eta\mu t^{2}}.

By Theorem 2.2 we have concentration for q(Gp)q^{*}(G_{p}) around 𝔼[q(Gp)]{\mathbb{E}}[q^{*}(G_{p})] as soon as e(G)pe(G)p is large. In more detail, (|q(Gp)𝔼[q(Gp)]|ε)<δ{\mathbb{P}}(|q^{*}(G_{p})-{\mathbb{E}}[q^{*}(G_{p})]|\geq\varepsilon)<\delta if e(G)p>cε2logδ1e(G)p>c\varepsilon^{-2}\log\delta^{-1} for a (large) constant cc. But, results on binomial random graphs (Theorem 1.1 (b) of [32]) show that, when GG is KnK_{n}, we need large average expected degree (i.e. e(G)p/v(G)e(G)p/v(G) large) to avoid q(Gp)q^{*}(G_{p}) being much larger than q(G)q^{*}(G) whp. Thus we get concentration of q(Gp)q^{*}(G_{p}) as soon as e(G)pe(G)p is a large constant, however, this may not be concentration around q(G)q^{*}(G) until the average expected degree e(G)p/v(G)e(G)p/v(G) is a large constant.

Under-sampling and over-estimating modularity.

In Ecological networks each interaction observed reveals that an edge is present in the underlying network, and the effect of sampling effort can be modelled by taking the observed network after varying numbers of observations. It was noted in multiple papers on ecological networks that a lower sampling effort, under-sampling, can lead to overestimating the modularity of the underlying ecological network  [45]. Our paper provides some theoretical explanations - see Section 8 for a statement of the result.


Expected modularity of the binomial random graph Gn,c/nG_{n,c/n}.

Given a constant c>0c>0, for each ncn\geq c\, we let q¯(n,c)=𝔼[q(Gn,c/n)]\bar{q}(n,c)={\mathbb{E}}[q^{*}(G_{n,c/n})], where Gn,c/nG_{n,c/n} is the binomial (or Erdős-Rényi) random graph with edge-probability c/nc/n. It was conjectured in [32] that for each c>0c>0, q¯(n,c)\bar{q}(n,c) tends to a limit q¯(c)\bar{q}(c) as nn\to\infty, and it was noted there that in this case the function q¯(c)\bar{q}(c) would be continuous in cc. From Theorem 1.1 we may deduce that also q¯(c)\bar{q}(c) would be non-increasing in cc - see Section 9.

Modularity and edge-sampling on weighted networks.

Network data which is of interest to cluster often has weights associated with each edge. Though we have stated the modularity score of a partition for binary edge weights it is simple to take the weight of edges inside each part (instead of the number of edges) and to take the degree of a vertex vv to be the sum of the weights of the edges incident to vv (instead of the number of edges). This weighted modularity is often used, and indeed the popular community detection algorithm Louvain can take weighted networks as input [4]. Our Theorems 1.1 and 1.2 have analogs for weighted networks - see Section 10.


2.3 Background on existing results, and our contribution

Modularity : use in community detection.

Modularity was introduced in [36], and gives a score to each vertex partition (i.e. commmunity division) and partitions with higher scores are considered to better capture the communities in the network. It is NP-hard to find a partition 𝒜{\mathcal{A}} with the highest modularity score (i.e. such that q𝒜(G)=q(G)q_{\mathcal{A}}(G)=q^{*}(G)) [7] and community detection algorithms do not do this. However, it is fast to compute the modularity of a particular partition and hence it can be feasible to choose which modifications to make to a partition by picking the candidate partition with the highest modularity score. Louvain [4] and Leiden [44] are examples of this. The algorithms are fast and have had success in recovering ground truth communities on real world networks. However, there no theoretical guarantees for either that the partition found is near optimal, though recently [10] showed that a Louvain-like algorithm recovers the communities in the stochastic block model for a wide parameter range.

Modularity-based clustering algorithms are the most commonly used to detect communities on large network data [24, 23] - see [15] and [39] for surveys. The widespread use in applications makes modularity an important graph parameter to understand theoretically.


Informing applied network theory : privacy.

Sharing network data can lead to privacy concerns - one approach is to share a subsampled graph GpG_{p} instead of GG [41]. A claimed advantage is that GpG_{p} retains many properties of the underlying graph GG and that parameters of GG may be estimated knowing only pp and GpG_{p}. Examples considered in [41] include vertex degrees and number of triangles.

Our contribution is to determine when the modularity of GG is well approximated by the modularity of GpG_{p}. Furthermore, in part (b) of Theorem 1.2, we see that for pp large enough we can likely obtain a near optimal partition of the underlying graph GG whilst seeing only the shared graph GpG_{p}. (Here we mean possible in an information theoretic sense - we do not consider the complexity of finding the partition.) Given the shared graph GpG_{p} and a near optimal partition 𝒜{\mathcal{A}} of the shared graph, we construct a partition which is likely to be near optimal for the underlying (non-shared) graph GG.


Robustness and percolation : existing results for random subgraphs of fixed graphs.

Given a graph GG with property 𝒫\mathcal{P} we say that GG robustly has property 𝒫\mathcal{P} if whp GpG_{p} has property 𝒫\mathcal{P} [43]. A seminal such result is in [22] which showed there is an absolute constant cc such that for pc(logn)/np\geq c(\log n)/n and underlying graph GG with minimum degree n/2n/2 then whp the observed graph GpG_{p} is Hamiltonian. This was later strengthened to a hitting time result, i.e. taking edges of the underlying graph in a random order, in [19].

Another property that has been considered is expansion - a measure of the number of edges leaving each vertex set relative to the size of that set. For dd-regular GG with good expansion properties the expansion in the giant component of GpG_{p} was studied in [37] and [14], see also [13]. Additionally non-planarity of GpG_{p} for GG with a specified minimum degree was studied in [17] and for planar underlying graphs [26] determines the threshold for pp such that GpG_{p} is 3-colourable for every planar GG.

Our contribution is to consider robustness of modularity : for GG with at least a large constant number of edges and pp large enough we get likely lower bounds on q(Gp)q^{*}(G_{p}) and for GG with average degree at least a large constant we also get likely upper bounds on q(Gp)q^{*}(G_{p}).


Parameter estimation, vertex sampling and network measures : existing work.

There is a well developed field of parameter estimation which asks for which parameters is it possible to estimate the parameter of an underlying graph given access to an induced subgraph on a random vertex subset of constant size. For an excellent introduction, including the relation to the theory of graph limits see [6, 28]. There are related results in [5] : this paper analysed testability of graph parameters relating to minimising the number of edges between parts, some of which were normalised with respect to the size of the parts but in a different way to modularity, and found some of these parameters not to be testable. Our results complement these by showing that modularity is not testable in general but is testable for dense graphs.


Modularity : existing results for random graphs.

Recall that the modularity value of a graph is always in the interval [0,1)[0,1) with higher values taken to indicate a higher extent of community structure. The modularity of the binomial random graph Gn,pG_{n,p} exhibits three phases, see [32]: for sparse graphs (where np1+o(1)np\leq 1+o(1)) modularity is near 1 whp, for dense graphs (where npnp\rightarrow\infty) it is near 0 whp, and inbetween (where c1npc2c_{1}\leq np\leq c_{2} for constants 1<c1<c21<c_{1}<c_{2}) it is bounded away from 0 and 11 whp. Note that for G=KnG=K_{n}, we have np12e(G)p/nnp\sim\frac{1}{2}e(G)p/n and q(Kn)=0q^{*}(K_{n})=0 [7], and hence this result on Gn,pG_{n,p} in the dense regime is recovered by our Theorem 1.2. It is also shown in [32] that, for 1/n<p<0.991/n<p<0.99 whp q(Gn,p)=Θ(1/np)q^{*}(G_{n,p})=\Theta(1/\sqrt{np}).

Random regular graphs have received recent attention in [25]; and in particular it is shown that the random cubic graph Gn,3G_{n,3} whp has modularity in the interval [0.667,0.790][0.667,0.790], improving on earlier results. For large rr whp q(Gn,r)=Θ(1/r)q^{*}(G_{n,r})=\Theta(1/\sqrt{r}) [31, 40] - note that this is the same order as for a binomial random graph with expected degree rr. The random hyperbolic graph whp has modularity asymptotically 11 [8], and so does the preferential attachment tree, though if h2h\geq 2 edges are added at each step as in the model GnhG_{n}^{h} then Ω(1/h)=q(Gnh)<0.94\Omega(1/\sqrt{h})=q^{*}(G_{n}^{h})<0.94 whp [40]. The modularity of the stochastic block model, and of a degree-corrected version, have been considered in [21, 2, 3, 46], see also Theorem 2.1.


3 The fattening lemma for vertex partitions

Given a graph GG and 0<η10<\eta\leq 1, we say that a vertex partition 𝒜{\mathcal{A}} is η\eta-fat (or η\eta-good) if each part has volume at least ηvol(G)\eta\,{\rm vol}(G). We shall describe a greedy algorithm which, given a graph GG, 0<η10<\eta\leq 1 and a vertex partition \mathcal{B}, amalgamates some parts in \mathcal{B} to yield an η\eta-fat partition 𝒜=𝒜(G,η,){\mathcal{A}}={\mathcal{A}}(G,\eta,\mathcal{B}) with modularity score at most a little less than that of \mathcal{B}. Note that |𝒜||||{\mathcal{A}}|\leq|\mathcal{B}|.

Lemma 3.1 (The fattening lemma).

For each non-empty graph GG and each 0<η10<\eta\leq 1, there is an η\eta-fat partition 𝒜{\mathcal{A}} of V(G)V(G) such that q𝒜(G)>q(G)2ηq_{{\mathcal{A}}}(G)>q^{*}(G)-2\eta. Indeed, given any partition \mathcal{B} of V(G)V(G), the greedy amalgamating algorithm uses a linear number of operations and constructs an η\eta-fat partition 𝒜=𝒜(G,η,){\mathcal{A}}={\mathcal{A}}(G,\eta,\mathcal{B}) such that q𝒜(G)>q(G)2ηq_{{\mathcal{A}}}(G)>q_{\mathcal{B}}(G)-2\eta.

For comparison, note the neat result of Dinh and Thai [12] that, for each graph GG and positive integer kk, we have

qk(G)q(G)(11/k),q^{*}_{\leq k}(G)\geq q^{*}(G)\,(1-1/k), (3.1)

where qk(G)q^{*}_{\leq k}(G) is the maximum modularity score over partitions with at most kk parts. Observe that if 𝒜{\mathcal{A}} is an η\eta-fat partition for GG and k=1/ηk=\lfloor 1/\eta\rfloor, then q𝒜(G)qk(G)q_{{\mathcal{A}}}(G)\leq q_{\leq k}(G). However note neither approximation result implies the other. The constant 2 in Lemma 3.1 is best possible, as shown by the following example.

Example 3.2.

Fix 0<b<20<b<2. Let the odd integer kk be sufficiently large that

11k>b(1216k),1-\tfrac{1}{k}>b\,(\tfrac{1}{2}-\tfrac{1}{6k}),

so there exists η\eta such that η>1216k\eta>\tfrac{1}{2}-\tfrac{1}{6k} and 11k>bη1-\tfrac{1}{k}>b\eta. Let the graph GG consist of kk disjoint triangles. Thus GG has 3k3k vertices and vol(G)=6k{\rm vol}(G)=6k. Since η>1216k\eta>\tfrac{1}{2}-\tfrac{1}{6k} the only η\eta-fat partition for GG is the trivial partition 𝒯\mathcal{T}, with modularity score 0. Also q(G)=11kq^{*}(G)=1-\tfrac{1}{k} (achieved with the connected components partition), see Proposition 1.3 of [31]. Thus

q(G)q𝒯(G)=11k>bη,q^{*}(G)-q_{\mathcal{T}}(G)=1-\tfrac{1}{k}>b\eta,

so in Lemma 3.1 we cannot replace 2 by bb.

To prove Lemma 3.1, it of course suffices to prove the second part. The greedy amalgamating algorithm to yield a good η\eta-fat partition is essentially a greedy algorithm for numbers, and we describe that first. The main step in the number greedy algorithm involves bipartitions, and the following well known problem. Before stating the problem, let us note some standard easy inequalities which are useful for the number (bi-) partitioning problem and for considering degree tax. Given x1,,xk0x_{1},\ldots,x_{k}\geq 0 with ixi=s\sum_{i}x_{i}=s, we have

sminixis2/kixi2smaxixi.s\,\min_{i}x_{i}\leq s^{2}/k\leq\sum_{i}x_{i}^{2}\leq s\,\max_{i}x_{i}. (3.2)

The number (bi-) partitioning problem

Given a positive integer nn and a positive nn-vector 𝒙=(x1,,xn)\boldsymbol{x}=(x_{1},\ldots,x_{n}) with ixi=1\sum_{i}x_{i}=1, let

λ(𝒙)=maxA[n]min{iAxi,1iAxi}.\lambda(\boldsymbol{x})=\max_{A\subseteq[n]}\,\min\,\{\sum_{i\in A}x_{i},1-\sum_{i\in A}x_{i}\}.

Thus 0λ(𝒙)120\leq\lambda(\boldsymbol{x})\leq\frac{1}{2}. Determining λ(𝒙)\lambda(\boldsymbol{x}) is the number partitioning problem, one of Garey and Johnson’s six basic NP-hard problems. It is well known that if each xi13x_{i}\leq\frac{1}{3} then λ(𝒙)13\lambda(\boldsymbol{x})\geq\frac{1}{3}. (To see this, observe that as we increase jj the last partial sum x1+x2++xjx_{1}+x_{2}+\ldots+x_{j} which is at most 23\frac{2}{3} is at least 13\frac{1}{3}. This result also follows immediately from (3.4) below.)

We might expect that λ(𝒙)\lambda(\boldsymbol{x}) is large when ixi2\sum_{i}x_{i}^{2} is small, that is when 1ixi21\!-\!\sum_{i}x_{i}^{2} is large. We would like to find the largest constant α\alpha such that always λ(𝒙)α(1ixi2)\lambda(\boldsymbol{x})\geq\alpha(1\!-\!\sum_{i}x_{i}^{2}). We must have α12\alpha\leq\frac{1}{2}: for example if n>1n>1 is odd and each xi=1/nx_{i}=1/n, then λ(𝒙)=(n1)/2n=12(11/n)=12(1ixi2)\lambda(\boldsymbol{x})=(n-1)/2n=\frac{1}{2}(1-1/n)=\frac{1}{2}(1-\sum_{i}x_{i}^{2}). We shall show that 12\frac{1}{2} is the right answer for α\alpha, that is, we always have λ(𝒙)12(1ixi2)\lambda(\boldsymbol{x})\geq\frac{1}{2}(1-\sum_{i}x_{i}^{2}); and further there is a simple greedy bi-partitioning algorithm which achieves this.

Assume that the elements are ordered so that x1x2xn>0x_{1}\geq x_{2}\geq\cdots\geq x_{n}>0. We have two bins, and we add the elements one by one to a smaller of the bins. In detail, the greedy bi-partitioning algorithm is as follows. Initially set A=B=A=B=\emptyset. For j=1,,nj=1,\ldots,n, if iAxiiBxi\sum_{i\in A}x_{i}\leq\sum_{i\in B}x_{i} then insert jj into AA, else insert jj into BB. At the end, let γ(𝒙)=min{iAxi,iBxi}\gamma(\boldsymbol{x})=\min\{\sum_{i\in A}x_{i},\sum_{i\in B}x_{i}\}. The algorithm clearly uses at most nn comparisons and nn additions.

Lemma 3.3.
γ(𝒙)12(1ixi2).\gamma(\boldsymbol{x})\geq\tfrac{1}{2}(1-\sum_{i}x_{i}^{2}). (3.3)

Note by (3.2) we have ixi2x1\sum_{i}x_{i}^{2}\leq x_{1}, so (3.3) gives

γ(𝒙)12(1x1).\gamma(\boldsymbol{x})\geq\tfrac{1}{2}(1-x_{1}). (3.4)
Proof of Lemma 3.3.

When the algorithm has finished, let TT be the total of the values xix_{i} in the bin containing xnx_{n}; and let T¯=1T\bar{T}=1-T, the total in the other bin. We shall use induction on nn. The result is trivial if n=1n=1, since both sides of (3.3) are 0. Let n2n\geq 2 and assume that the result holds for all inputs of length n1n-1. We shall consider two cases.

  • (a)

    Suppose that T12T\geq\frac{1}{2}, so TT¯T\geq\bar{T}. Then TT¯xnT-\bar{T}\leq x_{n}, so γ(𝒙)=T¯12(1xn)\gamma(\boldsymbol{x})=\bar{T}\geq\frac{1}{2}(1-x_{n}). But ixi2xn\sum_{i}x_{i}^{2}\geq x_{n} by (3.2), so (3.3) holds (without using the induction hypothesis).

  • (b)

    Suppose that T<12T<\frac{1}{2}, so γ(𝒙)=T<T¯\gamma(\boldsymbol{x})=T<\bar{T}. Let x=xnx=x_{n} and s=i=1nxi2s=\sum_{i=1}^{n}x_{i}^{2}, so 0<x,s<10<x,s<1. Let 𝒚=(y1,,yn1)\boldsymbol{y}=(y_{1},\ldots,y_{n-1}) where yi=xi/(1x)y_{i}=x_{i}/(1-x) (and note that iyi=1\sum_{i}y_{i}=1). By the induction hypothesis,

    γ(𝒚)12(1iyi2)=12(1sx2(1x)2).\gamma(\boldsymbol{y})\geq\tfrac{1}{2}(1-\sum_{i}y_{i}^{2})=\tfrac{1}{2}(1-\tfrac{s-x^{2}}{(1-x)^{2}}).

    But γ(𝒙)=x+(1x)γ(𝒚)\gamma(\boldsymbol{x})=x+(1-x)\gamma(\boldsymbol{y}). Hence

    γ(𝒙)\displaystyle\gamma(\boldsymbol{x}) \displaystyle\geq x+12(1x)(1sx2(1x)2)=12(1+xsx21x)\displaystyle x+\tfrac{1}{2}(1-x)(1-\tfrac{s-x^{2}}{(1-x)^{2}})\;\;=\;\;\tfrac{1}{2}(1+x-\tfrac{s-x^{2}}{1-x})
    =\displaystyle= 121x2s+x21x=121s1x>12(1s).\displaystyle\tfrac{1}{2}\tfrac{1-x^{2}-s+x^{2}}{1-x}\;\;=\;\;\tfrac{1}{2}\tfrac{1-s}{1-x}\;\;>\;\;\tfrac{1}{2}(1-s).

    Hence, again (3.3) holds.

Now we have established the induction step, and thus proven the lemma.∎

Next we consider partitioning numbers into several parts. Let n1n\geq 1 and let 𝒙=(x1,,xn)\boldsymbol{x}=(x_{1},\ldots,x_{n}) with each xi>0x_{i}>0 and ixi=1\sum_{i}x_{i}=1. Let k1k\geq 1 and let 𝒜=(A1,,Ak){\mathcal{A}}=(A_{1},\ldots,A_{k}) be a partition of [n][n]. Let Sj=iAjxiS_{j}=\sum_{i\in A_{j}}x_{i} for each j=1,,kj=1,\ldots,k. The corresponding cost for 𝒜{\mathcal{A}} and 𝒙\boldsymbol{x} is

c(𝒜,𝒙)=j[k]Sj2i[n]xi2.c({\mathcal{A}},\boldsymbol{x})=\sum_{j\in[k]}{S_{j}}^{2}-\sum_{i\in[n]}x_{i}^{2}.

Observe that 0c(𝒜,𝒙)<10\leq c({\mathcal{A}},\boldsymbol{x})<1. Given 0<η10<\eta\leq 1, we say that the partition 𝒜{\mathcal{A}} is (𝒙,η)(\boldsymbol{x},\eta)-fat if SjηS_{j}\geq\eta for each jj. Observe that the trivial partition is always (𝒙,η)(\boldsymbol{x},\eta)-fat.

The number greedy partitioning algorithm starts with the trivial partition. While there is a part AA such that, setting S=iAxiS=\sum_{i\in A}x_{i} and 𝒚=(xi/S:iA)\boldsymbol{y}=(x_{i}/S:i\in A) we have γ(𝒚)η/S\gamma(\boldsymbol{y})\geq\eta/S, it picks the first such part AA and uses the greedy number bi-partitioning algorithm to split AA into two parts each with sum at least η\eta.

Lemma 3.4.

Let n1n\geq 1 and let 𝐱=(x1,,xn)\boldsymbol{x}=(x_{1},\ldots,x_{n}) with each xi>0x_{i}>0 and ixi=1\sum_{i}x_{i}=1; and let 0<η10<\eta\leq 1. Then the number greedy partitioning algorithm finds an (𝐱,η)(\boldsymbol{x},\eta)-fat partition 𝒜{\mathcal{A}} of [n][n] with c(𝒜,𝐱)<2ηc({\mathcal{A}},\boldsymbol{x})<2\eta, using O(n)O(n) operations.

Proof.

Let the final partition obtained by the greedy amalgamating algorithm be 𝒜=(A1,,Ak){\mathcal{A}}=(A_{1},\ldots,A_{k}) (for some k1k\geq 1). Fix j[k]j\in[k]. Let 𝒙(j)=(xi/Sj:iAj)\boldsymbol{x}^{(j)}=(x_{i}/S_{j}:i\in A_{j}). Let cjc_{j} denote the cost of the trivial partition of AjA_{j} with 𝒙(j)\boldsymbol{x}^{(j)}, so cj=1iAj(xi/Sj)20c_{j}=1-\sum_{i\in A_{j}}(x_{i}/S_{j})^{2}\geq 0. By Lemma 3.3, γ(𝒙(j))12(1iAj(xi/Sj)2)=12cj\gamma(\boldsymbol{x}^{(j)})\geq\tfrac{1}{2}(1-\sum_{i\in A_{j}}(x_{i}/S_{j})^{2})=\tfrac{1}{2}c_{j}. But since the algorithm stopped with 𝒜{\mathcal{A}}, we have γ(𝒙(j))<η/Sj\gamma(\boldsymbol{x}^{(j)})<\eta/S_{j}. Hence cj<2η/Sjc_{j}<2\eta/S_{j} for each j[k]j\in[k], and so

c(𝒜,𝒙)\displaystyle c({\mathcal{A}},\boldsymbol{x}) =\displaystyle= j[k](Sj2iAjxi2)=j[k]Sj2(1iAj(xi/Sj)2)\displaystyle\sum_{j\in[k]}({S_{j}}^{2}-\!\sum_{i\in A_{j}}x_{i}^{2})\;=\;\sum_{j\in[k]}{S_{j}}^{2}(1-\!\sum_{i\in A_{j}}(x_{i}/S_{j})^{2})
=\displaystyle= j[k]Sj2cj<j[k]Sj2(2η/Sj)= 2η,\displaystyle\sum_{j\in[k]}{S_{j}}^{2}c_{j}\;<\sum_{j\in[k]}{S_{j}}^{2}(2\eta/S_{j})\;=\;2\eta,

as required. ∎

Finally, let us deduce the fattening lemma, Lemma 3.1, from Lemma 3.4. The greedy amalgamating algorithm which we apply to the partition =(B1,,Bn)\mathcal{B}=(B_{1},\ldots,B_{n}) of V(G)V(G) is essentially the number greedy partitioning algorithm applied to the partition of [n][n] into singletons {i}\{i\}, where the weight of ii is vol(Bi)/vol(G){\rm vol}(B_{i})/{\rm vol}(G).

Proof of Lemma 3.1.

Let =(B1,,Bn)\mathcal{B}=(B_{1},\ldots,B_{n}) be a partition of V(G)V(G); and let 𝒙=(x1,,xn)\boldsymbol{x}=(x_{1},\ldots,x_{n}) where xi=vol(Bi)/vol(G)x_{i}={\rm vol}(B_{i})/{\rm vol}(G). By Lemma 3.4 the number greedy partitioning algorithm finds an (𝒙,η)(\boldsymbol{x},\eta)-fat partition 𝒜=(A1,,Ak){\mathcal{A}}=(A_{1},\ldots,A_{k}) of [n][n] with c(𝒜,𝒙)<2ηc({\mathcal{A}},\boldsymbol{x})<2\eta. Now 𝒜{\mathcal{A}} yields the partition 𝒜~=(A~1,,A~k)\tilde{{\mathcal{A}}}=(\tilde{A}_{1},\ldots,\tilde{A}_{k}) of V(G)V(G), where A~j=iAjBi\tilde{A}_{j}=\cup_{i\in A_{j}}B_{i}; and vol(A~j)=(iAjxi)vol(G){\rm vol}(\tilde{A}_{j})=(\sum_{i\in A_{j}}x_{i})\,{\rm vol}(G), so 𝒜~\tilde{{\mathcal{A}}} is η\eta-fat for GG. Also q𝒜~E(G)qE(G)q_{\tilde{{\mathcal{A}}}}^{E}(G)\geq q_{\mathcal{B}}^{E}(G), and q𝒜~D(G)qD(G)=c(𝒜,𝒙)q_{\tilde{{\mathcal{A}}}}^{D}(G)-q_{\mathcal{B}}^{D}(G)=c({\mathcal{A}},\boldsymbol{x}). Thus

q𝒜~(G)\displaystyle q_{\tilde{{\mathcal{A}}}}(G) \displaystyle\geq qE(G)q𝒜~D(G)=qE(G)(c(𝒜,𝒙)+qD(G))\displaystyle q_{\mathcal{B}}^{E}(G)-q_{\tilde{{\mathcal{A}}}}^{D}(G)\;=\;q_{\mathcal{B}}^{E}(G)-(c({\mathcal{A}},\boldsymbol{x})+q_{\mathcal{B}}^{D}(G))
=\displaystyle= q(G)c(𝒜,𝒙)>q(G)2η.\displaystyle q_{\mathcal{B}}(G)-c({\mathcal{A}},\boldsymbol{x})\;>\;q_{\mathcal{B}}(G)-2\eta.

This completes the proof of Lemma 3.1. ∎


4 Modularity of GpG_{p} with large expected number of edges

4.1 Proof of Theorem 1.1

We shall use a tail bound for random variables with a binomial or similar distribution. We use a variant of the Chernoff bounds, which follows for example from Theorems 2.1 and 2.8 of [18] by considering S/bS/b. In this and the next section we shall always have b=1b=1 or b=2b=2\,: we will need general bb in Section 10.

Lemma 4.1.

Let b>0b>0, and let random variables X1,,XkX_{1},\ldots,X_{k} be independent, with 0Xjb0\leq X_{j}\leq b for each jj. Let S=j=1kXjS=\sum_{j=1}^{k}X_{j} and μ=𝔼(S)\mu={\mathbb{E}}(S). Then

(|Sμ|εμ)2e13ε2μ/b for 0<ε1;{\mathbb{P}}(|S-\mu|\geq\varepsilon\mu)\leq 2e^{-\tfrac{1}{3}\varepsilon^{2}\mu/b}\;\;\;\mbox{ for }0<\varepsilon\leq 1;

or equivalently

(|Sμ|xμ)2e13x2/b for 0<xμ.{\mathbb{P}}(|S-\mu|\geq x\sqrt{\mu})\leq 2e^{-\frac{1}{3}x^{2}/b}\;\;\;\mbox{ for }0<x\leq\sqrt{\mu}\,.
Proof of Theorem 1.1.

First note that we may assume that 0<ε<10<\varepsilon<1 and q(G)εq^{*}(G)\geq\varepsilon. Let η=ε/4\eta=\varepsilon/4. By Lemma 3.1 there is an η\eta-fat partition 𝒜=(A1,,Ak){\mathcal{A}}=(A_{1},\ldots,A_{k}) for GG such that q𝒜(G)q(G)2ηq_{{\mathcal{A}}}(G)\geq q^{*}(G)-2\eta (thus q𝒜(G)2ηq_{{\mathcal{A}}}(G)\geq 2\eta by our choice of η=ε/4\eta=\varepsilon/4). Observe that the number kk of parts in 𝒜{\mathcal{A}} is at most η1\eta^{-1}, and thus q𝒜D(G)ηq_{{\mathcal{A}}}^{D}(G)\geq\eta by (3.2). To prove Theorem 1.1 we consider first the edge contribution and then the degree tax. Let t=(e(G)p)1/2t=(e(G)p)^{1/2}. By Lemma 4.1 (with b=1b=1), for 0xt0\leq x\leq t

(|e(Gp)e(G)p|e(G)px/t)2ex2/3.{\mathbb{P}}(|e(G_{p})-e(G)p|\geq e(G)p\cdot x/t)\leq 2e^{-x^{2}/3}\,. (4.1)

For a graph HH on V(G)V(G), let e𝒜int(H)e_{{\mathcal{A}}}^{\rm int}(H) denote the number of ‘internal’ edges of HH within the parts of 𝒜{\mathcal{A}}, so that q𝒜E(H)=e𝒜int(H)/e(H)q_{{\mathcal{A}}}^{E}(H)=e_{{\mathcal{A}}}^{\rm int}(H)/e(H). Then by Lemma 4.1 as above, for 0<x(e𝒜int(G)p)1/20<x\leq(e_{{\mathcal{A}}}^{\rm int}(G)p)^{1/2}

(e𝒜int(Gp)e𝒜int(G)p(1x(e𝒜int(G)p)1/2)2ex2/3.{\mathbb{P}}(e_{{\mathcal{A}}}^{\rm int}(G_{p})\leq e_{{\mathcal{A}}}^{\rm int}(G)p\,(1-x\,({e_{{\mathcal{A}}}^{\rm int}(G)p})^{-1/2})\leq 2e^{-x^{2}/3}\,.

But

q𝒜E(G)>q(G)2η+q𝒜D(G)q(G)η3η,q_{{\mathcal{A}}}^{E}(G)>q^{*}(G)-2\eta+q_{{\mathcal{A}}}^{D}(G)\geq q^{*}(G)-\eta\geq 3\eta,

so e𝒜int(G)3ηe(G)e_{{\mathcal{A}}}^{\rm int}(G)\geq 3\eta\,e(G). Hence, for 0<x(3η)1/2t0<x\leq(3\eta)^{1/2}t, with probability at least 12ex2/31-2e^{-x^{2}/3}

e𝒜int(Gp)e𝒜int(G)p(1x(3ηe(G)p)1/2)=e𝒜int(G)p(1(3η)1/2x/t).e_{{\mathcal{A}}}^{\rm int}(G_{p})\geq e_{{\mathcal{A}}}^{\rm int}(G)\,p\,(1-x(3\eta\,e(G)p)^{-1/2})=e_{{\mathcal{A}}}^{\rm int}(G)\,p\,(1-(3\eta)^{-1/2}x/t). (4.2)

Also, by (4.1), for 0<xt0<x\leq t we have e(Gp)e(G)p(1+x/t)e(G_{p})\leq e(G)p\,(1+x/t) with probability at least 12ex2/31-2e^{-x^{2}/3}. Hence by (4.2), for 0<x(3η)1/2t0<x\leq(3\eta)^{1/2}t, with probability at least 14ex2/31-4e^{-x^{2}/3}

q𝒜E(Gp)=e𝒜int(Gp)e(Gp)q𝒜E(G)1(3η)1/2x/t1+x/t.q_{{\mathcal{A}}}^{E}(G_{p})=\frac{e_{{\mathcal{A}}}^{\rm int}(G_{p})}{e(G_{p})}\geq q_{{\mathcal{A}}}^{E}(G)\frac{1-(3\eta)^{-1/2}x/t}{1+x/t}. (4.3)

The degree tax is only a little more complicated. Let Xi=volGp(Ai)X_{i}={\rm vol}_{G_{p}}(A_{i}), so 𝔼[Xi]=volG(Ai)p{\mathbb{E}}[X_{i}]={\rm vol}_{G}(A_{i})p. By Lemma 4.1 with b=2b=2, for 0<x(volG(Ai)p)1/20<x\leq({\rm vol}_{G}(A_{i})p)^{1/2}, with probability at least 12ex2/61-2e^{-x^{2}/6} we have

XivolG(Ai)p(1+x(volG(Ai)p)1/2).X_{i}\leq{\rm vol}_{G}(A_{i})p\,\big{(}1+x\,\big{(}{{\rm vol}_{G}(A_{i})p}\big{)}^{-1/2}\big{)}\,.

But, since volG(Ai)ηvol(G){\rm vol}_{G}(A_{i})\geq\eta\,{\rm vol}(G),

(volG(Ai)p)1/2(ηvol(G)p)1/2=(2η)1/2/t;\big{(}{\rm vol}_{G}(A_{i})p\big{)}^{-1/2}\leq\big{(}\eta\,{\rm vol}(G)p\big{)}^{-1/2}=(2\eta)^{-1/2}/t;

and so, for 0<x(2η)1/2t0<x\leq(2\eta)^{1/2}t, with probability at least 12ex2/61-2e^{-x^{2}/6} we have

XivolG(Ai)p(1+(2η)1/2x/t).X_{i}\leq{\rm vol}_{G}(A_{i})p\,\big{(}1+(2\eta)^{-1/2}x/t\big{)}\,.

Recall that 𝒜{\mathcal{A}} has kk parts and thus for 0<x(2η)1/2t0<x\leq(2\eta)^{1/2}t, with probability at least 12kex2/61-2ke^{-x^{2}/6}

iXi2ivolG(Ai)2p2(1+(2η)1/2x/t)2.\sum_{i}X_{i}^{2}\leq\sum_{i}{\rm vol}_{G}(A_{i})^{2}p^{2}\,\big{(}1+(2\eta)^{-1/2}x/t\big{)}^{2}.

Also, by (4.1), for 0<xt0<x\leq t, with probability at least 12ex2/31-2e^{-x^{2}/3} we have e(Gp)e(G)p(1x/t)e(G_{p})\geq e(G)p(1-x/t) and so vol(Gp)2vol(G)2p2(1x/t)2{\rm vol}(G_{p})^{2}\geq{\rm vol}(G)^{2}p^{2}(1-x/t)^{2}. Hence, for 0<x(2η)1/2t0<x\leq(2\eta)^{1/2}t, with probability at least 12(k+1)ex2/61-2(k+1)e^{-x^{2}/6}

q𝒜D(Gp)q𝒜D(G)(1+(2η)1/2x/t1x/t)2.q_{{\mathcal{A}}}^{D}(G_{p})\leq q_{{\mathcal{A}}}^{D}(G)\left(\frac{1+(2\eta)^{-1/2}x/t}{1-x/t}\right)^{2}. (4.4)

Now, for 0<x(2η)1/2t0<x\leq(2\eta)^{1/2}t, with probability at least 12(k+3)ex2/61-2(k+3)e^{-x^{2}/6}, both (4.3) and (4.4) hold. (With a little more care we could replace the factor (k+3)(k+3) here by (k+2)(k+2) but that would not make a significant difference.) Since kη1=4/εk\leq\eta^{-1}=4/\varepsilon, the failure probability here is at most 2(4/ε+3)ex2/62(4/\varepsilon+3)e^{-x^{2}/6}. Choose x=x(ε)x=x(\varepsilon) sufficiently large that this probability is at most ε\varepsilon; and note that we may take x=Θ(logε1)x=\Theta(\sqrt{\log\varepsilon^{-1}}). In the next paragraph we shall choose t0(2η)1/2xt_{0}\geq(2\eta)^{-1/2}x, so (4.4) holds for our choice of xx and any tt0t\geq t_{0}.

Consider the ratios on the right sides of the inequalities (4.3) and (4.4). Choose t0t_{0} sufficiently large that for tt0t\geq t_{0}

1(3η)1/2x/t1+x/t1η\frac{1-(3\eta)^{-1/2}x/t}{1+x/t}\geq 1-\eta (4.5)

and

(1+(2η)1/2x/t1x/t)21+η.\left(\frac{1+(2\eta)^{-1/2}x/t}{1-x/t}\right)^{2}\leq 1+\eta. (4.6)

Note that the left side in (4.5) is increasing in tt, and the left side in (4.6) is decreasing in tt, so we just need the inequalities to hold for t0t_{0}. Rearranging, we see the inequality (4.5) is equivalent to

tx(31/2η3/2+η11).t\geq x(3^{-1/2}\eta^{-3/2}+\eta^{-1}-1).

and thus we must have t031/2η3/2xt_{0}\geq 3^{-1/2}\eta^{-3/2}x. In fact we may take t0=Θ(η3/2x)=Θ(ε3/2logε1)t_{0}=\Theta(\eta^{-3/2}x)=\Theta(\varepsilon^{-3/2}\sqrt{\log\varepsilon^{-1}}) and this will ensure both (4.5) and (4.6) hold.

Finally, set c=t02c=t_{0}^{2}, so c=Θ(ε3logε1)c=\Theta(\varepsilon^{-3}\log\varepsilon^{-1}); and let us check that this value for cc works. With probability at least 1ε1-\varepsilon both (4.3) and (4.4) hold. Suppose that both (4.3) and (4.4) do hold: if also e(G)pce(G)p\geq c then tt0t\geq t_{0}, so (4.5) and (4.6) hold; and then

q𝒜(Gp)q𝒜E(G)(1η)q𝒜D(G)(1+η)>q𝒜(G)2ηq(G)ε.q_{{\mathcal{A}}}(G_{p})\geq q_{{\mathcal{A}}}^{E}(G)(1-\eta)-q_{{\mathcal{A}}}^{D}(G)(1+\eta)>q_{{\mathcal{A}}}(G)-2\eta\geq q^{*}(G)-\varepsilon. (4.7)

Hence, if e(G)pce(G)p\geq c, then q𝒜(Gp)>q(G)εq_{{\mathcal{A}}}(G_{p})>q^{*}(G)-\varepsilon with probability at least 1ε1-\varepsilon, as required. This completes the proof of Theorem 1.1. ∎

In the above proof of Theorem 1.1 we took c=Θ(ε3logε1)c=\Theta(\varepsilon^{-3}\log\varepsilon^{-1}). If we used a more detailed and precise form of Lemma 4.1 - see for example Theorem 21.6 of [16] - we could improve several bounds in the proof but we would not improve the asymptotic estimate for cc. If we simply used Chebyshev’s inequality we find we can take c=Θ(ε5)c=\Theta(\varepsilon^{-5}).

The following example shows that, for the conclusion of Theorem 1.1 to hold, c(ε)c(\varepsilon) must be at least Ω(ε1logε1)\Omega(\varepsilon^{-1}\log\varepsilon^{-1}) as ε0\varepsilon\to 0.

Example 4.2.

Let ε>0\varepsilon>0 be small. Let the integer c=c(ε)c=c(\varepsilon) be sufficiently large that the conclusion of Theorem 1.1 holds. Let m=2cm=2c and k=εmk=\lceil\varepsilon m\rceil, and assume that k+1<mk+1<m. Let GG be the mm-edge graph consisting of a star with mkm-k edges together with kk isolated edges. Then the connected components partition is the optimal partition for GG, since each component has modularity 0 [27]. Thus

q(G)=1(mk)2+km2=2kmk2+km2>kmε.q^{*}(G)=1-\tfrac{(m-k)^{2}+k}{m^{2}}=\tfrac{2k}{m}-\tfrac{k^{2}+k}{m^{2}}>\tfrac{k}{m}\geq\varepsilon.

Now let p=12p=\frac{1}{2}, and note that e(G)p=mpce(G)p=mp\geq c. The probability that GpG_{p} contains none of the kk isolated edges is (1p)k>2εm1(1-p)^{k}>2^{-\varepsilon m-1}. But any star has modularity 0, so (q(Gp)=0)>2εm1{\mathbb{P}}(q^{*}(G_{p})=0)>2^{-\varepsilon m-1}. Thus we must have 2εm<2ε2^{-\varepsilon m}<2\varepsilon, and so c>12ε1log2(2ε)1c>\frac{1}{2}\varepsilon^{-1}\log_{2}(2\varepsilon)^{-1}.


4.2 A kk-part analogue to Theorem 1.1

Recall that qk(G)=max|𝒜|kq𝒜(G)q^{*}_{\leq k}(G)=\max_{|{\mathcal{A}}|\leq k}q_{\mathcal{A}}(G), the maximum value of q𝒜(G)q_{\mathcal{A}}(G) over all vertex partitions 𝒜{\mathcal{A}} with at most kk parts. We note here a variant of Theorem 1.1 where we replace each instance of qq^{*} by qkq^{*}_{\leq k}. (Observe that the inequality (3.1) of Dinh and Thai with large kk shows that Proposition 4.3 implies Theorem 1.1. A similar connection will hold for Proposition 5.2 and Theorem 1.2.)

Proposition 4.3.

Given ε>0\varepsilon>0 there exists c=c(ε)c=c(\varepsilon) such that the following holds. For each graph GG and probability pp such that e(G)pce(G)p\geq c, and for each k2k\geq 2, the random graph GpG_{p} satisfies qk(Gp)>qk(G)εq^{*}_{\leq k}(G_{p})>q^{*}_{\leq k}(G)-\varepsilon with probability 1ε\geq 1-\varepsilon.

Proof.

The proof is almost immediate following that of Theorem 1.1. As in that proof, first note we may assume that 0<ε<10<\varepsilon<1 and qk(G)εq^{*}_{\leq k}(G)\geq\varepsilon. Let 𝒜{\mathcal{A}} be a partition with at most kk parts and such that q𝒜(G)=qk(G)q_{{\mathcal{A}}}(G)=q^{*}_{\leq k}(G). Let η=ε/4\eta=\varepsilon/4 and then by Lemma 3.1 there is an η\eta-fat partition 𝒜{\mathcal{A}}^{\prime} such that q𝒜(G)q𝒜(G)2ηq_{{\mathcal{A}}^{\prime}}(G)\geq q_{{\mathcal{A}}}(G)-2\eta and such that 𝒜{\mathcal{A}} is a refinement of 𝒜{\mathcal{A}}^{\prime}. Note in particular this means that |𝒜||𝒜|k|{\mathcal{A}}^{\prime}|\leq|{\mathcal{A}}|\leq k and so 𝒜{\mathcal{A}}^{\prime} has at most kk parts.

Then we may proceed exactly as in the proof of Theorem 1.1 and the analogue of line (4.7) would say that then with probability at least 1ε1-\varepsilon we have

q𝒜(Gp)>q𝒜(G)ε.q_{{\mathcal{A}}^{\prime}}(G_{p})>q_{\mathcal{A}}(G)-\varepsilon.

But now, since q𝒜(G)=qk(G)q_{\mathcal{A}}(G)=q^{*}_{\leq k}(G) and since |𝒜|k|{\mathcal{A}}^{\prime}|\leq k this implies that with probability at least 1ε1-\varepsilon we have qk(Gp)q𝒜(Gp)>qk(G)εq^{*}_{\leq k}(G_{p})\geq q_{{\mathcal{A}}^{\prime}}(G_{p})>q^{*}_{\leq k}(G)-\varepsilon as required. ∎


5 Modularity of GpG_{p} with large expected degree

In this section we first prove Theorem 1.2, and then we give a kk-part analogue Proposition 5.2 and use it to complete the discussion of the bootstrapping example from Section 2.1.

5.1 Proof of Theorem 1.2

The rough idea of the proof of Theorem 1.2 is that we can use the fattening lemma, Lemma 3.1, to bound the probability that a vertex partition behaves badly by the probability that a fat vertex partition behaves similarly badly, and we can use probabilistic methods to handle fat partitions. However, even after the streamlining provided by Lemma 3.1 the proof is quite intricate and we will need to define a large number of events, many of which are ‘bad events’ parameterised by deviation from the ideal case.


Proof of Theorem 1.2.

Let ε>0\varepsilon>0. We shall choose c=c(ε)c=c(\varepsilon) sufficiently large that certain inequalities below hold. It will suffice to choose c>Kε3logε1c>K\varepsilon^{-3}\log\varepsilon^{-1} for a sufficiently large constant KK. Let G=(V,E)G=(V,E) be a (fixed) nn-vertex graph and let 0<p<10<p<1; and assume that e(G)pcne(G)p\geq cn.

We now define the ‘bad’ events 0\mathcal{B}_{0}, 1\mathcal{B}_{1}, 2\mathcal{B}_{2} and 0{\mathcal{E}}_{0} (there will be more later!). Let 0\mathcal{B}_{0} be the event {q(Gp)<q(G)ε}\{q^{*}(G_{p})<q^{*}(G)-\varepsilon\}. Let η=ε9\eta=\tfrac{\varepsilon}{9}. Let 0{\mathcal{E}}_{0} be the event that there is a partition 𝒜0{\mathcal{A}}_{0} such that q𝒜0(G)<q𝒜0(Gp)εq_{{\mathcal{A}}^{\prime}_{0}}(G)<q_{{\mathcal{A}}_{0}}(G_{p})-\varepsilon, where 𝒜0=𝒜(Gp,η,𝒜0){\mathcal{A}}^{\prime}_{0}={\mathcal{A}}(G_{p},\eta,{\mathcal{A}}_{0}) as in Lemma 3.1. Let 𝒬\mathcal{Q} be the set of partitions which are η2\frac{\eta}{2}-fat for GG. Let 1\mathcal{B}_{1} be the event that for some AVA\subseteq V with volG(A)<η2vol(G){\rm vol}_{G}(A)<\frac{\eta}{2}{\rm vol}(G) we have volGp(A)ηvol(Gp){\rm vol}_{G_{p}}(A)\geq\eta{\rm vol}(G_{p}) (that is, AA is not η/2\eta/2-fat for GG but is η\eta-fat for GpG_{p}). Observe that if some partition 𝒜𝒬{\mathcal{A}}\not\in\mathcal{Q} is η\eta-fat for GpG_{p}, then 1\mathcal{B}_{1} holds. Let 2\mathcal{B}_{2} be the event that there is a partition 𝒜𝒬{\mathcal{A}}\in\mathcal{Q} such that q𝒜(Gp)>q𝒜(G)+ε2q_{{\mathcal{A}}}(G_{p})>q_{{\mathcal{A}}}(G)+\tfrac{\varepsilon}{2}.

We shall show that

{q(Gp)>q(G)+ε}12\{q^{*}(G_{p})>q^{*}(G)+\varepsilon\}\subseteq\mathcal{B}_{1}\cup\mathcal{B}_{2} (5.1)

and

012.{\mathcal{E}}_{0}\subseteq\mathcal{B}_{1}\cup\mathcal{B}_{2}. (5.2)

Note that (5.1) yields {|q(Gp)q(G)|>ε}012\{|q^{*}(G_{p})-q^{*}(G)|>\varepsilon\}\subseteq\mathcal{B}_{0}\cup\mathcal{B}_{1}\cup\mathcal{B}_{2}. It will follow that the probability that (a) or (b) in the theorem fails is at most (0)+(1)+(2){\mathbb{P}}(\mathcal{B}_{0})+{\mathbb{P}}(\mathcal{B}_{1})+{\mathbb{P}}(\mathcal{B}_{2}). By Theorem 1.1, if cncn is sufficiently large (depending only on ε\varepsilon) then (0)<ε/3{\mathbb{P}}(\mathcal{B}_{0})<\varepsilon/3. We shall show that, if we choose cc sufficiently large (depending only on ε\varepsilon) then also

(1)<ε/3{\mathbb{P}}(\mathcal{B}_{1})<\varepsilon/3 (5.3)

and

(2)<ε/3{\mathbb{P}}(\mathcal{B}_{2})<\varepsilon/3 (5.4)

which will complete the proof of Theorem 1.2. Next come the details: the proofs of (5.1)-(5.4). We focus initially on part (a) and the containment (5.1).

Proof of (5.1).

By Lemma 3.1 (and since 2ηε/22\eta\leq\varepsilon/2),

{q(Gp)>q(G)+ε}\displaystyle\{q^{*}(G_{p})>q^{*}(G)+\varepsilon\} \displaystyle\subseteq {𝒜ηfat forGp:q𝒜(Gp)>q(G)+ε2}\displaystyle\{\exists\,{\mathcal{A}}\>\eta\!-\!\mbox{fat for}\,G_{p}:q_{{\mathcal{A}}}(G_{p})>q^{*}(G)+\tfrac{\varepsilon}{2}\}
\displaystyle\subseteq {𝒜ηfat forGp:q𝒜(Gp)>q𝒜(G)+ε2}\displaystyle\{\exists\,{\mathcal{A}}\>\eta\!-\!\mbox{fat for}\,G_{p}:q_{{\mathcal{A}}}(G_{p})>q_{{\mathcal{A}}}(G)+\tfrac{\varepsilon}{2}\}
\displaystyle\subseteq {𝒜𝒬:𝒜 is ηfat forGp}2\displaystyle\{\exists\,{\mathcal{A}}\not\in\mathcal{Q}:{\mathcal{A}}\mbox{ is }\eta\!-\!\mbox{fat for}\,G_{p}\}\cup\mathcal{B}_{2}
\displaystyle\subseteq 12.\displaystyle\mathcal{B}_{1}\cup\mathcal{B}_{2}.

Thus the containment (5.1) holds. ∎

We show next that the inequality (5.3) holds. Let us observe first that if e(G)p/v(G)ce(G)p/v(G)\geq c for some (large) cc then v(G)2cv(G)\geq 2c\,; so we are concerned only with large graphs.

Proof of (5.3).

Let AVA\subseteq V have volG(A)<η2vol(G){\rm vol}_{G}(A)<\frac{\eta}{2}{\rm vol}(G). Suppose that the edges within AA are e1,,eje_{1},\ldots,e_{j} and the edges with exactly one end in AA are ej+1,,eke_{j+1},\ldots,e_{k}. For i=1,,ji=1,\ldots,j let XiX_{i} be 2 if eie_{i} is in GpG_{p} and 0 otherwise; and for i=j+1,,ki=j+1,\ldots,k let XiX_{i} be 1 if eie_{i} is in GpG_{p} and 0 otherwise. Let S=i=1kXiS=\sum_{i=1}^{k}X_{i}, and let μ=𝔼(S)\mu={\mathbb{E}}(S). Then volGp(A)=S{\rm vol}_{G_{p}}(A)=S and μ=volG(A)p<η2vol(G)p\mu={\rm vol}_{G}(A)p<\frac{\eta}{2}{\rm vol}(G)p. Form SS^{\prime} by adding further independent random variables XiX_{i} with 0Xi10\leq X_{i}\leq 1 so that μ:=𝔼[S]=η2vol(G)p\mu^{\prime}:={\mathbb{E}}[S^{\prime}]=\frac{\eta}{2}{\rm vol}(G)p. Now by Lemma 4.1 applied to SS^{\prime} (with b=2b=2)

(volGp(A)23ηvol(G)p)\displaystyle{\mathbb{P}}({\rm vol}_{G_{p}}(A)\geq\tfrac{2}{3}\eta\,{\rm vol}(G)p) =\displaystyle= (S23ηvol(G)p)\displaystyle{\mathbb{P}}(S\geq\tfrac{2}{3}\eta\,{\rm vol}(G)p)
\displaystyle\leq (S23ηvol(G)p)=(S43μ)\displaystyle{\mathbb{P}}(S^{\prime}\geq\tfrac{2}{3}\eta\,{\rm vol}(G)p)\;=\;{\mathbb{P}}(S^{\prime}\geq\tfrac{4}{3}\mu^{\prime})
\displaystyle\leq 2e154μ= 2eη108vol(G)p 2eη54cn.\displaystyle 2e^{-\frac{1}{54}\mu^{\prime}}\;=\;2e^{-\frac{\eta}{108}{\rm vol}(G)p}\;\leq\;2e^{-\frac{\eta}{54cn}}.

Let 3\mathcal{B}_{3} be the (bad) event that volGp(A)23ηvol(G)p{\rm vol}_{G_{p}}(A)\geq\tfrac{2}{3}\eta\,{\rm vol}(G)p\, for some AVA\subseteq V with volG(A)<12ηvol(G){\rm vol}_{G}(A)<\frac{1}{2}\eta{\rm vol}(G). Then, by a union bound, (3)2n2eη108cn{\mathbb{P}}(\mathcal{B}_{3})\leq 2^{n}\cdot 2e^{-\frac{\eta}{108}cn}, which is at most ε6\tfrac{\varepsilon}{6} if cK/εc\geq K/\varepsilon for a sufficiently large constant KK. By Lemma 4.1 (with b=1b=1), (vol(Gp)<23vol(G)p)ε6{\mathbb{P}}({\rm vol}(G_{p})<\tfrac{2}{3}{\rm vol}(G)p)\leq\tfrac{\varepsilon}{6} (if cncn is sufficiently large). Then we have

(1)(3)+(vol(Gp)<23vol(G)p)ε3,{\mathbb{P}}(\mathcal{B}_{1})\leq{\mathbb{P}}(\mathcal{B}_{3})+{\mathbb{P}}({\rm vol}(G_{p})<\tfrac{2}{3}{\rm vol}(G)p)\leq\tfrac{\varepsilon}{3},

as required. ∎

We now start to prove that the inequality (5.4) holds. The proof proceeds by considering first degree tax in Claim (5.5), and then edge contribution in Claim (5.7).


Degree tax   Let 4\mathcal{B}_{4} be the event that for some AVA\subseteq V with volG(A)η2vol(G){\rm vol}_{G}(A)\geq\frac{\eta}{2}{\rm vol}(G) we have volGp(A)<(1ε9)volG(A)p{\rm vol}_{G_{p}}(A)<(1-\tfrac{\varepsilon}{9}){\rm vol}_{G}(A)p. We claim that

(4)ε12.{\mathbb{P}}(\mathcal{B}_{4})\leq\tfrac{\varepsilon}{12}. (5.5)
Proof of Claim (5.5).

Let AVA\subseteq V have volG(A)ηvol(G){\rm vol}_{G}(A)\geq\eta\,{\rm vol}(G). Then volGp(A){\rm vol}_{G_{p}}(A) has mean volG(A)pη2vol(G)pη2cn{\rm vol}_{G}(A)p\geq\tfrac{\eta}{2}{\rm vol}(G)p\geq\tfrac{\eta}{2}cn. Hence, by Lemma 4.1 (with b=2b=2) and a union bound, we obtain (5.5) if cK/εc\geq K/\varepsilon for a sufficiently large constant KK. ∎

Let 1{\mathcal{E}}_{1} be the event that vol(Gp)((1ε9)/(1ε8))vol(G)p{\rm vol}(G_{p})\leq\big{(}(1\!-\!\tfrac{\varepsilon}{9})/(1\!-\!\tfrac{\varepsilon}{8})\big{)}\,{\rm vol}(G)p. By Lemma 4.1 (with b=1b=1), (¯1)<ε12{\mathbb{P}}(\overline{{\mathcal{E}}}_{1})<\tfrac{\varepsilon}{12} (if cncn is sufficiently large). Suppose that 14¯{\mathcal{E}}_{1}\land\overline{\mathcal{B}_{4}} holds: then for each partition 𝒜=(A1,,Ak)𝒬{\mathcal{A}}=(A_{1},\ldots,A_{k})\in\mathcal{Q}

q𝒜D(Gp)\displaystyle q^{D}_{{\mathcal{A}}}(G_{p}) \displaystyle\geq i((1ε9)volG(Ai)p)2(((1ε9)/(1ε8))vol(G)p)2\displaystyle\frac{\sum_{i}\big{(}(1-\tfrac{\varepsilon}{9}){\rm vol}_{G}(A_{i})p\big{)}^{2}}{(((1-\tfrac{\varepsilon}{9})/(1-\tfrac{\varepsilon}{8})){\rm vol}(G)p)^{2}}
=\displaystyle= (1ε8)2q𝒜D(G)>(1ε4)q𝒜D(G)q𝒜D(G)ε4.\displaystyle\big{(}1-\tfrac{\varepsilon}{8}\big{)}^{2}q^{D}_{{\mathcal{A}}}(G)\;>\;(1-\tfrac{\varepsilon}{4})q^{D}_{{\mathcal{A}}}(G)\;\geq\;q^{D}_{{\mathcal{A}}}(G)-\tfrac{\varepsilon}{4}.

Thus

(𝒜𝒬:q𝒜D(Gp)q𝒜D(G)ε4)(1¯)+(4)ε6.{\mathbb{P}}(\exists\,{\mathcal{A}}\in\mathcal{Q}:q^{D}_{{\mathcal{A}}}(G_{p})\leq q^{D}_{{\mathcal{A}}}(G)-\tfrac{\varepsilon}{4})\leq{\mathbb{P}}(\overline{{\mathcal{E}}_{1}})+{\mathbb{P}}(\mathcal{B}_{4})\leq\tfrac{\varepsilon}{6}. (5.6)

Edge contribution   Let 2{\mathcal{E}}_{2} be the event that for each partition 𝒜𝒬{\mathcal{A}}\in\mathcal{Q} we have e𝒜int(Gp)max{(1+η)e𝒜int(G)p,ηe(G)p}e_{{\mathcal{A}}}^{\rm int}(G_{p})\leq\max\{(1+\eta)e_{{\mathcal{A}}}^{\rm int}(G)p,\,\eta\,e(G)p\}. We claim that (if cc is sufficiently large)

(¯2)ε12.{\mathbb{P}}(\overline{{\mathcal{E}}}_{2})\leq\tfrac{\varepsilon}{12}. (5.7)
Proof of Claim (5.7).

Each partition 𝒜𝒬{\mathcal{A}}\in\mathcal{Q} has at most 2η\frac{2}{\eta} parts, so |𝒬|(1+2η)n|\mathcal{Q}|\leq(1+\frac{2}{\eta})^{n}. Let 5\mathcal{B}_{5} be the event that, for some partition 𝒜𝒬{\mathcal{A}}\in\mathcal{Q} such that e𝒜int(G)<η2e(G)e_{{\mathcal{A}}}^{\rm int}(G)<\tfrac{\eta}{2}e(G), we have e𝒜int(Gp)ηe(G)pe_{{\mathcal{A}}}^{\rm int}(G_{p})\geq\eta e(G)p. For each given such partition 𝒜{\mathcal{A}}, e𝒜int(Gp)e_{{\mathcal{A}}}^{\rm int}(G_{p}) is stochastically at most a binomial random variable with mean η2e(G)p\tfrac{\eta}{2}e(G)p, so by Lemma 4.1 (with b=1b=1) and a union bound we obtain (5)ε24{\mathbb{P}}(\mathcal{B}_{5})\leq\tfrac{\varepsilon}{24}, if c>Kε1logε1c>K\varepsilon^{-1}\log\varepsilon^{-1} for a sufficiently large constant KK.

Let 6\mathcal{B}_{6} be the event that, for some partition 𝒜𝒬{\mathcal{A}}\in\mathcal{Q} such that e𝒜int(G)η2e(G)e_{{\mathcal{A}}}^{\rm int}(G)\geq\tfrac{\eta}{2}e(G), we have e𝒜int(Gp)>(1+η)e𝒜int(G)pe_{{\mathcal{A}}}^{\rm int}(G_{p})>(1+\eta)e_{{\mathcal{A}}}^{\rm int}(G)p. For each given such partition 𝒜{\mathcal{A}}, e𝒜int(Gp)e_{{\mathcal{A}}}^{\rm int}(G_{p}) is a binomial random variable with mean at least η2e(G)pη2cn\tfrac{\eta}{2}e(G)p\geq\tfrac{\eta}{2}cn, so by Lemma 4.1 (with b=1b=1) and a union bound we obtain (6)ε24{\mathbb{P}}(\mathcal{B}_{6})\leq\tfrac{\varepsilon}{24}, if c>Kε3logε1c>K\varepsilon^{-3}\log\varepsilon^{-1} for a sufficiently large constant KK.

Now consider any partition 𝒜𝒬{\mathcal{A}}\in\mathcal{Q}. If 5\mathcal{B}_{5} fails and e𝒜int(Gp)ηe(G)pe_{{\mathcal{A}}}^{\rm int}(G_{p})\geq\eta e(G)p then e𝒜int(G)η2e(G)e_{{\mathcal{A}}}^{\rm int}(G)\geq\tfrac{\eta}{2}e(G); and so, if also 6\mathcal{B}_{6} fails, then e𝒜int(Gp)(1+η)e𝒜int(G)pe_{{\mathcal{A}}}^{\rm int}(G_{p})\leq(1+\eta)e_{{\mathcal{A}}}^{\rm int}(G)p. Thus if both 5\mathcal{B}_{5} and 6\mathcal{B}_{6} fail then 2{\mathcal{E}}_{2} holds, so

(¯2)(5)+(6)ε12,{\mathbb{P}}(\overline{{\mathcal{E}}}_{2})\leq{\mathbb{P}}(\mathcal{B}_{5})+{\mathbb{P}}(\mathcal{B}_{6})\leq\tfrac{\varepsilon}{12},

so the Claim (5.7) holds, as required. ∎

Having proved the claims (5.5) and (5.7), we may now complete the proof of (5.4).

Proof of (5.4).

Let 3{\mathcal{E}}_{3} be the event that e(Gp)(1+η)1e(G)pe(G_{p})\geq(1+\eta)^{-1}e(G)p. By Lemma 4.1 (with b=1b=1), (¯3)ε12{\mathbb{P}}(\overline{{\mathcal{E}}}_{3})\leq\tfrac{\varepsilon}{12} if c>Kε2logε1c>K\varepsilon^{-2}\log\varepsilon^{-1} for a sufficiently large constant KK. If 2{\mathcal{E}}_{2} and 3{\mathcal{E}}_{3} hold, then for each partition 𝒜𝒬{\mathcal{A}}\in\mathcal{Q} we have

q𝒜E(Gp)\displaystyle q^{E}_{{\mathcal{A}}}(G_{p}) \displaystyle\leq max{(1+η)e𝒜int(G)p,ηe(G)p}(1+η)1e(G)p\displaystyle\frac{\max\{(1+\eta)e_{{\mathcal{A}}}^{\rm int}(G)p,\,\eta\,e(G)p\}}{(1+\eta)^{-1}e(G)p}
=\displaystyle= max{(1+η)2q𝒜E(G),η+η2}\displaystyle\max\{(1+\eta)^{2}q^{E}_{{\mathcal{A}}}(G),\,\eta+\eta^{2}\}
<\displaystyle< q𝒜E(G)+2η+η2<q𝒜E(G)+ε4.\displaystyle q^{E}_{{\mathcal{A}}}(G)+2\eta+\eta^{2}\;\;<\;\;q^{E}_{{\mathcal{A}}}(G)+\tfrac{\varepsilon}{4}.

Hence, using also (5.7), if c>Kε3logε1c>K\varepsilon^{-3}\log\varepsilon^{-1} for a sufficiently large constant KK,

(𝒜𝒬:q𝒜E(Gp)q𝒜E(G)+ε4)(¯2)+(¯3)ε6.{\mathbb{P}}(\exists{\mathcal{A}}\in\mathcal{Q}:q^{E}_{{\mathcal{A}}}(G_{p})\geq q^{E}_{{\mathcal{A}}}(G)+\tfrac{\varepsilon}{4})\leq{\mathbb{P}}(\overline{{\mathcal{E}}}_{2})+{\mathbb{P}}(\overline{{\mathcal{E}}}_{3})\leq\tfrac{\varepsilon}{6}. (5.8)

But now, by (5.6) and (5.8),

(2)\displaystyle{\mathbb{P}}(\mathcal{B}_{2}) =\displaystyle= (𝒜𝒬:q𝒜(Gp)>q𝒜(G)+ε2)\displaystyle{\mathbb{P}}(\exists{\mathcal{A}}\!\in\!\mathcal{Q}:q_{{\mathcal{A}}}(G_{p})>q_{{\mathcal{A}}}(G)\!+\!\tfrac{\varepsilon}{2})
\displaystyle\leq (𝒜𝒬:q𝒜D(Gp)<q𝒜D(G)ε4)+(𝒜𝒬:q𝒜E(Gp)q𝒜E(G)+ε4)ε3.\displaystyle{\mathbb{P}}(\exists{\mathcal{A}}\!\in\!\mathcal{Q}:q^{D}_{{\mathcal{A}}}(G_{p})\!<\!q^{D}_{{\mathcal{A}}}(G)\!-\!\tfrac{\varepsilon}{4})+{\mathbb{P}}(\exists{\mathcal{A}}\!\in\!\mathcal{Q}:q^{E}_{{\mathcal{A}}}(G_{p})\!\geq\!q^{E}_{{\mathcal{A}}}(G)\!+\!\tfrac{\varepsilon}{4})\leq\tfrac{\varepsilon}{3}.

This completes the proof of (5.4). ∎

We have now completed the proofs of (5.1), (5.3) and (5.4); so it remains only to prove the containment (5.2).

Proof of (5.2).

Recall from Lemma 3.1 that q𝒜0(Gp)q𝒜0(Gp)2ηq_{{\mathcal{A}}^{\prime}_{0}}(G_{p})\geq q_{{\mathcal{A}}_{0}}(G_{p})-2\eta, where 𝒜0=𝒜(Gp,η,𝒜0){\mathcal{A}}^{\prime}_{0}={\mathcal{A}}(G_{p},\eta,{\mathcal{A}}_{0}). Hence, if we let 4{\mathcal{E}}_{4} be the event that there is a partition 𝒜1{\mathcal{A}}_{1} which is η\eta-fat for GpG_{p} and satisfies q𝒜1(G)<q𝒜1(Gp)ε+2ηq_{{\mathcal{A}}_{1}}(G)<q_{{\mathcal{A}}_{1}}(G_{p})-\varepsilon+2\eta then 04{\mathcal{E}}_{0}\subseteq{\mathcal{E}}_{4}. Let 5{\mathcal{E}}_{5} be the event that there is a partition 𝒜2Q{\mathcal{A}}_{2}\in Q (that is, 𝒜2{\mathcal{A}}_{2} is η2\frac{\eta}{2}-fat for GG) which satisfies q𝒜2(G)<q𝒜2(Gp)ε+2ηq_{{\mathcal{A}}_{2}}(G)<q_{{\mathcal{A}}_{2}}(G_{p})-\varepsilon+2\eta. Then 415{\mathcal{E}}_{4}\subseteq\mathcal{B}_{1}\cup{\mathcal{E}}_{5}, by the definition of the event 1\mathcal{B}_{1}. Also, by our choice of η\eta, if 5{\mathcal{E}}_{5} holds then there exists 𝒜Q{\mathcal{A}}\in Q such that q𝒜(G)<q𝒜(Gp)ε2q_{{\mathcal{A}}}(G)<q_{{\mathcal{A}}}(G_{p})-\frac{\varepsilon}{2}, i.e. 2\mathcal{B}_{2} holds; and thus 52{\mathcal{E}}_{5}\subseteq\mathcal{B}_{2}. Summing up, we have 041512{\mathcal{E}}_{0}\subseteq{\mathcal{E}}_{4}\subseteq\mathcal{B}_{1}\cup{{\mathcal{E}}_{5}}\subseteq\mathcal{B}_{1}\cup\mathcal{B}_{2}\,; so (5.2) holds, as required. ∎

We have now completed the proofs of (5.1)-(5.4), and thus completed the proof of Theorem 1.2.∎

In the proof of Theorem 1.2 we took c(ε)=Θ(ε3logε1)c(\varepsilon)=\Theta(\varepsilon^{-3}\log\varepsilon^{-1}). The following example shows that, for the conclusion (a) in Theorem 1.2 to hold, c(ε)c(\varepsilon) must be at least Ω(ε2)\Omega(\varepsilon^{-2}).

Example 5.1.

Recall from [32, Theorem 1.3] that there is a constant a>0a>0 such that, for each c1c\geq 1 whp q(Gn,c/n)>a/cq^{*}(G_{n,c/n})>a/\sqrt{c}. Let ε>0\varepsilon>0 and let c=c(ε)=a2/ε2c=c(\varepsilon)=a^{2}/\varepsilon^{2} (so ε=a/c\varepsilon=a/\sqrt{c}). Let nc+1n\geq c+1, let GG be KnK_{n} (so q(G)=0q^{*}(G)=0) and let p=(c+1)/np=(c+1)/n. Then the expected average degree in GpG_{p} is (n1)pc(n-1)p\geq c, but whp q(Gp)>q(G)+εq^{*}(G_{p})>q^{*}(G)+\varepsilon.


5.2 A kk-part analogue to Theorem 1.2

As in Section 4.2, recall that qk(G)=max|𝒜|kq𝒜(G)q^{*}_{\leq k}(G)=\max_{|{\mathcal{A}}|\leq k}q_{\mathcal{A}}(G). We present here a variant of Theorem 1.2(a) where we replace each instance of qq^{*} by qkq^{*}_{\leq k}. In Section 2 it was mentioned in the discussion concerning the stochastic block model following (2.1) that the case k=2k=2 of this variant will be used in Remark 5.3 to obtain the relevant upper bound.

Proposition 5.2.

For each ε>0\varepsilon>0, there is a c=c(ε)c=c(\varepsilon) such that the following holds. Let the graph GG and probability pp satisfy e(G)p/v(G)ce(G)p/v(G)\geq c, and let k2k\geq 2. Then with probability 1ε\geq 1-\varepsilon the random graph GpG_{p} satisfies |qk(Gp)qk(G)|<ε|q^{*}_{\leq k}(G_{p})-q^{*}_{\leq k}(G)|<\varepsilon.

Proof.

We may follow the proof of Theorem 1.2(a), with a few natural small changes which we detail below. Note that since we do not prove an analogue of Theorem 1.2(b), we need only prove analogues of (5.1), (5.3) and (5.4), i.e. not (5.2).

For partitions with at most kk parts we define analogues of the events 0\mathcal{B}_{0} and 2\mathcal{B}_{2} (the event 1\mathcal{B}_{1} is unchanged, and no analogue of 0{\mathcal{E}}_{0} is needed since we do not prove the analogue of (5.2)). Let 0(k)\mathcal{B}_{0}^{(k)} be the event {qk(Gp)<qk(G)ε}\{q^{*}_{\leq k}(G_{p})<q^{*}_{\leq k}(G)-\varepsilon\}. By Theorem 1.2 (b) (with ε\varepsilon replaced by ε/3\varepsilon/3) applied to an optimal partition 𝒜{\mathcal{A}} with at most kk parts (and noting that 𝒜{\mathcal{A}}^{\prime} has at most kk parts), we have (0(k))ε/3{\mathbb{P}}(\mathcal{B}_{0}^{(k)})\leq\varepsilon/3.

As before let η=ε9\eta=\tfrac{\varepsilon}{9}. Let 𝒬(k)\mathcal{Q}^{(k)} be the set of partitions which are η2\frac{\eta}{2}-fat for GG and have at most kk-parts; and let 2(k)\mathcal{B}_{2}^{(k)} be the event that there is a partition 𝒜𝒬(k){\mathcal{A}}\in\mathcal{Q}^{(k)} such that q𝒜(Gp)>q𝒜(G)+ε2q_{{\mathcal{A}}}(G_{p})>q_{{\mathcal{A}}}(G)+\tfrac{\varepsilon}{2}. We must establish the analogues of the statements (5.1), (5.3) and (5.4).

Proof of analogue of (5.1).

We want to show that {qk(Gp)>qk(G)+ε}12(k)\{q^{*}_{\leq k}(G_{p})>q^{*}_{\leq k}(G)+\varepsilon\}\subseteq\mathcal{B}_{1}\cup\mathcal{B}_{2}^{(k)}. As before by Lemma 3.1 (and since 2ηε/22\eta\leq\varepsilon/2),

{qk(Gp)>qk(G)+ε}\displaystyle\{q^{*}_{\leq k}(G_{p})>q^{*}_{\leq k}(G)+\varepsilon\} \displaystyle\subseteq {𝒜:|𝒜|k,𝒜 is ηfat forGp,q𝒜(Gp)>qk(G)+ε2}\displaystyle\{\exists\,{\mathcal{A}}:\>|{\mathcal{A}}|\leq k,\>{\mathcal{A}}\mbox{ is }\eta\!-\!\mbox{fat for}\,G_{p},\>q_{{\mathcal{A}}}(G_{p})>q^{*}_{\leq k}(G)+\tfrac{\varepsilon}{2}\}
\displaystyle\subseteq {𝒜:|𝒜|k,𝒜 is ηfat forGp,q𝒜(Gp)>q𝒜(G)+ε2}\displaystyle\{\exists\,{\mathcal{A}}:\>|{\mathcal{A}}|\leq k,\>{\mathcal{A}}\mbox{ is }\eta\!-\!\mbox{fat for}\,G_{p},\>q_{{\mathcal{A}}}(G_{p})>q_{{\mathcal{A}}}(G)+\tfrac{\varepsilon}{2}\}
\displaystyle\subseteq {𝒜𝒬(k):|𝒜|k,𝒜 is ηfat forGp}2(k)\displaystyle\{\exists\,{\mathcal{A}}\not\in\mathcal{Q}^{(k)}:\>|{\mathcal{A}}|\leq k,\>{\mathcal{A}}\mbox{ is }\eta\!-\!\mbox{fat for}\,G_{p}\}\cup\mathcal{B}_{2}^{(k)}
\displaystyle\subseteq 12(k),\displaystyle\mathcal{B}_{1}\cup\mathcal{B}_{2}^{(k)},

as required. ∎

Since the event 1\mathcal{B}_{1} is unchanged, the statement of (5.3) is unchanged and thus (1)<ε/3{\mathbb{P}}(\mathcal{B}_{1})<\varepsilon/3. The analogue of (5.4) is to prove that (2(k))<ε/3{\mathbb{P}}(\mathcal{B}_{2}^{(k)})<\varepsilon/3. But 2(k)2\mathcal{B}_{2}^{(k)}\subseteq\mathcal{B}_{2} and thus (2(k))(2)<ε/3{\mathbb{P}}(\mathcal{B}_{2}^{(k)})\leq{\mathbb{P}}(\mathcal{B}_{2})<\varepsilon/3 by (the original) (5.4). We have now completed the proof of Proposition 5.2. ∎

Remark 5.3.

We may now complete the discussion of the bootstrapping example from Section 2.1. We give details for how to show that equality holds in (2.1) when p,q=ω(n1)p,q=\omega(n^{-1}). This will follow from spectral results on denser graphs, robustness of modularity and Proposition 5.2.

Recall that q(G)λ¯q^{*}(G)\leq\bar{\lambda} and q2(G)λ¯/2q^{*}_{\leq 2}(G)\leq\bar{\lambda}/2 where λ¯\bar{\lambda} is the spectral gap of GG, see Lemma 6.1 of [32]. Now let a>b>0a>b>0 satisfy the condition ab>2\sqrt{a}-\sqrt{b}>\sqrt{2}. Then a result from [11] says that, for p=n1alognp=n^{-1}a\log n and q=n1blognq=n^{-1}b\log n, we have λ¯=(ab)/(a+b)+o(1)\bar{\lambda}=(a-b)/(a+b)+o(1) whp; and thus

q2(Gn,2,p,q)=pq2(p+q)+o(1)whp.q^{*}_{\leq 2}(G_{n,2,p,q})=\frac{p-q}{2(p+q)}+o(1)\;\;\;\mbox{whp}. (5.9)

We can use Proposition 5.2 to bootstrap (5.9) to sparser pp and qq (and with no condition corresponding to ab>2\sqrt{a}-\sqrt{b}>\sqrt{2}). Let α>β>0\alpha>\beta>0; let ω=ω(n)\omega=\omega(n)\to\infty (arbitrarily slowly) as nn\to\infty, with ω(n)logn\omega(n)\leq\log n; and let p=αω/np=\alpha\,\omega/n and q=βω/nq=\beta\,\omega/n. Let us show that still (5.9) holds. Let p+=cα(logn)/np^{+}=c\alpha(\log n)/n and q+=cβ(logn)/nq^{+}=c\beta(\log n)/n, for some constant c1c\geq 1 such that cαcβ>2\sqrt{c\alpha}-\sqrt{c\beta}>\sqrt{2}, so whp by (5.9)

q2(Gn,2,p+,q+)=p+q+2(p++q+)+o(1)=pq2(p+q)+o(1).q^{*}_{\leq 2}(G_{n,2,p^{+},q^{+}})=\frac{p^{+}-q^{+}}{2(p^{+}+q^{+})}+o(1)=\frac{p-q}{2(p+q)}+o(1).

Also, whp e(Gn,2,p+,q+)=14(α+β)cnlogn+o(n2/3)e(G_{n,2,p^{+},q^{+}})=\frac{1}{4}(\alpha+\beta)cn\log n+o(n^{2/3}). Let x=ω/(clogn)x=\omega/(c\log n) (so 0x10\leq x\leq 1). Then the sampled random graph (Gn,2,p+,q+)x(G_{n,2,p^{+},q^{+}})_{x} satisfies (Gn,2,p+,q+)x=sGn,p,q(G_{n,2,p^{+},q^{+}})_{x}=_{s}G_{n,p,q}, and whp e((Gn,2,p+,q+)x)/n=14(α+β)ω+o(1)e((G_{n,2,p^{+},q^{+}})_{x})/n=\frac{1}{4}(\alpha+\beta)\omega+o(1)\to\infty. Hence by Proposition 5.2 we have q2(Gn,2,p,q)=q2(Gn,2,p+,q+)+o(1)q^{*}_{\leq 2}(G_{n,2,p,q})=q^{*}_{\leq 2}(G_{n,2,p^{+},q^{+}})+o(1); whp and we have shown that (5.9) holds with the current choice of p,qp,q, as we aimed to do.


6 Robustness of modularity, and closeness and concentration for q(Gp)q^{*}(G_{p}) and q(Gm)q^{*}(G_{m})

6.1 Robustness

We shall use deterministic robustness results relating to two graph operations, namely moving or deleting a set of edges, and concerning two graph parameters, namely the modularity score q𝒜(G)q_{\mathcal{A}}(G) for a given partition 𝒜{\mathcal{A}}, and the (maximum) modularity q(G)q^{*}(G). We first briefly collect some already known results together with one new bound in Lemma 6.1 below. For edge deletion the bound on q𝒜q_{\mathcal{A}} is new while the bound on qq^{*} follows from Lemma 5.1 in [32]; and for moving edges the bounds follow from Lemma 5.3 and its proof in the same paper. See also Example 6.2 below which gives examples showing the bounds in Lemma 6.1 are asymptotically tight.

Lemma 6.1.

Let H=(V,E)H=(V,E) be a graph and let 𝒜{\mathcal{A}} be a partition of VV.

  • (a)

    If E0E_{0} is a non-empty proper subset of EE, and H=(V,EE0)H^{\prime}=(V,E\setminus E_{0}), then

    |q𝒜(H)q𝒜(H)|,|q(H)q(H)|<2|E0||E|.|q_{\mathcal{A}}(H)-q_{\mathcal{A}}(H^{\prime})|\,,\;|q^{*}(H)-q^{*}(H^{\prime})|<\frac{2\,|E_{0}|}{|E|}.
  • (b)

    If E~E\tilde{E}\neq E is a set of edges with |E~|=|E||\tilde{E}|=|E|, and H~=(V,E~)\tilde{H}=(V,\tilde{E}), then

    |q𝒜(H)q𝒜(H~)|,|q(H)q(H~)|<|EE~||E|.|q_{\mathcal{A}}(H)-q_{\mathcal{A}}(\tilde{H})|\,,\;|q^{*}(H)-q^{*}(\tilde{H})|<\frac{|E\!\bigtriangleup\!\tilde{E}|}{|E|}.

We give examples to show tightness of Lemma 6.1 before proceeding with the proof. See the concluding remarks for a related open problem.

Example 6.2.

The examples here show tightness of the lemma - it is not possible to replace the factors 2 and 1 above by smaller constants in any of the four cases.

For the examples let us first recall some simple properties of modularity. Firstly, the placement of any isolated vertices in a partition is irrelevant. Secondly, for any part AA in any optimal partition 𝒜{\mathcal{A}} of a graph, the graph induced by the non-isolated vertices in AA is connected [7]. Thirdly, for any leaf vertex uu with pendant edge uvuv, the vertices uu and vv are in the same part in any optimal partition [7, 42].

  • (a)

    Let H1H_{1} be the graph consisting of a star on mm edges together with a disjoint edge ee. Take E0={e}E_{0}=\{e\} and thus H1H_{1}^{\prime} is a star on mm edges together with two isolated vertices. We consider the bipartition 𝒜{\mathcal{A}} which in H1H_{1} places the vertices of the star in one part and the vertices of the disjoint edge in the other. Note that 𝒜{\mathcal{A}} is an optimal partition of H1H_{1} and of H1H_{1}^{\prime}. Since H1H_{1}^{\prime} is a star plus isolated vertices, q(H1)=q𝒜(H1)=0q^{*}(H_{1}^{\prime})=q_{\mathcal{A}}(H_{1}^{\prime})=0, and we may calculate

    q𝒜(H1)=1(2m)2+224(m+1)2=2m(m+1)2.q_{\mathcal{A}}(H_{1})=1-\frac{(2m)^{2}+2^{2}}{4(m+1)^{2}}=\frac{2m}{(m+1)^{2}}.

    Noting that {e}/|E|=1/(m+1)\{e\}/|E|=1/(m+1), for this choice of H1H_{1} and H1H_{1}^{\prime} we have

    |q(H1)q(H1)|=|q𝒜(H1)q𝒜(H1)|=2m(m+1)2=2mm+1|E0||E||q^{*}(H_{1})-q^{*}(H_{1}^{\prime})|=|q_{\mathcal{A}}(H_{1})-q_{\mathcal{A}}(H_{1}^{\prime})|=\frac{2m}{(m+1)^{2}}=\frac{2m}{m+1}\frac{|E_{0}|}{|E|}

    and thus we may get arbitrarily close to the factor 2.

  • (b)

    Let H2H_{2} be the graph H1H_{1} above except that we add an isolated vertex. Thus H2H_{2} is the graph consisting of a star with central vertex uu on mm edges together with a disjoint edge ee and an isolated vertex vv. Let H~2\tilde{H}_{2} be obtained from H2H_{2} by removing ee and adding the edge e=uve^{\prime}=uv : thus H~2\tilde{H}_{2} is a star on m+1m+1 edges together with two isolated vertices.

    We consider the bipartition 𝒜{\mathcal{A}} which places the vertices of the star and the isolated vertex together in one part and the disjoint edge in the other part. Then 𝒜{\mathcal{A}} is an optimal partition of H2H_{2} and of H~2\tilde{H}_{2} by the same argument as in part (a), and q(H2)=q(H1)=2m/(m+1)2q^{*}(H_{2})=q^{*}(H_{1})=2m/(m+1)^{2}. Since H~2\tilde{H}_{2} is a star plus isolated vertices q(H~2)=q𝒜(H~2)=0q^{*}(\tilde{H}_{2})=q_{\mathcal{A}}(\tilde{H}_{2})=0.

    Noting that |EE~|/|E|=2/(m+1)|E\bigtriangleup\tilde{E}|/|E|=2/(m+1), for this choice of H2H_{2} and H~2\tilde{H}_{2} we have

    |q(H2)q(H~2)|=|q𝒜(H2)q𝒜(H~2)|=2m(m+1)2=mm+1|EE~||E||q^{*}(H_{2})-q^{*}(\tilde{H}_{2})|=|q_{\mathcal{A}}(H_{2})-q_{\mathcal{A}}(\tilde{H}_{2})|=\frac{2m}{(m+1)^{2}}=\frac{m}{m+1}\frac{|E\bigtriangleup\tilde{E}|}{|E|}

    and thus we may get arbitrarily close to the factor 1.

To prove the inequality concerning q𝒜q_{\mathcal{A}} in Lemma 6.1(a), we will use Lemma 6.3 bounding the edge contribution minus twice the degree tax.

Lemma 6.3.

Let HH be a graph on m1m\geq 1 edges and let 𝒜{\mathcal{A}} be a vertex partition. Then

q𝒜E(H)2q𝒜D(H)1.q^{E}_{\mathcal{A}}(H)-2q^{D}_{\mathcal{A}}(H)\geq-1\,.
Proof.

For each part Ai𝒜A_{i}\in{\mathcal{A}}, let us write ai=1me(Ai)a_{i}=\frac{1}{m}e(A_{i}) and bi=1me(Ai,Ai¯)b_{i}=\frac{1}{m}e(A_{i},\bar{A_{i}}) for the proportion of edges internal to part AiA_{i} and proportion between part AiA_{i} and the rest of the graph respectively. Note that i(ai+12bi)=1\sum_{i}(a_{i}+\frac{1}{2}b_{i})=1. We also set a=iai=q𝒜E(H)a=\sum_{i}a_{i}=q^{E}_{\mathcal{A}}(H) for the proportion of edges within parts of the partition. Similarly let b=12ibib=\frac{1}{2}\sum_{i}b_{i}, so bb is the proportion of edges between parts, and note that bibb_{i}\leq b for each ii. Also observe that a+b=1a+b=1.

The degree tax can be written

q𝒜D(H)=i(ai+bi/2)2=iai2+iaibi+14ibi2.q^{D}_{\mathcal{A}}(H)=\sum_{i}(a_{i}+b_{i}/2)^{2}=\sum_{i}a_{i}^{2}+\sum_{i}a_{i}b_{i}+\tfrac{1}{4}\sum_{i}b_{i}^{2}. (6.1)

We now upper bound each term of the RHS of (6.1) in turn. First note that iai=a\sum_{i}a_{i}=a implies iai2a2\sum_{i}a_{i}^{2}\leq a^{2}. Next, since each bibb_{i}\leq b and iai=a\sum_{i}a_{i}=a we have iaibiab\sum_{i}a_{i}b_{i}\leq ab. Similarly, each bibb_{i}\leq b and ibi=2b\sum_{i}b_{i}=2b implies that ibi2b(2b)=2b2.\sum_{i}b_{i}^{2}\leq b\,(2b)=2b^{2}. Thus by these bounds and (6.1)

q𝒜D(H)a2+ab+12b2=12+12a2q^{D}_{\mathcal{A}}(H)\leq a^{2}+ab+\tfrac{1}{2}b^{2}=\tfrac{1}{2}+\tfrac{1}{2}a^{2}

since a+b=1a+b=1. Recalling that q𝒜E(H)=aq^{E}_{\mathcal{A}}(H)=a, we get

q𝒜E(H)2q𝒜D(H)a1a2=1+a(1a)1q^{E}_{\mathcal{A}}(H)-2q^{D}_{\mathcal{A}}(H)\geq a-1-a^{2}=-1+a(1-a)\geq-1

where the last inequality follows since 0a10\leq a\leq 1. ∎


Proof of Lemma 6.1(a), bound on q𝒜q_{\mathcal{A}}.

We begin by following the proof of Lemma 5.1 in [32]. As there, let α=α(𝒜)=|E1|/|E|\alpha=\alpha({\mathcal{A}})=|E_{1}|/|E| where E1E_{1} is the set of edges in E0E_{0} which lie within the parts of 𝒜{\mathcal{A}} and β=β(𝒜)=|E0\E1|/|E|\beta=\beta({\mathcal{A}})=|E_{0}\backslash E_{1}|/|E|. That is, α\alpha is the proportion of the edges in EE which are in E0E_{0} and lie within the parts of 𝒜{\mathcal{A}}, and β\beta is the proportion which are in E0E_{0} and lie between the parts.

Note that the statement we wish to prove follows from the following two inequalities.

q𝒜(H)q𝒜(H)<2α+2βq_{\mathcal{A}}(H^{\prime})-q_{\mathcal{A}}(H)<2\alpha+2\beta (6.2)

and

q𝒜(H)q𝒜(H)<2α+β.q_{\mathcal{A}}(H)-q_{\mathcal{A}}(H^{\prime})<2\alpha+\beta\,. (6.3)

The first of these, (6.2), is equation (5.2) in [32] and was proven there, so it remains only to prove (6.3). Directly from the definitions we may calculate the difference in edge contributions, (this is (5.4) in [32]),

q𝒜E(H)q𝒜E(H)=α(α+β)q𝒜E(H).q^{E}_{\mathcal{A}}(H)-q^{E}_{\mathcal{A}}(H^{\prime})=\alpha-(\alpha+\beta)\,q^{E}_{\mathcal{A}}(H^{\prime})\,. (6.4)

Note also the following simple bound on the possible increase in the degree tax. Since |E\E0|=(1αβ)|E||E\backslash E_{0}|=(1-\alpha-\beta)|E| we have

q𝒜D(H)>14|E|2A𝒜volH(A)2=(1αβ)2q𝒜D(H)q_{\mathcal{A}}^{D}(H)>\frac{1}{4|E|^{2}}\sum_{A\in{\mathcal{A}}}{\rm vol}_{H^{\prime}}(A)^{2}=(1-\alpha-\beta)^{2}q^{D}_{\mathcal{A}}(H^{\prime})

and so

q𝒜D(H)q𝒜D(H)<2(α+β)q𝒜D(H).q_{\mathcal{A}}^{D}(H^{\prime})-q_{\mathcal{A}}^{D}(H)<2(\alpha+\beta)q_{\mathcal{A}}^{D}(H^{\prime})\,.

Thus by (6.4) we have (this is (5.6) in [32])

q𝒜(H)q𝒜(H)<α(α+β)q𝒜E(H)+2(α+β)q𝒜D(H)=α(α+β)(q𝒜E(H)2q𝒜D(H)).q_{\mathcal{A}}(H)-q_{\mathcal{A}}(H^{\prime})<\alpha\!-\!(\alpha+\beta)q^{E}_{\mathcal{A}}(H^{\prime})+2(\alpha+\beta)q_{\mathcal{A}}^{D}(H^{\prime})=\alpha-(\alpha+\beta)(q^{E}_{\mathcal{A}}(H^{\prime})-2q^{D}_{\mathcal{A}}(H^{\prime})). (6.5)

Now we may apply Lemma 6.3 to infer that the RHS of (6.5) is at most 2α+β2\alpha+\beta which proves (6.3). Finally, by 6.2 and (6.3),

|q𝒜(H)q𝒜(H)|<2(α+β)=2|E0||E|,|q_{\mathcal{A}}(H)-q_{\mathcal{A}}(H^{\prime})|<2(\alpha+\beta)=\frac{2\,|E_{0}|}{|E|},

as required.∎


6.2 Closeness of q(Gp)q^{*}(G_{p}) and q(Gm)q^{*}(G_{m})

Given a graph GG and 1me(G)1\leq m\leq e(G), how close are q(Gm)q^{*}(G_{m}) and q(Gp)q^{*}(G_{p}), where p=m/e(G)p=m/e(G)? This question gives rise to Lemma 6.4, which we now state and prove. Note that Corollaries 1.3 and 1.4 then follow by Lemma 6.4 and Theorem 1.1 and 1.2 respectively.

Lemma 6.4.

Let GG be a graph, let 1me(G)1\leq m\leq e(G), and let p=m/e(G)p=m/e(G). Then for any partition 𝒜{\mathcal{A}} of V(G)V(G)

|𝔼[q(Gp)]𝔼[q(Gm)]|,|𝔼[q𝒜(Gp)]𝔼[q𝒜(Gm)]|21pm.|{\mathbb{E}}[q^{*}(G_{p})]-{\mathbb{E}}[q^{*}(G_{m})]|\,,\;|{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]-{\mathbb{E}}[q_{\mathcal{A}}(G_{m})]|\leq 2\sqrt{\tfrac{1-p}{m}}\,. (6.6)

Also, we can couple GpG_{p} and GmG_{m} so that, for any partition 𝒜{\mathcal{A}} of V(G)V(G), if ω(m)\omega(m)\to\infty as mm\to\infty then

|q(Gp)q(Gm)|,|q𝒜(Gp)q𝒜(Gm)|ω(m)1pm whp.|q^{*}(G_{p})-q^{*}(G_{m})|\;,\;|q_{\mathcal{A}}(G_{p})-q_{\mathcal{A}}(G_{m})|\;\leq\;\omega(m)\sqrt{\tfrac{1-p}{m}}\;\;\mbox{ whp}\,. (6.7)
Proof.

Let X=e(Gp)X=e(G_{p}), so XBin(e(G),p)X\sim{\rm Bin}(e(G),p) with mean mm and variance var(X)=m(1p){\rm var}(X)=m(1-p). Couple GpG_{p} and GmG_{m} so that their edge sets are nested: to do this, we may list the edges of GG in a uniformly random order, let GpG_{p} have the first XX edges and let GmG_{m} have the first mm edges. By Lemma 6.1

|q(Gp)q(Gm)|2|Xm|m.|q^{*}(G_{p})-q^{*}(G_{m})|\leq\tfrac{2\,|X-m|}{m}\,. (6.8)

But

(𝔼[q(Gp)]𝔼[q(Gm)])2\displaystyle\left({\mathbb{E}}[q^{*}(G_{p})]-{\mathbb{E}}[q^{*}(G_{m})]\right)^{2} \displaystyle\leq 𝔼[(q(Gp)q(Gm))2]\displaystyle{\mathbb{E}}[(q^{*}(G_{p})-q^{*}(G_{m}))^{2}]
\displaystyle\leq 4m2var(X)=4(1p)m,\displaystyle\tfrac{4}{m^{2}}{\rm var}(X)\;=\;\tfrac{4(1-p)}{m}\,,

where we use (6.8) for the second inequality; and the first inequality in (6.6) follows. The second inequality in (6.6) may be proved in the just same way. Also, by Chebyshev’s inequality, for t>0t>0

(|Xm|t)var(X)t2=m(1p)t2.{\mathbb{P}}(|X-m|\geq t)\leq\tfrac{{\rm var}(X)}{t^{2}}=\tfrac{m(1-p)}{t^{2}}.

So, setting t=12ωm(1p)t=\frac{1}{2}\omega\sqrt{m(1-p)}, whp |Xm|t|X-m|\leq t. Hence

|q(Gp)q(Gm)|2tm=ω1pm whp;|q^{*}(G_{p})-q^{*}(G_{m})|\leq\tfrac{2t}{m}=\omega\,\sqrt{\tfrac{1-p}{m}}\;\;\mbox{ whp}\,;

and we have proved the first inequality in (6.7). The second inequality may be proved in just the same way. This completes the proof of Lemma 6.4. ∎


6.3 Concentration of q(Gp)q^{*}(G_{p}) and q(Gm)q^{*}(G_{m})

We restate Theorem 2.2 from the introduction, adding also concentration results for the random edge model. Recall that, given a fixed underlying graph GG, for 0<me(G)0<m\leq e(G) the random graph GmG_{m} is obtained by uniformly sampling all mm-edge subsets of GG.

For the random graphs Gn,mG_{n,m} and Gn,pG_{n,p}, Theorem 7.1 of [32] showed that the modularity values are highly concentrated about their expected value. A very similar result is true, with similar proof, for the case of edge sampling. We use the results on GmG_{m} to deduce the results on GpG_{p}.

Theorem 6.5.
  • (a)

    Given graph GG, a partition 𝒜{\mathcal{A}}, and 0me(G)0\leq m\leq e(G), for each t>0t>0

    (|q𝒜(Gm)𝔼[q𝒜(Gm)]|t)<2et2m/2 and (|q(Gm)𝔼[q(Gm)]|t)<2et2m/2.{\mathbb{P}}\Big{(}\big{|}\,q_{\mathcal{A}}(G_{m})-{\mathbb{E}}[q_{\mathcal{A}}(G_{m})]\,\big{|}\geq t\Big{)}<2\,e^{-t^{2}m/2}\;\mbox{ and }\;{\mathbb{P}}\Big{(}\big{|}q^{*}(G_{m})-{\mathbb{E}}[q^{*}(G_{m})]\big{|}\geq t\Big{)}<2e^{-t^{2}m/2}.
  • (b)

    There is a constant η>0\eta>0 such that for each graph GG, each partition 𝒜=𝒜(G){\mathcal{A}}={\mathcal{A}}(G) and each 0<p<10<p<1 the following holds with μ=μ(G,p)=e(G)p\mu=\mu(G,p)=e(G)p. For each t0t\geq 0

    (|q𝒜(Gp)𝔼[q𝒜(Gp)]|t)<2eημt2 and (|q(Gp)𝔼[q(Gp)]|t)<2eημt2.{\mathbb{P}}\Big{(}\big{|}\,q_{\mathcal{A}}(G_{p})-{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\,\big{|}\geq t\Big{)}<2\,e^{-\eta\mu t^{2}}\;\mbox{ and }\;{\mathbb{P}}\Big{(}\big{|}\,q^{*}(G_{p})-{\mathbb{E}}[q^{*}(G_{p})]\,\big{|}\geq t\Big{)}<2\,e^{-\eta\mu t^{2}}.

The results in Theorem 6.5 concerning q(Gm)q^{*}(G_{m}) and q(Gp)q^{*}(G_{p}) extend Theorem 7.1 of [32]. As well as the robustness results above, in the proof we use Lemma 4.1, and the following concentration result from [30] Theorem 7.4 (see also Example 7.3).

Lemma 6.6.

Let AA be a finite set, let aa be an integer such that 0a|A|0\leq a\leq|A|, and consider the set (Aa)\binom{A}{a} of all aa-element subsets of AA. Suppose that the function f:(Aa)f:\binom{A}{a}\rightarrow{\mathbb{R}} satisfies |f(S)f(T)|c|f(S)-f(T)|\leq c whenever |ST|=2|S\triangle T|=2 (i.e. the aa-element subsets SS and TT are minimally different). If the random variable XX is uniformly distributed over (Aa)\binom{A}{a}, then

(|f(X)𝔼[f(X)]|t)2e2t2/ac2.{\mathbb{P}}\left(\bigl{|}f(X)-{\mathbb{E}}[f(X)]\bigr{|}\geq t\right)\leq 2e^{-2t^{2}/ac^{2}}.

Proof of Theorem 6.5 part (a).

By Lemma 6.1(b), if E(H)E(H) and E(H~)E(\tilde{H}) are both of size mm and are minimally different then |q𝒜(H)q𝒜(H~)||q_{\mathcal{A}}(H)-q_{\mathcal{A}}(\tilde{H})|, |q(H)q(H~)|<2/m|q^{*}(H)-q^{*}(\tilde{H})|<2/m. Hence, Lemma 6.6 with A=E(G)A=E(G), a=ma=m and taking c=2/mc=2/m for q𝒜q_{\mathcal{A}} and for qq^{*} immediately yields part (a) of Theorem 6.5. ∎

Proof of Theorem 6.5 part (b).

Let M=e(Gp)M=e(G_{p}) and let μ=μ(n,p)=𝔼[M]=e(G)p\mu=\mu(n,p)={\mathbb{E}}[M]=e(G)p. We will first show the more detailed statement that, for each t42/μt\geq 42/\sqrt{\mu}, we have

(|q𝒜(Gp)𝔼[q𝒜(Gp)]|t)6et2μ/150,{\mathbb{P}}\big{(}\big{|}q_{\mathcal{A}}(G_{p})-{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\big{|}\geq t\big{)}\leq 6e^{-t^{2}\mu/150}, (6.9)

from which the result for q𝒜q_{\mathcal{A}} in part (b) of the theorem follows. We show the proof in detail for concentration of q𝒜q_{\mathcal{A}} then indicate the few differences to be made in showing the result for qq^{*}.

For any graph and any partition the modularity score lies in the interval [1/2,1)[-1/2,1), see [7]; and hence we may assume that 0t<3/20\leq t<3/2. Define the event ={M>2μ/3}\mathcal{E}=\{M>2\mu/3\} and let c{\mathcal{E}}^{c} denote the complement of {\mathcal{E}},

(|q𝒜(Gp)𝔼[q𝒜(Gp)]|t)\displaystyle\!\!\!\!\!\!\!\!{\mathbb{P}}\big{(}\big{|}q_{\mathcal{A}}(G_{p})\!-\!{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\big{|}\geq t\big{)} (6.10)
((|q𝒜(Gp)𝔼[q𝒜(Gp)|M]|t2))+(|𝔼[q𝒜(Gp)|M]𝔼[q𝒜(Gp)]|t2))+(c).\displaystyle\leq{\mathbb{P}}\big{(}\big{(}\big{|}q_{\mathcal{A}}(G_{p})\!-\!{\mathbb{E}}[q_{\mathcal{A}}(G_{p})|M]\big{|}\geq\tfrac{t}{2}\big{)}\land{\mathcal{E}}\big{)}+{\mathbb{P}}\big{(}\big{|}{\mathbb{E}}[q_{\mathcal{A}}(G_{p})|M]\!-\!{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\big{|}\geq\tfrac{t}{2}\big{)}\land{\mathcal{E}}\big{)}+{\mathbb{P}}({\mathcal{E}}^{c}).

The proof proceeds by bounding separately the terms on the right in (6.10). Firstly, by using part (a) of the theorem and conditioning on M=mM=m where m>2μ/3m>2\mu/3, we have a bound on the first term of (6.10)

((|q𝒜(Gp)𝔼[q(Gp)|M]|t2))\displaystyle{\mathbb{P}}\Big{(}\big{(}\,\big{|}\,q_{\mathcal{A}}(G_{p})-{\mathbb{E}}[q^{*}(G_{p})|M]\,\big{|}\geq\tfrac{t}{2}\big{)}\land{\mathcal{E}}\Big{)} \displaystyle\leq 2exp(12(t2)2(2μ3))=2exp(t2μ/12).\displaystyle 2\exp(-\tfrac{1}{2}(\tfrac{t}{2})^{2}(\tfrac{2\mu}{3}))=2\exp(-t^{2}\mu/12).

The third term is also straight forward to bound. By Lemma 4.1 and since t<3/2t<3/2,

(c)2exp(13(13)2μ)=2exp(μ/27)<2exp((4/243)t2μ).{\mathbb{P}}({\mathcal{E}}^{c})\leq 2\exp(-\tfrac{1}{3}(\tfrac{1}{3})^{2}\mu)=2\exp(-\mu/27)<2\exp(-(4/243)t^{2}\mu).

It now remains to bound the second term of (6.10). Let GpG^{\prime}_{p} be a random subgraph of GG obtained by keeping each edge with probability pp, independently of GpG_{p}, and let M=e(Gp)M^{\prime}=e(G_{p}^{\prime}). By coupling and by Lemma 6.1, for 0<me(G)0<m\leq e(G)

|𝔼[q𝒜(Gp)|M=m]𝔼[q𝒜(Gp)|M=m]|2|mm|m.\Big{|}{\mathbb{E}}\big{[}q_{\mathcal{A}}(G_{p})|M=m]-{\mathbb{E}}[q_{\mathcal{A}}(G^{\prime}_{p})|M^{\prime}=m^{\prime}]\Big{|}\leq\frac{2|m-m^{\prime}|}{m}.

For any xx and any random variable YY with finite mean |x𝔼[Y]|𝔼Y[|xY|]\big{|}x-{\mathbb{E}}[Y]\big{|}\leq{\mathbb{E}}_{Y}[|x-Y|] and thus for 0<me(G)0<m\leq e(G),

|𝔼[q𝒜(Gp)|M=m]𝔼(q𝒜(Gp))|\displaystyle\big{|}{\mathbb{E}}[q_{\mathcal{A}}(G_{p})|M=m]-{\mathbb{E}}(q_{\mathcal{A}}(G^{\prime}_{p}))\big{|} \displaystyle\leq 𝔼M[|𝔼[q𝒜(Gp)|M=m]𝔼[q𝒜(Gp)|M]|]\displaystyle{\mathbb{E}}_{M^{\prime}}\big{[}\big{|}{\mathbb{E}}[q_{\mathcal{A}}(G_{p})|M=m]-{\mathbb{E}}[q_{\mathcal{A}}(G_{p}^{\prime})|M^{\prime}]\big{|}\big{]} (6.11)
\displaystyle\leq (2/m)𝔼M[|mM|]\displaystyle(2/m)\,{\mathbb{E}}_{M^{\prime}}[|m-M^{\prime}|]
\displaystyle\leq (2/m)(|mμ|+𝔼[|Mμ|]).\displaystyle(2/m)\,\big{(}|m-\mu|+{\mathbb{E}}[|M^{\prime}-\mu|]\big{)}.

Note 𝔼[|Mμ|]𝔼[(Mμ)2]μ{\mathbb{E}}[|M^{\prime}-\mu|]\leq\sqrt{{\mathbb{E}}[(M^{\prime}-\mu)^{2}]}\leq\sqrt{\mu} and so

((|𝔼[q𝒜(Gp)|M]𝔼[q𝒜(Gp)]|t2))\displaystyle{\mathbb{P}}\Big{(}\big{(}\big{|}{\mathbb{E}}[q_{\mathcal{A}}(G_{p})|M]-{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\big{|}\geq\tfrac{t}{2}\big{)}\land{\mathcal{E}}\Big{)} \displaystyle\leq ((|Mμ|+μtM4))\displaystyle{\mathbb{P}}\Big{(}\big{(}|M-\mu|+\sqrt{\mu}\geq\tfrac{tM}{4}\big{)}\land{\mathcal{E}}\Big{)}
\displaystyle\leq (|Mμ|tμ6μ)\displaystyle{\mathbb{P}}\Big{(}|M-\mu|\geq\tfrac{t\mu}{6}-\sqrt{\mu}\Big{)}
\displaystyle\leq (|Mμ|tμ7)\displaystyle{\mathbb{P}}\Big{(}|M-\mu|\geq\tfrac{t\mu}{7}\Big{)}

since we assumed that t42μ1/2t\geq 42\mu^{-1/2} and so tμ/6μtμ/7t\mu/6-\sqrt{\mu}\geq t\mu/7. By Lemma 4.1

(|Mμ|tμ/7)2exp(13(t/7)2μ)=2exp(t2μ/147),{\mathbb{P}}\Big{(}|M-\mu|\geq t\mu/7\Big{)}\leq 2\exp\Big{(}-\tfrac{1}{3}(t/7)^{2}\mu\Big{)}=2\exp\big{(}-t^{2}\mu/147\big{)},

and thus we have a bound for the second term

((|𝔼[q𝒜(Gp)|M]𝔼[q𝒜(Gp)]|t2))2exp(t2μ/147),{\mathbb{P}}\Big{(}\big{(}\big{|}{\mathbb{E}}[q_{\mathcal{A}}(G_{p})|M]-{\mathbb{E}}[q_{\mathcal{A}}(G_{p})]\big{|}\geq\tfrac{t}{2}\big{)}\land{\mathcal{E}}\Big{)}\leq 2\exp\big{(}-t^{2}\mu/147\big{)},

which completes the proof of (6.9). It is now possible to choose η\eta small enough that the inequality holds for all t0t\geq 0 and we may take 22 as the coefficient of the exponential - the details of this calculation appear for example in the last part of the proof of Theorem 7.1 of [32]. This completes the proof of the first half of part (b).

To prove the corresponding result for qq^{*} the same technique may be used; in particular we bound the probability that |q(Gp)𝔼[q(Gp)]||q^{*}(G_{p})-{\mathbb{E}}[q^{*}(G_{p})]| is large by considering the sum of three terms as in (6.10). The first and second term can then be handled in the same way. For the third term we can get a slightly better bound by noticing that since q(H)[0,1)q^{*}(H)\in[0,1) for any graph HH we may assume t<1t<1 instead of t<3/2t<3/2. ∎


7 Estimating modularity by sampling a fixed number of vertices: proof of Theorem 1.5

A graph parameter ff is estimable (or testable), see [6] and [28], if for every ε>0\varepsilon>0 there is a positive integer k=k(ε)k=k(\varepsilon) such that if GG is a graph with at least kk vertices, then for XX a uniformly random k-subset of V(G)V(G) we have (|f(G)f(G[X])|>ε)<ε{\mathbb{P}}(|f(G)-f(G[X])|>\varepsilon)<\varepsilon. We shall see here that the modularity value q(G)q^{*}(G) is estimable for dense graphs but not more generally. For convenience we restate Theorem 1.5 from the introduction.

Theorem (restatement of Theorem 1.5).

(a) For fixed ρ\rho with 0<ρ<10<\rho<1, modularity is estimable for graphs with density at least ρ\rho. (b) For any given function ρ(n)=o(1)\rho(n)=o(1), modularity is not estimable for nn-vertex graphs with density at least ρ(n)\rho(n).

We need some definitions. If GG is a graph and S,TV(G)S,T\subseteq V(G) then eG(S,T)e_{G}(S,T) is the number of ordered pairs (s,t)S×T(s,t)\in S\times T such that there is an edge in GG between ss and tt. If GG and GG^{\prime} are graphs with the same vertex set VV, then the cut distance d(G,G):=|V|2maxS,T|eG(S,T)eG(S,T)|d_{\square}(G,G^{\prime}):=|V|^{-2}\max_{S,T}|e_{G}(S,T)-e_{G^{\prime}}(S,T)|. Given a graph GG and bb\in{\mathbb{N}}, we let G(b)G(b) denote the bb-blow-up of GG, where each vertex of GG is replaced by bb independent copies of itself; and thus v(G(b))=bv(G)v(G(b))=b\,v(G) and e(G(b))=b2e(G)e(G(b))=b^{2}\,e(G).

Example 7.1.

q(C5(2))>q(C5)q^{*}(C_{5}(2))>q^{*}(C_{5}). The unique form of an optimal partition of C5C_{5} is into one path P2P_{2} and one P3P_{3}, with modularity 3542+62102=3552100=225\frac{3}{5}-\frac{4^{2}+6^{2}}{10^{2}}=\frac{3}{5}-\frac{52}{100}=\frac{2}{25}. For C5(2)C_{5}(2) we can balance the partition, into two copies of K2,3K_{2,3}, with modularity score 122012=110>225\frac{12}{20}-\frac{1}{2}=\frac{1}{10}>\frac{2}{25}.


Proof of Theorem 1.5(a).

We shall use Theorem 15.1 of [28], which says that a graph parameter f(G)f(G) is estimable if and only if the following three conditions hold.

  • (i)

    If GnG_{n} and GnG^{\prime}_{n} are graphs on the same vertex set and d(Gn,Gn)0d_{\square}(G_{n},G^{\prime}_{n})\!\rightarrow\!0 then f(Gn)f(Gn)0f(G_{n})\!-\!f(G^{\prime}_{n})\!\rightarrow\!~{}0.

  • (ii)

    For every graph GG, f(G(b))f(G(b)) has a limit as bb\rightarrow\infty.

  • (iii)

    f(GK1)f(G)0f(G\!\cup\!K_{1})-f(G)\rightarrow 0 if v(G)v(G)\rightarrow\infty (where GK1G\!\cup\!K_{1} denotes the graph obtained from GG by adding a single isolated vertex).

Observe that always q(GK1)=q(G)q^{*}(G\!\cup\!K_{1})=q^{*}(G), so we need be concerned here only about conditions (i) and (ii). We shall show that condition (ii) concerning blow-ups always holds. After that, we shall show that condition (i) concerning cut distances holds, as long as the graphs are suitably dense. Finally we give examples for sparse graphs which show that in this case modularity is not estimable.

Blow-ups of a graph : condition (ii)

Recall that G(b)G(b) is the bb-blow-up of GG. Observe that always q(G(b))q(G)q^{*}(G(b))\geq q^{*}(G). For if 𝒜{\mathcal{A}} is an optimal partition for GG, with kk parts, then the natural corresponding kk-part partition for G(b)G(b) has modularity score q(G)q^{*}(G). Thus also q(G(jb))q(G(b))q^{*}(G(jb))\geq q^{*}(G(b)) for every jj\in{\mathbb{N}}.

Let GG be a (fixed) graph. We need to show that q(G(b))q^{*}(G(b)) tends to a limit as bb\to\infty. Let q(G)q^{**}(G) be supbq(G(b))\sup_{b}q^{*}(G(b)) where the sup is over all bb\in{\mathbb{N}}. We shall see that in fact

q(G(b))q(G) as b.q^{*}(G(b))\to q^{**}(G)\;\;\;\mbox{ as }\;b\to\infty. (7.1)

If GG has no edges then q(G(b))=0q^{*}(G(b))=0 for all bb\in{\mathbb{N}}; so we may assume that e(G)1e(G)\geq 1. Let aa\in{\mathbb{N}}, let jj\in{\mathbb{N}} and let bb\in{\mathbb{N}} satisfy (j1)ab<ja(j-1)a\leq b<ja. Then

e(G(ja))e(G(b))e(G(ja))=j2a2b2j2a2j2(j1)2j2<2j.\frac{e(G(ja))-e(G(b))}{e(G(ja))}=\frac{j^{2}a^{2}-b^{2}}{j^{2}a^{2}}\leq\frac{j^{2}-(j-1)^{2}}{j^{2}}<\frac{2}{j}.

Hence by a robustness result, Lemma 5.1 of [32], (see also Lemma 6.1 in this paper) we have that |q(G(b))q(G(ja))|<4j|q^{*}(G(b))-q^{*}(G(ja))|<\tfrac{4}{j}; and so

q(G(b))>q(G(ja))4j.q^{*}(G(b))>q^{*}(G(ja))-\tfrac{4}{j}.

Let ε>0\varepsilon>0. Let aa\in{\mathbb{N}} be such that q(G(a))q(G)ε/2q^{*}(G(a))\geq q^{**}(G)-\varepsilon/2. Then, for all bb\in{\mathbb{N}} such that b8a/εb\geq 8a/\varepsilon, letting jj\in{\mathbb{N}} be such that (j1)ab<ja(j-1)a\leq b<ja  (so j>8/εj>8/\varepsilon and thus 4/j<ε/24/j<\varepsilon/2) we have

q(G(b))q(G(ja))ε/2q(G(a))ε/2q(G)ε.q^{*}(G(b))\geq q^{*}(G(ja))-\varepsilon/2\geq q^{*}(G(a))-\varepsilon/2\geq q^{**}(G)-\varepsilon\,.

Now (7.1) follows, as required.

Cut distance and modularity : condition (i) for dense graphs

Fix ρ\rho with 0<ρ<10<\rho<1. A graph GG is called ρ\rho-dense if e(G)ρv(G)2/2e(G)\geq\rho\,v(G)^{2}/2. Let 0<ε<10<\varepsilon<1. We want to show that there exists δ>0\delta>0 such that for all ρ\rho-dense graphs GG, GG^{\prime} with the same vertex set and such that d(G,G)δd_{\square}(G,G^{\prime})\leq\delta we have |q(G)q(G)|ε|q^{*}(G)-q^{*}(G^{\prime})|\leq\varepsilon. With foresight, we shall take δ=ρε16+ 4/ε\delta=\tfrac{\rho\,\varepsilon}{16\,+\,4/\varepsilon}.

By Lemma 1 in [12] there is a kk0=2/εk\leq k_{0}=\lceil 2/\varepsilon\rceil and a partition 𝒜=(A1,,Ak){\mathcal{A}}=(A_{1},\ldots,A_{k}) for GG such that q𝒜(G)q(G)ε/2q_{\mathcal{A}}(G)\geq q^{*}(G)-\varepsilon/2. It suffices to show that

q𝒜(G)q𝒜(G)ε/2q_{\mathcal{A}}(G^{\prime})\geq q_{\mathcal{A}}(G)-\varepsilon/2 (7.2)

(since then q(G)q𝒜(G)q(G)εq^{*}(G^{\prime})\geq q_{\mathcal{A}}(G^{\prime})\geq q^{*}(G)-\varepsilon, and we may similarly deduce that q(G)q(G)εq^{*}(G)\geq q^{*}(G^{\prime})-\varepsilon). To prove (7.2) we first consider the edge contribution q𝒜Eq^{E}_{\mathcal{A}} then the degree tax q𝒜Dq^{D}_{\mathcal{A}}. Let n=v(G)=v(G)n=v(G)=v(G^{\prime}).

Edge contribution q𝒜Eq^{E}_{\mathcal{A}}.   Note that 2|e(G)e(G)|d(G,G)n2δn22|e(G)-e(G^{\prime})|\leq d_{\square}(G,G^{\prime})\,n^{2}\leq\delta n^{2}, so e(G)e(G)+12δn2e(G^{\prime})\leq e(G)+\frac{1}{2}\delta n^{2}; and similarly |intG(Ai)intG(Ai)|12δn2|{\rm int}_{G}(A_{i})-{\rm int}_{G^{\prime}}(A_{i})|\leq\frac{1}{2}\delta n^{2}, so intG(Ai)intG(Ai)12δn2{\rm int}_{G}(A_{i})\geq{\rm int}_{G^{\prime}}(A_{i})-\frac{1}{2}\delta n^{2}. Thus

q𝒜E(G)q𝒜E(G)\displaystyle q^{E}_{\mathcal{A}}(G^{\prime})-q^{E}_{\mathcal{A}}(G) =\displaystyle= i=1k(intG(Ai)e(G)intG(Ai)e(G))\displaystyle\sum_{i=1}^{k}\big{(}\frac{{\rm int}_{G^{\prime}}(A_{i})}{e(G^{\prime})}-\frac{{\rm int}_{G}(A_{i})}{e(G)}\big{)}
\displaystyle\geq i=1k(intG(Ai)12δn2e(G)+12δn2intG(Ai)e(G))\displaystyle\sum_{i=1}^{k}\big{(}\frac{{\rm int}_{G}(A_{i})-\tfrac{1}{2}\delta n^{2}}{e(G)+\tfrac{1}{2}\delta n^{2}}-\frac{{\rm int}_{G}(A_{i})}{e(G)}\big{)}
=\displaystyle= q𝒜E(G)((1+δn22e(G))11)i=1kδn22e(G)+δn2.\displaystyle q^{E}_{\mathcal{A}}(G)\,\left(\big{(}1+\frac{\delta n^{2}}{2e(G)}\big{)}^{-1}-1\right)-\sum_{i=1}^{k}\frac{\delta n^{2}}{2e(G)+\delta n^{2}}.

The second term here (minus the sum) is at least kδn22e(G)k0δ/ρ-k\,\tfrac{\delta n^{2}}{2e(G)}\geq-k_{0}\,\delta/\rho. Also, since (1+x)11x(1+x)^{-1}\geq 1-x for x0x\geq 0, the first term is at least q𝒜E(G)δn22e(G)δ/ρ-q^{E}_{\mathcal{A}}(G)\,\tfrac{\delta n^{2}}{2e(G)}\geq-\delta/\rho. Hence

q𝒜E(G)q𝒜E(G)(k0+1)δ/ρ(2/ε+2)δ/ρ.q^{E}_{\mathcal{A}}(G^{\prime})-q^{E}_{\mathcal{A}}(G)\geq-(k_{0}+1)\,\delta/\rho\geq-(2/\varepsilon+\!2)\,\delta/\rho. (7.3)

Degree tax q𝒜Dq^{D}_{\mathcal{A}}.  Since |vol(G)vol(G)|δn2|{\rm vol}(G)-{\rm vol}(G^{\prime})|\leq\delta n^{2} we have

|vol(G)2vol(G)2|=(vol(G)+vol(G))|vol(G)vol(G)|(vol(G)+vol(G))δn2.|{\rm vol}(G)^{2}-{\rm vol}(G^{\prime})^{2}|=({\rm vol}(G)+{\rm vol}(G^{\prime}))\,|{\rm vol}(G)-{\rm vol}(G^{\prime})|\leq({\rm vol}(G)+{\rm vol}(G^{\prime}))\,\delta n^{2}.

Thus, using the last inequality if vol(G)vol(G){\rm vol}(G^{\prime})\leq{\rm vol}(G),

vol(G)2vol(G)22δn2vol(G)=vol(G)2(12δn2/vol(G))vol(G)2(12δ/ρ).{\rm vol}(G^{\prime})^{2}\geq{\rm vol}(G)^{2}-2\delta n^{2}{\rm vol}(G)={\rm vol}(G)^{2}\,(1-2\delta n^{2}/{\rm vol}(G))\geq{\rm vol}(G)^{2}\,(1-2\delta/\rho). (7.4)

Also |volG(Ai)volG(Ai)|δn2|{\rm vol}_{G}(A_{i})-{\rm vol}_{G^{\prime}}(A_{i})|\leq\delta n^{2} for each ii. We claim that

i=1kvolG(Ai)2i=1kvolG(Ai)22δn2vol(G).\sum_{i=1}^{k}{\rm vol}_{G^{\prime}}(A_{i})^{2}-\sum_{i=1}^{k}{\rm vol}_{G}(A_{i})^{2}\leq 2\delta n^{2}\,{\rm vol}(G^{\prime}). (7.5)

To show this, let I={i[k]:volG(Ai)volG(Ai)}I=\{i\in[k]:{\rm vol}_{G^{\prime}}(A_{i})\geq{\rm vol}_{G}(A_{i})\}. For iIi\in I

volG(Ai)2volG(Ai)2=(volG(Ai)+volG(Ai))(volG(Ai)volG(Ai))2volG(Ai)δn2.{\rm vol}_{G^{\prime}}(A_{i})^{2}-{\rm vol}_{G}(A_{i})^{2}=\big{(}{\rm vol}_{G^{\prime}}(A_{i})+{\rm vol}_{G}(A_{i})\big{)}\big{(}{\rm vol}_{G^{\prime}}(A_{i})-{\rm vol}_{G}(A_{i})\big{)}\leq 2\,{\rm vol}_{G^{\prime}}(A_{i})\cdot\delta n^{2}.

Thus

i=1kvolG(Ai)2i=1kvolG(Ai)2\displaystyle\sum_{i=1}^{k}{\rm vol}_{G^{\prime}}(A_{i})^{2}-\sum_{i=1}^{k}{\rm vol}_{G}(A_{i})^{2} \displaystyle\leq iI(volG(Ai)2volG(Ai)2)\displaystyle\sum_{i\in I}\big{(}{\rm vol}_{G^{\prime}}(A_{i})^{2}-{\rm vol}_{G}(A_{i})^{2}\big{)}
\displaystyle\leq 2δn2iIvolG(Ai)\displaystyle 2\delta n^{2}\,\sum_{i\in I}{\rm vol}_{G^{\prime}}(A_{i})
\displaystyle\leq 2δn2vol(G),\displaystyle 2\delta n^{2}\,{\rm vol}(G^{\prime}),

which completes the proof of (7.5). Using (7.5) and then (7.4), we find

q𝒜D(G)\displaystyle q^{D}_{\mathcal{A}}(G^{\prime}) =\displaystyle= (vol(G))2i=1kvolG(Ai)2\displaystyle({\rm vol}(G^{\prime}))^{-2}\sum_{i=1}^{k}{\rm vol}_{G^{\prime}}(A_{i})^{2}
\displaystyle\leq (vol(G))2((i=1kvolG(Ai)2)+2δn2vol(G))\displaystyle({\rm vol}(G^{\prime}))^{-2}\big{(}\big{(}\sum_{i=1}^{k}{\rm vol}_{G}(A_{i})^{2}\big{)}+2\delta n^{2}{\rm vol}(G^{\prime})\big{)}
\displaystyle\leq (12δ/ρ)1q𝒜D(G)+2δn2/vol(G)\displaystyle(1-2\delta/\rho)^{-1}\,q^{D}_{\mathcal{A}}(G)+2\delta n^{2}/{\rm vol}(G^{\prime})
\displaystyle\leq q𝒜D(G)+4δ/ρ+2δ/ρ\displaystyle q^{D}_{\mathcal{A}}(G)+4\delta/\rho+2\delta/\rho

since (1x)11+2x(1-x)^{-1}\leq 1+2x for 0x120\leq x\leq\frac{1}{2}. Thus

q𝒜D(G)q𝒜D(G)+6δ/ρ.q^{D}_{\mathcal{A}}(G^{\prime})\leq q^{D}_{\mathcal{A}}(G)+6\,\delta/\rho. (7.6)

Putting the results (7.3) on q𝒜Eq^{E}_{\mathcal{A}} and (7.6) on q𝒜Dq^{D}_{\mathcal{A}} together we have

q𝒜(G)q𝒜(G)(2/ε+8)δ/ρq𝒜(G)ε/2q_{\mathcal{A}}(G^{\prime})\geq q_{\mathcal{A}}(G)-(2/\varepsilon+\!8)\,\delta/\rho\geq q_{\mathcal{A}}(G)-\varepsilon/2

by our choice of δ\delta. Hence (7.2) holds, as required. This completes the proof of Theorem 1.5(a), for dense graphs. ∎ We now give a pair of constructions which will demonstrate that modularity is not estimable.

Example 7.2.

Let 0ρ(n)<10\leq\rho(n)<1 and let ρ(n)0\rho(n)\to 0 as nn\to\infty, arbitrarily slowly, so that in particular ρ(n)n/logn\rho(n)\,n/\log n\to\infty. Then there are connected graphs Gn,GnG_{n},G^{\prime}_{n} on vertex set [n][n] such that e(Gn),e(Gn)ρ(n)n2/2e(G_{n}),e(G^{\prime}_{n})\geq\rho(n)\,n^{2}/2; and as nn\to\infty, q(Gn)1q^{*}(G_{n})\to 1 and q(Gn)0q^{*}(G^{\prime}_{n})\to 0, and e(Gn),e(Gn)=o(n2)e(G_{n}),e(G^{\prime}_{n})=o(n^{2}) so d(Gn,Gn)=o(1)d_{\square}(G_{n},G^{\prime}_{n})=o(1). Thus the graphs Gn,GnG_{n},G^{\prime}_{n} are ‘nearly dense’ and are close in cut distance dd_{\square}, but their modularity values are not close.

For GnG_{n} we may let k=k(n)2ρ(n)nk=k(n)\sim 2\rho(n)n, and let GnG_{n} be a collection of disjoint kk-cliques (together with at most k1k-1 (k+1)(k+1)-cliques) joined by edges to form a path. Then e(Gn)(n/k)(k2)nk/2ρ(n)n2e(G_{n})\sim(n/k)\binom{k}{2}\sim nk/2\sim\rho(n)n^{2} so e(Gn)ρ(n)n2/2e(G_{n})\geq\rho(n)n^{2}/2 for nn sufficiently large. Also it is easy to see that the partition 𝒜{\mathcal{A}} of the vertex set into the cliques satisfies q𝒜(Gn)1q_{\mathcal{A}}(G_{n})\sim 1.

For GnG^{\prime}_{n} we may consider a binomial random graph Gn,ρG_{n,\rho}, which with probability at least 13+o(1)\frac{1}{3}+o(1) is connected, has at least ρ(n)n2/2\rho(n)n^{2}/2 edges, and has modularity at most ε(n)\varepsilon(n) for a suitable ε(n)=o(1)\varepsilon(n)=o(1) by Theorem 1.1(c) of [32] (or by other results in that paper).

Proof of Theorem 1.5(b).

Observe that if GG and GG^{\prime} are two nn-vertex graphs with at most εn2/2\varepsilon n^{2}/2 edges then

d(G,G)max{2e(G),2e(G)}/n2ε.d_{\square}(G,G^{\prime})\leq\max\{2e(G),2e(G^{\prime})\}/n^{2}\leq\varepsilon.

Also recall condition (i) for estimability in the proof of Theorem 1.5(a). We may now see that Theorem 1.5(b) (for graphs that do not have at least constant positive density) follows directly from Example 7.2 above. ∎


8 Under-sampling and overestimating modularity

When we sample few edges from a graph HH it seems that we tend to overestimate its modularity; that is, q(Hp)q^{*}(H_{p}) tends to be significantly larger than q(H)q^{*}(H). For example, if HH is the complete graph KnK_{n} and p=1/np=1/n, then q(H)=0q^{*}(H)=0 but q(Hp)1q^{*}(H_{p})\to 1 in probability as nn\to\infty, see Theorem 1.1 of [32]. Our Theorem 1.1 shows that when the expected number e(H)pe(H)p of edges observed is large, although we may overestimate modularity we are unlikely to underestimate it by much. In this section we use Theorem 1.1 to prove that when the sampling probability pp is bounded away from 0, increasing pp is unlikely to increase overestimation by much.

To state the result precisely we give one definition. For random variables XX and YY and ε>0\varepsilon>0, we say that XX ε\varepsilon-nearly (stochastically) dominates YY if

(Xt)(Yt+ε)ε for each t.{\mathbb{P}}(X\geq t)\geq{\mathbb{P}}(Y\geq t+\varepsilon)-\varepsilon\;\;\;\mbox{ for each }t. (8.1)

Observe that if say XX and YY take values in [0,1][0,1] and XX ε\varepsilon-nearly dominates YY then 𝔼[X]>𝔼[Y]2ε{\mathbb{E}}[X]>{\mathbb{E}}[Y]-2\varepsilon, since in this case

𝔼[X]\displaystyle{\mathbb{E}}[X] \displaystyle\geq 01ε(Xt)𝑑t01ε((Yt+ε)ε)𝑑t\displaystyle\int_{0}^{1-\varepsilon}{\mathbb{P}}(X\geq t)\,dt\;\;\geq\int_{0}^{1-\varepsilon}({\mathbb{P}}(Y\geq t+\varepsilon)-\varepsilon)\,dt
=\displaystyle= ε1(Yu)𝑑uε+ε2𝔼[Y]2ε+ε2.\displaystyle\int_{\varepsilon}^{1}{\mathbb{P}}(Y\geq u)\,du-\varepsilon+\varepsilon^{2}\;\;\geq{\mathbb{E}}[Y]-2\varepsilon+\varepsilon^{2}\,.

We now give the main result of this section.

Proposition 8.1.

Let 0<p0<10<p_{0}<1 and ε>0\varepsilon>0. Then there exists cc such that, for any graph HH with at least cc edges and any sampling probabilities p1,p2p_{1},p_{2} with p0p1<p21p_{0}\leq p_{1}<p_{2}\leq 1, it holds that q(Hp1)\,q^{*}(H_{p_{1}}) ε\varepsilon-nearly dominates q(Hp2)q^{*}(H_{p_{2}}).

The case p2=1p_{2}=1 shows that (q(Hp1)q(H)ε)1ε{\mathbb{P}}(q^{*}(H_{p_{1}})\geq q^{*}(H)-\varepsilon)\geq 1-\varepsilon as in Theorem 1.1, except that now we have the lower bound p0>0p_{0}>0 on the sampling probability pp (and cc depends on p0p_{0}).

Proof.

By Theorem 1.1 there exists c0c_{0} such that for all graphs JJ and all 0<p10<p\leq 1 such that e(J)pc0e(J)p\geq c_{0}, we have

(q(Jp)>q(J)ε/2)>1ε/2.{\mathbb{P}}\big{(}q^{*}(J_{p})>q^{*}(J)-\varepsilon/2\big{)}>1-\varepsilon/2.

Let AA be the set of all graphs JJ with e(J)p0c0e(J)p_{0}\geq c_{0}. Then

(q(Jp)>q(J)ε/2)>1ε/2 for each JA and p0p1.{\mathbb{P}}\big{(}q^{*}(J_{p})>q^{*}(J)-\varepsilon/2\big{)}>1-\varepsilon/2\;\;\;\;\mbox{ for each }J\in A\;\mbox{ and }\;p_{0}\leq p\leq 1. (8.2)

Let cc\in{\mathbb{N}} be such that

(Bin(c,p0)c0/p0)1ε/2.{\mathbb{P}}({\rm Bin}(c,p_{0})\geq c_{0}/p_{0})\geq 1-\varepsilon/2.

Fix a graph HH on vertex set VV with e(H)ce(H)\geq c; and note that by the above, for each p0p1p_{0}\leq p\leq 1

(HpA)(Hp0A)>1ε/2.{\mathbb{P}}(H_{p}\in A)\geq{\mathbb{P}}(H_{p_{0}}\in A)>1-\varepsilon/2. (8.3)

We couple Hp1H_{p_{1}} and Hp2H_{p_{2}} in the natural way. Let p=p1/p2p=p_{1}/p_{2}, let ZGn,pZ\sim G_{n,p} and Hp2H_{p_{2}} be independent. For each graph JJ on vertex set VV, let JZJ_{Z} be the graph on VV with edge set E(J)E(Z)E(J)\cap E(Z); and observe that (Hp2)ZHp1(H_{p_{2}})_{Z}\sim H_{p_{1}}. We have

(q((Hp2)Z)>q(Hp2)ε)\displaystyle{\mathbb{P}}(q^{*}((H_{p_{2}})_{Z})>q^{*}(H_{p_{2}})-\varepsilon)
\displaystyle\!\geq\! KA((Hp2=K)(q(KZ)>q(K)ε))\displaystyle\sum_{K\in A}{\mathbb{P}}\big{(}(H_{p_{2}}=K)\land(q^{*}(K_{Z})>q^{*}(K)-\varepsilon)\big{)}
=\displaystyle\!=\! KA(Hp2=K)(q(Kp)>q(K)ε) since Z and Hp2 are independent, and KZKp\displaystyle\sum_{K\in A}{\mathbb{P}}(H_{p_{2}}=K)\;{\mathbb{P}}(q^{*}(K_{p})>q^{*}(K)-\varepsilon)\;\;\;\mbox{ since $Z$ and $H_{p_{2}}$ are independent, and }K_{Z}\sim K_{p}
\displaystyle\!\geq\! (1ε/2)KA(Hp2=K) by (8.2)\displaystyle(1-\varepsilon/2)\sum_{K\in A}{\mathbb{P}}(H_{p_{2}}=K)\;\;\;\mbox{ by~{}\eqref{eqn.A}}
\displaystyle\!\geq\! (1ε/2)2>1ε by (8.3).\displaystyle(1-\varepsilon/2)^{2}>1-\varepsilon\;\;\;\mbox{ by~{}(\ref{eqn.inAnew})}.

Hence for every tt

(q(Hp1)t)\displaystyle{\mathbb{P}}(q^{*}(H_{p_{1}})\geq t) \displaystyle\geq ((q((Hp2)Z)>q(Hp2)ε)(q(Hp2)t+ε))\displaystyle{\mathbb{P}}\big{(}(q^{*}((H_{p_{2}})_{Z})>q^{*}(H_{p_{2}})-\varepsilon)\land(q^{*}(H_{p_{2}})\geq t+\varepsilon)\big{)}
\displaystyle\geq (q(Hp2)t+ε)(q((Hp2)Z)q(Hp2)ε)\displaystyle{\mathbb{P}}(q^{*}(H_{p_{2}})\geq t+\varepsilon)-{\mathbb{P}}(q^{*}((H_{p_{2}})_{Z})\leq q^{*}(H_{p_{2}})-\varepsilon)
>\displaystyle> (q(Hp2)t+ε)ε,\displaystyle{\mathbb{P}}(q^{*}(H_{p_{2}})\geq t+\varepsilon)-\varepsilon,

so q(Hp1)q^{*}(H_{p_{1}}) ε\varepsilon-nearly dominates q(Hp2)q^{*}(H_{p_{2}}) as required. ∎


9 Expected modularity when average degree is constant

The modularity of the Erdős-Rényi (or binomial) random graph Gn,pG_{n,p} is investigated in [32]. Given a constant c>0c>0 we let q¯(n,c)=𝔼[q(Gn,c/n)]\bar{q}(n,c)={\mathbb{E}}[q^{*}(G_{n,c/n})] for each ncn\geq c. By Theorem 1.1 of that paper, for 0<c10<c\leq 1 we have q¯(n,c)1\bar{q}(n,c)\to 1 as nn\to\infty. Let q¯(c)=1\bar{q}(c)=1 for each c(0,1]c\in(0,1].

Conjecture 9.1 ([32]).

For each c>1c>1, q¯(n,c)\bar{q}(n,c) tends to a limit q¯(c)\bar{q}(c) as nn\to\infty.

It was noted in that paper that if the conjecture holds then the function q¯(c)\bar{q}(c) would be uniformly continuous for c(0,)c\in(0,\infty). From Theorem 1.1 (in the present paper) we shall deduce that also q¯(c)\bar{q}(c) would be non-increasing in cc. We collect results on q¯(c)\bar{q}(c) in the following proposition.

Proposition 9.2.
  • (i)

    for 0<c10<c\leq 1, we have q¯(n,c)q¯(c)=1\bar{q}(n,c)\to\bar{q}(c)=1 as nn\to\infty;

and if Conjecture 9.1 holds then

  • (ii)

    0<q¯(c)<10<\bar{q}(c)<1 for c>1c>1

  • (iii)

    q¯(c)=Θ(c12)\bar{q}(c)=\Theta(c^{-\frac{1}{2}}) as cc\to\infty

  • (iv)

    q¯(c)\bar{q}(c) is (uniformly) continuous for c(0,)c\in(0,\infty)

  • (v)

    q¯(c)\bar{q}(c) is non-increasing for c(0,)c\in(0,\infty).

All but part (v)(v) of this result comes directly from [32]: part (i)(i) (as we already noted) and part (ii)(ii) are from Theorem 1.1; part (iii)(iii) is from Theorem 1.3 and part (iv)(iv) is from Lemma 7.4. Part (v)(v) will follow immediately from inequality (9.1) below, so to complete the proof of Proposition 9.2 it remains only to prove the following lemma.

Lemma 9.3.

Let 0<c<c0<c<c^{\prime}. For each ε>0\varepsilon>0, there exists n0n_{0} such that for all nn0n\geq n_{0} we have q¯(n,c)q¯(n,c)<ε\bar{q}(n,c^{\prime})-\bar{q}(n,c)<\varepsilon ; and thus

lim supn(q¯(n,c)q¯(n,c))0.\limsup_{n\to\infty}\;(\bar{q}(n,c^{\prime})-\bar{q}(n,c))\leq 0. (9.1)
Proof.

By Theorem 1.1 there exists c0c_{0} such that for all graphs HH and all 0<p10<p\leq 1 such that e(H)pc0e(H)p\geq c_{0} we have

(q(Hp)>q(H)ε/4)>1ε/4,{\mathbb{P}}(q^{*}(H_{p})>q^{*}(H)-\varepsilon/4)>1-\varepsilon/4\,,

and so

𝔼[q(Hp)](1ε/4)(q(H)ε/4)>q(H)ε/2.{\mathbb{E}}[q^{*}(H_{p})]\geq(1-\varepsilon/4)(q^{*}(H)-\varepsilon/4)>q^{*}(H)-\varepsilon/2.

Let p=c/cp=c/c^{\prime}. Let AA be the set of all graphs HH with e(H)pc0e(H)p\geq c_{0}. Let n0cn_{0}\geq c^{\prime} be sufficiently large that for each nn0n\geq n_{0} we have (Gn,c/nA)>1ε/2{\mathbb{P}}(G_{n,c^{\prime}/n}\in A)>1-\varepsilon/2. Let nn0n\geq n_{0}. Let XGn,c/nX\sim G_{n,c^{\prime}/n} and ZGn,pZ\sim G_{n,p} be independent. For each graph HH on [n][n], let HZH_{Z} be the graph on [n][n] with edge set E(H)E(Z)E(H)\cap E(Z); and observe that HZHpH_{Z}\sim H_{p}. Note also that Y=XZY=X_{Z} satisfies YGn,c/nY\sim G_{n,c/n}. Given a graph HAH\in A, since HZH_{Z} and 𝟏X=H{\mathbf{1}}_{X=H} are independent,

𝔼[q(Y)𝟏X=H]\displaystyle{\mathbb{E}}[q^{*}(Y){\mathbf{1}}_{X=H}] =\displaystyle= 𝔼[q(HZ)𝟏X=H]=𝔼[q(Hp)](X=H)\displaystyle{\mathbb{E}}[q^{*}(H_{Z}){\mathbf{1}}_{X=H}]\;=\;{\mathbb{E}}[q^{*}(H_{p})]\,{\mathbb{P}}(X=H)
\displaystyle\geq (q(H)ε/2)(X=H).\displaystyle(q^{*}(H)-\varepsilon/2)\,{\mathbb{P}}(X=H).

Hence

q¯(n,c)\displaystyle\bar{q}(n,c) =\displaystyle= 𝔼[q(Y)]HA𝔼[q(Y)𝟏X=H]\displaystyle{\mathbb{E}}[q^{*}(Y)]\;\;\geq\;\;\sum_{H\in A}{\mathbb{E}}[q^{*}(Y){\mathbf{1}}_{X=H}]
\displaystyle\geq HA(q(H)ε/2)(X=H)\displaystyle\sum_{H\in A}(q^{*}(H)-\varepsilon/2)\,{\mathbb{P}}(X=H)
>\displaystyle> HAq(H)(X=H)ε/2\displaystyle\sum_{H\in A}q^{*}(H)\,{\mathbb{P}}(X=H)-\varepsilon/2
>\displaystyle> 𝔼[q(X)]ε=q¯(n,c)ε\displaystyle{\mathbb{E}}[q^{*}(X)]-\varepsilon\;\;=\;\;\bar{q}(n,c^{\prime})-\varepsilon

since (XA)<ε/2{\mathbb{P}}(X\not\in A)<\varepsilon/2. Thus q¯(n,c)q¯(n,c)<ε\bar{q}(n,c^{\prime})-\bar{q}(n,c)<\varepsilon for each nn0n\geq n_{0}. It follows that

lim supn(q¯(n,c)q¯(n,c))ε;\limsup_{n\to\infty}\;(\bar{q}(n,c^{\prime})-\bar{q}(n,c))\leq\varepsilon\,;

and since this holds for each ε>0\varepsilon>0, the inequality (9.1) follows. ∎


10 Modularity and edge-sampling on weighted networks

In network applications it can be useful to consider graphs in which the edges have weights. Following the notation of [9], let VV be a non-empty vertex set, and let w:V×Vw:V\times V\to{\mathbb{R}} satisfy w(u,v)=w(v,u)0w(u,v)=w(v,u)\geq 0 for all vertices uu and vv. For simplicity, let us assume that w(v,v)=0w(v,v)=0 for each vv. We call ww a weight function on V2V^{2}. Let max(w)\max(w) denote the maximum of all the values w(u,v)w(u,v).

Define the (weighted) degree of a vertex uu by setting degw(u)=vw(u,v)\deg_{w}(u)=\sum_{v}w(u,v). Similarly, define the (weighted) volume of a vertex set XX by volw(X)=uXdegw(u){\rm vol}_{w}(X)=\sum_{u\in X}\deg_{w}(u), and (corresponding to e(X)e(X)) let ew(X)=12u,vXw(u,v)e_{w}(X)=\tfrac{1}{2}\sum_{u,v\in X}w(u,v).

Assume that ww is not identically zero, that is volw(V)=2ew(V)>0{\rm vol}_{w}(V)=2e_{w}(V)>0. For a given partition 𝒜{\mathcal{A}} of VV, define the modularity score of 𝒜{\mathcal{A}} on ww by

q𝒜(w)\displaystyle q_{\mathcal{A}}(w) =\displaystyle= 1volw(V)A𝒜u,vA(w(u,v)degw(u)degw(v)volw(V))\displaystyle\frac{1}{{\rm vol}_{w}(V)}\sum_{A\in{\mathcal{A}}}\sum_{u,v\in A}\left(w(u,v)-\frac{\deg_{w}(u)\deg_{w}(v)}{{\rm vol}_{w}(V)}\right)
=\displaystyle= q𝒜E(w)q𝒜D(w)\displaystyle q_{{\mathcal{A}}}^{E}(w)-q_{{\mathcal{A}}}^{D}(w)

where

q𝒜E(w)=1volw(V)A𝒜2ew(A) and q𝒜D(w)=1volw(V)2A𝒜volw(A)2.q_{{\mathcal{A}}}^{E}(w)=\frac{1}{{\rm vol}_{w}(V)}\sum_{A\in{\mathcal{A}}}2e_{w}(A)\;\;\mbox{ and }\;\;q_{{\mathcal{A}}}^{D}(w)=\frac{1}{{\rm vol}_{w}(V)^{2}}\sum_{A\in{\mathcal{A}}}{\rm vol}_{w}(A)^{2}.

Define the modularity of ww by q(w)=max𝒜q𝒜(w)q^{*}(w)=\max_{{\mathcal{A}}}q_{{\mathcal{A}}}(w). As in the unweighted case, 0q(w)<10\leq q^{*}(w)<1, and we may ignore vertices with degree 0. If ww is identically 0 we set q𝒜(w)=0q_{{\mathcal{A}}}(w)=0 and q(w)=0q^{*}(w)=0. If ww is {0,1}\{0,1\}-valued then q𝒜(w)q_{{\mathcal{A}}}(w) and q(w)q^{*}(w) are the usual modularity score and modularity value respectively for the graph corresponding to ww. The values q𝒜(w)q_{{\mathcal{A}}}(w) and q(w)q^{*}(w) are unchanged under re-scaling ww, so we may assume that 0w(u,v)10\leq w(u,v)\leq 1 for each u,vVu,v\in V; and in this case we call ww a probability weight function.

Given a weight function ww on V2V^{2} and 0<p10<p\leq 1 we define a random weight function wpw_{p} by considering each edge uvuv independently and keeping w(u,v)w(u,v) unchanged with probability pp, and otherwise setting it to 0. Also, given a probability weight function ww, let GwG_{w} be the random (unweighted) graph obtained by considering each edge uvuv independently, and including edge uvuv with probability w(u,v)w(u,v), and otherwise having no edge uvuv. For a special case of GwG_{w} see Definition 3.1 in [21]. The model has also been considered in network science applications and GwG_{w} is referred to as a probabilistic network (to distinguish it from a binary one - in which all edges have probability either 0 or 1 of appearing) [20, 38].


10.1 Weighted underlying to weighted observed graph

Given a probability weight function ww on V2V^{2} and a probability pp, we defined the random weight function wpw_{p} above. For such weight functions we have results very like Theorems 1.1 and 1.2. Note that the theorems below have conditions on the sum of weights ew(V)e_{w}(V) being large - and recall that the modularity score of ww is unchanged under re-scaling - thus given a general weight function ww we may best apply the theorems to a re-scaling w~\tilde{w} of ww by dividing through by max(w)\max(w) (the maximum weight of an edge).

Theorem 10.1.

Given b>0b>0 and ε>0\varepsilon>0, there exists c=c(ε)c=c(\varepsilon) such that the following holds. Let 0<p10<p\leq 1, and let the probability weight function ww on V2V^{2} satisfy ew(V)pce_{w}(V)\,p\geq c. Then, with probability 1ε\geq 1-\varepsilon, the random weight function wpw_{p} satisfies q(wp)q(w)εq^{*}(w_{p})\geq q^{*}(w)-\varepsilon.


Theorem 10.2.

Given ε>0\varepsilon>0, there exists c=c(ε)c=c(\varepsilon) such that the following holds. Let 0<p10<p\leq 1, and let the probability weight function ww on V2V^{2} satisfy ew(V)pc|V|e_{w}(V)\,p\geq c\,|V|. Then, with probability 1ε\geq 1-\varepsilon,

  • (a)

    the random weight function wpw_{p} satisfies |q(wp)q(w)|<ε|q^{*}(w_{p})-q^{*}(w)|<\varepsilon; and

  • (b)

    given any partition 𝒜{\mathcal{A}} of the vertex set, in a linear number of operations (seeing only GpG_{p}) the greedy amalgamating algorithm finds a partition 𝒜{\mathcal{A}}^{\prime} with q𝒜(w)q𝒜(wp)εq_{{\mathcal{A}}^{\prime}}(w)\geq q_{{\mathcal{A}}}(w_{p})\!-\!\varepsilon.

Observe that Theorem 1.1 is the special case of Theorem 10.1 when ww is {0,1}\{0,1\}-valued, and similarly Theorem 1.2 is a special case of Theorem 10.2. In order to prove Theorems 10.1 and 10.2, we may use almost the same proofs as before. We need a natural minor variant of Lemma 3.1. Given a weight function ww on V2V^{2}, call a set UU of vertices η\eta-fat if volw(U)ηvolw(V){\rm vol}_{w}(U)\geq\eta\,{\rm vol}_{w}(V), and call a partition 𝒜{\mathcal{A}} of VV η\eta-fat if each part is η\eta-fat. The following lemma is very similar to Lemma 3.1 and may be proved in exactly the same way.

Lemma 10.3 (The fattening lemma for weighted graphs).

For each non-zero weight function ww on V2V^{2}\!, and each 0<η10<\eta\leq 1, there is an η\eta-fat partition 𝒜{\mathcal{A}} of VV such that q𝒜(w)>q(w)2ηq_{{\mathcal{A}}}(w)>q^{*}(w)-2\eta. Indeed, given any partition 𝒜0{\mathcal{A}}_{0} of VV, using a linear number of operations, by amalgamating parts we can construct an η\eta-fat partition 𝒜{\mathcal{A}} such that q𝒜(w)>q𝒜0(w)2ηq_{{\mathcal{A}}}(w)>q_{{\mathcal{A}}_{0}}(w)-2\eta.

Proof of Theorem 10.1..

The proof is very similar to that of Theorem 1.1 so we indicate just a few key steps where there are differences.

As in that proof we may assume that 0<ε<10<\varepsilon<1 and q(w)εq^{*}(w)\geq\varepsilon, and we set η=ε/4\eta=\varepsilon/4. By the weighted fattening lemma, Lemma 10.3, there exists an η\eta-fat partition 𝒜={A1,,Ak}{\mathcal{A}}=\{A_{1},\ldots,A_{k}\} (where k1/ηk\leq 1/\eta) such that q𝒜(w)q(w)2η2ηq_{\mathcal{A}}(w)\geq q^{*}(w)-2\eta\geq 2\eta. Let t=(ew(V)p)1/2t=(e_{w}(V)p)^{1/2}. Then corresponding to (4.1), for 0xt0\leq x\leq t (and noting that w(u,v)1w(u,v)\leq 1 for each edge uvuv)

(|ewp(V)ew(V)p|ew(V)px/t)2ex2/3.{\mathbb{P}}\big{(}\;|e_{w_{p}}(V)-e_{w}(V)p|\geq e_{w}(V)p\cdot x/t\;)\leq 2e^{-x^{2}/3}.

Let e𝒜int(w)e^{\rm int}_{\mathcal{A}}(w) denote the sum of ‘internal’ edge weights within the parts of 𝒜{\mathcal{A}}. Then corresponding to (4.3), for 0<x(3η)1/2t0<x\leq(3\eta)^{1/2}t, with probability at least 14ex2/31-4e^{-x^{2}/3}

q𝒜E(wp)=e𝒜int(wp)ewp(V)q𝒜E(w)1(3η)1/2x/t1+x/t.q^{E}_{\mathcal{A}}(w_{p})=\frac{e^{\rm int}_{\mathcal{A}}(w_{p})}{e_{w_{p}}(V)}\geq q^{E}_{\mathcal{A}}(w)\frac{1-(3\eta)^{-1/2}x/t}{1+x/t}.

For the degree tax, corresponding to (4.4) we find the following. For 0<x(2η)1/2t0<x\leq(2\eta)^{1/2}t, with probability at least 12(k+1)ex2/(6b)1-2(k+1)e^{-x^{2}/(6b)}

q𝒜D(wp)q𝒜D(w)(1+(2η)1/2x/t1x/t)2.q^{D}_{\mathcal{A}}(w_{p})\leq q^{D}_{\mathcal{A}}(w)\left(\frac{1+(2\eta)^{-1/2}x/t}{1-x/t}\right)^{2}.

Thus for 0<x(2η)1/2t0<x\leq(2\eta)^{1/2}t, with probability at least 12(k+3)ex2/61-2(k+3)e^{-x^{2}/6} both the last two displayed inequalities hold. The failure probability is at most 2(4/ε+3)ex2/62(4/\varepsilon+3)e^{-x^{2}/6}, and we may thus choose x=x(ε)x=x(\varepsilon) sufficiently large that the probability is at most ε\varepsilon; and indeed we may take x=Θ((logε1)1/2)x=\Theta((\log\varepsilon^{-1})^{1/2}).

The rest of the proof follows the non-weighted case by making similar minor adaptations.∎

Proof of Theorem 10.2.

As in the last proof, we may follow the proof of the non-weighted version, Theorem 1.2, with the following adaptations. In place of the fattening lemma, use the weighted version, Lemma 10.3; replace instances of GG and GpG_{p} by ww and wpw_{p} respectively. We may still apply Lemma 4.1 with 0Xj20\leq X_{j}\leq 2 since all edges have weight at most 1. ∎


10.2 Weighted underlying to unweighted observed graph

We will see that the proofs in this subsection follow our proofs of Theorems 1.1 and 1.2 almost line by line, replacing all instances of GG with ww and all instances of GpG_{p} with GwG_{w}.

Theorem 10.4.

There exists c=c(ε)c=c(\varepsilon) such that the following holds. Let the probability weight function ww on V2V^{2} satisfy ew(V)ce_{w}(V)\geq c. Then with probability at least 1ε1-\varepsilon the random graph GwG_{w} satisfies q(Gw)>q(w)εq^{*}(G_{w})>q^{*}(w)-\varepsilon.


Theorem 10.5.

Given ε>0\varepsilon>0, there exists c=c(ε)c=c(\varepsilon) such that the following holds. Let the probability weight function ww on V2V^{2} satisfy ew(V)c|V|e_{w}(V)\geq c\,|V|. Then with probability at least 1ε1-\varepsilon,

  • (a)

    the random graph GwG_{w} satisfies |q(Gw)q(w)|<ε|q^{*}(G_{w})-q^{*}(w)|<\varepsilon; and

  • (b)

    given any partition 𝒜{\mathcal{A}} of the vertex set, in a linear number of operations (seeing only GpG_{p}) the greedy amalgamating algorithm finds a partition 𝒜{\mathcal{A}}^{\prime} with q𝒜(w)q𝒜(Gw)εq_{{\mathcal{A}}^{\prime}}(w)\geq q_{{\mathcal{A}}}(G_{w})\!-\!\varepsilon.

Proof of Theorem 10.4.

The proof follows that of the non-weighted case, Theorem 1.1, line by line with the following adaptations. In place of the fattening lemma used on the underlying graph, use the weighted version, Lemma 10.3; and replace instances of GG and GpG_{p} by ww and GwG_{w} respectively. Note that Lemma 4.1 still applies with 0Xi20\leq X_{i}\leq 2 - more details are given in the proof of Theorem 10.5. ∎

Proof of Theorem 10.5.

As in the proof of Theorem 1.2, let ε>0\varepsilon>0 and let c>Kε3logε1c>K\varepsilon^{-3}\log\varepsilon^{-1} for a sufficiently large constant KK. We again set η=ε/9\eta=\varepsilon/9. Let ww be a fixed probability weight function on V2V^{2}, where |V|=n|V|=n; and assume that ew(V)pcne_{w}(V)p\geq cn.

Then the corresponding events 0,1,2\mathcal{B}_{0},\mathcal{B}_{1},\mathcal{B}_{2} and 0{\mathcal{E}}_{0} may all be defined as before, replacing instances of GG and GpG_{p} with ww and GwG_{w} respectively. Note that in the definition of 0{\mathcal{E}}_{0}, one constructs partition 𝒜0{\mathcal{A}}_{0}^{\prime} from 𝒜0{\mathcal{A}}_{0} seeing only the observed graph GwG_{w}, and thus using the usual fattening lemma, Lemma 3.1 rather than the weighted one. Then the proof proceeds by noting (0){\mathbb{P}}(\mathcal{B}_{0}) is small due to Theorem 10.4 and proving the statements corresponding to (5.1) to (5.4).

Corresponding to the proof of (5.1) only the usual substitutions of GG and GpG_{p} by ww and GwG_{w} are required. For the observation after that a small change is needed. Notice that since max(w)1\max(w)\leq 1 we have that v(G)2/2ew(V)v(G)^{2}/2\geq e_{w}(V) and ew(V)cv(G)e_{w}(V)\geq cv(G) by assumption. Thus v(G)2cv(G)\geq 2c as in the earlier proof.

Corresponding to the proof of (5.3) there are a few minor changes. Let AVA\subseteq V have volw(A)<η2volw(V){\rm vol}_{w}(A)<\frac{\eta}{2}{\rm vol}_{w}(V). Suppose the (unordered) pairs u,vu,v of vertices in AA with w(u,v)>0w(u,v)>0 are labelled u1v1,,ujvju_{1}v_{1},\ldots,u_{j}v_{j} (some vertices may be repeated), and the pairs of vertices with exactly one endpoint in AA and positive edge weight are labelled uj+1vj+1,,ukvku_{j+1}v_{j+1},\ldots,u_{k}v_{k}. For i=1,,ji=1,\ldots,j let XiX_{i} be 2 if edge uiviu_{i}v_{i} is in GwG_{w} and let XiX_{i} be 0 otherwise; and for i=j+1,,ki=j+1,\ldots,k let Xi=1X_{i}=1 if edge uiviu_{i}v_{i} is in GwG_{w} and let XiX_{i} be 0 otherwise (so the random variable XiX_{i} satisfies 0Xi20\leq X_{i}\leq 2 if iji\leq j and 0Xi10\leq X_{i}\leq 1 if i>ji>j, and is non-zero with probability w(ui,vi)w(u_{i},v_{i}).) Corresponding to the original proof, volGw(A)=iXi{\rm vol}_{G_{w}}(A)=\sum_{i}X_{i} and volw(A)=𝔼[iXi]{\rm vol}_{w}(A)={\mathbb{E}}[\sum_{i}X_{i}], and the rest of the proof corresponding to (5.3) follows in the same manner – note that Lemma 4.1 (with b=2b=2) still applies.

Corresponding to the proof of Claim (5.7), let 5\mathcal{B}_{5} be the event that, for some partition 𝒜𝒬{\mathcal{A}}\in\mathcal{Q} such that e𝒜int(w)<η2ew(V)e_{{\mathcal{A}}}^{\rm int}(w)<\tfrac{\eta}{2}e_{w}(V), we have e𝒜int(Gw)ηew(V)e_{{\mathcal{A}}}^{\rm int}(G_{w})\geq\eta e_{w}(V). We note that e𝒜int(Gw)e_{{\mathcal{A}}}^{\rm int}(G_{w}) is stochastically at most a sum of Bernoulli random variables, such that the mean (of the sum) is ηew(V)/2\eta e_{w}(V)/2 so we may apply Lemma 4.1 (with b=1b=1). The rest of the proof corresponding to Claim (5.7) and indeed the entire proof continues with similar adaptations. ∎


10.3 Application : stochastic block model

In this subsection we show that Theorem 2.1 on the modularity value q(Gn,k,p,q)q^{*}(G_{n,k,p,q}) of the stochastic block model follows quickly from Theorem 10.5 (a weighted version of Theorem 1.2) and the deterministic result Lemma 10.6. For k2k\geq 2 and 0qp0\leq q\leq p let

q(k,p,q)=(pq)(11/k)p+(k1)q.q(k,p,q)=\frac{(p-q)\,(1-1/k)}{p+(k-1)q}.

Since rescaling a weight function ww does not change q(w)q^{*}(w), for simplicity we set α=1\alpha=1 in the following lemma.

Lemma 10.6.

Let k,k2k\in{\mathbb{N}},k\geq 2. For nn\in{\mathbb{N}} let V1VkV_{1}\cup\cdots\cup V_{k} be a partition of V=[n]V=[n] where n/k|Vi|n/k\lfloor n/k\rfloor\leq|V_{i}|\leq\lceil n/k\rceil. Let α=1\alpha=1 and 0β=β(n)α0\leq\beta=\beta(n)\leq\alpha, and let w=w(n,k,α,β)w=w(n,k,\alpha,\beta) be the weight function on vertex set VV with wuv=αw_{uv}=\alpha if uu and vv are in the same block ViV_{i} and with wuv=βw_{uv}=\beta otherwise. Then q𝒜0(w)=q(k,α,β)+o(1)q_{{\mathcal{A}}_{0}}(w)=q(k,\alpha,\beta)+o(1) for the planted partition 𝒜0{\mathcal{A}}_{0}, and q(w)=q(k,α,β)+o(1)q^{*}(w)=q(k,\alpha,\beta)+o(1).

Theorem 4.2 in the recent paper by Koshelev [21] concerns the same weighted graph as above and gives an upper bound on q(w)q^{*}(w) without the factor (11/k)(1-1/k), i.e.  q(w)(αβ)/(α+(k1)β)+o(1)q^{*}(w)\leq(\alpha-\beta)/(\alpha+(k-1)\beta)+o(1). The proof in [21] proceeds by calculating the eigenvalues of the weighted adjacency matrix of ww. The proof below of Lemma 10.6 involves a simple weighted notion, fw(A)f_{w}(A), of ‘per-unit-modularity’ as used in [34, 27].

Proof.

It will be straightforward to show that the planted partition 𝒜0{\mathcal{A}}_{0} has modularity score as claimed: the main part of the proof is to show that q(w)q^{*}(w) is at most this value.

First we show that we may write the modularity of the weight function ww as a weighted sum of a function fw(A)f_{w}(A) on vertex sets AA - see (10.1). The proof of the upper bound will proceed by bounding the maximum value of fw(A)f_{w}(A) over vertex sets AA. By definition, for any partition 𝒜{\mathcal{A}} of VV

q𝒜(w)\displaystyle q_{\mathcal{A}}(w) =\displaystyle= 1volw(V)A𝒜(2ew(A)volw(A)2volw(V))=A𝒜volw(A)volw(V)fw(A)\displaystyle\frac{1}{{\rm vol}_{w}(V)}\sum_{A\in{\mathcal{A}}}\left(2e_{w}(A)-\frac{{\rm vol}_{w}(A)^{2}}{{\rm vol}_{w}(V)}\right)=\sum_{A\in{\mathcal{A}}}\frac{{\rm vol}_{w}(A)}{{\rm vol}_{w}(V)}f_{w}(A)

where

fw(A)=2ew(A)volw(A)volw(A)volw(V).f_{w}(A)=\frac{2e_{w}(A)}{{\rm vol}_{w}(A)}-\frac{{\rm vol}_{w}(A)}{{\rm vol}_{w}(V)}. (10.1)

Let η>0\eta>0 and let 𝒜{\mathcal{A}} be an η\eta-fat partition of VV. By Lemma 10.3 it will suffice for us to show that q𝒜(w)q(k,α,β)+o(1)q_{\mathcal{A}}(w)\leq q(k,\alpha,\beta)+o(1); and since q𝒜(w)q_{\mathcal{A}}(w) is a weighted average of the values fw(A)f_{w}(A), it will suffice to show that fw(A)q(k,α,β)+o(1)f_{w}(A)\leq q(k,\alpha,\beta)+o(1) for every AVA\subseteq V with volw(A)ηvolw(V){\rm vol}_{w}(A)\geq\eta\,{\rm vol}_{w}(V). Fix such a set AA. Note that volw(V)=Θ(n2){\rm vol}_{w}(V)=\Theta(n^{2}), and so also volw(A)=Θ(n2){\rm vol}_{w}(A)=\Theta(n^{2}).

Define δi\delta_{i} such that |AVi|=δi|Vi||A\cap V_{i}|=\delta_{i}|V_{i}|, and note that 0δi10\leq\delta_{i}\leq 1 with iδi>0\sum_{i}\delta_{i}>0. Observe that the weighted complete graph on [n][n] is approximately regular with weighted degree

degw(u)=(α+(k1)β)n/k+O(1)\deg_{w}(u)=(\alpha+(k-1)\beta)\,n/k+O(1)

for each vertex uu (where the O(1)O(1) error term is in (2α+β,β)(2,1)(-2\alpha+\beta,\beta)\subseteq(-2,1)). Thus volw(V)=(α+(k1)β)n2/k+O(n){\rm vol}_{w}(V)=(\alpha+(k-1)\beta)n^{2}/k+O(n) and volw(A)=(α+(k1)β)(iδi)n2/k2+O(n){\rm vol}_{w}(A)=(\alpha+(k-1)\beta)(\sum_{i}\delta_{i})n^{2}/k^{2}+O(n). Also

2ew(A)\displaystyle 2e_{w}(A) =\displaystyle= αn2k2(iδi2)+βn2k2(ijδiδj)+O(n)\displaystyle\alpha\frac{n^{2}}{k^{2}}\big{(}\sum_{i}\delta_{i}^{2}\big{)}+\beta\frac{n^{2}}{k^{2}}\big{(}\sum_{i\neq j}\delta_{i}\delta_{j}\big{)}+O(n)
=\displaystyle= n2k2((αβ)(iδi2)+β(iδi)2)+O(n)\displaystyle\frac{n^{2}}{k^{2}}\left((\alpha-\beta)(\sum_{i}\delta_{i}^{2})+\beta(\sum_{i}\delta_{i})^{2}\right)+O(n)

since ijδiδj=(iδi)2iδi2\sum_{i\neq j}\delta_{i}\delta_{j}=(\sum_{i}\delta_{i})^{2}-\sum_{i}\delta_{i}^{2}. Thus

fw(A)\displaystyle f_{w}(A) =\displaystyle= (αβ)(iδi2)+β(iδi)2(α+(k1)β)(iδi)(α+(k1)β)(iδi)k(α+(k1)β)+O(1/n)\displaystyle\frac{(\alpha-\beta)(\sum_{i}\delta_{i}^{2})+\beta(\sum_{i}\delta_{i})^{2}}{(\alpha+(k-1)\beta)(\sum_{i}\delta_{i})}-\frac{(\alpha+(k-1)\beta)(\sum_{i}\delta_{i})}{k(\alpha+(k-1)\beta)}+O\Big{(}1/n\Big{)}
=\displaystyle= (αβ)(iδi2)+(β1k(α+(k1)β)(iδi)2(α+(k1)β)(iδi)+O(1/n)\displaystyle\frac{(\alpha-\beta)(\sum_{i}\delta_{i}^{2})+(\beta-\frac{1}{k}(\alpha+(k-1)\beta)(\sum_{i}\delta_{i})^{2}}{(\alpha+(k-1)\beta)(\sum_{i}\delta_{i})}+O\Big{(}1/n\Big{)}
=\displaystyle= (αβ)((iδi2)1k(iδi)2)(α+(k1)β)(iδi)+O(1/n).\displaystyle\frac{(\alpha-\beta)((\sum_{i}\delta_{i}^{2})-\frac{1}{k}(\sum_{i}\delta_{i})^{2})}{(\alpha+(k-1)\beta)(\sum_{i}\delta_{i})}+O\Big{(}1/n\Big{)}.

Now, for any s>0s>0, given that iδi=s\sum_{i}\delta_{i}=s

fw(A)=(αβ)((iδi2)/ss/k)α+(k1)β+O(1/n).f_{w}(A)=\frac{(\alpha-\beta)\,((\sum_{i}\delta_{i}^{2})/s-s/k)}{\alpha+(k-1)\beta}+O\Big{(}1/n\Big{)}.

We now show (iδi2)/ss/k11/k(\sum_{i}\delta_{i}^{2})/s-s/k\leq 1-1/k. If some δi=1\delta_{i}=1 and the other δj\delta_{j} are zero (in other words, if AA is ViV_{i}) then s=1s=1 and (iδi2)/ss/k=11/k(\sum_{i}\delta_{i}^{2})/s-s/k=1-1/k. If s1s\leq 1 then (iδi2)/ss/kss/k11/k(\sum_{i}\delta_{i}^{2})/s-s/k\leq s-s/k\leq 1-1/k. Also, suppose that s>1s>1 say s=a+xs=a+x where aa\in{\mathbb{N}} and 0x<10\leq x<1. Then iδi2a+x2\sum_{i}\delta_{i}^{2}\leq a+x^{2}, and (iδi2)/ss/ka+x2a+xsk<11/k(\sum_{i}\delta_{i}^{2})/s-s/k\leq\frac{a+x^{2}}{a+x}-\frac{s}{k}<1-1/k. Thus

fw(A)q(k,α,β)+O(1/n)f_{w}(A)\leq q(k,\alpha,\beta)+O\Big{(}1/n\Big{)}

(and the upper bound is achieved when AA is a block ViV_{i}). But, as we noted earlier, q𝒜(w)q_{\mathcal{A}}(w) is a weighted average of the values fw(A)f_{w}(A) for A𝒜A\in{\mathcal{A}}, and so

q𝒜(w)q(k,α,β)+O(1/n).q_{\mathcal{A}}(w)\leq q(k,\alpha,\beta)+O\Big{(}1/n\Big{)}.

Hence

q(w)q(k,α,β)+o(1),q^{*}(w)\leq q(k,\alpha,\beta)+o(1),

and we have the upper bound claimed.

Now take 𝒜{\mathcal{A}} as the planted partition {V1,,Vk}\{V_{1},\ldots,V_{k}\}, and note that fw(Vi)=q(k,α,β)+O(1/n)f_{w}(V_{i})=q(k,\alpha,\beta)+O(1/n) for each ii. Now since q𝒜(w)q_{\mathcal{A}}(w) is a weighted average of the values fw(Vi)f_{w}(V_{i}) we see that

q𝒜(w)=q(k,α,β)+O(1/n),q_{\mathcal{A}}(w)=q(k,\alpha,\beta)+O(1/n),

and we are done. ∎

Proof of Theorem 2.1..

Let α=1\alpha=1 and β=ρ\beta=\rho (so 0βα0\leq\beta\leq\alpha). Let ε>0\varepsilon>0. Let w=w(n,k,α,β)w=w(n,k,\alpha,\beta) as in Lemma 10.6, and let w^=pw\hat{w}=p\,w. Then Gn,k,p,qG_{n,k,p,q} is Gw^G_{\hat{w}}, and whp |q(Gw^)q(w^)|ε|q^{*}(G_{\hat{w}})-q^{*}(\hat{w})|\leq\varepsilon by Theorem 10.5 since with V=[n]V=[n] we have ew^(V)/nnp2k+O(1)e_{\hat{w}}(V)/n\geq\frac{np}{2k}+O(1)\to\infty as nn\to\infty. But q(w^)=q(w)q^{*}(\hat{w})=q^{*}(w) and q𝒜(w^)=q𝒜(w)q-{\mathcal{A}}(\hat{w})=q_{\mathcal{A}}(w) for the planted partition 𝒜{\mathcal{A}}, so by Lemma 10.6

q𝒜(Gn,k,p,q)=q(k,α,β)+o(1)whpq_{\mathcal{A}}(G_{n,k,p,q})=q(k,\alpha,\beta)+o(1)\;\;\;\mbox{whp}

and

q(Gn,k,p,q)=q(k,α,β)+o(1)whp,q^{*}(G_{n,k,p,q})=q(k,\alpha,\beta)+o(1)\;\;\;\mbox{whp},

and we are done. ∎

There is a version of the stochastic block model in which vertices are assigned to blocks independently and uniformly at random. Consider the variant of Theorem 2.1 using this version of the stochastic block model. Note that the part sizes will whp be n/k±n1/2lognn/k\pm n^{1/2}\log n. Thus by a coupling argument the edge set will differ by o(m)o(m) edges whp from the original model; and by the robustness result, Lemma 6.1, we find that whp the modularity values q𝒜q_{\mathcal{A}} and qq^{*} are both q(k,α,β)+o(1)q(k,\alpha,\beta)+o(1) as before.


11 Concluding remarks

Three phases of modularity
As mentioned in Section 2.3, the modularity of the binomial random graph Gn,pG_{n,p} exhibits three phases [32]. We restate the result as a sampling one. Recall that q(Kn)=0q^{*}(K_{n})=0 [7]. Let G=KnG=K_{n} and consider nn\to\infty. Then in the sparse case (where e(G)pe(G)p\rightarrow\infty and e(G)p/n12+o(1)e(G)p/n\leq\frac{1}{2}+o(1)) we have q(Gp)q^{*}(G_{p}) near 1 whp, in the dense case (where e(G)p/ne(G)p/n\rightarrow\infty) q(Gp)q^{*}(G_{p}) is near q(G)=0q^{*}(G)=0 whp, and in between (where c1e(G)p/nc2c_{1}\leq e(G)p/n\leq c_{2} for constants 12<c1<c2\frac{1}{2}<c_{1}<c_{2}) q(Gp)q^{*}(G_{p}) is bounded away from q(G)=0q^{*}(G)=0 and 11 whp. The question below asks if we may extend this three phase behaviour from complete graphs to a larger class of underlying graphs. Let us restrict our attention to graphs without isolated vertices.


Question 11.1.

For which classes \mathcal{H} of graphs HH do we have parts (i) and (ii) of the following three phase result (where we write nn for v(H)v(H))?

  • (i)

    If e(H)pe(H)p\rightarrow\infty and e(H)p/n12e(H)p/n\leq\frac{1}{2} then q(Hp)=1+o(1)q^{*}(H_{p})=1+o(1) whp.

  • (ii)

    There exist ε>0\varepsilon>0 and 0<c1<c20<c_{1}<c_{2} such that if c1e(H)p/nc2c_{1}\leq e(H)p/n\leq c_{2} then q(H)+ε<q(Hp)<1εq^{*}(H)+\varepsilon<q^{*}(H_{p})<1-\varepsilon whp.

  • (iii)

    If e(H)p/ne(H)p/n\rightarrow\infty then q(Hp)=q(H)+o(1)q^{*}(H_{p})=q^{*}(H)+o(1) whp.

Firstly, observe that part (iii) is implied by Theorem 1.2: thus the open question is for which classes of graphs we have parts (i) and (ii). Secondly, we can only get three genuine phases if e(H)/ne(H)/n is unbounded. We have already noted that the three phase result holds for complete graphs, and by double exposure it must hold also for the random graph Gn,pG_{n,p} where npnp\to\infty as nn\to\infty.

Part (i) is false for complete bipartite graphs Kk,nK_{k,n} for any fixed kk, since q(G)11/kq^{*}(G)\leq 1-1/k for any subgraph GG of Kk,nK_{k,n} (since GG has at most kk components, ignoring isolated vertices). Similarly, if tt and k1,,ktk_{1},\ldots,k_{t} are fixed and HH has components Kk1,n1,,Kkt,ntK_{k_{1},n_{1}},\ldots,K_{k_{t},n_{t}} then part (i) is false. Also, for (ii) to hold, we clearly must have q(H)q^{*}(H) bounded away from 1.


Random false positives
In our model, the observed graph GpG_{p} has random false negatives, that is, there are are edges in GG which are non-edges in GpG_{p}, but no false positives. Theorem 1.2 shows that the modularity value is robust to random false negatives: so long as the observed graph has high enough expected degree the modularity value of the observed graph is close to the modularity of the underlying graph. (For an underlying graph with Θ(n2)\Theta(n^{2}) edges we may have a 1Θ(1/n)1-\Theta(1/n) chance of not seeing each edge.) However, perhaps random false positives may cause a large increase or decrease to the modularity value, possibly as much as adversarially added false positives.

Suppose that we are given all the edges in GG, but we may falsely think that some non-edges – the false positives – are also edges of GG. (We have not allowed false positives until now.) What can we say about modularity? Let us consider graphs GnG_{n} with nn vertices and m=m(n)m=m(n) edges which we see fully and correctly; and suppose that there are m~=m~(n)\tilde{m}=\tilde{m}(n) false positives randomly chosen from the non-edges of the graph, so the observed graph GnG^{\prime}_{n} has m+m~m+\tilde{m} edges.

If m~\tilde{m} is much smaller than mm then (unsurprisingly) we have no difficulties: by Lemma 6.1, if m~(ε/2)m\tilde{m}\leq(\varepsilon/2)m then |q(Gn)q(Gn)|ε|q^{*}(G_{n})-q^{*}(G^{\prime}_{n})|\leq\varepsilon so q(Gn)q^{*}(G^{\prime}_{n}) is a good estimate of q(Gn)q^{*}(G_{n}). At the other extreme, if m~\tilde{m} is much larger than mm then there is little point in using q(Gn)q^{*}(G^{\prime}_{n}) as an estimate for q(Gn)q^{*}(G_{n}). For by Lemma 6.1, if m~(2/ε)m\tilde{m}\geq(2/\varepsilon)m and we denote by G~n\tilde{G}_{n} the graph formed just from the false positive edges, then |q(Gn)q(G~n)|ε|q^{*}(G^{\prime}_{n})-q^{*}(\tilde{G}_{n})|\leq\varepsilon; so that q(Gn)q^{*}(G^{\prime}_{n}) is essentially determined by q(G~n)q^{*}(\tilde{G}_{n}), whatever the value of q(Gn)q^{*}(G_{n}).

The interest is thus in the balanced case, when m~δm\tilde{m}\sim\delta m for some constant δ\delta. We will see in Example 11.2 that sometimes randomly adding false positives may increase the modularity nearly as much as adversarially choosing edges to add. Part (a) of the robustness lemma, Lemma 6.1, gives the following ‘adversarial’ bound. If the graph GG has mm edges, and GG^{\prime} is any graph obtained from GG by adding δm\delta m edges, then for any vertex partition 𝒜{\mathcal{A}},

|q(G)q(G)|,|q𝒜(G)q𝒜(G)|2δ1+δ.|q^{*}(G)-q^{*}(G^{\prime})|,\,|q_{\mathcal{A}}(G)-q_{\mathcal{A}}(G^{\prime})|\;\leq\frac{2\delta}{1+\delta}. (11.1)
Example 11.2.

Let k=k(n)k=k(n)\in{\mathbb{N}} and suppose that 2kn1/22\leq k\ll n^{1/2}. Let GnG_{n} consist of a star on [k][k] together with nkn-k isolated vertices; and note that m=e(Gn)=k1m=e(G_{n})=k-1 and q(Gn)=0q^{*}(G_{n})=0. Form GnG^{\prime}_{n} by adding δm\delta m new edges uniformly at random. Then the difference in modularity values is approximately 2δ/(1+δ)2\delta/(1+\delta), matching the adversarial bound (11.1).

To see this we may check that whp GnG^{\prime}_{n} consists of the star together with δm\delta m isolated edges (and nk2δmn-k-2\delta m isolated vertices - which we may ignore). But for such a graph HH, an optimal partition 𝒜{\mathcal{A}} consists of one part [k][k] and δm\delta m parts of size 2 corresponding to the isolated edges - since the parts in an optimal partition must induce connected subgraphs, and connected components which themselves have modularity zero must not be split in an optimal partition [27]. (Note that the modularity values of a star and of a single edge are zero [7]). Now, this partition captures all edges and so

q𝒜(H)=1(2m)2+δm22(2m(1+δ))2=2δ+δ2(1+δ)2δ(1+δ)2m=2δ1+δδ2(1+δ)2+O(1m).q_{{\mathcal{A}}}(H)=1-\frac{(2m)^{2}+\delta m\cdot 2^{2}}{(2m(1+\delta))^{2}}=\frac{2\delta+\delta^{2}}{(1+\delta)^{2}}-\frac{\delta}{(1+\delta)^{2}m}=\frac{2\delta}{1+\delta}-\frac{\delta^{2}}{(1+\delta)^{2}}+O(\frac{1}{m}).

Hence whp

q(Gn)q(Gn)=q(Gn)=2δ1+δδ2(1+δ)2+O(1m),q^{*}(G^{\prime}_{n})-q^{*}(G_{n})=q^{*}(G^{\prime}_{n})=\frac{2\delta}{1+\delta}-\frac{\delta^{2}}{(1+\delta)^{2}}+O(\frac{1}{m})\,,

which is close to the adversarial (worst case) bound in (11.1) when δ\delta is small.

However, how much we may decrease the modularity by adding random false positives is open. Indeed, it is open how much we may decrease the modularity by adversarially adding edges: note that Example 6.2 which showed tightness of Lemma 6.1 only showed tightness for how much the modularity may increase by adding edges.

Question 11.3.

Let GG be a graph with mm edges, let GG^{\prime} be obtained from GG by randomly adding δm\delta m edges, and let G′′G^{\prime\prime} be obtained from GG by adding a set of δm\delta m edges which maximises q(G)q(G′′)q^{*}(G)-q^{*}(G^{\prime\prime}). What upper bounds do we have for q(G)q(G)q^{*}(G)-q^{*}(G^{\prime}) or for q(G)q(G′′)q^{*}(G)-q^{*}(G^{\prime\prime})? Can we beat the upper bound 2δ/(1+δ)2\delta/(1+\delta) significantly?

References

  • [1] L. A. Adamic and N. Glance. The political blogosphere and the 2004 US election: divided they blog. In Proceedings of the 3rd international workshop on Link discovery, 2005.
  • [2] P. J. Bickel and A. Chen. A nonparametric view of network models and newman–girvan and other modularities. Proceedings of the National Academy of Sciences, 106(50), 2009.
  • [3] P. J. Bickel, A. Chen, Y. Zhao, E. Levina, and J. Zhu. Correction to the proof of consistency of community detection. The Annals of Statistics, 2015.
  • [4] V. D. Blondel, J. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), 2008.
  • [5] M. Bolla, T. Kói, and A. Krámli. Testability of minimum balanced multiway cut densities. Discrete Applied Mathematics, 160(7-8), 2012.
  • [6] C. Borgs, J.T. Chayes, L. Lovász, V.T. Sós, and K. Vesztergombi. Convergent sequences of dense graphs I: Subgraph frequencies, metric properties and testing. Advances in Mathematics, 219(6), 2008. doi:10.1016/j.aim.2008.07.008.
  • [7] U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski, and D. Wagner. On modularity clustering. Knowledge and Data Engineering, IEEE Transactions on, 20(2), 2008. doi:10.1109/TKDE.2007.190689.
  • [8] J. Chellig, N. Fountoulakis, and F. Skerman. The modularity of random graphs on the hyperbolic plane. Journal of Complex Networks, 10(1), 2022.
  • [9] F. Chung. Spectral graph theory, volume 92. American Mathematical Soc.  Providence, RI, 1997. doi:10.1090/cbms/092.
  • [10] V. Cohen-Addad, A. Kosowski, F. Mallmann-Trenn, and D. Saulpic. On the power of louvain in the stochastic block model. Advances in Neural Information Processing Systems, 33, 2020.
  • [11] S. Deng, S. Ling, and T. Strohmer. Strong Consistency, Graph Laplacians, and the Stochastic Block Model. The Journal of Machine Learning Research, 22(1), 2021.
  • [12] T. N. Dinh and M. T. Thai. Finding community structure with performance guarantees in scale-free networks. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on. IEEE, 2011.
  • [13] S. Diskin, J. Erde, M. Kang, and M. Krivelevich. Isoperimetric inequalities and supercritical percolation on high-dimensional product graphs. arXiv preprint arXiv:2304.00016, 2023.
  • [14] S. Diskin and M. Krivelevich. Expansion in supercritical random subgraphs of expanders and its consequences. arXiv preprint arXiv:2205.04852, 2022.
  • [15] S. Fortunato. Community detection in graphs. Physics Reports, 486(3):75–174, 2010. doi:10.1016/j.physrep.2009.11.002.
  • [16] A. Frieze and M. Karoński. Introduction to random graphs. Cambridge University Press, 2015. doi:10.1017/cbo9781316339831.
  • [17] A. Frieze and M. Krivelevich. On the non-planarity of a random subgraph. Combinatorics, Probability and Computing, 22(5):722–732, 2013.
  • [18] S. Janson, T. Łuczak, and A. Ruciński. Random Graphs, volume 45. John Wiley & Sons, 2011.
  • [19] T. Johansson. On hamilton cycles in Erdős-Rényi subgraphs of large graphs. Random Structures & Algorithms, 57(1):132–149, 2020.
  • [20] A. Kaveh, M. Magnani, and C. Rohner. Comparing node degrees in probabilistic networks. Journal of Complex Networks, 7(5), 2019.
  • [21] M. Koshelev. Modularity in planted partition model. Computational Management Science, 20(1):34, 2023.
  • [22] M. Krivelevich, C. Lee, and B. Sudakov. Robust hamiltonicity of dirac graphs. Transactions of the American Mathematical Society, 366(6):3095–3130, 2014.
  • [23] R. Lambiotte and M. T. Schaub. Modularity and Dynamics on Complex Networks. Cambridge University Press, 2021.
  • [24] A. Lancichinetti and S. Fortunato. Limits of modularity maximization in community detection. Physical Review E, 84(6), 2011. doi:10.1103/physreve.84.066122.
  • [25] L. Lichev and D. Mitsche. On the modularity of 3-regular random graphs and random graphs with given degree sequences. Random Structures & Algorithms, 61(4), 2022.
  • [26] C. Liu and F. Wei. Phase transition of degeneracy in minor-closed families. Advances in Applied Mathematics, 146, 2023.
  • [27] B. Louf, C. McDiarmid, and F. Skerman. Modularity and graph expansion. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.ITCS.2024.78.
  • [28] L. Lovász. Large networks and graph limits, volume 60. American Mathematical Soc., 2012.
  • [29] D. Lusseau. The emergent properties of a dolphin social network. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270, 2003.
  • [30] C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics, volume 141 of London Mathematical Society Lecture Note Series. Cambridge University Press, 1989.
  • [31] C. McDiarmid and F. Skerman. Modularity of regular and treelike graphs. Journal of Complex Networks, 6(4), 2018. doi:10.1093/comnet/cnx046.
  • [32] C. McDiarmid and F. Skerman. Modularity of Erdős-Rényi random graphs. Random Structures & Algorithms, 57(1), 2020. doi:10.1002/rsa.20910.
  • [33] C. McDiarmid and F. Skerman. Modularity and edge sampling. arXiv preprint arXiv:2112.13190v1, 2021.
  • [34] K. Meeks and F. Skerman. The parameterised complexity of computing the maximum modularity of a graph. Algorithmica, 82(8), 2020. doi:10.1007/s00453-019-00649-7.
  • [35] M. E. J. Newman. Networks: An Introduction. Oxford University Press, 2010. doi:10.1093/acprof:oso/9780199206650.001.0001.
  • [36] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69(2):026113, 2004. doi:10.1103/physreve.69.026113.
  • [37] E. Ofek. On the expansion of the giant component in percolated (n, d, λ\lambda) graphs. Combinatorics, Probability and Computing, 16(3):445–457, 2007.
  • [38] T. Poisot, A. R. Cirtwill, K. Cazelles, D. Gravel, M.-J. Fortin, and D. B. Stouffer. The structure of probabilistic networks. Methods in Ecology and Evolution, 7(3), 2016.
  • [39] M. A. Porter, J. Onnela, and Peter J. Mucha. Communities in networks. Notices of the AMS, 56(9):1082–1097, 2009.
  • [40] L. O. Prokhorenkova, A. Raigorodskii, and P. Prałat. Modularity of complex networks models. Internet Mathematics, 2017.
  • [41] D. Romanini, S. Lehmann, and M. Kivelä. Privacy and uniqueness of neighborhoods in social networks. Scientific reports, 11(1):1–15, 2021.
  • [42] F. Skerman. Modularity of Networks. PhD thesis, University of Oxford, 2016.
  • [43] B. Sudakov. Robustness of graph properties. Surveys in Combinatorics 2017, 440:372, 2017.
  • [44] V. A. Traag, L. Waltman, and N. J. Van Eck. From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports, 9(1), 2019.
  • [45] J. Vizentin-Bugoni, P. K. Maruyama, V. J. Debastiani, L. Duarte, B. Dalsgaard, and M. Sazima. Influences of sampling effort on detected patterns and structuring processes of a neotropical plant–hummingbird network. Journal of Animal Ecology, 85(1), 2016.
  • [46] Y. Zhao, E. Levina, and J. Zhu. Consistency of community detection in networks under degree-corrected stochastic block models. The Annals of Statistics, 40(4):2266, 2012.

Appendix A Simulations for larger underlying graph

Refer to caption

q~(Gp)\tilde{q}(G_{p})

Estimated modularity of sampled graphpp
Refer to caption

q𝒜~(Gp)(G)q_{\tilde{{\mathcal{A}}}(G_{p})}(G)

Modularity score of underlying graph usingpartition estimated from sampled graphpp
Figure A.1: Simulation results. The US political blogs network [1] with 1490 vertices and 16718 edges was taken to be the underlying graph GG, with the directed links between blogs encoded as undirected edges. The simulations were run as described in Figure 1.1 on page 1.1 with the exception that in the lower plot we plot the modularity score of 𝒜~(Gp)\tilde{{\mathcal{A}}}(G_{p}), rather than the score of the η\eta-fattened partition 𝒜~(Gp)\tilde{{\mathcal{A}}}^{\prime}(G_{p}).